All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org,
	linux-kernel@vger.kernel.org, hpa@zytor.com, jeremy@goop.org,
	cpw@sgi.com
Subject: [patch] x86: optimize __pa() to be linear again on 64-bit x86
Date: Mon, 23 Feb 2009 14:38:04 +0100	[thread overview]
Message-ID: <20090223133804.GA20468@elte.hu> (raw)
In-Reply-To: <20090223101741.GP9582@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> > Are __pa()/__va() that hot paths?  Or am I over-estimating 
> > the cost of 2MB dTLB?
> 
> yes, __pa()/__va() is a very hot path - in a defconfig they 
> are used in about a thousand different places.
> 
> In fact it would be nice to get rid of the __phys_addr() 
> redirection on the 64-bit side (which is non-linear and a 
> function there, and all __pa()s go through it) and make it a 
> constant offset again.
> 
> This isnt trivial/possible to do though as .data/.bss is in 
> the high alias. (high .text aliases alone wouldnt be a big 
> issue to fix, but the data aliases are an issue.)
> 
> Moving .data/.bss into the linear space isnt feasible as we'd 
> lose RIP-relative addressing shortcuts.
> 
> Maybe we could figure out the places that do __pa() on a high 
> alias and gradually eliminate them. __pa() on .data/.bss is a 
> rare and unusal thing to do, and CONFIG_DEBUG_VIRTUAL could 
> warn about them without crashing the kernel.
> 
> Later on we could make this check unconditional, and then 
> switch over __pa() to addr-PAGE_OFFSET in the 
> !CONFIG_DEBUG_VIRTUAL case (which is the default).

Ok, i couldnt resist and using ftrace_printk() (regular printk 
in __pa() would hang during bootup) and came up with the patch 
below - which allows the second patch below that does:

 -#define __pa(x)		__phys_addr((unsigned long)(x))
 +#define __pa(x)		((unsigned long)(x)-PAGE_OFFSET)

It cuts a nice (and hotly executed) ~650 bytes chunk out of the 
x86 64-bit defconfig kernel text:

    text	   data	    bss	    dec	    hex	filename
 7999071	1137780	 843672	9980523	 984a6b	vmlinux.before
 7998414	1137780	 843672	9979866	 9847da	vmlinux.after

And it even boots.

(the load_cr3() hack needs to be changed, by setting the init 
pgdir from init_level4_pgt to __va(__pa_symbol(init_level4_pgt).)

(32-bit is untested and likely wont even build.)

It's not even that bad and looks quite maintainable as a 
concept.

This also means that __va() and __pa() will be one and the same 
thing simple arithmetics again on both 32-bit and 64-bit 
kernels.

	Ingo

---
 arch/x86/include/asm/page.h          |    4 +++-
 arch/x86/include/asm/page_64_types.h |    1 +
 arch/x86/include/asm/pgalloc.h       |    4 ++--
 arch/x86/include/asm/pgtable.h       |    2 +-
 arch/x86/include/asm/processor.h     |    7 ++++++-
 arch/x86/kernel/setup.c              |   12 ++++++------
 arch/x86/mm/init_64.c                |    6 +++---
 arch/x86/mm/ioremap.c                |   12 +++++++++++-
 arch/x86/mm/pageattr.c               |   28 ++++++++++++++--------------
 arch/x86/mm/pgtable.c                |    2 +-
 10 files changed, 48 insertions(+), 30 deletions(-)

Index: linux/arch/x86/include/asm/page.h
===================================================================
--- linux.orig/arch/x86/include/asm/page.h
+++ linux/arch/x86/include/asm/page.h
@@ -34,10 +34,11 @@ static inline void copy_user_page(void *
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#define __pa_slow(x)		__phys_addr_slow((unsigned long)(x))
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
-#define __pa_symbol(x)	__pa(__phys_reloc_hide((unsigned long)(x)))
+#define __pa_symbol(x)	__pa_slow(__phys_reloc_hide((unsigned long)(x)))
 
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
 
@@ -49,6 +50,7 @@ static inline void copy_user_page(void *
  * virt_addr_valid(kaddr) returns true.
  */
 #define virt_to_page(kaddr)	pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
+#define virt_to_page_slow(kaddr) pfn_to_page(__pa_slow(kaddr) >> PAGE_SHIFT)
 #define pfn_to_kaddr(pfn)      __va((pfn) << PAGE_SHIFT)
 extern bool __virt_addr_valid(unsigned long kaddr);
 #define virt_addr_valid(kaddr)	__virt_addr_valid((unsigned long) (kaddr))
Index: linux/arch/x86/include/asm/page_64_types.h
===================================================================
--- linux.orig/arch/x86/include/asm/page_64_types.h
+++ linux/arch/x86/include/asm/page_64_types.h
@@ -67,6 +67,7 @@ extern unsigned long max_pfn;
 extern unsigned long phys_base;
 
 extern unsigned long __phys_addr(unsigned long);
+extern unsigned long __phys_addr_slow(unsigned long);
 #define __phys_reloc_hide(x)	(x)
 
 #define vmemmap ((struct page *)VMEMMAP_START)
Index: linux/arch/x86/include/asm/pgalloc.h
===================================================================
--- linux.orig/arch/x86/include/asm/pgalloc.h
+++ linux/arch/x86/include/asm/pgalloc.h
@@ -51,8 +51,8 @@ extern void __pte_free_tlb(struct mmu_ga
 static inline void pmd_populate_kernel(struct mm_struct *mm,
 				       pmd_t *pmd, pte_t *pte)
 {
-	paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT);
-	set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+	paravirt_alloc_pte(mm, __pa_symbol(pte) >> PAGE_SHIFT);
+	set_pmd(pmd, __pmd(__pa_symbol(pte) | _PAGE_TABLE));
 }
 
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
Index: linux/arch/x86/include/asm/pgtable.h
===================================================================
--- linux.orig/arch/x86/include/asm/pgtable.h
+++ linux/arch/x86/include/asm/pgtable.h
@@ -20,7 +20,7 @@
  * for zero-mapped memory areas etc..
  */
 extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
-#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
+#define ZERO_PAGE(vaddr) (virt_to_page_slow(empty_zero_page))
 
 extern spinlock_t pgd_lock;
 extern struct list_head pgd_list;
Index: linux/arch/x86/include/asm/processor.h
===================================================================
--- linux.orig/arch/x86/include/asm/processor.h
+++ linux/arch/x86/include/asm/processor.h
@@ -186,9 +186,14 @@ static inline void native_cpuid(unsigned
 	    : "0" (*eax), "2" (*ecx));
 }
 
+extern pgd_t init_level4_pgt[];
+
 static inline void load_cr3(pgd_t *pgdir)
 {
-	write_cr3(__pa(pgdir));
+	if (pgdir == init_level4_pgt)
+		write_cr3((unsigned long)(pgdir) - __START_KERNEL_map);
+	else
+		write_cr3(__pa(pgdir));
 }
 
 #ifdef CONFIG_X86_32
Index: linux/arch/x86/kernel/setup.c
===================================================================
--- linux.orig/arch/x86/kernel/setup.c
+++ linux/arch/x86/kernel/setup.c
@@ -733,12 +733,12 @@ void __init setup_arch(char **cmdline_p)
 	init_mm.brk = (unsigned long) &_end;
 #endif
 
-	code_resource.start = virt_to_phys(_text);
-	code_resource.end = virt_to_phys(_etext)-1;
-	data_resource.start = virt_to_phys(_etext);
-	data_resource.end = virt_to_phys(_edata)-1;
-	bss_resource.start = virt_to_phys(&__bss_start);
-	bss_resource.end = virt_to_phys(&__bss_stop)-1;
+	code_resource.start = __pa_symbol(_text);
+	code_resource.end = __pa_symbol(_etext)-1;
+	data_resource.start = __pa_symbol(_etext);
+	data_resource.end = __pa_symbol(_edata)-1;
+	bss_resource.start = __pa_symbol(&__bss_start);
+	bss_resource.end = __pa_symbol(&__bss_stop)-1;
 
 #ifdef CONFIG_CMDLINE_BOOL
 #ifdef CONFIG_CMDLINE_OVERRIDE
Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -965,11 +965,11 @@ void free_init_pages(char *what, unsigne
 	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
 
 	for (; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
+		ClearPageReserved(virt_to_page_slow(addr));
+		init_page_count(virt_to_page_slow(addr));
 		memset((void *)(addr & ~(PAGE_SIZE-1)),
 			POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
+		free_page((unsigned long)__va(__pa_symbol(addr)));
 		totalram_pages++;
 	}
 #endif
Index: linux/arch/x86/mm/ioremap.c
===================================================================
--- linux.orig/arch/x86/mm/ioremap.c
+++ linux/arch/x86/mm/ioremap.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/mmiotrace.h>
+#include <linux/ftrace.h>
 
 #include <asm/cacheflush.h>
 #include <asm/e820.h>
@@ -29,7 +30,7 @@ static inline int phys_addr_valid(unsign
 	return addr < (1UL << boot_cpu_data.x86_phys_bits);
 }
 
-unsigned long __phys_addr(unsigned long x)
+unsigned long __phys_addr_slow(unsigned long x)
 {
 	if (x >= __START_KERNEL_map) {
 		x -= __START_KERNEL_map;
@@ -43,6 +44,15 @@ unsigned long __phys_addr(unsigned long 
 	}
 	return x;
 }
+EXPORT_SYMBOL(__phys_addr_slow);
+
+unsigned long __phys_addr(unsigned long x)
+{
+	if (x >= __START_KERNEL_map)
+		ftrace_printk("__phys_addr() done on symbol: %p\n", (void *)x);
+
+	return __phys_addr_slow(x);
+}
 EXPORT_SYMBOL(__phys_addr);
 
 bool __virt_addr_valid(unsigned long x)
Index: linux/arch/x86/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86/mm/pageattr.c
+++ linux/arch/x86/mm/pageattr.c
@@ -90,12 +90,12 @@ static inline void split_page_count(int 
 
 static inline unsigned long highmap_start_pfn(void)
 {
-	return __pa(_text) >> PAGE_SHIFT;
+	return __pa_symbol(_text) >> PAGE_SHIFT;
 }
 
 static inline unsigned long highmap_end_pfn(void)
 {
-	return __pa(roundup((unsigned long)_end, PMD_SIZE)) >> PAGE_SHIFT;
+	return __pa_symbol(roundup((unsigned long)_end, PMD_SIZE)) >> PAGE_SHIFT;
 }
 
 #endif
@@ -266,8 +266,8 @@ static inline pgprot_t static_protection
 	 * The .rodata section needs to be read-only. Using the pfn
 	 * catches all aliases.
 	 */
-	if (within(pfn, __pa((unsigned long)__start_rodata) >> PAGE_SHIFT,
-		   __pa((unsigned long)__end_rodata) >> PAGE_SHIFT))
+	if (within(pfn, __pa_symbol((unsigned long)__start_rodata) >> PAGE_SHIFT,
+		   __pa_symbol((unsigned long)__end_rodata) >> PAGE_SHIFT))
 		pgprot_val(forbidden) |= _PAGE_RW;
 
 	prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));
@@ -555,7 +555,7 @@ static int __cpa_process_fault(struct cp
 	if (within(vaddr, PAGE_OFFSET,
 		   PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT))) {
 		cpa->numpages = 1;
-		cpa->pfn = __pa(vaddr) >> PAGE_SHIFT;
+		cpa->pfn = __pa_symbol(vaddr) >> PAGE_SHIFT;
 		return 0;
 	} else {
 		WARN(1, KERN_WARNING "CPA: called for zero pte. "
@@ -901,7 +901,7 @@ int set_memory_uc(unsigned long addr, in
 	/*
 	 * for now UC MINUS. see comments in ioremap_nocache()
 	 */
-	if (reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
+	if (reserve_memtype(__pa_symbol(addr), __pa_symbol(addr) + numpages * PAGE_SIZE,
 			    _PAGE_CACHE_UC_MINUS, NULL))
 		return -EINVAL;
 
@@ -918,9 +918,9 @@ int set_memory_array_uc(unsigned long *a
 	 * for now UC MINUS. see comments in ioremap_nocache()
 	 */
 	for (i = 0; i < addrinarray; i++) {
-		start = __pa(addr[i]);
+		start = __pa_symbol(addr[i]);
 		for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
-			if (end != __pa(addr[i + 1]))
+			if (end != __pa_symbol(addr[i + 1]))
 				break;
 			i++;
 		}
@@ -932,12 +932,12 @@ int set_memory_array_uc(unsigned long *a
 				    __pgprot(_PAGE_CACHE_UC_MINUS), 1);
 out:
 	for (i = 0; i < addrinarray; i++) {
-		unsigned long tmp = __pa(addr[i]);
+		unsigned long tmp = __pa_symbol(addr[i]);
 
 		if (tmp == start)
 			break;
 		for (end = tmp + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
-			if (end != __pa(addr[i + 1]))
+			if (end != __pa_symbol(addr[i + 1]))
 				break;
 			i++;
 		}
@@ -958,7 +958,7 @@ int set_memory_wc(unsigned long addr, in
 	if (!pat_enabled)
 		return set_memory_uc(addr, numpages);
 
-	if (reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
+	if (reserve_memtype(__pa_symbol(addr), __pa_symbol(addr) + numpages * PAGE_SIZE,
 		_PAGE_CACHE_WC, NULL))
 		return -EINVAL;
 
@@ -974,7 +974,7 @@ int _set_memory_wb(unsigned long addr, i
 
 int set_memory_wb(unsigned long addr, int numpages)
 {
-	free_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE);
+	free_memtype(__pa_symbol(addr), __pa_symbol(addr) + numpages * PAGE_SIZE);
 
 	return _set_memory_wb(addr, numpages);
 }
@@ -985,11 +985,11 @@ int set_memory_array_wb(unsigned long *a
 	int i;
 
 	for (i = 0; i < addrinarray; i++) {
-		unsigned long start = __pa(addr[i]);
+		unsigned long start = __pa_symbol(addr[i]);
 		unsigned long end;
 
 		for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
-			if (end != __pa(addr[i + 1]))
+			if (end != __pa_symbol(addr[i + 1]))
 				break;
 			i++;
 		}
Index: linux/arch/x86/mm/pgtable.c
===================================================================
--- linux.orig/arch/x86/mm/pgtable.c
+++ linux/arch/x86/mm/pgtable.c
@@ -77,7 +77,7 @@ static void pgd_ctor(pgd_t *pgd)
 				swapper_pg_dir + KERNEL_PGD_BOUNDARY,
 				KERNEL_PGD_PTRS);
 		paravirt_alloc_pmd_clone(__pa(pgd) >> PAGE_SHIFT,
-					 __pa(swapper_pg_dir) >> PAGE_SHIFT,
+					 __pa_symbol(swapper_pg_dir) >> PAGE_SHIFT,
 					 KERNEL_PGD_BOUNDARY,
 					 KERNEL_PGD_PTRS);
 	}

---
 arch/x86/include/asm/page.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/include/asm/page.h
===================================================================
--- linux.orig/arch/x86/include/asm/page.h
+++ linux/arch/x86/include/asm/page.h
@@ -33,7 +33,7 @@ static inline void copy_user_page(void *
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
-#define __pa(x)		__phys_addr((unsigned long)(x))
+#define __pa(x)		((unsigned long)(x)-PAGE_OFFSET)
 #define __pa_slow(x)		__phys_addr_slow((unsigned long)(x))
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.

  reply	other threads:[~2009-02-23 13:38 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-18 12:04 [PATCHSET x86/core/percpu] implement dynamic percpu allocator Tejun Heo
2009-02-18 12:04 ` [PATCH 01/10] vmalloc: call flush_cache_vunmap() from unmap_kernel_range() Tejun Heo
2009-02-19 12:06   ` Nick Piggin
2009-02-19 22:36     ` David Miller
2009-02-18 12:04 ` [PATCH 02/10] module: fix out-of-range memory access Tejun Heo
2009-02-19 12:08   ` Nick Piggin
2009-02-20  7:16   ` Tejun Heo
2009-02-18 12:04 ` [PATCH 03/10] module: reorder module pcpu related functions Tejun Heo
2009-02-18 12:04 ` [PATCH 04/10] alloc_percpu: change percpu_ptr to per_cpu_ptr Tejun Heo
2009-02-18 12:04 ` [PATCH 05/10] alloc_percpu: add align argument to __alloc_percpu Tejun Heo
2009-02-18 12:04 ` [PATCH 06/10] percpu: kill percpu_alloc() and friends Tejun Heo
2009-02-19  0:17   ` Rusty Russell
2009-03-11 18:36   ` Tony Luck
2009-03-11 22:44     ` Rusty Russell
2009-03-12  2:06     ` Tejun Heo
2009-02-18 12:04 ` [PATCH 07/10] vmalloc: implement vm_area_register_early() Tejun Heo
2009-02-19  0:55   ` Tejun Heo
2009-02-19 12:09   ` Nick Piggin
2009-02-18 12:04 ` [PATCH 08/10] vmalloc: add un/map_kernel_range_noflush() Tejun Heo
2009-02-19 12:17   ` Nick Piggin
2009-02-20  1:27     ` Tejun Heo
2009-02-20  7:15   ` Subject: [PATCH 08/10 UPDATED] " Tejun Heo
2009-02-20  8:32     ` Andrew Morton
2009-02-21  3:21       ` Tejun Heo
2009-02-18 12:04 ` [PATCH 09/10] percpu: implement new dynamic percpu allocator Tejun Heo
2009-02-19 10:10   ` Andrew Morton
2009-02-19 11:01     ` Ingo Molnar
2009-02-20  2:45       ` Tejun Heo
2009-02-19 12:07     ` Rusty Russell
2009-02-20  2:35     ` Tejun Heo
2009-02-20  3:04       ` Andrew Morton
2009-02-20  5:29         ` Tejun Heo
2009-02-24  2:52         ` Rusty Russell
2009-02-19 11:51   ` Rusty Russell
2009-02-20  3:01     ` Tejun Heo
2009-02-20  3:02       ` Tejun Heo
2009-02-24  2:56       ` Rusty Russell
2009-02-24  5:27         ` [PATCH tj-percpu] percpu: add __read_mostly to variables which are mostly read only Tejun Heo
2009-02-24  5:47         ` [PATCH 09/10] percpu: implement new dynamic percpu allocator Tejun Heo
2009-02-24 17:41           ` Luck, Tony
2009-02-26  3:17             ` Tejun Heo
2009-02-27 19:41               ` Luck, Tony
2009-02-19 12:36   ` Nick Piggin
2009-02-20  3:04     ` Tejun Heo
2009-02-20  7:30   ` [PATCH UPDATED " Tejun Heo
2009-02-20  8:37     ` Andrew Morton
2009-02-21  3:23       ` Tejun Heo
2009-02-21  3:42         ` [PATCH tj-percpu] percpu: s/size/bytes/g in new percpu allocator and interface Tejun Heo
2009-02-21  7:48           ` Tejun Heo
2009-02-21  7:55             ` [PATCH tj-percpu] percpu: clean up size usage Tejun Heo
2009-02-21  7:56               ` Tejun Heo
2009-02-18 12:04 ` [PATCH 10/10] x86: convert to the new dynamic percpu allocator Tejun Heo
2009-02-18 13:43 ` [PATCHSET x86/core/percpu] implement " Ingo Molnar
2009-02-19  0:31   ` Tejun Heo
2009-02-19 10:51   ` Rusty Russell
2009-02-19 11:06     ` Ingo Molnar
2009-02-19 12:14       ` Rusty Russell
2009-02-20  3:08         ` Tejun Heo
2009-02-20  5:36           ` Tejun Heo
2009-02-20  7:33             ` Tejun Heo
2009-02-19  0:30 ` Tejun Heo
2009-02-19 11:07   ` Ingo Molnar
2009-02-20  3:17     ` Tejun Heo
2009-02-20  9:32       ` Ingo Molnar
2009-02-21  7:10         ` Tejun Heo
2009-02-21  7:33           ` Tejun Heo
2009-02-22 19:38             ` Ingo Molnar
2009-02-23  0:43               ` Tejun Heo
2009-02-23 10:17                 ` Ingo Molnar
2009-02-23 13:38                   ` Ingo Molnar [this message]
2009-02-23 14:08                     ` [patch] x86: optimize __pa() to be linear again on 64-bit x86 Nick Piggin
2009-02-23 14:53                       ` Ingo Molnar
2009-02-24 16:00                         ` Andi Kleen
2009-02-27  5:57                         ` Tejun Heo
2009-02-27  6:57                           ` Ingo Molnar
2009-02-27  7:11                             ` Tejun Heo
2009-02-22 19:27           ` [PATCHSET x86/core/percpu] implement dynamic percpu allocator Ingo Molnar
2009-02-23  0:47             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090223133804.GA20468@elte.hu \
    --to=mingo@elte.hu \
    --cc=cpw@sgi.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.