* [PATCH 0/2] block: rq_affinity default and reserved tag limits
@ 2014-09-10 0:17 Robert Elliott
2014-09-10 0:18 ` [PATCH 1/2] block: default to rq_affinity=2 for blk-mq Robert Elliott
2014-09-10 0:18 ` [PATCH 2/2] block: return error if too many reserved tags are requested Robert Elliott
0 siblings, 2 replies; 7+ messages in thread
From: Robert Elliott @ 2014-09-10 0:17 UTC (permalink / raw)
To: axboe, elliott, hch, linux-kernel
The following series changes the default blk-mq rq_affinity
to handling completions on the submitting CPU, and handles
requests for too many reserved tags.
---
Robert Elliott (2):
block: default to rq_affinity=2 for blk-mq
block: return error if too many reserved tags are requested
block/blk-mq.c | 8 +++++---
include/linux/blkdev.h | 3 ++-
2 files changed, 7 insertions(+), 4 deletions(-)
--
Robert Elliott, HP Server Storage
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] block: default to rq_affinity=2 for blk-mq
2014-09-10 0:17 [PATCH 0/2] block: rq_affinity default and reserved tag limits Robert Elliott
@ 2014-09-10 0:18 ` Robert Elliott
2014-09-10 18:14 ` Jens Axboe
2014-09-10 0:18 ` [PATCH 2/2] block: return error if too many reserved tags are requested Robert Elliott
1 sibling, 1 reply; 7+ messages in thread
From: Robert Elliott @ 2014-09-10 0:18 UTC (permalink / raw)
To: axboe, elliott, hch, linux-kernel
From: Robert Elliott <elliott@hp.com>
One change introduced by blk-mq is that it does all
the completion work in hard irq context rather than
soft irq context.
On a 6 core system, if all interrupts are routed to
one CPU, then you can easily run into this:
* 5 CPUs submitting IOs
* 1 CPU spending 100% of its time in hard irq context
processing IO completions, not able to submit anything
itself
Example with CPU5 receiving all interrupts:
CPU usage: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
%usr: 0.00 3.03 1.01 2.02 2.00 0.00
%sys: 14.58 75.76 14.14 4.04 78.00 0.00
%irq: 0.00 0.00 0.00 1.01 0.00 100.00
%soft: 0.00 0.00 0.00 0.00 0.00 0.00
%iowait idle: 85.42 21.21 84.85 92.93 20.00 0.00
%idle: 0.00 0.00 0.00 0.00 0.00 0.00
When the submitting CPUs are forced to process their own
completion interrupts, this steals time from new
submissions and self-throttles them.
Without that, there is no direct feedback to the
submitters to slow down. The only feedback is:
* reaching max queue depth
* lots of timeouts, resulting in aborts, resets, soft
lockups and self-detected stalls on CPU5, bogus
clocksource tsc unstable reports, network
drop-offs, etc.
The SCSI LLD can set affinity_hint for each of its
interrupts to request that a program like irqbalance
route the interrupts back to the submitting CPU.
The latest version of irqbalance ignores those hints,
though, instead offering an option to run a policy
script that could honor them. Otherwise, it balances
them based on its own algorithms. So, we cannot rely
on this.
Hardware might perform interrupt coalescing to help,
but it cannot help 1 CPU keep up with the work
generated by many other CPUs.
rq_affinity=2 helps by pushing most of the block layer
and SCSI midlayer completion work back to the submitting
CPU (via an IPI).
Change the default rq_affinity=2 under blk-mq
so there's at least some feedback to slow down the
submitters.
Signed-off-by: Robert Elliott <elliott@hp.com>
---
include/linux/blkdev.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 518b465..9f41a02 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -522,7 +522,8 @@ struct request_queue {
(1 << QUEUE_FLAG_ADD_RANDOM))
#define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
- (1 << QUEUE_FLAG_SAME_COMP))
+ (1 << QUEUE_FLAG_SAME_COMP) | \
+ (1 << QUEUE_FLAG_SAME_FORCE))
static inline void queue_lockdep_assert_held(struct request_queue *q)
{
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] block: return error if too many reserved tags are requested
2014-09-10 0:17 [PATCH 0/2] block: rq_affinity default and reserved tag limits Robert Elliott
2014-09-10 0:18 ` [PATCH 1/2] block: default to rq_affinity=2 for blk-mq Robert Elliott
@ 2014-09-10 0:18 ` Robert Elliott
2014-09-10 18:17 ` Jens Axboe
1 sibling, 1 reply; 7+ messages in thread
From: Robert Elliott @ 2014-09-10 0:18 UTC (permalink / raw)
To: axboe, elliott, hch, linux-kernel
From: Robert Elliott <elliott@hp.com>
Make blk_mq_alloc_tag_set return an error if set->reserved_tags
is greater than BLK_MQ_MAX_DEPTH minus the minimum number of
tags, since:
* set->queue_depth is truncated to that value
* set->reserved_tags needs to be less than set->queue_depth
Signed-off-by: Robert Elliott <elliott@hp.com>
---
block/blk-mq.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c49fe00..dc2970d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1936,16 +1936,18 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
return -EINVAL;
if (!set->queue_depth)
return -EINVAL;
+ if (set->reserved_tags > BLK_MQ_MAX_DEPTH - BLK_MQ_TAG_MIN)
+ return -EINVAL;
if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN)
return -EINVAL;
if (!set->nr_hw_queues || !set->ops->queue_rq || !set->ops->map_queue)
return -EINVAL;
- if (set->queue_depth > BLK_MQ_MAX_DEPTH) {
+ if (set->queue_depth > BLK_MQ_MAX_DEPTH - set->reserved_tags) {
+ set->queue_depth = BLK_MQ_MAX_DEPTH - set->reserved_tags;
pr_info("blk-mq: reduced tag depth to %u\n",
- BLK_MQ_MAX_DEPTH);
- set->queue_depth = BLK_MQ_MAX_DEPTH;
+ set->queue_depth);
}
set->tags = kmalloc_node(set->nr_hw_queues *
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] block: default to rq_affinity=2 for blk-mq
2014-09-10 0:18 ` [PATCH 1/2] block: default to rq_affinity=2 for blk-mq Robert Elliott
@ 2014-09-10 18:14 ` Jens Axboe
2014-09-10 19:35 ` Elliott, Robert (Server Storage)
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2014-09-10 18:14 UTC (permalink / raw)
To: Robert Elliott, elliott, hch, linux-kernel
On 09/09/2014 06:18 PM, Robert Elliott wrote:
> From: Robert Elliott <elliott@hp.com>
>
> One change introduced by blk-mq is that it does all
> the completion work in hard irq context rather than
> soft irq context.
>
> On a 6 core system, if all interrupts are routed to
> one CPU, then you can easily run into this:
> * 5 CPUs submitting IOs
> * 1 CPU spending 100% of its time in hard irq context
> processing IO completions, not able to submit anything
> itself
>
> Example with CPU5 receiving all interrupts:
> CPU usage: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> %usr: 0.00 3.03 1.01 2.02 2.00 0.00
> %sys: 14.58 75.76 14.14 4.04 78.00 0.00
> %irq: 0.00 0.00 0.00 1.01 0.00 100.00
> %soft: 0.00 0.00 0.00 0.00 0.00 0.00
> %iowait idle: 85.42 21.21 84.85 92.93 20.00 0.00
> %idle: 0.00 0.00 0.00 0.00 0.00 0.00
>
> When the submitting CPUs are forced to process their own
> completion interrupts, this steals time from new
> submissions and self-throttles them.
>
> Without that, there is no direct feedback to the
> submitters to slow down. The only feedback is:
> * reaching max queue depth
> * lots of timeouts, resulting in aborts, resets, soft
> lockups and self-detected stalls on CPU5, bogus
> clocksource tsc unstable reports, network
> drop-offs, etc.
>
> The SCSI LLD can set affinity_hint for each of its
> interrupts to request that a program like irqbalance
> route the interrupts back to the submitting CPU.
> The latest version of irqbalance ignores those hints,
> though, instead offering an option to run a policy
> script that could honor them. Otherwise, it balances
> them based on its own algorithms. So, we cannot rely
> on this.
>
> Hardware might perform interrupt coalescing to help,
> but it cannot help 1 CPU keep up with the work
> generated by many other CPUs.
>
> rq_affinity=2 helps by pushing most of the block layer
> and SCSI midlayer completion work back to the submitting
> CPU (via an IPI).
>
> Change the default rq_affinity=2 under blk-mq
> so there's at least some feedback to slow down the
> submitters.
I don't think we should do this generically. For "sane" devices with
multiple completion queues, and with proper affinity setting in the
driver, this is going to be a loss.
So lets not add it to QUEUE_FLAG_MQ_DEFAULT, but we can make it default
for nr_hw_queues == 1. I think that would be way saner.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] block: return error if too many reserved tags are requested
2014-09-10 0:18 ` [PATCH 2/2] block: return error if too many reserved tags are requested Robert Elliott
@ 2014-09-10 18:17 ` Jens Axboe
0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2014-09-10 18:17 UTC (permalink / raw)
To: Robert Elliott, elliott, hch, linux-kernel
On 09/09/2014 06:18 PM, Robert Elliott wrote:
> From: Robert Elliott <elliott@hp.com>
>
> Make blk_mq_alloc_tag_set return an error if set->reserved_tags
> is greater than BLK_MQ_MAX_DEPTH minus the minimum number of
> tags, since:
> * set->queue_depth is truncated to that value
> * set->reserved_tags needs to be less than set->queue_depth
>
> Signed-off-by: Robert Elliott <elliott@hp.com>
> ---
> block/blk-mq.c | 8 +++++---
> 1 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c49fe00..dc2970d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1936,16 +1936,18 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
> return -EINVAL;
> if (!set->queue_depth)
> return -EINVAL;
> + if (set->reserved_tags > BLK_MQ_MAX_DEPTH - BLK_MQ_TAG_MIN)
> + return -EINVAL;
> if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN)
> return -EINVAL;
>
> if (!set->nr_hw_queues || !set->ops->queue_rq || !set->ops->map_queue)
> return -EINVAL;
>
> - if (set->queue_depth > BLK_MQ_MAX_DEPTH) {
> + if (set->queue_depth > BLK_MQ_MAX_DEPTH - set->reserved_tags) {
This is harder to read than:
if (set->queue_depth + set->reserved_tags > BLK_MQ_MAX_DEPTH) {
which more clearly expresses the same thing. But set->queue_depth is the
total pool, reserved tags come out of that pool. So I don't think this
is correct.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/2] block: default to rq_affinity=2 for blk-mq
2014-09-10 18:14 ` Jens Axboe
@ 2014-09-10 19:35 ` Elliott, Robert (Server Storage)
2014-09-10 19:51 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-09-10 19:35 UTC (permalink / raw)
To: Jens Axboe, Robert Elliott, hch@lst.de,
linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4606 bytes --]
> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Wednesday, 10 September, 2014 1:15 PM
> To: Robert Elliott; Elliott, Robert (Server Storage); hch@lst.de;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/2] block: default to rq_affinity=2 for blk-mq
>
> On 09/09/2014 06:18 PM, Robert Elliott wrote:
> > From: Robert Elliott <elliott@hp.com>
> >
> > One change introduced by blk-mq is that it does all
> > the completion work in hard irq context rather than
> > soft irq context.
> >
> > On a 6 core system, if all interrupts are routed to
> > one CPU, then you can easily run into this:
> > * 5 CPUs submitting IOs
> > * 1 CPU spending 100% of its time in hard irq context
> > processing IO completions, not able to submit anything
> > itself
> >
> > Example with CPU5 receiving all interrupts:
> > CPU usage: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> > %usr: 0.00 3.03 1.01 2.02 2.00 0.00
> > %sys: 14.58 75.76 14.14 4.04 78.00 0.00
> > %irq: 0.00 0.00 0.00 1.01 0.00 100.00
> > %soft: 0.00 0.00 0.00 0.00 0.00 0.00
> > %iowait idle: 85.42 21.21 84.85 92.93 20.00 0.00
> > %idle: 0.00 0.00 0.00 0.00 0.00 0.00
> >
> > When the submitting CPUs are forced to process their own
> > completion interrupts, this steals time from new
> > submissions and self-throttles them.
> >
> > Without that, there is no direct feedback to the
> > submitters to slow down. The only feedback is:
> > * reaching max queue depth
> > * lots of timeouts, resulting in aborts, resets, soft
> > lockups and self-detected stalls on CPU5, bogus
> > clocksource tsc unstable reports, network
> > drop-offs, etc.
> >
> > The SCSI LLD can set affinity_hint for each of its
> > interrupts to request that a program like irqbalance
> > route the interrupts back to the submitting CPU.
> > The latest version of irqbalance ignores those hints,
> > though, instead offering an option to run a policy
> > script that could honor them. Otherwise, it balances
> > them based on its own algorithms. So, we cannot rely
> > on this.
> >
> > Hardware might perform interrupt coalescing to help,
> > but it cannot help 1 CPU keep up with the work
> > generated by many other CPUs.
> >
> > rq_affinity=2 helps by pushing most of the block layer
> > and SCSI midlayer completion work back to the submitting
> > CPU (via an IPI).
> >
> > Change the default rq_affinity=2 under blk-mq
> > so there's at least some feedback to slow down the
> > submitters.
>
> I don't think we should do this generically. For "sane" devices with
> multiple completion queues, and with proper affinity setting in the
> driver, this is going to be a loss.
>
> So lets not add it to QUEUE_FLAG_MQ_DEFAULT, but we can make it
> default
> for nr_hw_queues == 1. I think that would be way saner.
>
> --
> Jens Axboe
If the interrupt does arrive on the submitting CPU, then it
meets the criteria for all the cases:
* 1: complete on any CPU
* 2: complete on submitting CPU's node (QUEUE_FLAG_SAME_COMP)
* 3: complete on submitting CPU (QUEUE_FLAG_SAME_FORCE)
and _blk_complete_request handles it locally rather
than sending an IPI.
if (req->cpu != -1) {
ccpu = req->cpu;
if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
shared = cpus_share_cache(cpu, ccpu);
} else
ccpu = cpu;
...
if (ccpu == cpu || shared) {
struct list_head *list;
do_local:
...
} else if (raise_blk_irq(ccpu, req))
goto do_local;
Are you saying you want the blk_queue_bio submission to
not even set the req->cpu field (which defaulted to -1):
if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags))
req->cpu = raw_smp_processor_id();
when you expect the interrupt routing is good so that
_blk_complete_request can avoid the test_bit and
cpus_share_cache calls?
With irqbalance no longer honoring affinity_hint
by default, I'm worried that most LLDs will not find
their interrupts routed that way anymore. That's
how we ran into this; scsi-mq + kernel-3.17 on an
up-to-date RHEL 6.5 distro (which now carries the
new irqbalance).
We plan to create a policyscript for the new irqbalance
for hpsa devices, but other high-IOPS drivers will hit
the same problem.
---
Rob Elliott HP Server Storage
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] block: default to rq_affinity=2 for blk-mq
2014-09-10 19:35 ` Elliott, Robert (Server Storage)
@ 2014-09-10 19:51 ` Jens Axboe
0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2014-09-10 19:51 UTC (permalink / raw)
To: Elliott, Robert (Server Storage), Robert Elliott, hch@lst.de,
linux-kernel@vger.kernel.org
On 09/10/2014 01:35 PM, Elliott, Robert (Server Storage) wrote:
>
>
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@kernel.dk]
>> Sent: Wednesday, 10 September, 2014 1:15 PM
>> To: Robert Elliott; Elliott, Robert (Server Storage); hch@lst.de;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH 1/2] block: default to rq_affinity=2 for blk-mq
>>
>> On 09/09/2014 06:18 PM, Robert Elliott wrote:
>>> From: Robert Elliott <elliott@hp.com>
>>>
>>> One change introduced by blk-mq is that it does all
>>> the completion work in hard irq context rather than
>>> soft irq context.
>>>
>>> On a 6 core system, if all interrupts are routed to
>>> one CPU, then you can easily run into this:
>>> * 5 CPUs submitting IOs
>>> * 1 CPU spending 100% of its time in hard irq context
>>> processing IO completions, not able to submit anything
>>> itself
>>>
>>> Example with CPU5 receiving all interrupts:
>>> CPU usage: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
>>> %usr: 0.00 3.03 1.01 2.02 2.00 0.00
>>> %sys: 14.58 75.76 14.14 4.04 78.00 0.00
>>> %irq: 0.00 0.00 0.00 1.01 0.00 100.00
>>> %soft: 0.00 0.00 0.00 0.00 0.00 0.00
>>> %iowait idle: 85.42 21.21 84.85 92.93 20.00 0.00
>>> %idle: 0.00 0.00 0.00 0.00 0.00 0.00
>>>
>>> When the submitting CPUs are forced to process their own
>>> completion interrupts, this steals time from new
>>> submissions and self-throttles them.
>>>
>>> Without that, there is no direct feedback to the
>>> submitters to slow down. The only feedback is:
>>> * reaching max queue depth
>>> * lots of timeouts, resulting in aborts, resets, soft
>>> lockups and self-detected stalls on CPU5, bogus
>>> clocksource tsc unstable reports, network
>>> drop-offs, etc.
>>>
>>> The SCSI LLD can set affinity_hint for each of its
>>> interrupts to request that a program like irqbalance
>>> route the interrupts back to the submitting CPU.
>>> The latest version of irqbalance ignores those hints,
>>> though, instead offering an option to run a policy
>>> script that could honor them. Otherwise, it balances
>>> them based on its own algorithms. So, we cannot rely
>>> on this.
>>>
>>> Hardware might perform interrupt coalescing to help,
>>> but it cannot help 1 CPU keep up with the work
>>> generated by many other CPUs.
>>>
>>> rq_affinity=2 helps by pushing most of the block layer
>>> and SCSI midlayer completion work back to the submitting
>>> CPU (via an IPI).
>>>
>>> Change the default rq_affinity=2 under blk-mq
>>> so there's at least some feedback to slow down the
>>> submitters.
>>
>> I don't think we should do this generically. For "sane" devices with
>> multiple completion queues, and with proper affinity setting in the
>> driver, this is going to be a loss.
>>
>> So lets not add it to QUEUE_FLAG_MQ_DEFAULT, but we can make it
>> default
>> for nr_hw_queues == 1. I think that would be way saner.
>>
>> --
>> Jens Axboe
>
> If the interrupt does arrive on the submitting CPU, then it
> meets the criteria for all the cases:
> * 1: complete on any CPU
> * 2: complete on submitting CPU's node (QUEUE_FLAG_SAME_COMP)
> * 3: complete on submitting CPU (QUEUE_FLAG_SAME_FORCE)
>
> and _blk_complete_request handles it locally rather
> than sending an IPI.
>
> if (req->cpu != -1) {
> ccpu = req->cpu;
> if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
> shared = cpus_share_cache(cpu, ccpu);
> } else
> ccpu = cpu;
> ...
> if (ccpu == cpu || shared) {
> struct list_head *list;
> do_local:
> ...
> } else if (raise_blk_irq(ccpu, req))
> goto do_local;
I forgot about the shared case being handled appropriately, so that
should probably be fine to do. My primary concern here is a performance
penalty on sync IO, I'll run a few test on a single IRQ case (like the
mtip32xx) and see how that performs. But you are right, it might not be
a bad thing to do by default.
> Are you saying you want the blk_queue_bio submission to
> not even set the req->cpu field (which defaulted to -1):
> if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags))
> req->cpu = raw_smp_processor_id();
>
> when you expect the interrupt routing is good so that
> _blk_complete_request can avoid the test_bit and
> cpus_share_cache calls?
No, and since those are non-serializing tests, I suspect if we start
adding a branch to avoid that we will negate any potential win on that.
The flags should basically never get dirtied. Well I guess they could
for heavy uses of start/stop queue, but that might be something that's
worthwhile to tackle separately.
> With irqbalance no longer honoring affinity_hint
> by default, I'm worried that most LLDs will not find
> their interrupts routed that way anymore. That's
> how we ran into this; scsi-mq + kernel-3.17 on an
> up-to-date RHEL 6.5 distro (which now carries the
> new irqbalance).
>
> We plan to create a policyscript for the new irqbalance
> for hpsa devices, but other high-IOPS drivers will hit
> the same problem.
irqbalance has _always_ been a pain in the butt... Suboptimal or
changing behaviour from release to release, it's been one of the most
annoying parts of performance tuning.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-09-10 19:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-10 0:17 [PATCH 0/2] block: rq_affinity default and reserved tag limits Robert Elliott
2014-09-10 0:18 ` [PATCH 1/2] block: default to rq_affinity=2 for blk-mq Robert Elliott
2014-09-10 18:14 ` Jens Axboe
2014-09-10 19:35 ` Elliott, Robert (Server Storage)
2014-09-10 19:51 ` Jens Axboe
2014-09-10 0:18 ` [PATCH 2/2] block: return error if too many reserved tags are requested Robert Elliott
2014-09-10 18:17 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox