Netdev List
 help / color / mirror / Atom feed
* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: David Miller @ 2007-08-18  0:00 UTC (permalink / raw)
  To: rdreier; +Cc: jeff, netdev, linux-kernel, general
In-Reply-To: <adafy2hyc04.fsf@cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Fri, 17 Aug 2007 16:31:07 -0700

>  > >  > When using RDMA you lose the capability to do packet shaping,
>  > >  > classification, and all the other wonderful networking facilities
>  > >  > you've grown to love and use over the years.
>  > > 
>  > > Same thing with TSO and LRO and who knows what else.
>  > 
>  > Not true at all.  Full classification and filtering still is usable
>  > with TSO and LRO.
> 
> Well, obviously with TSO and LRO the packets that the stack sends or
> receives are not the same as what's on the wire.  Whether that breaks
> your wonderful networking facilities or not depends on the specifics
> of the particular facility I guess -- for example shaping is clearly
> broken by TSO.  (And people can wonder what the packet trains TSO
> creates do to congestion control on the internet, but the netdev crowd
> has already decided that TSO is "good" and RDMA is "bad")

This is also a series of falsehoods.  All packet filtering,
queue management, and packet scheduling facilities work perfectly
fine and as designed with both LRO and TSO.

When problems come up, they are bugs, and we fix them.

Please stop spreading this FUD about TSO and LRO.

The fact is that RDMA bypasses the whole stack so that supporting
these facilities is not even _POSSIBLE_.  With stateless offloads it
is possible to support all of these facilities, and we do.

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-17 23:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Paul Mackerras, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm,
	jesper.juhl, linux-arch, zlynx, satyam, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708162045150.30176@woody.linux-foundation.org>

On Thu, Aug 16, 2007 at 08:50:30PM -0700, Linus Torvalds wrote:
> Just try it yourself:
> 
> 	volatile int i;
> 	int j;
> 
> 	int testme(void)
> 	{
> 	        return i <= 1;
> 	}
> 
> 	int testme2(void)
> 	{
> 	        return j <= 1;
> 	}
> 
> and compile with all the optimizations you can.
> 
> I get:
> 
> 	testme:
> 	        movl    i(%rip), %eax
> 	        subl    $1, %eax
> 	        setle   %al
> 	        movzbl  %al, %eax
> 	        ret
> 
> vs
> 
> 	testme2:
> 	        xorl    %eax, %eax
> 	        cmpl    $1, j(%rip)
> 	        setle   %al
> 	        ret
> 
> (now, whether that "xorl + setle" is better than "setle + movzbl", I don't 
> really know - maybe it is. But that's not thepoint. The point is the 
> difference between
> 
>                 movl    i(%rip), %eax
>                 subl    $1, %eax
> 
> and
> 
>                 cmpl    $1, j(%rip)

gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 23:55 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Paul Mackerras, heiko.carstens, horms, Linux Kernel Mailing List,
	rpjday, ak, netdev, cfriesen, Nick Piggin, linux-arch,
	jesper.juhl, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708180512300.3666@enigma.security.iitk.ac.in>

>>> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
>>>
>>> [ This is exactly equivalent to using "+m" in the constraints, as 
>>> recently
>>>   explained on a GCC list somewhere, in response to the patch in my 
>>> bitops
>>>   series a few weeks back where I thought "+m" was bogus. ]
>>
>> [It wasn't explained on a GCC list in response to your patch, as
>> far as I can see -- if I missed it, please point me to an archived
>> version of it].
>
> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html

Ah yes, that old thread, thank you.

> That's when _I_ came to know how GCC interprets "+m", but probably
> this has been explained on those lists multiple times. Who cares,
> anyway?

I just couldn't find the thread you meant, I thought I missed
have it, that's all :-)


Segher


^ permalink raw reply

* Re: [RFC] restore netdev_priv optimization
From: David Miller @ 2007-08-17 23:56 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20070817161928.6b6302fd@freepuppy.rosehill.hemminger.net>

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Fri, 17 Aug 2007 16:19:28 -0700

> The subqueue is only referenced in start/stop queue and that only happens
> once per packet on normal tx, and only if multiqueue is used.

If it only happens when multiqueue, then why does loopback need
at least one queue entry there even though it's not multiqueue? :-)

This thing is deref'd regardless of multi-queue.

Once per TX packet is a lot, and it's the same amount of derefs
as dev->priv will have on the TX path.

^ permalink raw reply

* [PATCH] i386: optimize memset of 6 and 8 bytes
From: Stephen Hemminger @ 2007-08-17 23:50 UTC (permalink / raw)
  To: David S. Miller, Andrew Morton; +Cc: linux-kernel, netdev

Tne network code does memset for 6 and 8 byte values, that can easily
be optimized into simple assignments without string instructions.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>


--- a/include/asm-i386/string.h	2007-08-17 15:14:37.000000000 -0700
+++ b/include/asm-i386/string.h	2007-08-17 16:49:10.000000000 -0700
@@ -228,6 +228,14 @@ static __always_inline void * __constant
 		case 4:
 			*(unsigned long *)s = pattern;
 			return s;
+		case 6:
+			*(unsigned long *)s = pattern;
+			*(2+(unsigned short *)s) = pattern;
+			return s;
+		case 8:
+			*(unsigned long *)s = pattern;
+			*(1+(unsigned long *)s) = pattern;
+			return s;
 	}
 #define COMMON(x) \
 __asm__  __volatile__( \

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-17 23:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang
In-Reply-To: <a51ab88dc00f8d7fd44f79400671d259@kernel.crashing.org>



On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > No it does not have any volatile semantics. atomic_dec() can be
> > > > reordered
> > > > at will by the compiler within the current basic unit if you do not add
> > > > a
> > > > barrier.
> > > 
> > > "volatile" has nothing to do with reordering.
> > 
> > If you're talking of "volatile" the type-qualifier keyword, then
> > http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
> > otherwise.
> 
> I'm not sure what in that mail you mean, but anyway...
> 
> Yes, of course, the fact that "volatile" creates a side effect
> prevents certain things from being reordered wrt the atomic_dec();
> but the atomic_dec() has a side effect *already* so the volatile
> doesn't change anything.

That's precisely what that sub-thread (read down to the last mail
there, and not the first mail only) shows. So yes, "volatile" does
have something to do with re-ordering (as guaranteed by the C
standard).


> > > atomic_dec() writes
> > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > long as the compiler cannot optimise the atomic variable away
> > > completely -- any store counts as a side effect.
> > 
> > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > or one that uses a "forget" macro would have the same re-ordering
> > guarantees as an atomic_dec() that uses a volatile access cast.
> 
> The "asm volatile" implementation does have exactly the same
> reordering guarantees as the "volatile cast" thing,

I don't think so.

> if that is
> implemented by GCC in the "obvious" way.  Even a "plain" asm()
> will do the same.

Read the relevant GCC documentation.

[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-17 23:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Paul Mackerras, heiko.carstens, horms, Linux Kernel Mailing List,
	rpjday, ak, netdev, cfriesen, Nick Piggin, linux-arch,
	jesper.juhl, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang
In-Reply-To: <f5796b5aa38ae6305c65ffa0340ba295@kernel.crashing.org>



On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> > 
> > [ This is exactly equivalent to using "+m" in the constraints, as recently
> >   explained on a GCC list somewhere, in response to the patch in my bitops
> >   series a few weeks back where I thought "+m" was bogus. ]
> 
> [It wasn't explained on a GCC list in response to your patch, as
> far as I can see -- if I missed it, please point me to an archived
> version of it].

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html

is a follow-up in the thread on the gcc-patches@gcc.gnu.org mailing list,
which began with:

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01677.html

that was posted by Jan Kubicka, as he quotes in that initial posting,
after I had submitted:

http://lkml.org/lkml/2007/7/23/252

which was a (wrong) patch to "rectify" what I thought was the "bogus"
"+m" constraint, as per the quoted extract from gcc docs (that was
given in that (wrong) patch's changelog).

That's when _I_ came to know how GCC interprets "+m", but probably
this has been explained on those lists multiple times. Who cares,
anyway?


> One last time: it isn't equivalent on older (but still supported
> by Linux) versions of GCC.  Current versions of GCC allow it, but
> it has no documented behaviour at all, so use it at your own risk.

True.

^ permalink raw reply

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier @ 2007-08-17 23:31 UTC (permalink / raw)
  To: David Miller; +Cc: jeff, netdev, linux-kernel, general
In-Reply-To: <20070817.142756.65194845.davem@davemloft.net>

 > >  > When using RDMA you lose the capability to do packet shaping,
 > >  > classification, and all the other wonderful networking facilities
 > >  > you've grown to love and use over the years.
 > > 
 > > Same thing with TSO and LRO and who knows what else.
 > 
 > Not true at all.  Full classification and filtering still is usable
 > with TSO and LRO.

Well, obviously with TSO and LRO the packets that the stack sends or
receives are not the same as what's on the wire.  Whether that breaks
your wonderful networking facilities or not depends on the specifics
of the particular facility I guess -- for example shaping is clearly
broken by TSO.  (And people can wonder what the packet trains TSO
creates do to congestion control on the internet, but the netdev crowd
has already decided that TSO is "good" and RDMA is "bad")

 - R.

^ permalink raw reply

* [PATCH] ethernet: optimize memcpy and memset
From: Stephen Hemminger @ 2007-08-17 23:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

The ethernet header management only needs to handle a fixed
size address (6 bytes). If the memcpy/memset are changed to
be passed a constant length, then compiler can optimize for
this case (and if it is smart eliminate string instructions).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

--- a/net/ethernet/eth.c	2007-08-17 12:08:51.000000000 -0400
+++ b/net/ethernet/eth.c	2007-08-17 12:16:33.000000000 -0400
@@ -91,10 +91,10 @@ int eth_header(struct sk_buff *skb, stru
 
 	if (!saddr)
 		saddr = dev->dev_addr;
-	memcpy(eth->h_source, saddr, dev->addr_len);
+	memcpy(eth->h_source, saddr, ETH_ALEN);
 
 	if (daddr) {
-		memcpy(eth->h_dest, daddr, dev->addr_len);
+		memcpy(eth->h_dest, daddr, ETH_ALEN);
 		return ETH_HLEN;
 	}
 
@@ -103,7 +103,7 @@ int eth_header(struct sk_buff *skb, stru
 	 */
 
 	if (dev->flags & (IFF_LOOPBACK | IFF_NOARP)) {
-		memset(eth->h_dest, 0, dev->addr_len);
+		memset(eth->h_dest, 0, ETH_ALEN);
 		return ETH_HLEN;
 	}
 
@@ -135,7 +135,7 @@ int eth_rebuild_header(struct sk_buff *s
 		       "%s: unable to resolve type %X addresses.\n",
 		       dev->name, (int)eth->h_proto);
 
-		memcpy(eth->h_source, dev->dev_addr, dev->addr_len);
+		memcpy(eth->h_source, dev->dev_addr, ETH_ALEN);
 		break;
 	}
 
@@ -233,8 +233,8 @@ int eth_header_cache(struct neighbour *n
 		return -1;
 
 	eth->h_proto = type;
-	memcpy(eth->h_source, dev->dev_addr, dev->addr_len);
-	memcpy(eth->h_dest, neigh->ha, dev->addr_len);
+	memcpy(eth->h_source, dev->dev_addr, ETH_ALEN);
+	memcpy(eth->h_dest, neigh->ha, ETH_ALEN);
 	hh->hh_len = ETH_HLEN;
 	return 0;
 }
@@ -251,7 +251,7 @@ void eth_header_cache_update(struct hh_c
 			     unsigned char *haddr)
 {
 	memcpy(((u8 *) hh->hh_data) + HH_DATA_OFF(sizeof(struct ethhdr)),
-	       haddr, dev->addr_len);
+	       haddr, ETH_ALEN);
 }
 
 /**
@@ -271,7 +271,7 @@ static int eth_mac_addr(struct net_devic
 		return -EBUSY;
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
-	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
 	return 0;
 }
 

^ permalink raw reply

* Re: [RFC] restore netdev_priv optimization
From: Stephen Hemminger @ 2007-08-17 23:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20070817.160409.88475506.davem@davemloft.net>

On Fri, 17 Aug 2007 16:04:09 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Fri, 17 Aug 2007 15:40:22 -0700
> 
> > Compile tested only!!!
> 
> Obviously.  The first loopback transmit is guarenteed to crash.

That is fixable.

> > Fix optimization of netdev_priv() lost by the  addition of multiqueue.
> > Move the variable size subqueues to after the constant size priv area.
> > 
> > When putting back the old netdev_priv() code, I tried to make it
> > clearer by using roundup() and ALIGN() macros.
> > 
> > --- a/include/linux/netdevice.h	2007-08-17 12:08:51.000000000 -0400
> > +++ b/include/linux/netdevice.h	2007-08-17 12:48:03.000000000 -0400
> > @@ -575,16 +575,15 @@ struct net_device
> >  
> >  	/* The TX queue control structures */
> >  	unsigned int			egress_subqueue_count;
> > -	struct net_device_subqueue	egress_subqueue[1];
> > +	struct net_device_subqueue	*egress_subqueue;
> >  };
> 
> This just trades off the dev->priv dereference for a
> dev->egress_subqueue dereference.  I bet they occur about equally in
> the data paths, at least on transmit.

The subqueue is only referenced in start/stop queue and that only happens
once per packet on normal tx, and only if multiqueue is used.

The existing multiqueue penalizes all devices, not just multiqueue devices.

> And this also breaks loopback again, which uses a static struct netdev
> in the kernel image, it doesn't use alloc_netdev(), so egress_subqueue
> of loopback will be NULL.

That can be overcome.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 23:17 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang
In-Reply-To: <alpine.LFD.0.999.0708180002290.3666@enigma.security.iitk.ac.in>

>>> No it does not have any volatile semantics. atomic_dec() can be 
>>> reordered
>>> at will by the compiler within the current basic unit if you do not 
>>> add a
>>> barrier.
>>
>> "volatile" has nothing to do with reordering.
>
> If you're talking of "volatile" the type-qualifier keyword, then
> http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
> otherwise.

I'm not sure what in that mail you mean, but anyway...

Yes, of course, the fact that "volatile" creates a side effect
prevents certain things from being reordered wrt the atomic_dec();
but the atomic_dec() has a side effect *already* so the volatile
doesn't change anything.

>> atomic_dec() writes
>> to memory, so it _does_ have "volatile semantics", implicitly, as
>> long as the compiler cannot optimise the atomic variable away
>> completely -- any store counts as a side effect.
>
> I don't think an atomic_dec() implemented as an inline "asm volatile"
> or one that uses a "forget" macro would have the same re-ordering
> guarantees as an atomic_dec() that uses a volatile access cast.

The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing, if that is
implemented by GCC in the "obvious" way.  Even a "plain" asm()
will do the same.


Segher

^ permalink raw reply

* Re: [RFC] restore netdev_priv optimization
From: David Miller @ 2007-08-17 23:04 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20070817154022.0f3afba4@freepuppy.rosehill.hemminger.net>

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Fri, 17 Aug 2007 15:40:22 -0700

> Compile tested only!!!

Obviously.  The first loopback transmit is guarenteed to crash.

> Fix optimization of netdev_priv() lost by the  addition of multiqueue.
> Move the variable size subqueues to after the constant size priv area.
> 
> When putting back the old netdev_priv() code, I tried to make it
> clearer by using roundup() and ALIGN() macros.
> 
> --- a/include/linux/netdevice.h	2007-08-17 12:08:51.000000000 -0400
> +++ b/include/linux/netdevice.h	2007-08-17 12:48:03.000000000 -0400
> @@ -575,16 +575,15 @@ struct net_device
>  
>  	/* The TX queue control structures */
>  	unsigned int			egress_subqueue_count;
> -	struct net_device_subqueue	egress_subqueue[1];
> +	struct net_device_subqueue	*egress_subqueue;
>  };

This just trades off the dev->priv dereference for a
dev->egress_subqueue dereference.  I bet they occur about equally in
the data paths, at least on transmit.

And this also breaks loopback again, which uses a static struct netdev
in the kernel image, it doesn't use alloc_netdev(), so egress_subqueue
of loopback will be NULL.


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 22:49 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Paul Mackerras, heiko.carstens, horms, Linux Kernel Mailing List,
	rpjday, ak, netdev, cfriesen, Nick Piggin, linux-arch,
	jesper.juhl, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708171016160.3666@enigma.security.iitk.ac.in>

> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
>
> [ This is exactly equivalent to using "+m" in the constraints, as 
> recently
>   explained on a GCC list somewhere, in response to the patch in my 
> bitops
>   series a few weeks back where I thought "+m" was bogus. ]

[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].

One last time: it isn't equivalent on older (but still supported
by Linux) versions of GCC.  Current versions of GCC allow it, but
it has no documented behaviour at all, so use it at your own risk.


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 22:38 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708170957350.3666@enigma.security.iitk.ac.in>

>>> Here, I should obviously admit that the semantics of *(volatile int 
>>> *)&
>>> aren't any neater or well-defined in the _language standard_ at all. 
>>> The
>>> standard does say (verbatim) "precisely what constitutes as access to
>>> object of volatile-qualified type is implementation-defined", but GCC
>>> does help us out here by doing the right thing.
>>
>> Where do you get that idea?
>
> Try a testcase (experimentally verify).

That doesn't prove anything.  Experiments can only disprove
things.

>> GCC manual, section 6.1, "When
>> is a Volatile Object Accessed?" doesn't say anything of the
>> kind.
>
> True, "implementation-defined" as per the C standard _is_ supposed to 
> mean
> "unspecified behaviour where each implementation documents how the 
> choice
> is made". So ok, probably GCC isn't "documenting" this
> implementation-defined behaviour which it is supposed to, but can't 
> really
> fault them much for this, probably.

GCC _is_ documenting this, namely in this section 6.1.  It doesn't
mention volatile-casted stuff.  Draw your own conclusions.


Segher


^ permalink raw reply

* [RFC] restore netdev_priv optimization
From: Stephen Hemminger @ 2007-08-17 22:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Compile tested only!!!

Fix optimization of netdev_priv() lost by the  addition of multiqueue.
Move the variable size subqueues to after the constant size priv area.

When putting back the old netdev_priv() code, I tried to make it
clearer by using roundup() and ALIGN() macros.

--- a/include/linux/netdevice.h	2007-08-17 12:08:51.000000000 -0400
+++ b/include/linux/netdevice.h	2007-08-17 12:48:03.000000000 -0400
@@ -575,16 +575,15 @@ struct net_device
 
 	/* The TX queue control structures */
 	unsigned int			egress_subqueue_count;
-	struct net_device_subqueue	egress_subqueue[1];
+	struct net_device_subqueue	*egress_subqueue;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
 #define	NETDEV_ALIGN		32
-#define	NETDEV_ALIGN_CONST	(NETDEV_ALIGN - 1)
 
 static inline void *netdev_priv(const struct net_device *dev)
 {
-	return dev->priv;
+	return (void *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
 }
 
 #define SET_MODULE_OWNER(dev) do { } while (0)
--- a/net/core/dev.c	2007-08-17 09:02:57.000000000 -0400
+++ b/net/core/dev.c	2007-08-17 12:48:30.000000000 -0400
@@ -3706,7 +3706,22 @@ EXPORT_SYMBOL(dev_get_stats);
  *	@queue_count:	the number of subqueues to allocate
  *
  *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
+ *	and performs basic initialization.
+ *
+ *      Layout:
+ *              allocation->+------------+
+ *                          | (pad)      | dev->padded
+ *                   dev -->+------------+ -- (32 byte boundary)
+ *                          | net_device |
+ *      netdev_priv(dev) -->+------------+ -- (32 byte boundary)
+ *                          | device     |
+ *                          | private    |
+ *			    |            |
+ * dev->egress_subqueue  -->+------------+ -- (32 byte boundary)
+ *                          | Tx queue(s)|
+ *                          +------------+
+ *
+ * Also allocates subqueue structs
  *	for each queue on the device at the end of the netdevice.
  */
 struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
@@ -3719,10 +3734,9 @@ struct net_device *alloc_netdev_mq(int s
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
 	/* ensure 32-byte alignment of both the device and private area */
-	alloc_size = (sizeof(*dev) + NETDEV_ALIGN_CONST +
-		     (sizeof(struct net_device_subqueue) * (queue_count - 1))) &
-		     ~NETDEV_ALIGN_CONST;
-	alloc_size += sizeof_priv + NETDEV_ALIGN_CONST;
+	alloc_size = roundup(sizeof(*dev), NETDEV_ALIGN);
+	alloc_size += roundup(sizeof_priv, NETDEV_ALIGN);
+	alloc_size += sizeof(struct net_device_subqueue) * queue_count;
 
 	p = kzalloc(alloc_size, GFP_KERNEL);
 	if (!p) {
@@ -3730,19 +3744,20 @@ struct net_device *alloc_netdev_mq(int s
 		return NULL;
 	}
 
-	dev = (struct net_device *)
-		(((long)p + NETDEV_ALIGN_CONST) & ~NETDEV_ALIGN_CONST);
+	/* Pad so that network device is on cache line boundary	 */
+	dev = (struct net_device *) ALIGN((unsigned long) p, NETDEV_ALIGN);
 	dev->padded = (char *)dev - (char *)p;
 
-	if (sizeof_priv) {
-		dev->priv = ((char *)dev +
-			     ((sizeof(struct net_device) +
-			       (sizeof(struct net_device_subqueue) *
-				(queue_count - 1)) + NETDEV_ALIGN_CONST)
-			      & ~NETDEV_ALIGN_CONST));
-	}
+	if (sizeof_priv)
+		dev->priv = netdev_priv(dev);
 
+	/* Put subqueue(s) which are variable size after fix sized priv area */
 	dev->egress_subqueue_count = queue_count;
+	dev->egress_subqueue = ALIGN((unsigned long)(netdev_priv(dev) + sizeof_priv),
+				     NETDEV_ALIGN);
+
+	pr_debug("alloc_netdev_mq: p=%p dev=%p priv=%p q=%p\n",
+	       p, dev, dev->priv, dev->egress_subqueue);
 
 	setup(dev);
 	strcpy(dev->name, name);

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 22:34 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, davids, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708170945300.3666@enigma.security.iitk.ac.in>

> Now the second wording *IS* technically correct, but come on, it's
> 24 words long whereas the original one was 3 -- and hopefully anybody
> reading the shorter phrase *would* have known anyway what was meant,
> without having to be pedantic about it :-)

Well you were talking pretty formal (and detailed) stuff, so
IMHO it's good to have that exactly correct.  Sure it's nicer
to use small words most of the time :-)


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 22:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, heiko.carstens, horms, linux-kernel, rpjday, ak,
	netdev, cfriesen, akpm, Nick Piggin, linux-arch, jesper.juhl,
	satyam, zlynx, clameter, schwidefsky, Chris Snook, Herbert Xu,
	davem, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708162033400.30176@woody.linux-foundation.org>

> In a reasonable world, gcc should just make that be (on x86)
>
> 	addl $1,i(%rip)
>
> on x86-64, which is indeed what it does without the volatile. But with 
> the
> volatile, the compiler gets really nervous, and doesn't dare do it in 
> one
> instruction, and thus generates crap like
>
>         movl    i(%rip), %eax
>         addl    $1, %eax
>         movl    %eax, i(%rip)
>
> instead. For no good reason, except that "volatile" just doesn't have 
> any
> good/clear semantics for the compiler, so most compilers will just 
> make it
> be "I will not touch this access in any way, shape, or form". Including
> even trivially correct instruction optimization/combination.

It's just a (target-specific, perhaps) missed-optimisation kind
of bug in GCC.  Care to file a bug report?

> but is
> (again) something that gcc doesn't dare do, since "i" is volatile.

Just nobody taught it it can do this; perhaps no one wanted to
add optimisations like that, maybe with a reasoning like "people
who hit the go-slow-in-unspecified-ways button should get what
they deserve" ;-)


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 22:14 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, Christoph Lameter, Paul Mackerras, heiko.carstens,
	Stefan Richter, horms, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang
In-Reply-To: <46C512EB.7020603@yahoo.com.au>

> (and yes, it is perfectly legitimate to
> want a non-volatile read for a data type that you also want to do
> atomic RMW operations on)

...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-17 22:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Satyam Sharma, Ilpo J?rvinen,
	Linux Kernel Mailing List, David Miller, Paul E. McKenney, ak,
	Netdev, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708161953310.30176@woody.linux-foundation.org>

> Of course, since *normal* accesses aren't necessarily limited wrt
> re-ordering, the question then becomes one of "with regard to *what* 
> does
> it limit re-ordering?".
>
> A C compiler that re-orders two different volatile accesses that have a
> sequence point in between them is pretty clearly a buggy compiler. So 
> at a
> minimum, it limits re-ordering wrt other volatiles (assuming sequence
> points exists). It also means that the compiler cannot move it
> speculatively across conditionals, but other than that it's starting to
> get fuzzy.

This is actually really well-defined in C, not fuzzy at all.
"Volatile accesses" are a side effect, and no side effects can
be reordered with respect to sequence points.  The side effects
that matter in the kernel environment are: 1) accessing a volatile
object; 2) modifying an object; 3) volatile asm(); 4) calling a
function that does any of these.

We certainly should avoid volatile whenever possible, but "because
it's fuzzy wrt reordering" is not a reason -- all alternatives have
exactly the same issues.


Segher


^ permalink raw reply

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: David Miller @ 2007-08-17 21:27 UTC (permalink / raw)
  To: rdreier; +Cc: tom, jeff, swise, mshefty, netdev, linux-kernel, general
In-Reply-To: <adaodh6x7js.fsf@cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Fri, 17 Aug 2007 12:52:39 -0700

>  > When using RDMA you lose the capability to do packet shaping,
>  > classification, and all the other wonderful networking facilities
>  > you've grown to love and use over the years.
> 
> Same thing with TSO and LRO and who knows what else.

Not true at all.  Full classification and filtering still is usable
with TSO and LRO.

^ permalink raw reply

* [PATCH] e1000e: Update e1000e driver to use devres
From: Brandon Philips @ 2007-08-17 20:25 UTC (permalink / raw)
  To: Kok, Auke, jgarzik; +Cc: Tejun Heo, e1000-devel, netdev
In-Reply-To: <46C48B2E.9040105@intel.com>

Conversion of e1000e probe() and remove() to devres.

Depends on "[patch 1/4] Update net core to use devres."

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/net/e1000e/netdev.c |   70 ++++++++++----------------------------------
 1 file changed, 17 insertions(+), 53 deletions(-)

Index: linux-netdev/drivers/net/e1000e/netdev.c
===================================================================
--- linux-netdev.orig/drivers/net/e1000e/netdev.c
+++ linux-netdev/drivers/net/e1000e/netdev.c
@@ -2516,6 +2516,7 @@ void e1000e_reinit_locked(struct e1000_a
 static int __devinit e1000_sw_init(struct e1000_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
+	struct pci_dev *pdev = adapter->pdev;
 	struct net_device *netdev = adapter->netdev;
 
 	adapter->rx_buffer_len = ETH_FRAME_LEN + VLAN_HLEN + ETH_FCS_LEN;
@@ -2523,11 +2524,13 @@ static int __devinit e1000_sw_init(struc
 	hw->mac.max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
 	hw->mac.min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
 
-	adapter->tx_ring = kzalloc(sizeof(struct e1000_ring), GFP_KERNEL);
+	adapter->tx_ring = devm_kzalloc(&pdev->dev,
+					sizeof(struct e1000_ring), GFP_KERNEL);
 	if (!adapter->tx_ring)
 		goto err;
 
-	adapter->rx_ring = kzalloc(sizeof(struct e1000_ring), GFP_KERNEL);
+	adapter->rx_ring = devm_kzalloc(&pdev->dev,
+					sizeof(struct e1000_ring), GFP_KERNEL);
 	if (!adapter->rx_ring)
 		goto err;
 
@@ -2544,8 +2547,6 @@ static int __devinit e1000_sw_init(struc
 
 err:
 	ndev_err(netdev, "Unable to allocate memory for queues\n");
-	kfree(adapter->rx_ring);
-	kfree(adapter->tx_ring);
 	return -ENOMEM;
 }
 
@@ -4016,15 +4017,13 @@ static int __devinit e1000_probe(struct 
 	struct e1000_adapter *adapter;
 	struct e1000_hw *hw;
 	const struct e1000_info *ei = e1000_info_tbl[ent->driver_data];
-	unsigned long mmio_start, mmio_len;
-	unsigned long flash_start, flash_len;
 
 	static int cards_found;
 	int i, err, pci_using_dac;
 	u16 eeprom_data = 0;
 	u16 eeprom_apme_mask = E1000_EEPROM_APME;
 
-	err = pci_enable_device(pdev);
+	err = pcim_enable_device(pdev);
 	if (err)
 		return err;
 
@@ -4042,21 +4041,20 @@ static int __devinit e1000_probe(struct 
 			if (err) {
 				dev_err(&pdev->dev, "No usable DMA "
 					"configuration, aborting\n");
-				goto err_dma;
+				return err;
 			}
 		}
 	}
 
 	err = pci_request_regions(pdev, e1000e_driver_name);
 	if (err)
-		goto err_pci_reg;
+		return err;
 
 	pci_set_master(pdev);
 
-	err = -ENOMEM;
-	netdev = alloc_etherdev(sizeof(struct e1000_adapter));
+	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct e1000_adapter));
 	if (!netdev)
-		goto err_alloc_etherdev;
+		return -ENOMEM;
 
 	SET_MODULE_OWNER(netdev);
 	SET_NETDEV_DEV(netdev, &pdev->dev);
@@ -4073,21 +4071,16 @@ static int __devinit e1000_probe(struct 
 	adapter->hw.mac.type = ei->mac;
 	adapter->msg_enable = (1 << NETIF_MSG_DRV | NETIF_MSG_PROBE) - 1;
 
-	mmio_start = pci_resource_start(pdev, 0);
-	mmio_len = pci_resource_len(pdev, 0);
 
-	err = -EIO;
-	adapter->hw.hw_addr = ioremap(mmio_start, mmio_len);
+	adapter->hw.hw_addr = pcim_iomap(pdev, 0, 0);
 	if (!adapter->hw.hw_addr)
-		goto err_ioremap;
+		return -EIO;
 
 	if ((adapter->flags & FLAG_HAS_FLASH) &&
 	    (pci_resource_flags(pdev, 1) & IORESOURCE_MEM)) {
-		flash_start = pci_resource_start(pdev, 1);
-		flash_len = pci_resource_len(pdev, 1);
-		adapter->hw.flash_address = ioremap(flash_start, flash_len);
+		adapter->hw.flash_address = pcim_iomap(pdev, 1, 0);
 		if (!adapter->hw.flash_address)
-			goto err_flashmap;
+			return -EIO;
 	}
 
 	/* construct the net_device struct */
@@ -4112,17 +4105,15 @@ static int __devinit e1000_probe(struct 
 #endif
 	strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
 
-	netdev->mem_start = mmio_start;
-	netdev->mem_end = mmio_start + mmio_len;
+	netdev->mem_start = pci_resource_start(pdev, 0);
+	netdev->mem_end = netdev->mem_start + pci_resource_len(pdev, 0);
 
 	adapter->bd_number = cards_found++;
 
 	/* setup adapter struct */
 	err = e1000_sw_init(adapter);
 	if (err)
-		goto err_sw_init;
-
-	err = -EIO;
+		return err;
 
 	memcpy(&hw->mac.ops, ei->mac_ops, sizeof(hw->mac.ops));
 	memcpy(&hw->nvm.ops, ei->nvm_ops, sizeof(hw->nvm.ops));
@@ -4290,21 +4281,6 @@ err_eeprom:
 	if (!e1000_check_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
 
-	if (adapter->hw.flash_address)
-		iounmap(adapter->hw.flash_address);
-
-err_flashmap:
-	kfree(adapter->tx_ring);
-	kfree(adapter->rx_ring);
-err_sw_init:
-	iounmap(adapter->hw.hw_addr);
-err_ioremap:
-	free_netdev(netdev);
-err_alloc_etherdev:
-	pci_release_regions(pdev);
-err_pci_reg:
-err_dma:
-	pci_disable_device(pdev);
 	return err;
 }
 
@@ -4340,18 +4316,6 @@ static void __devexit e1000_remove(struc
 
 	if (!e1000_check_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
-
-	kfree(adapter->tx_ring);
-	kfree(adapter->rx_ring);
-
-	iounmap(adapter->hw.hw_addr);
-	if (adapter->hw.flash_address)
-		iounmap(adapter->hw.flash_address);
-	pci_release_regions(pdev);
-
-	free_netdev(netdev);
-
-	pci_disable_device(pdev);
 }
 
 /* PCI Error Recovery (ERS) */

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-17 20:12 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Chris Friesen, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <1187380142.2615.6.camel@laptopd505.fenrus.org>

On Fri, Aug 17, 2007 at 12:49:00PM -0700, Arjan van de Ven wrote:
> 
> On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote:
> > > > What about reading values modified in interrupt handlers, as in your 
> > > > "random" case?  Or is this a bug where the user of atomic_read() is 
> > > > invalidly expecting a read each time it is called?
> > > 
> > > the interrupt handler case is an SMP case since you do not know
> > > beforehand what cpu your interrupt handler will run on.
> > 
> > With the exception of per-CPU variables, yes.
> 
> if you're spinning waiting for a per-CPU variable to get changed by an
> interrupt handler... you have bigger problems than "volatile" ;-)

That would be true, if you were doing that.  But you might instead be
simply making sure that the mainline actions were seen in order by the
interrupt handler.  My current example is the NMI-save rcu_read_lock()
implementation for realtime.  Not the common case, I will admit, but
still real.  ;-)

						Thanx, Paul

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Arjan van de Ven @ 2007-08-17 19:49 UTC (permalink / raw)
  To: paulmck
  Cc: Chris Friesen, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <20070817194924.GG8464@linux.vnet.ibm.com>


On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote:
> > > What about reading values modified in interrupt handlers, as in your 
> > > "random" case?  Or is this a bug where the user of atomic_read() is 
> > > invalidly expecting a read each time it is called?
> > 
> > the interrupt handler case is an SMP case since you do not know
> > beforehand what cpu your interrupt handler will run on.
> 
> With the exception of per-CPU variables, yes.

if you're spinning waiting for a per-CPU variable to get changed by an
interrupt handler... you have bigger problems than "volatile" ;-)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier @ 2007-08-17 19:52 UTC (permalink / raw)
  To: David Miller; +Cc: jeff, netdev, linux-kernel, general
In-Reply-To: <20070816.141751.115907875.davem@davemloft.net>

 > > Isn't RDMA _part_ of the "software net stack" within Linux?

 > It very much is not so.

This is just nit-picking.  You can draw the boundary of the "software
net stack" wherever you want, but I think Sean's point was just that
RDMA drivers already are part of Linux, and we all want them to get
better.

 > When using RDMA you lose the capability to do packet shaping,
 > classification, and all the other wonderful networking facilities
 > you've grown to love and use over the years.

Same thing with TSO and LRO and who knows what else.  I know you're
going to make a distinction between "stateless" and "stateful"
offloads, but really it's just an arbitrary distinction between things
you like and things you don't.

 > Imagine if you didn't know any of this, you purchase and begin to
 > deploy a huge piece of RDMA infrastructure, you then get the mandate
 > from IT that you need to add firewalling on the RDMA connections at
 > the host level, and "oh shit" you can't?

It's ironic that you bring up firewalling.  I've had vendors of iWARP
hardware tell me they would *love* to work with the community to make
firewalling work better for RDMA connections.  But instead we get the
catch-22 of your changing arguments -- first, you won't even consider
changes that might help RDMA work better in the name of
maintainability; then you have to protect poor, ignorant users from
accidentally using RDMA because of some problem or another; and then
when someone tries to fix some of the problems you mention, it's back
to step one.

Obviously some decisions have been prejudged here, so I guess this
moves to the realm of politics.  I have plenty of interesting
technical stuff, so I'll leave it to the people with a horse in the
race to find ways to twist your arm.

 - R.

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-17 19:49 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Chris Friesen, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <1187376873.2615.2.camel@laptopd505.fenrus.org>

On Fri, Aug 17, 2007 at 11:54:33AM -0700, Arjan van de Ven wrote:
> 
> On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote:
> > Linus Torvalds wrote:
> > 
> > >  - in other words, the *only* possible meaning for "volatile" is a purely 
> > >    single-CPU meaning. And if you only have a single CPU involved in the 
> > >    process, the "volatile" is by definition pointless (because even 
> > >    without a volatile, the compiler is required to make the C code appear 
> > >    consistent as far as a single CPU is concerned).
> > 
> > I assume you mean "except for IO-related code and 'random' values like 
> > jiffies" as you mention later on?  I assume other values set in 
> > interrupt handlers would count as "random" from a volatility perspective?
> > 
> > > So anybody who argues for "volatile" fixing bugs is fundamentally 
> > > incorrect. It does NO SUCH THING. By arguing that, such people only show 
> > > that you have no idea what they are talking about.
> > 
> > What about reading values modified in interrupt handlers, as in your 
> > "random" case?  Or is this a bug where the user of atomic_read() is 
> > invalidly expecting a read each time it is called?
> 
> the interrupt handler case is an SMP case since you do not know
> beforehand what cpu your interrupt handler will run on.

With the exception of per-CPU variables, yes.

							Thanx, Paul

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox