Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: Eric Dumazet @ 2010-04-26  7:21 UTC (permalink / raw)
  To: David Miller; +Cc: xiaosuo, netdev
In-Reply-To: <20100425.232758.101478192.davem@davemloft.net>

Le dimanche 25 avril 2010 à 23:27 -0700, David Miller a écrit :

> No matter what is faster, we have to process packets in
> FIFO order, otherwise we get reordering within a flow
> which is to be absolutely avoided.

Hmm, completion queue is about freeing skb, and its name is misleading.

This queue is feed by dev_kfree_skb_irq() only




^ permalink raw reply

* Re: RPS and forwarding
From: Herbert Xu @ 2010-04-26  6:31 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev
In-Reply-To: <20100425.232857.145290769.davem@davemloft.net>

On Sun, Apr 25, 2010 at 11:28:57PM -0700, David Miller wrote:
>
> Right, none of this stuff is on by default.

Great, that puts my mind at ease :)

> It might be possible to somehow make the sock table get bypassed for
> forwarded traffic, but I can't think of a cheap way to do that at
> the moment.

Yeah I thought about that too but as we need to go through the
routing table before we know whether something is forwarded or
not, it certainly seems non-trivial.

Hmm, maybe if we used a routing cache keyed by rxhash we could
make an appromixation? After all, we only need to make sure that
no forwarded traffic is redirected, and not the reverse.  IOW,
it's OK if we incorrectly classify some local traffic as forwarded.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: RPS and forwarding
From: David Miller @ 2010-04-26  6:28 UTC (permalink / raw)
  To: therbert; +Cc: herbert, netdev
In-Reply-To: <r2n65634d661004252200x4e7a49f1h181cdd69b3e4ebc0@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Sun, 25 Apr 2010 22:00:09 -0700

>> Right.  The problem is that if you run a distribution kernel on
>> a router with CONFIG_RPS (and hence RFS) enabled, you may suddenly
>> start seeing packet reordering on forwarded traffic due to the
>> presence of local traffic.
>>
>> Can we perhaps add a run-time toggle to disable RFS?
>>
> RFS is not on at run-time by default.  The number of entries in the
> global_rps_sock table needs to be set in sysctl, and the number of
> entries in rps_dev_flow_table needs to be set per RX queue in sysfs.
> You can turn it on for some devices, but not others if that helps.

Right, none of this stuff is on by default.

It might be possible to somehow make the sock table get bypassed for
forwarded traffic, but I can't think of a cheap way to do that at
the moment.

^ permalink raw reply

* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: David Miller @ 2010-04-26  6:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev
In-Reply-To: <1272262154.2069.1032.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 26 Apr 2010 08:09:14 +0200

> LIFO is the slub behavior, for the exact reason its better for cache
> reuse. Same for completion queue.
> 
> I repeat : 
> - slub dont touch objects in normal situations.
> - LIFO is better for caches.
> - LIFO is faster (one pointer for the queue head)

No matter what is faster, we have to process packets in
FIFO order, otherwise we get reordering within a flow
which is to be absolutely avoided.

^ permalink raw reply

* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: Eric Dumazet @ 2010-04-26  6:09 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <q2w412e6f7f1004252244s3a5000eft850c2a5dc0fd72d1@mail.gmail.com>

Le lundi 26 avril 2010 à 13:44 +0800, Changli Gao a écrit :
> On Mon, Apr 26, 2010 at 1:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le lundi 26 avril 2010 à 11:20 +0800, Changli Gao a écrit :
> >> reimplement completion_queue as a FIFO queue.
> >>
> >> As slab allocator always does its best to return the latest unused objects, we'd
> >> better release skb in order, this patch reimplement completion_queue as a FIFO
> >> queue instead of the old LIFO queue.
> >
> >
> > 1) New devices dont use completion queue.
> 
> It is good enough reason for rejection.
> 

I said, this part of the kernel is not used today.

> >
> > 2) Hot objects are the last enqueued.
> >   If many objects were queued, the old one are not hot anymore.
> >
> >   Using FIFO will give more cache misses, when walking the chain
> > (skb->next)
> >
> 
> I meaned that slab allocator maintains objects in the LIFO manner, and
> if we call kmem_cache_alloc(cache) after a kmem_cache_free(cache)
> call, the last object freed by kmem_cache_free(cache) will be
> returned, as this object are more likely in cache and hot. If we don't
> realse skbs in FIFO manner, the last and hot one will not been
> returned at first.
> 

But your patch doesnt change this behavior. This is irrelevant.

LIFO is the slub behavior, for the exact reason its better for cache
reuse. Same for completion queue.

I repeat : 
- slub dont touch objects in normal situations.
- LIFO is better for caches.
- LIFO is faster (one pointer for the queue head)



^ permalink raw reply

* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: Changli Gao @ 2010-04-26  5:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1272258841.2069.968.camel@edumazet-laptop>

On Mon, Apr 26, 2010 at 1:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 26 avril 2010 à 11:20 +0800, Changli Gao a écrit :
>> reimplement completion_queue as a FIFO queue.
>>
>> As slab allocator always does its best to return the latest unused objects, we'd
>> better release skb in order, this patch reimplement completion_queue as a FIFO
>> queue instead of the old LIFO queue.
>>
>
> 1) New devices dont use completion queue.

It is good enough reason for rejection.

>
> 2) Hot objects are the last enqueued.
>   If many objects were queued, the old one are not hot anymore.
>
>   Using FIFO will give more cache misses, when walking the chain
> (skb->next)
>

I meaned that slab allocator maintains objects in the LIFO manner, and
if we call kmem_cache_alloc(cache) after a kmem_cache_free(cache)
call, the last object freed by kmem_cache_free(cache) will be
returned, as this object are more likely in cache and hot. If we don't
realse skbs in FIFO manner, the last and hot one will not been
returned at first.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: RPS and forwarding
From: Eric Dumazet @ 2010-04-26  5:15 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Herbert Xu, David S. Miller, netdev
In-Reply-To: <z2w65634d661004252144z1bf72880wdf7a0143b8490ddf@mail.gmail.com>

Le dimanche 25 avril 2010 à 21:44 -0700, Tom Herbert a écrit :
> > Apart from not using RPS on routers, I suppose people doing
> > forwarding will simply have to maintain a constant RPS table,
> > and forgo its local redirection capabilities.
> >
> 
> I think you'd want RPS on router (steering packets using hash and CPU
> mask).  Most of your concerns are about RFS (steering by flows), which
> is probably more appropriate for a server.

We could probably rename RFS parts to RFS to avoid confusion ?




^ permalink raw reply

* Re: [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: Eric Dumazet @ 2010-04-26  5:14 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1272252016-11755-1-git-send-email-xiaosuo@gmail.com>

Le lundi 26 avril 2010 à 11:20 +0800, Changli Gao a écrit :
> reimplement completion_queue as a FIFO queue.
> 
> As slab allocator always does its best to return the latest unused objects, we'd
> better release skb in order, this patch reimplement completion_queue as a FIFO
> queue instead of the old LIFO queue.
> 

1) New devices dont use completion queue.

2) Hot objects are the last enqueued.
   If many objects were queued, the old one are not hot anymore.

   Using FIFO will give more cache misses, when walking the chain
(skb->next)

3) Claim about slab allocators properties are irrelevant. SLUB doesnt
touch objects, unless some debugging is on.

> At the same time, a opencoding skb queue is replaced with a sk_buff_head queue.
> 

Yes, this is a pro, yet it cost a bit more, because of extra things
(qlen management, and two pointers per object)

For example, compare text size before and after the patch.

If you feel open coding a LIFO is error prone, you could add generic
primitives to kernel ?

> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  include/linux/netdevice.h |    2 +-
>  net/core/dev.c            |   28 ++++++++++------------------
>  net/core/netpoll.c        |   12 +++++-------
>  3 files changed, 16 insertions(+), 26 deletions(-)
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3c5ed5f..785e514 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1386,7 +1386,7 @@ static inline int unregister_gifconf(unsigned int family)
>  struct softnet_data {
>  	struct Qdisc		*output_queue;
>  	struct list_head	poll_list;
> -	struct sk_buff		*completion_queue;
> +	struct sk_buff_head	completion_queue;
>  
>  #ifdef CONFIG_RPS
>  	struct softnet_data	*rps_ipi_list;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4d43f1a..5bbfa0f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1578,8 +1578,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
>  
>  		local_irq_save(flags);
>  		sd = &__get_cpu_var(softnet_data);
> -		skb->next = sd->completion_queue;
> -		sd->completion_queue = skb;
> +		__skb_queue_tail(&sd->completion_queue, skb);
>  		raise_softirq_irqoff(NET_TX_SOFTIRQ);
>  		local_irq_restore(flags);
>  	}
> @@ -2506,18 +2505,16 @@ static void net_tx_action(struct softirq_action *h)
>  {
>  	struct softnet_data *sd = &__get_cpu_var(softnet_data);
>  
> -	if (sd->completion_queue) {
> -		struct sk_buff *clist;
> +	if (!skb_queue_empty(&sd->completion_queue)) {
> +		struct sk_buff_head queue;
> +		struct sk_buff *skb;
>  
> +		__skb_queue_head_init(&queue);
>  		local_irq_disable();
> -		clist = sd->completion_queue;
> -		sd->completion_queue = NULL;
> +		skb_queue_splice_tail_init(&sd->completion_queue, &queue);
>  		local_irq_enable();
>  
> -		while (clist) {
> -			struct sk_buff *skb = clist;
> -			clist = clist->next;
> -
> +		while ((skb = __skb_dequeue(&queue))) {
>  			WARN_ON(atomic_read(&skb->users));
>  			__kfree_skb(skb);
>  		}
> @@ -5593,7 +5590,6 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>  			    unsigned long action,
>  			    void *ocpu)
>  {
> -	struct sk_buff **list_skb;
>  	struct Qdisc **list_net;
>  	struct sk_buff *skb;
>  	unsigned int cpu, oldcpu = (unsigned long)ocpu;
> @@ -5607,13 +5603,9 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>  	sd = &per_cpu(softnet_data, cpu);
>  	oldsd = &per_cpu(softnet_data, oldcpu);
>  
> -	/* Find end of our completion_queue. */
> -	list_skb = &sd->completion_queue;
> -	while (*list_skb)
> -		list_skb = &(*list_skb)->next;
>  	/* Append completion queue from offline CPU. */
> -	*list_skb = oldsd->completion_queue;
> -	oldsd->completion_queue = NULL;
> +	skb_queue_splice_tail_init(&oldsd->completion_queue,
> +				   &sd->completion_queue);
>  
>  	/* Find end of our output_queue. */
>  	list_net = &sd->output_queue;
> @@ -5849,7 +5841,7 @@ static int __init net_dev_init(void)
>  		struct softnet_data *sd = &per_cpu(softnet_data, i);
>  
>  		skb_queue_head_init(&sd->input_pkt_queue);
> -		sd->completion_queue = NULL;
> +		__skb_queue_head_init(&sd->completion_queue);
>  		INIT_LIST_HEAD(&sd->poll_list);
>  
>  #ifdef CONFIG_RPS
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index a58f59b..3905a14 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -222,17 +222,15 @@ static void zap_completion_queue(void)
>  	unsigned long flags;
>  	struct softnet_data *sd = &get_cpu_var(softnet_data);
>  
> -	if (sd->completion_queue) {
> -		struct sk_buff *clist;
> +	if (!skb_queue_empty(&sd->completion_queue)) {
> +		struct sk_buff_head queue;
> +		struct sk_buff *skb;
>  
>  		local_irq_save(flags);
> -		clist = sd->completion_queue;
> -		sd->completion_queue = NULL;
> +		skb_queue_splice_tail_init(&sd->completion_queue, &queue);
>  		local_irq_restore(flags);
>  
> -		while (clist != NULL) {
> -			struct sk_buff *skb = clist;
> -			clist = clist->next;
> +		while ((skb = __skb_dequeue(&queue))) {
>  			if (skb->destructor) {
>  				atomic_inc(&skb->users);
>  				dev_kfree_skb_any(skb); /* put this one back */
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Re: RPS and forwarding
From: Tom Herbert @ 2010-04-26  5:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <20100426045316.GA21810@gondor.apana.org.au>

> Right.  The problem is that if you run a distribution kernel on
> a router with CONFIG_RPS (and hence RFS) enabled, you may suddenly
> start seeing packet reordering on forwarded traffic due to the
> presence of local traffic.
>
> Can we perhaps add a run-time toggle to disable RFS?
>
RFS is not on at run-time by default.  The number of entries in the
global_rps_sock table needs to be set in sysctl, and the number of
entries in rps_dev_flow_table needs to be set per RX queue in sysfs.
You can turn it on for some devices, but not others if that helps.

> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

^ permalink raw reply

* Re: RPS and forwarding
From: Herbert Xu @ 2010-04-26  4:53 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David S. Miller, netdev
In-Reply-To: <z2w65634d661004252144z1bf72880wdf7a0143b8490ddf@mail.gmail.com>

On Sun, Apr 25, 2010 at 09:44:05PM -0700, Tom Herbert wrote:
> 
> I think you'd want RPS on router (steering packets using hash and CPU
> mask).  Most of your concerns are about RFS (steering by flows), which
> is probably more appropriate for a server.

Right.  The problem is that if you run a distribution kernel on
a router with CONFIG_RPS (and hence RFS) enabled, you may suddenly
start seeing packet reordering on forwarded traffic due to the
presence of local traffic.

Can we perhaps add a run-time toggle to disable RFS?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: RPS and forwarding
From: Tom Herbert @ 2010-04-26  4:44 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <20100426022415.GA20323@gondor.apana.org.au>

> Apart from not using RPS on routers, I suppose people doing
> forwarding will simply have to maintain a constant RPS table,
> and forgo its local redirection capabilities.
>

I think you'd want RPS on router (steering packets using hash and CPU
mask).  Most of your concerns are about RFS (steering by flows), which
is probably more appropriate for a server.

Tom

> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH 2/2] net: reimplement softnet_data.output_queue as a FIFO queue
From: Changli Gao @ 2010-04-26  3:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, netdev, Changli Gao

reimplement softnet_data.output_queue as a FIFO queue.

reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
the qdiscs rescheduled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   22 ++++++++++++----------
 2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 785e514..7984c15 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1385,6 +1385,7 @@ static inline int unregister_gifconf(unsigned int family)
  */
 struct softnet_data {
 	struct Qdisc		*output_queue;
+	struct Qdisc		**output_queue_tailp;
 	struct list_head	poll_list;
 	struct sk_buff_head	completion_queue;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 5bbfa0f..b2b7869 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1557,8 +1557,9 @@ static inline void __netif_reschedule(struct Qdisc *q)
 
 	local_irq_save(flags);
 	sd = &__get_cpu_var(softnet_data);
-	q->next_sched = sd->output_queue;
-	sd->output_queue = q;
+	q->next_sched = NULL;
+	*sd->output_queue_tailp = q;
+	sd->output_queue_tailp = &q->next_sched;
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_restore(flags);
 }
@@ -2526,6 +2527,7 @@ static void net_tx_action(struct softirq_action *h)
 		local_irq_disable();
 		head = sd->output_queue;
 		sd->output_queue = NULL;
+		sd->output_queue_tailp = &sd->output_queue;
 		local_irq_enable();
 
 		while (head) {
@@ -5590,7 +5592,6 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 			    unsigned long action,
 			    void *ocpu)
 {
-	struct Qdisc **list_net;
 	struct sk_buff *skb;
 	unsigned int cpu, oldcpu = (unsigned long)ocpu;
 	struct softnet_data *sd, *oldsd;
@@ -5607,13 +5608,12 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	skb_queue_splice_tail_init(&oldsd->completion_queue,
 				   &sd->completion_queue);
 
-	/* Find end of our output_queue. */
-	list_net = &sd->output_queue;
-	while (*list_net)
-		list_net = &(*list_net)->next_sched;
-	/* Append output queue from offline CPU. */
-	*list_net = oldsd->output_queue;
-	oldsd->output_queue = NULL;
+	if (oldsd->output_queue) {
+		*sd->output_queue_tailp = oldsd->output_queue;
+		sd->output_queue_tailp = oldsd->output_queue_tailp;
+		oldsd->output_queue = NULL;
+		oldsd->output_queue_tailp = &oldsd->output_queue;
+	}
 
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_enable();
@@ -5843,6 +5843,8 @@ static int __init net_dev_init(void)
 		skb_queue_head_init(&sd->input_pkt_queue);
 		__skb_queue_head_init(&sd->completion_queue);
 		INIT_LIST_HEAD(&sd->poll_list);
+		sd->output_queue = NULL;
+		sd->output_queue_tailp = &sd->output_queue;
 
 #ifdef CONFIG_RPS
 		sd->csd.func = rps_trigger_softirq;

^ permalink raw reply related

* [PATCH 1/2] net: reimplement completion_queue as a FIFO queue
From: Changli Gao @ 2010-04-26  3:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, netdev, Changli Gao

reimplement completion_queue as a FIFO queue.

As slab allocator always does its best to return the latest unused objects, we'd
better release skb in order, this patch reimplement completion_queue as a FIFO
queue instead of the old LIFO queue.

At the same time, a opencoding skb queue is replaced with a sk_buff_head queue.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |   28 ++++++++++------------------
 net/core/netpoll.c        |   12 +++++-------
 3 files changed, 16 insertions(+), 26 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3c5ed5f..785e514 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1386,7 +1386,7 @@ static inline int unregister_gifconf(unsigned int family)
 struct softnet_data {
 	struct Qdisc		*output_queue;
 	struct list_head	poll_list;
-	struct sk_buff		*completion_queue;
+	struct sk_buff_head	completion_queue;
 
 #ifdef CONFIG_RPS
 	struct softnet_data	*rps_ipi_list;
diff --git a/net/core/dev.c b/net/core/dev.c
index 4d43f1a..5bbfa0f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1578,8 +1578,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
 
 		local_irq_save(flags);
 		sd = &__get_cpu_var(softnet_data);
-		skb->next = sd->completion_queue;
-		sd->completion_queue = skb;
+		__skb_queue_tail(&sd->completion_queue, skb);
 		raise_softirq_irqoff(NET_TX_SOFTIRQ);
 		local_irq_restore(flags);
 	}
@@ -2506,18 +2505,16 @@ static void net_tx_action(struct softirq_action *h)
 {
 	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 
-	if (sd->completion_queue) {
-		struct sk_buff *clist;
+	if (!skb_queue_empty(&sd->completion_queue)) {
+		struct sk_buff_head queue;
+		struct sk_buff *skb;
 
+		__skb_queue_head_init(&queue);
 		local_irq_disable();
-		clist = sd->completion_queue;
-		sd->completion_queue = NULL;
+		skb_queue_splice_tail_init(&sd->completion_queue, &queue);
 		local_irq_enable();
 
-		while (clist) {
-			struct sk_buff *skb = clist;
-			clist = clist->next;
-
+		while ((skb = __skb_dequeue(&queue))) {
 			WARN_ON(atomic_read(&skb->users));
 			__kfree_skb(skb);
 		}
@@ -5593,7 +5590,6 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 			    unsigned long action,
 			    void *ocpu)
 {
-	struct sk_buff **list_skb;
 	struct Qdisc **list_net;
 	struct sk_buff *skb;
 	unsigned int cpu, oldcpu = (unsigned long)ocpu;
@@ -5607,13 +5603,9 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	sd = &per_cpu(softnet_data, cpu);
 	oldsd = &per_cpu(softnet_data, oldcpu);
 
-	/* Find end of our completion_queue. */
-	list_skb = &sd->completion_queue;
-	while (*list_skb)
-		list_skb = &(*list_skb)->next;
 	/* Append completion queue from offline CPU. */
-	*list_skb = oldsd->completion_queue;
-	oldsd->completion_queue = NULL;
+	skb_queue_splice_tail_init(&oldsd->completion_queue,
+				   &sd->completion_queue);
 
 	/* Find end of our output_queue. */
 	list_net = &sd->output_queue;
@@ -5849,7 +5841,7 @@ static int __init net_dev_init(void)
 		struct softnet_data *sd = &per_cpu(softnet_data, i);
 
 		skb_queue_head_init(&sd->input_pkt_queue);
-		sd->completion_queue = NULL;
+		__skb_queue_head_init(&sd->completion_queue);
 		INIT_LIST_HEAD(&sd->poll_list);
 
 #ifdef CONFIG_RPS
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a58f59b..3905a14 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -222,17 +222,15 @@ static void zap_completion_queue(void)
 	unsigned long flags;
 	struct softnet_data *sd = &get_cpu_var(softnet_data);
 
-	if (sd->completion_queue) {
-		struct sk_buff *clist;
+	if (!skb_queue_empty(&sd->completion_queue)) {
+		struct sk_buff_head queue;
+		struct sk_buff *skb;
 
 		local_irq_save(flags);
-		clist = sd->completion_queue;
-		sd->completion_queue = NULL;
+		skb_queue_splice_tail_init(&sd->completion_queue, &queue);
 		local_irq_restore(flags);
 
-		while (clist != NULL) {
-			struct sk_buff *skb = clist;
-			clist = clist->next;
+		while ((skb = __skb_dequeue(&queue))) {
 			if (skb->destructor) {
 				atomic_inc(&skb->users);
 				dev_kfree_skb_any(skb); /* put this one back */

^ permalink raw reply related

* Re: [PATCH net-next-2.6] netns: call ops_free right after ops_exit
From: Eric W. Biederman @ 2010-04-26  3:06 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: David Miller, netdev
In-Reply-To: <20100425184430.GA2891@psychotron.redhat.com>

Jiri Pirko <jpirko@redhat.com> writes:

> Sun, Apr 25, 2010 at 04:50:34PM CEST, ebiederm@xmission.com wrote:
>>David Miller <davem@davemloft.net> writes:
>>
>>> From: Jiri Pirko <jpirko@redhat.com>
>>> Date: Sun, 25 Apr 2010 11:26:01 +0200
>>>
>>>> There's no need to iterate this twice. We can free net generic
>>>> variables right after exit is called.
>>>>
>>>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>>
>>> Are you sure there are no problems with doing this?
>>>
>>> What if there are inter-net variable reference dependencies
>>> or something like that?
>>>
>>> I really suspect it is being done this way on purpose, but
>>> in the end I defer to experts like Eric B. :-)
>>
>>I am pretty certain there is a problem.  My memory is fuzzy this
>>morning but I believe we can have rcu references between various
>>pieces of the networking stack for a single network namespace.  So we
>>need to cause all of the network namespace to exit before it is safe
>>to free those pieces.
>
> Hmm, that doesn't make much sense to me. Since the allocated memory in question
> is used locally, after exit() is called, the memory chunk should not be used by
> anyone and if it is, I think it's a bug.
>
> Earlier, when the memory wasn't allocated automatically (by filling .size)
> memory was individually freed in exit(). From what I understood from your reply,
> you are telling this was buggy?

Now I remember clearly.  The use case for not freeing memory
immediately is the delayed freeing of network devices.  Ideally we
delay unregistering all of the network devices into
default_device_exit_batch, when we terminate one or namespaces.

Since network devices are freed outside of their exit routines we need
to keep their per net memory around until they are freed.

Ultimately all of this is much easier to think about if these chunks of
memory can be logically thought of as living on struct net and have the
same lifetime rules.

Eric

^ permalink raw reply

* [PATCH v3] TCP: avoid to send keepalive probes if receiving data
From: Flavio Leitner @ 2010-04-26  2:40 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo, Eric Dumazet, Flavio Leitner
In-Reply-To: <20100425235549.GB2550@sysclose.org>

RFC 1122 says the following:
...
  Keep-alive packets MUST only be sent when no data or
  acknowledgement packets have been received for the
  connection within an interval.
...

The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 net/ipv4/tcp.c       |    5 ++++-
 net/ipv4/tcp_timer.c |    8 ++++++++
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0f8caf6..94f605c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			if (sock_flag(sk, SOCK_KEEPOPEN) &&
 			    !((1 << sk->sk_state) &
 			      (TCPF_CLOSE | TCPF_LISTEN))) {
-				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
+				u32 elapsed = min_t(u32,
+				      tcp_time_stamp - icsk->icsk_ack.lrcvtime,
+				      tcp_time_stamp - tp->rcv_tstamp);
+
 				if (tp->keepalive_time > elapsed)
 					elapsed = tp->keepalive_time - elapsed;
 				else
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 8a0ab29..9d1bb6f 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
 	if (tp->packets_out || tcp_send_head(sk))
 		goto resched;
 
+	elapsed = tcp_time_stamp - icsk->icsk_ack.lrcvtime;
+
+	/* receiving data means alive */
+	if (elapsed < keepalive_time_when(tp)) {
+		elapsed = keepalive_time_when(tp) - elapsed;
+		goto resched;
+	}
+
 	elapsed = tcp_time_stamp - tp->rcv_tstamp;
 
 	if (elapsed >= keepalive_time_when(tp)) {
-- 
1.5.5.6


^ permalink raw reply related

* RPS and forwarding
From: Herbert Xu @ 2010-04-26  2:24 UTC (permalink / raw)
  To: David S. Miller, netdev

Hi:

I'm sorry I didn't have time to jump into the RPS discussions
earlier, so in a way I'm just getting what I deserved :)

Anyway, I am specifically concerned about the possibility of
reordering of forwarded traffic.

As RPS is doing fuzzy matching, it is possible (and quite likely
if rps_sock_flow_table is small) for a forwarded flow to be hashed
to the same index as a local flow.

In that case we may end up redirecting a forwarded flow.  That
in itself is undesirable because for forwarded flows the best
solution is to stay on the ingress CPU.

What's worse is that if the local flow bounces around different
CPUs, the forwarded flow will follow it.

For a local flow RPS can guarantee original ordering (assuming
we're not doing anything weird like netfilter queueing), but
this doesn't work for forwarded flows.

Even if netif_receive_skb has completed, the forwarded packet
may still be sitting in a hardware TX queue, selected based
on the processing CPU.  If you then bounce the forwarded flow
then packets may be placed in a different hardware TX queue,
causing reordering.

BTW, selecting hardware TX queues for forwarded flows based
on the rxhash is not a good solution, as that causes cache-line
bouncing between CPUs.

Apart from not using RPS on routers, I suppose people doing
forwarding will simply have to maintain a constant RPS table,
and forgo its local redirection capabilities.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Paul E. McKenney @ 2010-04-26  2:07 UTC (permalink / raw)
  To: David Miller
  Cc: johannes, miles.lane, vgoyal, eparis, laijs, mingo, peterz,
	linux-kernel, nauman, eric.dumazet, netdev, jens.axboe,
	guijianfeng, lizf
In-Reply-To: <20100425.004925.54203352.davem@davemloft.net>

On Sun, Apr 25, 2010 at 12:49:25AM -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Sun, 25 Apr 2010 09:45:34 +0200
> 
> > The station locking is a tad confusing, but I've added the right
> > annotations already, should be coming to a kernel near you soon (i.e.
> > are in net-2.6 right now).
> 
> Linus took in everything I have so it should be in Linus's tree
> by now.

Thank you both, Dave and Johannes!

							Thanx, Paul

^ permalink raw reply

* Re: GRO after RPS?
From: Herbert Xu @ 2010-04-26  1:47 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, netdev
In-Reply-To: <m2q412e6f7f1004251840vdf264f38vd0df335802cb2a91@mail.gmail.com>

On Mon, Apr 26, 2010 at 09:40:35AM +0800, Changli Gao wrote:
> On Mon, Apr 26, 2010 at 9:17 AM, David Miller <davem@davemloft.net> wrote:
> >
> > The goal is to eliminate all packet header references from the pre-RPS
> > path, and let the post-RPS cpu do it.
> 
> If the NIC doesn't provide rxhash, RPS will have to compute one by one
> by itself. Is the hash computation more expensive than GRO? I think
> the hash computation is cheaper than GRO, so we can do RPS ASAP to
> avoid the direct CPU overload.

The dominating cost will be pulling the memory into cache.  So
once you do that the cost of GRO will be negligible.

As I said, if we had hardware rxhashes, we can change GRO to only
pull in the header when we're likely to have a match.

When we do get a mergeable flow, the difference between doing
GRO before/after RPS is going to be redirecting 1 packet vs.
~100 packets.

So if RPS is so cheap that it's able to process 100 packets at
a cost tantamount to 1 packet, then sure we should postpone GRO.

Otherwise I think GRO before RPS is definitely a win.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: GRO after RPS?
From: Herbert Xu @ 2010-04-26  1:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100425.181757.10149105.davem@davemloft.net>

On Sun, Apr 25, 2010 at 06:17:57PM -0700, David Miller wrote:
>
> Either GRO walks the skb->rxhash values one by one, or GRO does, the
> cost is going to be roughly equivalent I think.

(I presume one of those GROs is meant to be RPS).
Right, in fact if we could combine the two it would be even better.

> And when the packets do match, you can't just trust the ->rxhash in
> the GRO code, since a hash match only indicates that the packets might
> have the same flow.  So you're going to have to touch the packet
> headers in that case on the pre-RPS cpu.

Yes, if the rxhash matches then we need to examine the packet
header.  However, in that case the packet will most likely turn
out to be a GRO candidate, meaning that you have to do one fewer
RPS lookup/redirection.

Besides, the place where you need GRO most is where the hardware
doesn't do it directly.  That same hardware is probably not going
to give you a hash either.  When if we have to touch the packet
header, it would be beneficial to do GRO while it's still hot.

> The goal is to eliminate all packet header references from the pre-RPS
> path, and let the post-RPS cpu do it.

So if we changed GRO to use hardware hashes, then the only case
where it would touch the packet header unnecessarily is when we
get a hash collision.

In all other cases the header would only be touched in case we
have a hash match, which will most likely result in a merging.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: GRO after RPS?
From: Changli Gao @ 2010-04-26  1:40 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev
In-Reply-To: <20100425.181757.10149105.davem@davemloft.net>

On Mon, Apr 26, 2010 at 9:17 AM, David Miller <davem@davemloft.net> wrote:
>
> The goal is to eliminate all packet header references from the pre-RPS
> path, and let the post-RPS cpu do it.

If the NIC doesn't provide rxhash, RPS will have to compute one by one
by itself. Is the hash computation more expensive than GRO? I think
the hash computation is cheaper than GRO, so we can do RPS ASAP to
avoid the direct CPU overload.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: GRO after RPS?
From: David Miller @ 2010-04-26  1:17 UTC (permalink / raw)
  To: herbert; +Cc: netdev
In-Reply-To: <20100426004934.GA12525@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 26 Apr 2010 08:49:34 +0800

> On Sun, Apr 25, 2010 at 05:09:33PM -0700, David Miller wrote:
>> 
>> Herbert, after thinking about some ideas we've been discussing and
>> some suggestions from folks like Tom Herbert, I'm thinking of changing
>> it such that we do GRO after RPS sends the packet to a remove cpu.
> 
> Actually, I'd've thought that doing GRO before RPS would be a
> better solution as it means that RPS would have a lot less work
> to do.

Either GRO walks the skb->rxhash values one by one, or GRO does, the
cost is going to be roughly equivalent I think.

And when the packets do match, you can't just trust the ->rxhash in
the GRO code, since a hash match only indicates that the packets might
have the same flow.  So you're going to have to touch the packet
headers in that case on the pre-RPS cpu.

The goal is to eliminate all packet header references from the pre-RPS
path, and let the post-RPS cpu do it.

^ permalink raw reply

* Re: GRO after RPS?
From: Herbert Xu @ 2010-04-26  0:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100425.170933.190068177.davem@davemloft.net>

On Sun, Apr 25, 2010 at 05:09:33PM -0700, David Miller wrote:
> 
> Herbert, after thinking about some ideas we've been discussing and
> some suggestions from folks like Tom Herbert, I'm thinking of changing
> it such that we do GRO after RPS sends the packet to a remove cpu.

Actually, I'd've thought that doing GRO before RPS would be a
better solution as it means that RPS would have a lot less work
to do.

> The idea being that, this way, if we have a device provided ->rxhash
> we can elide touching the packet headers entirely.

Yes not touching the headers was also the plan for GRO, I just
never got around to it.  So we would only examine a packet if
its hash matched one already processed in the current NAPI window.

Of course we'd need to change the way we currently store packets
in GRO and use a data structure more efficient than a linked list.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* GRO after RPS?
From: David Miller @ 2010-04-26  0:09 UTC (permalink / raw)
  To: herbert; +Cc: netdev


Herbert, after thinking about some ideas we've been discussing and
some suggestions from folks like Tom Herbert, I'm thinking of changing
it such that we do GRO after RPS sends the packet to a remove cpu.

The idea being that, this way, if we have a device provided ->rxhash
we can elide touching the packet headers entirely.

Initially I wanted to defer the eth_type_trans() by adding a state bit
to sk_buff, and making ->ndo_type_trans() a new netdev op.  The state
bit exists so that we can have a transition period and avoid doing
the type_trans multiple times if the driver still does it early.

So, in this way, when RPS doesn't even need to touch the packet
headers, due to a device provided skb->rxhash, we can defer the
type_trans all the way to the remote cpu.

The only thing getting in the way of this is GRO, since it wants to
parse the packet headers too for flow matching.

At first I thought this was a bad idea to defer GRO to the remote cpu,
since if we do batch things up it means we have less packets to queue
up to the remote cpu.

But upon further consideration it doesn't matter, because GRO is going
to fuddle with the packets and link them up into a list anyways.

So if the list handling is a wash, then it's a real win to defer GRO
and thus potentially all packet header touching to the remote cpu.

Also, we can add the quick ->rxhash check to the GRO flow matcher
like we discussed several times in the past.  And guess what?  Now
that RPS runs first, we'll always have a valid skb->rhash available
for this purpose since RPS will compute one in software for us :-)

So Herbert, any objections before I start hacking on this?

^ permalink raw reply

* Re: [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: Flavio Leitner @ 2010-04-25 23:55 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, eric.dumazet, netdev
In-Reply-To: <20100421.224205.18551946.davem@davemloft.net>

On Wed, Apr 21, 2010 at 10:42:05PM -0700, David Miller wrote:
> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Sun, 18 Apr 2010 23:34:15 +0300 (EEST)
> 
> > I fail to see why the addition of this new variable is necessary at all, 
> > could either of you enlight me why exactly it's necessary and rcv_tstamp 
> > will not suffice?
> 
> I agree, the existing rcv_tstamp should serve this purpose just
> fine.

I thought it would break TCP_INFO/tcp_get_info(). Actually, reviewing
the code again I found another variable which does exactly what is needed.

I'll post a new patch for review.

thanks,
-- 
Flavio

^ permalink raw reply

* Re: [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: Flavio Leitner @ 2010-04-25 23:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1271610919.16881.5564.camel@edumazet-laptop>

On Sun, Apr 18, 2010 at 07:15:19PM +0200, Eric Dumazet wrote:
> Le dimanche 18 avril 2010 à 11:55 -0300, Flavio Leitner a écrit :
> > RFC 1122 says the following:
> > ...
> >   Keep-alive packets MUST only be sent when no data or
> >   acknowledgement packets have been received for the
> >   connection within an interval.
> > ...
> > 
> > Fix this by storing the timestamp of last received data
> > packet and checking for it when the keepalive timer expires.
> > 
> > -v2 fix do_tcp_setsockopt() as pointed by Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> 
> 
> I find this patch very welcome, and we could easily use this new
> lrcvtime information available in diagnostic tools (ss command)
> 
> But are you sure you update it for all valid packets ?
> 
> If we receive a pure ACK, it seems you do not ...

Pure ack is handled by rcv_tstamp in the struct which is
considered in tcp_keepalive_time() too.

The idea of exporting those variables is nice, I'll see
how 'ss' works.

thanks for reviewing the patch!
 

 
> > ---
> >  include/linux/tcp.h  |    1 +
> >  net/ipv4/tcp.c       |    5 ++++-
> >  net/ipv4/tcp_input.c |    3 +++
> >  net/ipv4/tcp_timer.c |    8 ++++++++
> >  4 files changed, 16 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index a778ee0..405678f 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -314,6 +314,7 @@ struct tcp_sock {
> >   	u32	snd_sml;	/* Last byte of the most recently transmitted small packet */
> >  	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
> >  	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
> > +	u32	lrcvtime;	/* timestamp of last received data packet (for keepalives) */
> >  
> >  	/* Data for direct copy to user */
> >  	struct {
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 0f8caf6..a4048d7 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> >  			if (sock_flag(sk, SOCK_KEEPOPEN) &&
> >  			    !((1 << sk->sk_state) &
> >  			      (TCPF_CLOSE | TCPF_LISTEN))) {
> > -				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
> > +				u32 elapsed = min_t(u32,
> > +						      tcp_time_stamp - tp->rcv_tstamp,
> > +						      tcp_time_stamp - tp->lrcvtime);
> > +
> >  				if (tp->keepalive_time > elapsed)
> >  					elapsed = tp->keepalive_time - elapsed;
> >  				else
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index f240f57..60d2980 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -5391,6 +5391,8 @@ no_ack:
> >  				__kfree_skb(skb);
> >  			else
> >  				sk->sk_data_ready(sk, 0);
> > +
> > +			tp->lrcvtime = tcp_time_stamp;
> >  			return 0;
> >  		}
> >  	}
> > @@ -5421,6 +5423,7 @@ step5:
> >  
> >  	tcp_data_snd_check(sk);
> >  	tcp_ack_snd_check(sk);
> > +	tp->lrcvtime = tcp_time_stamp;
> >  	return 0;
> >  
> >  csum_error:
> > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> > index 8a0ab29..74dd804 100644
> > --- a/net/ipv4/tcp_timer.c
> > +++ b/net/ipv4/tcp_timer.c
> > @@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
> >  	if (tp->packets_out || tcp_send_head(sk))
> >  		goto resched;
> >  
> > +	elapsed = tcp_time_stamp - tp->lrcvtime;
> > +	
> > +	/* receiving data means alive */
> > +	if (elapsed < keepalive_time_when(tp)) {
> > +		elapsed = keepalive_time_when(tp) - elapsed;
> > +		goto resched;
> > +	}
> > +
> >  	elapsed = tcp_time_stamp - tp->rcv_tstamp;
> >  
> >  	if (elapsed >= keepalive_time_when(tp)) {
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Flavio

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox