[PATCH] net: core: Fix race by protecting process

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] net: core: Fix race by  protecting process_queues at CPU hotplug
@ 2015-01-15  3:13 subashab
  2015-01-15  7:01 ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: subashab @ 2015-01-15  3:13 UTC (permalink / raw)
  To: netdev

I am seeing frequent crashes in high throughput conditions in a
multiprocessor system with kernel version 3.10 where cores are getting hot
plugged. I have pinned the network stack to a particular core using
receive packet steering (RPS). At the time of crash, it looks like a
contention of the process_queue between dev_cpu_callback and
process_backlog.

The following is the log at the moment of the crash
 75241.598056:   <6> Unable to handle kernel NULL pointer dereference at
virtual address 00000004
 75241.598064:   <6> pgd = c0004000
 75241.598072:   <2> [00000004] *pgd=00000000
 75241.598082:   <6> Internal error: Oops: 817 [#1] PREEMPT SMP ARM
 75241.598096:   <2> Modules linked in: wlan(O) [last unloaded: wlan]
 75241.598106:   <6> CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W 
O 3.10.49-g0b78c2e-00003-g2eab3e3 #1
 75241.598113:   <6> task: ee870a80 ti: ee888000 task.ti: ee888000
 75241.598133:   <2> PC is at process_backlog+0x78/0x140
 75241.598142:   <2> LR is at __netif_receive_skb_core+0x6fc/0x724

Here is the call stack involved in the crash
-006|__skb_unlink
-006|__skb_dequeue
-006|dev_cpu_callback
-007|notifier_call_chain
-008|__raw_notifier_call_chain
-009|notifier_to_errno
-009|__cpu_notify
-010|cpu_notify_nofail
-011|check_for_tasks
-011|cpu_down
-012|disable_nonboot_cpus()
-013|suspend_enter
-013|suspend_devices_and_enter
-014|enter_state
-014|pm_suspend
-015|state_store
-016|kobj_attr_store
-017|flush_write_buffer
-017|sysfs_write_file
-018|vfs_write
-019|SYSC_write
-019|sys_write
-020|ret_fast_syscall

I have a fix in mind in which the process queues are protected using spin
locks. I do not observe the crash with this patch. Since this patch is in
the hot path for incoming packet processing, I would like some reviews of
this patch. I would also like to know if there are any potential
performance bottlenecks due to this change.

    net: core: Protect process_queues at CPU hotplug

    When a CPU is hotplugged while processing incoming packets,
    dev_cpu_callback() will copy the poll list from the offline
    CPU and raise the softIRQ. In the same context, it will also
    process process_queue of the offline CPU by de-queueing and
    calling netif_rx. Due to this there is a potential for race
    condition between process_backlog() and dev_cpu_callback()
    accessing the same process_queue resource. This patch
    protects this concurrent access by locking.

    Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
    Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>

diff --git a/net/core/dev.c b/net/core/dev.c
index df0b522..aa8f503 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3640,6 +3640,7 @@ static void flush_backlog(void *arg)
        struct net_device *dev = arg;
        struct softnet_data *sd = &__get_cpu_var(softnet_data);
        struct sk_buff *skb, *tmp;
+       unsigned long flags;

        rps_lock(sd);
        skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
@@ -3651,6 +3652,7 @@ static void flush_backlog(void *arg)
        }
        rps_unlock(sd);

+       spin_lock_irqsave(&sd->process_queue.lock, flags);
        skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
                if (skb->dev == dev) {
                        __skb_unlink(skb, &sd->process_queue);
@@ -3658,6 +3660,7 @@ static void flush_backlog(void *arg)
                        input_queue_head_incr(sd);
                }
        }
+       spin_unlock_irqrestore(&sd->process_queue.lock, flags);
 }

 static int napi_gro_complete(struct sk_buff *skb)
@@ -4021,7 +4024,7 @@ static int process_backlog(struct napi_struct *napi,
int quota)
 {
        int work = 0;
        struct softnet_data *sd = container_of(napi, struct softnet_data,
backlog);
-
+       unsigned long flags;
 #ifdef CONFIG_RPS
        /* Check if we have pending ipi, its better to send them now,
         * not waiting net_rx_action() end.
@@ -4032,18 +4035,19 @@ static int process_backlog(struct napi_struct
*napi, int quota)
        }
 #endif
        napi->weight = weight_p;
-       local_irq_disable();
+       spin_lock_irqsave(&sd->process_queue.lock, flags);
        while (work < quota) {
                struct sk_buff *skb;
                unsigned int qlen;

                while ((skb = __skb_dequeue(&sd->process_queue))) {
-                       local_irq_enable();
+                       spin_unlock_irqrestore(&sd->process_queue.lock,
flags);
                        __netif_receive_skb(skb);
-                       local_irq_disable();
+                       spin_lock_irqsave(&sd->process_queue.lock, flags);
                        input_queue_head_incr(sd);
                        if (++work >= quota) {
-                               local_irq_enable();
+                              
spin_unlock_irqrestore(&sd->process_queue.lock,
+                                                      flags);
                                return work;
                        }
                }
@@ -4054,6 +4058,7 @@ static int process_backlog(struct napi_struct *napi,
int quota)
                        skb_queue_splice_tail_init(&sd->input_pkt_queue,
                                                   &sd->process_queue);

+
                if (qlen < quota - work) {
                        /*
                         * Inline a custom version of __napi_complete().
@@ -4069,7 +4074,7 @@ static int process_backlog(struct napi_struct *napi,
int quota)
                }
                rps_unlock(sd);
        }
-       local_irq_enable();
+       spin_unlock_irqrestore(&sd->process_queue.lock, flags);

        return work;
 }
@@ -5991,6 +5996,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
        struct sk_buff *skb;
        unsigned int cpu, oldcpu = (unsigned long)ocpu;
        struct softnet_data *sd, *oldsd;
+       unsigned long flags;

        if (action != CPU_DEAD && action != CPU_DEAD_FROZEN)
                return NOTIFY_OK;
@@ -6024,11 +6030,15 @@ static int dev_cpu_callback(struct notifier_block
*nfb,
        raise_softirq_irqoff(NET_TX_SOFTIRQ);
        local_irq_enable();

+       spin_lock_irqsave(&oldsd->process_queue.lock, flags);
        /* Process offline CPU's input_pkt_queue */
        while ((skb = __skb_dequeue(&oldsd->process_queue))) {
+               spin_unlock_irqrestore(&oldsd->process_queue.lock, flags);
                netif_rx(skb);
+               spin_lock_irqsave(&oldsd->process_queue.lock, flags);
                input_queue_head_incr(oldsd);
        }
+       spin_unlock_irqrestore(&oldsd->process_queue.lock, flags);
        while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
                netif_rx(skb);
                input_queue_head_incr(oldsd);

Thanks
KS

The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] net: core: Fix race by  protecting process_queues at CPU hotplug
  2015-01-15  3:13 [PATCH] net: core: Fix race by protecting process_queues at CPU hotplug subashab
@ 2015-01-15  7:01 ` Eric Dumazet
  2015-01-15 19:03   ` subashab
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-01-15  7:01 UTC (permalink / raw)
  To: subashab; +Cc: netdev

On Thu, 2015-01-15 at 03:13 +0000, subashab@codeaurora.org wrote:
> I am seeing frequent crashes in high throughput conditions in a
> multiprocessor system with kernel version 3.10 where cores are getting hot
> plugged. I have pinned the network stack to a particular core using
> receive packet steering (RPS). At the time of crash, it looks like a
> contention of the process_queue between dev_cpu_callback and
> process_backlog.

Your patch is mangled, hard to tell what is going on.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] net: core: Fix race by  protecting process_queues at CPU hotplug
  2015-01-15  7:01 ` Eric Dumazet
@ 2015-01-15 19:03   ` subashab
  2015-01-15 19:28     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: subashab @ 2015-01-15 19:03 UTC (permalink / raw)
  To: Eric Dumazet, netdev

When a CPU is hotplugged while processing incoming packets,
dev_cpu_callback() will copy the poll list from the offline
CPU and raise the softIRQ. In the same context, it will also
process process_queue of the offline CPU by de-queueing and
calling netif_rx. Due to this there is a potential for race
condition between process_backlog() and dev_cpu_callback()
accessing the same process_queue resource. This patch
protects this concurrent access by locking.

Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 net/core/dev.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index df0b522..aa8f503 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3640,6 +3640,7 @@ static void flush_backlog(void *arg)
 	struct net_device *dev = arg;
 	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 	struct sk_buff *skb, *tmp;
+	unsigned long flags;

 	rps_lock(sd);
 	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
@@ -3651,6 +3652,7 @@ static void flush_backlog(void *arg)
 	}
 	rps_unlock(sd);

+	spin_lock_irqsave(&sd->process_queue.lock, flags);
 	skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
 		if (skb->dev == dev) {
 			__skb_unlink(skb, &sd->process_queue);
@@ -3658,6 +3660,7 @@ static void flush_backlog(void *arg)
 			input_queue_head_incr(sd);
 		}
 	}
+	spin_unlock_irqrestore(&sd->process_queue.lock, flags);
 }

 static int napi_gro_complete(struct sk_buff *skb)
@@ -4021,7 +4024,7 @@ static int process_backlog(struct napi_struct *napi,
int quota)
 {
 	int work = 0;
 	struct softnet_data *sd = container_of(napi, struct softnet_data, backlog);
-
+	unsigned long flags;
 #ifdef CONFIG_RPS
 	/* Check if we have pending ipi, its better to send them now,
 	 * not waiting net_rx_action() end.
@@ -4032,18 +4035,19 @@ static int process_backlog(struct napi_struct
*napi, int quota)
 	}
 #endif
 	napi->weight = weight_p;
-	local_irq_disable();
+	spin_lock_irqsave(&sd->process_queue.lock, flags);
 	while (work < quota) {
 		struct sk_buff *skb;
 		unsigned int qlen;

 		while ((skb = __skb_dequeue(&sd->process_queue))) {
-			local_irq_enable();
+			spin_unlock_irqrestore(&sd->process_queue.lock, flags);
 			__netif_receive_skb(skb);
-			local_irq_disable();
+			spin_lock_irqsave(&sd->process_queue.lock, flags);
 			input_queue_head_incr(sd);
 			if (++work >= quota) {
-				local_irq_enable();
+				spin_unlock_irqrestore(&sd->process_queue.lock,
+						       flags);
 				return work;
 			}
 		}
@@ -4054,6 +4058,7 @@ static int process_backlog(struct napi_struct *napi,
int quota)
 			skb_queue_splice_tail_init(&sd->input_pkt_queue,
 						   &sd->process_queue);

+
 		if (qlen < quota - work) {
 			/*
 			 * Inline a custom version of __napi_complete().
@@ -4069,7 +4074,7 @@ static int process_backlog(struct napi_struct *napi,
int quota)
 		}
 		rps_unlock(sd);
 	}
-	local_irq_enable();
+	spin_unlock_irqrestore(&sd->process_queue.lock, flags);

 	return work;
 }
@@ -5991,6 +5996,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	struct sk_buff *skb;
 	unsigned int cpu, oldcpu = (unsigned long)ocpu;
 	struct softnet_data *sd, *oldsd;
+	unsigned long flags;

 	if (action != CPU_DEAD && action != CPU_DEAD_FROZEN)
 		return NOTIFY_OK;
@@ -6024,11 +6030,15 @@ static int dev_cpu_callback(struct notifier_block
*nfb,
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_enable();

+	spin_lock_irqsave(&oldsd->process_queue.lock, flags);
 	/* Process offline CPU's input_pkt_queue */
 	while ((skb = __skb_dequeue(&oldsd->process_queue))) {
+		spin_unlock_irqrestore(&oldsd->process_queue.lock, flags);
 		netif_rx(skb);
+		spin_lock_irqsave(&oldsd->process_queue.lock, flags);
 		input_queue_head_incr(oldsd);
 	}
+	spin_unlock_irqrestore(&oldsd->process_queue.lock, flags);
 	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
 		netif_rx(skb);
 		input_queue_head_incr(oldsd);
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 a Linux Foundation Collaborative Project


> On Thu, 2015-01-15 at 03:13 +0000, subashab@codeaurora.org wrote:
>> I am seeing frequent crashes in high throughput conditions in a
>> multiprocessor system with kernel version 3.10 where cores are getting
>> hot
>> plugged. I have pinned the network stack to a particular core using
>> receive packet steering (RPS). At the time of crash, it looks like a
>> contention of the process_queue between dev_cpu_callback and
>> process_backlog.
>
> Your patch is mangled, hard to tell what is going on.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] net: core: Fix race by  protecting process_queues at CPU hotplug
  2015-01-15 19:03   ` subashab
@ 2015-01-15 19:28     ` Eric Dumazet
  2015-01-15 21:46       ` [PATCH net] net: rps: fix cpu unplug Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-01-15 19:28 UTC (permalink / raw)
  To: subashab; +Cc: netdev

On Thu, 2015-01-15 at 19:03 +0000, subashab@codeaurora.org wrote:
> When a CPU is hotplugged while processing incoming packets,
> dev_cpu_callback() will copy the poll list from the offline
> CPU and raise the softIRQ. In the same context, it will also
> process process_queue of the offline CPU by de-queueing and
> calling netif_rx. Due to this there is a potential for race
> condition between process_backlog() and dev_cpu_callback()
> accessing the same process_queue resource. This patch
> protects this concurrent access by locking.
> 
> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  net/core/dev.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index df0b522..aa8f503 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3640,6 +3640,7 @@ static void flush_backlog(void *arg)
>  	struct net_device *dev = arg;
>  	struct softnet_data *sd = &__get_cpu_var(softnet_data);
>  	struct sk_buff *skb, *tmp;
> +	unsigned long flags;
> 
>  	rps_lock(sd);
>  	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
> @@ -3651,6 +3652,7 @@ static void flush_backlog(void *arg)
>  	}
>  	rps_unlock(sd);
> 
> +	spin_lock_irqsave(&sd->process_queue.lock, flags);
>  	skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
>  		if (skb->dev == dev) {
>  			__skb_unlink(skb, &sd->process_queue);
> @@ -3658,6 +3660,7 @@ static void flush_backlog(void *arg)
>  			input_queue_head_incr(sd);
>  		}
>  	}
> +	spin_unlock_irqrestore(&sd->process_queue.lock, flags);
>  }
> 
>  static int napi_gro_complete(struct sk_buff *skb)
> @@ -4021,7 +4024,7 @@ static int process_backlog(struct napi_struct *napi,
> int quota)
>  {
>  	int work = 0;
>  	struct softnet_data *sd = container_of(napi, struct softnet_data, backlog);
> -
> +	unsigned long flags;
>  #ifdef CONFIG_RPS
>  	/* Check if we have pending ipi, its better to send them now,
>  	 * not waiting net_rx_action() end.
> @@ -4032,18 +4035,19 @@ static int process_backlog(struct napi_struct
> *napi, int quota)
>  	}
>  #endif
>  	napi->weight = weight_p;
> -	local_irq_disable();
> +	spin_lock_irqsave(&sd->process_queue.lock, flags);
>  	while (work < quota) {
>  		struct sk_buff *skb;
>  		unsigned int qlen;
> 
>  		while ((skb = __skb_dequeue(&sd->process_queue))) {
> -			local_irq_enable();
> +			spin_unlock_irqrestore(&sd->process_queue.lock, flags);
>  			__netif_receive_skb(skb);
> -			local_irq_disable();
> +			spin_lock_irqsave(&sd->process_queue.lock, flags);

>  			input_queue_head_incr(sd);
>  			if (++work >= quota) {
> -				local_irq_enable();
> +				spin_unlock_irqrestore(&sd->process_queue.lock,
> +						       flags);
>  				return work;
>  			}


This would be a terrible change in term of performance.

There is something wrong here : process_queue is local to the cpu.

A cpu does not have to use a spin lock to protect this queue.

Can you reproduce the bug on x86 ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net] net: rps: fix cpu unplug
  2015-01-15 19:28     ` Eric Dumazet
@ 2015-01-15 21:46       ` Eric Dumazet
  2015-01-15 22:29         ` subashab
  2015-01-16  1:04         ` [PATCH v2 " Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-01-15 21:46 UTC (permalink / raw)
  To: subashab, Prasad Sodagudi; +Cc: netdev, Tom Herbert

From: Eric Dumazet <edumazet@google.com>

softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
---

Could you test this fix instead of yours ? Thanks !

 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1e325adc43678084418ef9e1abb1fca8059ff599..76f72762b325cfa927a793af180189c51e9eaffd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7089,7 +7089,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 		netif_rx_internal(skb);
 		input_queue_head_incr(oldsd);
 	}
-	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
+	while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
 		netif_rx_internal(skb);
 		input_queue_head_incr(oldsd);
 	}

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: rps: fix cpu unplug
  2015-01-15 21:46       ` [PATCH net] net: rps: fix cpu unplug Eric Dumazet
@ 2015-01-15 22:29         ` subashab
  2015-01-16  0:14           ` Eric Dumazet
  2015-01-16  1:04         ` [PATCH v2 " Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: subashab @ 2015-01-15 22:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Prasad Sodagudi, netdev, Tom Herbert

Thanks for the patch. I shall try it out and provide feedback soon.
But we think the race condition issue is different. The crash was observed
in the process_queue.

On the event of a CPU hotplug, the NAPI poll_list is copied over from the
offline CPU to the CPU on which dev_cpu_callback() was called. These
operations happens in  dev_cpu_callback() in the context of the notifier
chain from hotplug framework.  Also in the same hotplug notifier context
(dev_cpu_callback) the input_pkt_queue and  process_queue of the offline
CPU are dequeued and sent up the network stack and this is where I think
the race/problem is.

Context1: The online CPU starts processing the poll_list from
net_rx_action since a
softIRQ was raised in dev_cpu_callback(). process_backlog() draining the
process queue

Context2: hotplug notifier dev_cpu_callback() draining the queues and
calling netif_rx().

from dev_cpu_callback()
	/* Process offline CPU's input_pkt_queue */
	while ((skb = __skb_dequeue(&oldsd->process_queue))) {
		netif_rx(skb);
		input_queue_head_incr(oldsd);
	}
	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
		netif_rx(skb);
		input_queue_head_incr(oldsd);
	}

Is this de-queuing(the above code snippet from dev_cpu_callback())
actually necessary since the poll_list should already have the backlog
napi struct of the old CPU? In this case when process_backlog()
actually runs it should drain these two queues of older CPU.
Let me know your thoughts.

Thanks
KS

The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 a Linux Foundation Collaborative Project

> From: Eric Dumazet <edumazet@google.com>
>
> softnet_data.input_pkt_queue is protected by a spinlock that
> we must hold when transferring packets from victim queue to an active
> one. This is because other cpus could still be trying to enqueue packets
> into victim queue.
>
> Based on initial patch from Prasad Sodagudi & Subash Abhinov
> Kasiviswanathan.
>
> This version is better because we do not slow down packet processing,
> only make migration safer.
>
> Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
> Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tom Herbert <therbert@google.com>
> ---
>
> Could you test this fix instead of yours ? Thanks !
>
>  net/core/dev.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index
> 1e325adc43678084418ef9e1abb1fca8059ff599..76f72762b325cfa927a793af180189c51e9eaffd
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7089,7 +7089,7 @@ static int dev_cpu_callback(struct notifier_block
> *nfb,
>  		netif_rx_internal(skb);
>  		input_queue_head_incr(oldsd);
>  	}
> -	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
> +	while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
>  		netif_rx_internal(skb);
>  		input_queue_head_incr(oldsd);
>  	}
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: rps: fix cpu unplug
  2015-01-15 22:29         ` subashab
@ 2015-01-16  0:14           ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-01-16  0:14 UTC (permalink / raw)
  To: subashab; +Cc: Prasad Sodagudi, netdev, Tom Herbert

On Thu, 2015-01-15 at 22:29 +0000, subashab@codeaurora.org wrote:
> Thanks for the patch. I shall try it out and provide feedback soon.
> But we think the race condition issue is different. The crash was observed
> in the process_queue.


> 
> On the event of a CPU hotplug, the NAPI poll_list is copied over from the
> offline CPU to the CPU on which dev_cpu_callback() was called. These
> operations happens in  dev_cpu_callback() in the context of the notifier
> chain from hotplug framework.  Also in the same hotplug notifier context
> (dev_cpu_callback) the input_pkt_queue and  process_queue of the offline
> CPU are dequeued and sent up the network stack and this is where I think
> the race/problem is.
> 
> Context1: The online CPU starts processing the poll_list from
> net_rx_action since a
> softIRQ was raised in dev_cpu_callback(). process_backlog() draining the
> process queue
> 
> Context2: hotplug notifier dev_cpu_callback() draining the queues and
> calling netif_rx().
> 
> from dev_cpu_callback()
> 	/* Process offline CPU's input_pkt_queue */
> 	while ((skb = __skb_dequeue(&oldsd->process_queue))) {
> 		netif_rx(skb);
> 		input_queue_head_incr(oldsd);
> 	}
> 	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
> 		netif_rx(skb);
> 		input_queue_head_incr(oldsd);
> 	}
> 
> Is this de-queuing(the above code snippet from dev_cpu_callback())
> actually necessary since the poll_list should already have the backlog
> napi struct of the old CPU? In this case when process_backlog()
> actually runs it should drain these two queues of older CPU.
> Let me know your thoughts.


input_pkt_queue and process_queue have nothing to do with NAPI
poll_list : They store skbs.

dev_cpu_callback() is called when the cpu we are offlining is no longer
running. No interrupts either serviced by this offline cpu.

You have the absolute guarantee No one is manipulating process_queue at
the same time than you.

It looks like you found another issue, not related to RPS, but due to
the fact that commit 264524d5e5195f6e
("net: cpu offline cause napi stall") did not exclude the percpu
backlog.

process_backlog() MUST be called by the owner cpu. Otherwise we would
need to add locking everywhere, as you did, and this is simply insane.

I'll send a V2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 net] net: rps: fix cpu unplug
  2015-01-15 21:46       ` [PATCH net] net: rps: fix cpu unplug Eric Dumazet
  2015-01-15 22:29         ` subashab
@ 2015-01-16  1:04         ` Eric Dumazet
  2015-01-16  1:21           ` Eric Dumazet
  2015-01-16  6:05           ` David Miller
  1 sibling, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-01-16  1:04 UTC (permalink / raw)
  To: subashab; +Cc: Prasad Sodagudi, netdev, Tom Herbert

From: Eric Dumazet <edumazet@google.com>

softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.

Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |   20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1e325adc4367..7f028d441e98 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7075,10 +7075,20 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 		oldsd->output_queue = NULL;
 		oldsd->output_queue_tailp = &oldsd->output_queue;
 	}
-	/* Append NAPI poll list from offline CPU. */
-	if (!list_empty(&oldsd->poll_list)) {
-		list_splice_init(&oldsd->poll_list, &sd->poll_list);
-		raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	/* Append NAPI poll list from offline CPU, with one exception :
+	 * process_backlog() must be called by cpu owning percpu backlog.
+	 * We properly handle process_queue & input_pkt_queue later.
+	 */
+	while (!list_empty(&oldsd->poll_list)) {
+		struct napi_struct *napi = list_first_entry(&oldsd->poll_list,
+							    struct napi_struct,
+							    poll_list);
+
+		list_del_init(&napi->poll_list);
+		if (napi->poll == process_backlog)
+			napi->state = 0;
+		else
+			____napi_schedule(sd, napi);
 	}
 
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
@@ -7089,7 +7099,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 		netif_rx_internal(skb);
 		input_queue_head_incr(oldsd);
 	}
-	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
+	while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
 		netif_rx_internal(skb);
 		input_queue_head_incr(oldsd);
 	}

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net] net: rps: fix cpu unplug
  2015-01-16  1:04         ` [PATCH v2 " Eric Dumazet
@ 2015-01-16  1:21           ` Eric Dumazet
  2015-01-16  6:05           ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-01-16  1:21 UTC (permalink / raw)
  To: subashab; +Cc: Prasad Sodagudi, netdev, Tom Herbert

On Thu, 2015-01-15 at 17:04 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 

Small note : this patch was based on net-next but applies correctly to
net tree

Hunk #1 succeeded at 7072 (offset -3 lines).
Hunk #2 succeeded at 7096 (offset -3 lines).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net] net: rps: fix cpu unplug
  2015-01-16  1:04         ` [PATCH v2 " Eric Dumazet
  2015-01-16  1:21           ` Eric Dumazet
@ 2015-01-16  6:05           ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2015-01-16  6:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: subashab, psodagud, netdev, therbert

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Jan 2015 17:04:22 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> softnet_data.input_pkt_queue is protected by a spinlock that
> we must hold when transferring packets from victim queue to an active
> one. This is because other cpus could still be trying to enqueue packets
> into victim queue.
> 
> A second problem is that when we transfert the NAPI poll_list from
> victim to current cpu, we absolutely need to special case the percpu
> backlog, because we do not want to add complex locking to protect
> process_queue : Only owner cpu is allowed to manipulate it, unless cpu
> is offline.
> 
> Based on initial patch from Prasad Sodagudi & Subash Abhinov
> Kasiviswanathan.
> 
> This version is better because we do not slow down packet processing,
> only make migration safer.
> 
> Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
> Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-16  6:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-15  3:13 [PATCH] net: core: Fix race by protecting process_queues at CPU hotplug subashab
2015-01-15  7:01 ` Eric Dumazet
2015-01-15 19:03   ` subashab
2015-01-15 19:28     ` Eric Dumazet
2015-01-15 21:46       ` [PATCH net] net: rps: fix cpu unplug Eric Dumazet
2015-01-15 22:29         ` subashab
2015-01-16  0:14           ` Eric Dumazet
2015-01-16  1:04         ` [PATCH v2 " Eric Dumazet
2015-01-16  1:21           ` Eric Dumazet
2015-01-16  6:05           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox