public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	netdev@oss.sgi.com, linux-kernel@vger.kernel.org
Subject: Re: 2.6.13-rc6-rt6
Date: Fri, 19 Aug 2005 15:47:58 -0700	[thread overview]
Message-ID: <20050819224758.GJ1298@us.ibm.com> (raw)
In-Reply-To: <1124486548.18408.18.camel@localhost.localdomain>

On Fri, Aug 19, 2005 at 05:22:28PM -0400, Steven Rostedt wrote:
> On Wed, 2005-08-17 at 18:23 +0200, Ingo Molnar wrote:
> 
> > 
> > > And it goes on and on. This happens everytime. Without netconsole, I
> > > only get the nonzero lock count error. Also, one of my lockups on SMP
> > > had to do with the kernel_thread_helper:
> > > 
> > > Using IPI Shortcut mode
> > > khelper/794[CPU#0]: BUG in set_new_owner at kernel/rt.c:916
> 
> This was with netconsole and showed up after a bunch of other bugs. So
> this is a side effect of what happened earlier.
> > 
> > this is a 'must not happen'. Somehow lock->held list got non-empty.  
> > Maybe some use-after-free thing? Havent seen it myself.
> 
> I started debugging netconsole with the RT patch and found this
> happening.  After seeing what's wrong, I looked at the latest git
> branch, and it seems to already have a similar solution that I was going
> to make. Here's a description of what's wrong.
> 
> In net/core/dev.c the following code is in net_rx_action:
> 
> 		netpoll_poll_lock(dev);
> 
> 		if (dev->quota <= 0 || dev->poll(dev, &budget)) {
> 			netpoll_poll_unlock(dev);
> 			raw_local_irq_disable();
> 			list_del(&dev->poll_list);
> 			list_add_tail(&dev->poll_list, &queue->poll_list);
> 			if (dev->quota < 0)
> 				dev->quota += dev->weight;
> 			else
> 				dev->quota = dev->weight;
> 		} else {
> 			netpoll_poll_unlock(dev);
> 
> The netpoll_poll_lock and netpoll_poll_unlock look like this (in current RT):
> 
> static inline netpoll_poll_lock(struct net_device *dev)
> {
> 	if (dev->npinfo) {
> 		spin_lock(&dev->npinfo->poll_lock);
> 		dev->npinfo->poll_owner = smp_processor_id();
> 	}
> }
> 
> static inline void netpoll_poll_unlock(struct net_device *dev)
> {
> 	if (dev->npinfo) {
> 		dev->npinfo->poll_owner = -1;
> 		spin_unlock(&dev->npinfo->poll_lock);
> 	}
> }
> 
> 
> The problem here is that between netpoll_poll_lock and
> netpoll_poll_unlock the dev->npinfo gets assigned. So we unlock the
> dev->npinfo->poll_lock without ever locking it.
> 
> Here's the port from the latest git to solve this. I've CCed the netdev,
> since I'm not sure I got all the places for rcu_lock for the netpoll. At
> least to solve this problem.  I did boot up the kernel and this patch
> did fix my bugs that I was getting using netconsole. (I have one more
> patch to send to fix the illegal API messages).
> 
> -- Steve
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> Index: linux_realtime_ernie/include/linux/netpoll.h
> ===================================================================
> --- linux_realtime_ernie/include/linux/netpoll.h	(revision 296)
> +++ linux_realtime_ernie/include/linux/netpoll.h	(working copy)
> @@ -60,25 +60,31 @@
>  	return ret;
>  }

Good catch -- but a few changes needed to be perfectly safe:

	static inline void *netpoll_poll_lock(struct net_device *dev)
	{

		struct netpoll_info *npi;

		rcu_read_lock();
		npi = rcu_dereference(dev)->npinfo;
		if (have) {
			spin_lock(&npi->poll_lock);
			npi->poll_owner = smp_processor_id();
			return npi;
		}
		return NULL;
	}

The earlier version could get in trouble if dev->npinfo was set
to NULL while this was executing.

I am assuming that the dev pointer is really what is being RCU-protected,
but this example uses mostly static data structures, so it is hard
for me to tell.  The npinfo pointer is not being RCU protected, as it
appears to never be changed.  The other candidate is the rx_np pointer,
which is set to NULL in netpoll_cleanup.  I suggest a modification
to netpoll_cleanup below that handles both possibilities.  Of course,
I might well be missing something...

> -static inline void netpoll_poll_unlock(struct net_device *dev)
> +static inline void netpoll_poll_unlock(void *have)
>  {
> -	if (dev->npinfo) {
> -		dev->npinfo->poll_owner = -1;
> -		spin_unlock(&dev->npinfo->poll_lock);
> +	struct netpoll_info *npi = have;
> +
> +	if (npi) {
> +		npi->poll_owner = -1;
> +		spin_unlock(&npi->poll_lock);
>  	}
> +	rcu_read_unlock();
>  }
>  
>  #else
>  #define netpoll_rx(a) 0
> -#define netpoll_poll_lock(a)
> +#define netpoll_poll_lock(a) 0
>  #define netpoll_poll_unlock(a)
>  #endif
>  
> Index: linux_realtime_ernie/net/core/netpoll.c
> ===================================================================
> --- linux_realtime_ernie/net/core/netpoll.c	(revision 296)
> +++ linux_realtime_ernie/net/core/netpoll.c	(working copy)

If netpoll_setup() is implicitly tearing down an earlier netpoll_setup(),
then something like Steve's change below might be needed.

> @@ -726,6 +726,9 @@
>  	/* last thing to do is link it to the net device structure */
>  	ndev->npinfo = npinfo;
>  
> +	/* avoid racing with NAPI reading npinfo */
> +	synchronize_rcu();
> +
>  	return 0;
>  
>   release:

Assuming that it is legal to block in netpoll_cleanup(), the following
should work.  The idea is to NULL the dev pointer, wait for all RCU
readers to get done, and only then complete the cleanup.

	void netpoll_cleanup(struct netpoll *np)
	{
		struct netpoll_info *npinfo;
		unsigned long flags;
		struct net_device *dp;

		if (np->dev) {
			dp = np->dev;
			rcu_assign_pointer(np->dev, NULL);
			synchronize_rcu();
			npinfo = dp->npinfo;
			if (npinfo && npinfo->rx_np == np) {
				spin_lock_irqsave(&npinfo->rx_lock, flags);
				npinfo->rx_np = NULL;
				npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
				spin_unlock_irqrestore(&npinfo->rx_lock, flags);
			}
			dev_put(dp);
		}

	}

Again, I do not fully understand this code, so a grain of salt might
come in handy.  But there definitely need to be some rcu_dereference()
and rcu_assign_pointer() primitives in there somewhere.  ;-)

The following changes look good to me, but, as I said earlier, I do
not claim to fully understand this code.

> Index: linux_realtime_ernie/net/core/dev.c
> ===================================================================
> --- linux_realtime_ernie/net/core/dev.c	(revision 296)
> +++ linux_realtime_ernie/net/core/dev.c	(working copy)
> @@ -1723,6 +1723,7 @@
>  
>  	while (!list_empty(&queue->poll_list)) {
>  		struct net_device *dev;
> +		void *have;
>  
>  		if (budget <= 0 || jiffies - start_time > 1)
>  			goto softnet_break;
> @@ -1735,10 +1736,10 @@
>  
>  		dev = list_entry(queue->poll_list.next,
>  				 struct net_device, poll_list);
> -		netpoll_poll_lock(dev);
> +		have = netpoll_poll_lock(dev);
>  
>  		if (dev->quota <= 0 || dev->poll(dev, &budget)) {
> -			netpoll_poll_unlock(dev);
> +			netpoll_poll_unlock(have);
>  			raw_local_irq_disable();
>  			list_del(&dev->poll_list);
>  			list_add_tail(&dev->poll_list, &queue->poll_list);
> @@ -1747,7 +1748,7 @@
>  			else
>  				dev->quota = dev->weight;
>  		} else {
> -			netpoll_poll_unlock(dev);
> +			netpoll_poll_unlock(have);
>  			dev_put(dev);
>  			raw_local_irq_disable();
>  		}

						Thanx, Paul

  reply	other threads:[~2005-08-19 22:47 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-16 12:18 2.6.13-rc6-rt3 Ingo Molnar
2005-08-16 15:31 ` 2.6.13-rc6-rt5 Steven Rostedt
2005-08-16 15:44   ` 2.6.13-rc6-rt5 Steven Rostedt
2005-08-16 16:08     ` 2.6.13-rc6-rt5 Steven Rostedt
2005-08-16 16:16       ` 2.6.13-rc6-rt5 Ingo Molnar
2005-08-16 16:22       ` 2.6.13-rc6-rt5 Ingo Molnar
2005-08-16 16:32       ` 2.6.13-rc6-rt5 Ingo Molnar
2005-08-16 16:37         ` 2.6.13-rc6-rt5 Ingo Molnar
2005-08-16 16:52           ` 2.6.13-rc6-rt5 Ingo Molnar
2005-08-16 17:08             ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-16 17:50               ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-16 18:07                 ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-16 18:50                   ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17  4:20                     ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17  5:46                       ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17  6:47                         ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-17 14:05                           ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17 14:24                             ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17 16:13                               ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17 16:23                                 ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-17 17:10                                   ` 2.6.13-rc6-rt6 K.R. Foley
2005-08-17 18:31                                     ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17 19:31                                       ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-18  0:02                                   ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-18  2:44                                     ` 2.6.13-rc6-rt6 Steven Rostedt
     [not found]                                       ` <20050822075012.GB19386@elte.hu>
     [not found]                                         ` <1124704837.5208.22.camel@localhost.localdomain>
     [not found]                                           ` <20050822101632.GA28803@elte.hu>
     [not found]                                             ` <1124710309.5208.30.camel@localhost.localdomain>
     [not found]                                               ` <20050822113858.GA1160@elte.hu>
     [not found]                                                 ` <1124715755.5647.4.camel@localhost.localdomain>
     [not found]                                                   ` <20050822183355.GB13888@elte.hu>
2005-08-22 19:40                                                     ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-22 19:44                                                       ` [RFC] RT-patch update to remove the global pi_lock Steven Rostedt
2005-08-22 22:19                                                         ` Daniel Walker
2005-08-23  0:26                                                           ` Steven Rostedt
2005-08-23  0:51                                                             ` Daniel Walker
2005-08-23  1:32                                                               ` Steven Rostedt
2005-08-23  3:38                                                                 ` Steven Rostedt
     [not found]                                                                   ` <1124908080.5604.22.camel@localhost.localdomain>
     [not found]                                                                     ` <1124917003.5711.8.camel@localhost.localdomain>
2005-08-24 21:05                                                                       ` Thomas Gleixner
2005-08-25  1:13                                                                       ` Steven Rostedt
2005-08-25  1:38                                                                         ` Daniel Walker
2005-08-25  1:48                                                                           ` Steven Rostedt
2005-08-25  6:31                                                                         ` Ingo Molnar
2005-08-25  6:35                                                                         ` Ingo Molnar
2005-08-25 16:15                                                                           ` Steven Rostedt
2005-08-25 19:34                                                                             ` Ingo Molnar
2005-08-25 19:46                                                                               ` Steven Rostedt
2005-08-23  5:29                                                             ` Ingo Molnar
2005-08-25 14:47                                                         ` Steven Rostedt
2005-08-25 15:06                                                           ` Steven Rostedt
2005-08-25 17:47                                                             ` Ingo Molnar
2005-08-25 20:09                                                               ` Steven Rostedt
2005-08-25 21:32                                                                 ` Daniel Walker
2005-08-26  2:23                                                                 ` Steven Rostedt
2005-08-26 13:52                                                                   ` Steven Rostedt
2005-08-30 15:00                                                                     ` Steven Rostedt
2005-08-30 15:52                                                                       ` Steven Rostedt
2005-08-30 23:08                                                                         ` Steven Rostedt
2005-08-31 15:01                                                                         ` [FYI] 2.6.13-rt3 and a nanosleep jitter test Steven Rostedt
2005-08-31 15:12                                                                           ` Daniel Walker
2005-08-31 15:30                                                                             ` Steven Rostedt
2005-08-31 15:13                                                                           ` Daniel Walker
2005-08-31 15:19                                                                             ` Steven Rostedt
2005-08-31 15:30                                                                               ` Daniel Walker
2005-08-23  5:46                                                       ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-19 21:22                                   ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-19 22:47                                     ` Paul E. McKenney [this message]
2005-08-19 23:02                                       ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-19 23:12                                         ` 2.6.13-rc6-rt6 Paul E. McKenney
2005-08-19 23:20                                           ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-19 23:44                                             ` 2.6.13-rc6-rt6 Paul E. McKenney
2005-08-22  7:53                                     ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-17 19:27                                 ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-17 19:39                                   ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17 17:32                           ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17 19:34                             ` 2.6.13-rc6-rt6 Steven Rostedt
2005-08-17  5:59                     ` 2.6.13-rc6-rt6 Ingo Molnar
2005-08-17 20:01 ` 2.6.13-rc6-rt8 Peter Bortas
2005-08-23  6:14   ` 2.6.13-rc6-rt8 Ingo Molnar
2005-08-28 20:36     ` 2.6.13-rc6-rt8 Peter Bortas
2005-08-18  9:57 ` 2.6.13-rc6-rt3 Alistair John Strachan
2005-08-18 10:00   ` 2.6.13-rc6-rt3 Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050819224758.GJ1298@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@oss.sgi.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox