All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Mackall <mpm@selenic.com>
To: avorontsov@ru.mvista.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	torvalds@linux-foundation.org, a.p.zijlstra@chello.nl,
	oleg@redhat.com, mingo@elte.hu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
Date: Thu, 09 Jul 2009 07:52:45 -0500	[thread overview]
Message-ID: <1247143965.21295.867.camel@calx> (raw)
In-Reply-To: <20090708222003.GA12318@oksana.dev.rtsoft.ru>

On Thu, 2009-07-09 at 02:20 +0400, Anton Vorontsov wrote:
> Using early netconsole and gianfar driver this error pops up:
> 
>   netconsole: timeout waiting for carrier
> 
> It appears that net/core/netpoll.c:netpoll_setup() is using
> cond_resched() in a loop waiting for a carrier.
> 
> The thing is that cond_resched() is a no-op when system_state !=
> SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
> scheduled, therefore link detection doesn't work.
> 
> I belive that the main problem is in cond_resched()[1], but despite
> how the cond_resched() story ends, it might be a good idea to call
> msleep(1) instead of cond_resched(), as suggested by Andrew Morton.
> 
> [1] http://lkml.org/lkml/2009/7/7/463
> 
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> ---
> 
> On Wed, Jul 08, 2009 at 02:47:44PM -0700, Andrew Morton wrote:
> > (belatedly cc'ing netdev)
> > 
> > Original diagnosis:
> > 
> > : Using early netconsole and gianfar driver this error pops up:
> > : 
> > :   netconsole: timeout waiting for carrier
> > : 
> > : It appears that net/core/netpoll.c:netpoll_setup() is using
> > : cond_resched() in a loop waiting for a carrier.
> > : 
> > : The thing is that cond_resched() is a no-op when system_state !=
> > : SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
> > : scheduled, therefore link detection doesn't work
> > 
> > > On Thu, 9 Jul 2009 01:33:31 +0400 Anton Vorontsov <avorontsov@ru.mvista.com> wrote:
> > > On Wed, Jul 08, 2009 at 02:10:24PM -0700, Andrew Morton wrote:
> > > > > On Wed, 8 Jul 2009 09:12:30 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > > That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> > > > > Testing that the scheduler is initialized may be the more correct one. I 
> > > > > think the SYSTEM_RUNNING one just comes from that being used for other 
> > > > > debug issues.
> > > > 
> > > > Agreed.  system_state is too general.
> > > > 
> > > > If we specifically want to know whether it is safe to call schedule() then
> > > > let's create a global boolean it_is_safe_to_call_schedule and test that,
> > > > rather than testing something which indirectly and unreliably implies "it
> > > > is safe to call schedule".  If that boolean already exists then no-brainer.
> > > > 
> > > > All that being said, I wonder if the netconsole code should be using
> > > > msleep(1) instead.  Spinning on cond_resched() is a bit rude.  But one
> > > > would have to verify that it is safe to call schedule() at this time, and
> > > > for the netconsole caller, this is dubious.
> > > 
> > > What do you mean by "verify that it is safe"? If it works,
> > > can I assume that it's safe? ;-) It works, fwiw.
> > > 
> > 
> > netconsole is supposed to be available as early as possible in boot for
> > obvious reasons.  I'd say there's a decent risk now and in the future that
> > netconsole will be initialised prior to the scheduler being available.
> > 
> > In fact, if "netconsole: timeout waiting for carrier" newly added to
> > netpoll_setup() a depedency on the scheduler being available then perhaps
> > that was an incorrect change.
> 
> 'git blame' says that carrier detection code didn't change since 2.6.12
> (where git history starts), PHYLIB is using workqueue since its
> submission (2.6.13). And SYSTEM_RUNNING check was added in 2.6.16.
> So it's not a new dependency.
> 
> The netpoll code is using msleep() just a few lines below cond_resched(),
> so we won't make things worse. ;-)

I think that's an improvement with or without the SYSTEM_RUNNING fix.

Signed-off-by: Matt Mackall <mpm@selenic.com>

> Thanks!
> 
>  net/core/netpoll.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 9675f31..df30feb 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -740,7 +740,7 @@ int netpoll_setup(struct netpoll *np)
>  				       np->name);
>  				break;
>  			}
> -			cond_resched();
> +			msleep(1);
>  		}
>  
>  		/* If carrier appears to come up instantly, we don't

-- 
http://selenic.com : development and support for Mercurial and Linux



  parent reply	other threads:[~2009-07-09 12:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-07 23:58 [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Anton Vorontsov
2009-07-08  0:50 ` Oleg Nesterov
2009-07-08  6:24   ` Peter Zijlstra
2009-07-08 12:03     ` Anton Vorontsov
2009-07-08 12:12       ` Peter Zijlstra
2009-07-08 12:55         ` Anton Vorontsov
2009-07-08 12:58           ` Peter Zijlstra
2009-07-08 20:45             ` [PATCH] sched: Make cond_resched*() available earlier Anton Vorontsov
2009-07-08 16:12     ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Linus Torvalds
2009-07-08 21:10       ` Andrew Morton
2009-07-08 21:33         ` Anton Vorontsov
2009-07-08 21:47           ` Andrew Morton
2009-07-08 22:20             ` [PATCH] netpoll: Fix carrier detection for drivers that are using phylib Anton Vorontsov
2009-07-09  0:01               ` Linus Torvalds
2009-07-09  3:08                 ` David Miller
2009-07-09  7:56                 ` Peter Zijlstra
2009-07-09 12:56                   ` Matt Mackall
2009-07-09 13:26                 ` Matt Mackall
2009-07-09 13:46                   ` Peter Zijlstra
2009-07-09 14:18                     ` Matt Mackall
2009-07-09 14:31                       ` Peter Zijlstra
2009-07-09 14:43                         ` Matt Mackall
2009-07-09 14:51                           ` Peter Zijlstra
2009-07-09 15:06                             ` Matt Mackall
2009-07-09 17:29                         ` Linus Torvalds
2009-07-09 12:52               ` Matt Mackall [this message]
2009-07-09 23:20         ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1247143965.21295.867.camel@calx \
    --to=mpm@selenic.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=avorontsov@ru.mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.