netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using netconsole for debugging suspend/resume
@ 2006-06-08 17:50 Jeremy Fitzhardinge
  2006-06-08 20:35 ` Auke Kok
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-08 17:50 UTC (permalink / raw)
  To: Matt Mackall, Linux Kernel Mailing List, netdev

I've been trying to get suspend/resume working well on my new laptop.  
In general, netconsole has been pretty useful for extracting oopses and 
other messages, but it is of more limited help in debugging the actual 
suspend/resume cycle.  The problem looks like the e1000 driver won't 
suspend while netconsole is using it, so I have to rmmod/modprobe 
netconsole around the actual suspend/resume.

This is a big problem during resume because the screen is also blank, so 
I get no useful clue as to what went wrong when things go wrong.  I'm 
wondering if there's some way to keep netconsole alive to the last 
possible moment during suspend, and re-woken as soon as possible during 
resume.  It would be nice to have a clean solution, but I'm willing to 
use a bletcherous hack if that's what it takes.

Any ideas?

Thanks,
    J


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 17:50 Using netconsole for debugging suspend/resume Jeremy Fitzhardinge
@ 2006-06-08 20:35 ` Auke Kok
  2006-06-08 20:40 ` Rafael J. Wysocki
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Auke Kok @ 2006-06-08 20:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

Jeremy Fitzhardinge wrote:
> I've been trying to get suspend/resume working well on my new laptop.  
> In general, netconsole has been pretty useful for extracting oopses and 
> other messages, but it is of more limited help in debugging the actual 
> suspend/resume cycle.  The problem looks like the e1000 driver won't 
> suspend while netconsole is using it, so I have to rmmod/modprobe 
> netconsole around the actual suspend/resume.
> 
> This is a big problem during resume because the screen is also blank, so 
> I get no useful clue as to what went wrong when things go wrong.  I'm 
> wondering if there's some way to keep netconsole alive to the last 
> possible moment during suspend, and re-woken as soon as possible during 
> resume.  It would be nice to have a clean solution, but I'm willing to 
> use a bletcherous hack if that's what it takes.
> 
> Any ideas?

Have you tried using different cards/drivers? This might or might not be 
either a netconsole problem (generic) or driver related (which could impact 
other drivers too).

 From the top of my head I don't see any reason why the e1000 shouldn't handle 
the suspend event - but mind you that a fix for e1000/WoL impacting shutdown 
handlers was only recently added. Which kernels does this impact?

Auke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 17:50 Using netconsole for debugging suspend/resume Jeremy Fitzhardinge
  2006-06-08 20:35 ` Auke Kok
@ 2006-06-08 20:40 ` Rafael J. Wysocki
  2006-06-09  1:56   ` Jeremy Fitzhardinge
  2006-06-08 21:07 ` Matt Mackall
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2006-06-08 20:40 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

On Thursday 08 June 2006 19:50, Jeremy Fitzhardinge wrote:
> I've been trying to get suspend/resume working well on my new laptop.  
> In general, netconsole has been pretty useful for extracting oopses and 
> other messages, but it is of more limited help in debugging the actual 
> suspend/resume cycle.  The problem looks like the e1000 driver won't 
> suspend while netconsole is using it, so I have to rmmod/modprobe 
> netconsole around the actual suspend/resume.
> 
> This is a big problem during resume because the screen is also blank, so 
> I get no useful clue as to what went wrong when things go wrong.  I'm 
> wondering if there's some way to keep netconsole alive to the last 
> possible moment during suspend, and re-woken as soon as possible during 
> resume.  It would be nice to have a clean solution, but I'm willing to 
> use a bletcherous hack if that's what it takes.
> 
> Any ideas?

Please try doing "echo 8 > /proc/sys/kernel/printk" before suspend.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 17:50 Using netconsole for debugging suspend/resume Jeremy Fitzhardinge
  2006-06-08 20:35 ` Auke Kok
  2006-06-08 20:40 ` Rafael J. Wysocki
@ 2006-06-08 21:07 ` Matt Mackall
  2006-06-09  1:54   ` Jeremy Fitzhardinge
  2006-06-09  2:15   ` [PATCH RFC] netpoll: don't spin forever sending to stopped queues Jeremy Fitzhardinge
  2006-06-09  3:46 ` Using netconsole for debugging suspend/resume Andi Kleen
  2006-06-09  8:34 ` Pavel Machek
  4 siblings, 2 replies; 30+ messages in thread
From: Matt Mackall @ 2006-06-08 21:07 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Linux Kernel Mailing List, netdev

On Thu, Jun 08, 2006 at 10:50:57AM -0700, Jeremy Fitzhardinge wrote:
> I've been trying to get suspend/resume working well on my new laptop.  
> In general, netconsole has been pretty useful for extracting oopses and 
> other messages, but it is of more limited help in debugging the actual 
> suspend/resume cycle.  The problem looks like the e1000 driver won't 
> suspend while netconsole is using it, so I have to rmmod/modprobe 
> netconsole around the actual suspend/resume.

That's odd. Netpoll holds a reference to the device, of course, but so
does a normal "up" interface. So that shouldn't be the problem.
Another possibility is that outgoing packets from printks in the
driver are causing difficulty. Not sure what can be done about that.

> This is a big problem during resume because the screen is also blank, so 
> I get no useful clue as to what went wrong when things go wrong.  I'm 
> wondering if there's some way to keep netconsole alive to the last 
> possible moment during suspend, and re-woken as soon as possible during 
> resume.  It would be nice to have a clean solution, but I'm willing to 
> use a bletcherous hack if that's what it takes.

It's generally going to suck, because unlike a polled serial port, the
device needs to be put to sleep. But if you're doing suspend to RAM,
you might be able to do something like this:

- unhook net device from suspend machinery (possibly just return success)
- bounce out of suspend before the final call to ACPI is made

Net effect is you do OS-level suspend and resume of everything but the
NIC without actually powering down the core. Which should let you
debug just about everything.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 21:07 ` Matt Mackall
@ 2006-06-09  1:54   ` Jeremy Fitzhardinge
  2006-06-09  5:13     ` Auke Kok
  2006-06-09  2:15   ` [PATCH RFC] netpoll: don't spin forever sending to stopped queues Jeremy Fitzhardinge
  1 sibling, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-09  1:54 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Linux Kernel Mailing List, netdev

Matt Mackall wrote:
> That's odd. Netpoll holds a reference to the device, of course, but so
> does a normal "up" interface. So that shouldn't be the problem.
> Another possibility is that outgoing packets from printks in the
> driver are causing difficulty. Not sure what can be done about that.
>   
I only tried once; maybe I misunderstood what was going on.  I'll try 
again tonight.

Oh, I think I see what's happening.  The e1000 suspend routine does this:

	if (netif_running(netdev))
		e1000_down(adapter);

This leaves the interface up, but it stops the queue.  Then 
netpoll_send_skb() has this loop:

	do {
		npinfo->tries--;
		spin_lock(&np->dev->xmit_lock);
		np->dev->xmit_lock_owner = smp_processor_id();

		/*
		 * network drivers do not expect to be called if the queue is
		 * stopped.
		 */
		if (netif_queue_stopped(np->dev)) {
			np->dev->xmit_lock_owner = -1;
			spin_unlock(&np->dev->xmit_lock);
			netpoll_poll(np);
			udelay(50);
			continue;
		}
/* ... */
again: /* proposed */
	} while (npinfo->tries > 0);


so this will end up in an infinite loop, since netif_queue_stopped() 
will always return true, and it never looks at npinfo->tries.  Should 
the "continue" be "goto again"?

Also, e1000_down does a netif_poll_disable(), but I'm not sure what that 
actually does...  Should it prevent netpoll from even trying to send?
> It's generally going to suck, because unlike a polled serial port, the
> device needs to be put to sleep. But if you're doing suspend to RAM,
>   
I'm interested in suspend-to-ram.  I presume that with suspend-to-disk, 
booting with built-in netconsole will tell me useful stuff; that'll be 
the next experiment.

> you might be able to do something like this:
>
> - unhook net device from suspend machinery (possibly just return success)
> - bounce out of suspend before the final call to ACPI is made
>
> Net effect is you do OS-level suspend and resume of everything but the
> NIC without actually powering down the core. Which should let you
> debug just about everything.

Well, the machine has to really suspend so that I can see (and debug) a 
mostly normal resume.  In particular, I need the hardware to be zapped 
so I can see if it is being restarted properly.

What might work is to change the e1000 suspend routine to save enough 
state for resume to work, but keep the interface up so that netconsole 
can keep transmitting all the way up to the point that the final acpi 
call powers off the machine.

Then the e1000 would resume normally, including restarting the xmit 
queue so that netconsole can start again immediately; any netconsole 
output before the e1000 resume would be lost, of course (I guess it 
could be buffered).  That would suit me for now.

    J


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 20:40 ` Rafael J. Wysocki
@ 2006-06-09  1:56   ` Jeremy Fitzhardinge
  2006-06-09 10:34     ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-09  1:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

Rafael J. Wysocki wrote:
> Please try doing "echo 8 > /proc/sys/kernel/printk" before suspend.
>   
Um, why?  That would increase the amount of log output, but I don't see 
how it would help with netconsole preventing suspend, or not being able 
to see console messages on a blank screen after resume.

    J


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC] netpoll: don't spin forever sending to stopped queues
  2006-06-08 21:07 ` Matt Mackall
  2006-06-09  1:54   ` Jeremy Fitzhardinge
@ 2006-06-09  2:15   ` Jeremy Fitzhardinge
  2006-06-11 20:04     ` Matt Mackall
  1 sibling, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-09  2:15 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Linux Kernel Mailing List, netdev

Matt Mackall wrote:
> That's odd. Netpoll holds a reference to the device, of course, but so
> does a normal "up" interface. So that shouldn't be the problem.
> Another possibility is that outgoing packets from printks in the
> driver are causing difficulty. Not sure what can be done about that.
>   
Here's a patch.  I haven't tested it beyond compiling it, and I don't 
know if it is actually correct.  In this case, it seems pointless to 
spin waiting for an even which will never happen.  Should 
netif_poll_disable() cause netpoll_send_skb() (or something) to not even 
bother trying to send?  netif_poll_disable seems mysteriously simple to me.

    J

--

Subject: netpoll: don't spin forever sending to stopped queues

When transmitting a skb in netpoll_send_skb(), only retry a limited
number of times if the device queue is stopped.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>

diff -r aac813f54617 net/core/netpoll.c
--- a/net/core/netpoll.c	Wed Jun 07 14:53:40 2006 -0700
+++ b/net/core/netpoll.c	Thu Jun 08 19:00:29 2006 -0700
@@ -280,15 +280,10 @@ static void netpoll_send_skb(struct netp
 		 * network drivers do not expect to be called if the queue is
 		 * stopped.
 		 */
-		if (netif_queue_stopped(np->dev)) {
-			np->dev->xmit_lock_owner = -1;
-			spin_unlock(&np->dev->xmit_lock);
-			netpoll_poll(np);
-			udelay(50);
-			continue;
-		}
-
-		status = np->dev->hard_start_xmit(skb, np->dev);
+		status = NETDEV_TX_BUSY;
+		if (!netif_queue_stopped(np->dev))
+			status = np->dev->hard_start_xmit(skb, np->dev);
+
 		np->dev->xmit_lock_owner = -1;
 		spin_unlock(&np->dev->xmit_lock);
 




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 17:50 Using netconsole for debugging suspend/resume Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2006-06-08 21:07 ` Matt Mackall
@ 2006-06-09  3:46 ` Andi Kleen
  2006-06-09 15:24   ` Mark Lord
  2006-06-09  8:34 ` Pavel Machek
  4 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2006-06-09  3:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

On Thursday 08 June 2006 19:50, Jeremy Fitzhardinge wrote:
> I've been trying to get suspend/resume working well on my new laptop.  
> In general, netconsole has been pretty useful for extracting oopses and 
> other messages, but it is of more limited help in debugging the actual 
> suspend/resume cycle.  The problem looks like the e1000 driver won't 
> suspend while netconsole is using it, so I have to rmmod/modprobe 
> netconsole around the actual suspend/resume.

If your laptop has firewire you can also use firescope.
(ftp://ftp.suse.com/pub/people/ak/firescope/) 

> This is a big problem during resume because the screen is also blank, so 
> I get no useful clue as to what went wrong when things go wrong.  I'm 
> wondering if there's some way to keep netconsole alive to the last 
> possible moment during suspend, and re-woken as soon as possible during 
> resume.  It would be nice to have a clean solution, but I'm willing to 
> use a bletcherous hack if that's what it takes.

FW keeps running as long as nobody resets the ieee1394 chip.

Networking is much more complex and will likely never work well for such
low level debug situations. Netconsole is mostly useful to catch the
odd oops during runtime.

-Andi



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  1:54   ` Jeremy Fitzhardinge
@ 2006-06-09  5:13     ` Auke Kok
  2006-06-09  5:23       ` David Miller
  2006-06-09  5:45       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 30+ messages in thread
From: Auke Kok @ 2006-06-09  5:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
>> That's odd. Netpoll holds a reference to the device, of course, but so
>> does a normal "up" interface. So that shouldn't be the problem.
>> Another possibility is that outgoing packets from printks in the
>> driver are causing difficulty. Not sure what can be done about that.
>>   
> I only tried once; maybe I misunderstood what was going on.  I'll try 
> again tonight.
> 
> Oh, I think I see what's happening.  The e1000 suspend routine does this:
> 
>     if (netif_running(netdev))
>         e1000_down(adapter);
> 
> This leaves the interface up, but it stops the queue.  Then 
> netpoll_send_skb() has this loop:
> 
>     do {
>         npinfo->tries--;
>         spin_lock(&np->dev->xmit_lock);
>         np->dev->xmit_lock_owner = smp_processor_id();
> 
>         /*
>          * network drivers do not expect to be called if the queue is
>          * stopped.
>          */
>         if (netif_queue_stopped(np->dev)) {
>             np->dev->xmit_lock_owner = -1;
>             spin_unlock(&np->dev->xmit_lock);
>             netpoll_poll(np);
>             udelay(50);
>             continue;
>         }
> /* ... */
> again: /* proposed */
>     } while (npinfo->tries > 0);
> 
> 
> so this will end up in an infinite loop, since netif_queue_stopped() 
> will always return true, and it never looks at npinfo->tries.  Should 
> the "continue" be "goto again"?

netconsole should retry. There is no timeout programmed here since that might
lose important information, and you rather want netconsole to survive an odd
unplugged cable then to lose vital debugging information when the system is
busy for instance. (losing link will cause the interface to be down and thus
the queue to be stopped)

> Also, e1000_down does a netif_poll_disable(), but I'm not sure what that 
> actually does...  Should it prevent netpoll from even trying to send?

polling is for receives. We're basically telling the stack not to poll our
interface anymore.

>> It's generally going to suck, because unlike a polled serial port, the
>> device needs to be put to sleep. But if you're doing suspend to RAM,
>>   
> I'm interested in suspend-to-ram.  I presume that with suspend-to-disk, 
> booting with built-in netconsole will tell me useful stuff; that'll be 
> the next experiment.
> 
>> you might be able to do something like this:
>>
>> - unhook net device from suspend machinery (possibly just return success)
>> - bounce out of suspend before the final call to ACPI is made
>>
>> Net effect is you do OS-level suspend and resume of everything but the
>> NIC without actually powering down the core. Which should let you
>> debug just about everything.
> 
> Well, the machine has to really suspend so that I can see (and debug) a 
> mostly normal resume.  In particular, I need the hardware to be zapped 
> so I can see if it is being restarted properly.
> 
> What might work is to change the e1000 suspend routine to save enough 
> state for resume to work, but keep the interface up so that netconsole 
> can keep transmitting all the way up to the point that the final acpi 
> call powers off the machine.

e1000_suspend saves the entire configuration of the device and puts it in
Wake-on-Lan mode, allowing it to be waken up by your 'zap' in the proper way.

> Then the e1000 would resume normally, including restarting the xmit 
> queue so that netconsole can start again immediately; any netconsole 
> output before the e1000 resume would be lost, of course (I guess it 
> could be buffered).  That would suit me for now.

after coming out of suspend, e1000_resume is called which basically
reinitializes the entire device. In the entire sequence it is unlikely that
you'll actually be able to maintain netconsole in the first boot stage - the
network device will not be initialized by the kernel yet, and obviously will
be useless until e1000_resume is called!

I'm not sure that tweaking e1000 to survive longer is the answer here, and you
might be better off trying to have netconsole graciously wait
(msleep_interruptable instead of udelay?) In any case, I see the biggest
problem in the early boot stage when all nics are basically uninitialized
until resume starts. You just can't assign it an IP address for instance that
easy, and even resume causes the device to reset and thus link renegotiation,
adding crucial seconds to the time that the link is down, in which time you're
stacking up netconsole messages, or worse, fail to initialize netconsole

I hope this helps - I can't help but thinking that netconsole definately
wasn't designed with this in mind.

Cheers,

Auke


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  5:13     ` Auke Kok
@ 2006-06-09  5:23       ` David Miller
  2006-06-09  5:50         ` Andi Kleen
  2006-06-09  5:45       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 30+ messages in thread
From: David Miller @ 2006-06-09  5:23 UTC (permalink / raw)
  To: auke-jan.h.kok; +Cc: jeremy, mpm, linux-kernel, netdev

From: Auke Kok <auke-jan.h.kok@intel.com>
Date: Thu, 08 Jun 2006 22:13:48 -0700

> netconsole should retry. There is no timeout programmed here since that might
> lose important information, and you rather want netconsole to survive an odd
> unplugged cable then to lose vital debugging information when the system is
> busy for instance. (losing link will cause the interface to be down and thus
> the queue to be stopped)

I completely disagree that netpoll should loop when the ethernet
cable is plugged out.  This stops the entire system.  What if this
is one of my main web servers and I have other links on the machine
for redundancy and load balancing?  Just because some careless
sysop knocks one of the cables out, my system just freezes up and
stops?

What if I'm on a remote serial console, how long should I scratch
my head wondering why the whole machine is frozen up before I "figure
out" that the ethernet cable being out has made my system unusable
because netpoll is just looping on the thing forever?

That's an extremely poor quality of implementation if you ask me.

Netpoll is _BEST_ _EFFORT_, end of story.  It by definition can only
offer that level of service because it does locking in circumstances
where such locking might be illegal or even impossible.  So it has to
try, but if it can't get the resources it needs, it must stop trying
and abort the logging.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  5:13     ` Auke Kok
  2006-06-09  5:23       ` David Miller
@ 2006-06-09  5:45       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-09  5:45 UTC (permalink / raw)
  To: Auke Kok; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

Auke Kok wrote:
> netconsole should retry. There is no timeout programmed here since 
> that might
> lose important information, and you rather want netconsole to survive 
> an odd
> unplugged cable then to lose vital debugging information when the 
> system is
> busy for instance. (losing link will cause the interface to be down 
> and thus
> the queue to be stopped)
Well, the trouble is that it ends up spinning forever in the suspend 
case.  The driver's suspend routine has XOFFed the queue, and its never 
going to come back if netconsole clogs everything up over it.

Perhaps the correct fix isn't at the netpoll level, but at the 
netconsole level, but the behaviour of "suspend ethernet, netconsole 
drops bits into the bucket until the ether comes back" seems to be the 
best we can hope for.

The present behaviour is definitely bad, since it will prevent any 
system from suspending while using netconsole, so you'd need to make it 
modular and rmmod/modprobe it around the suspend event - definitely 
losing more information.

Also it means that if you kick a cable, the machine will eventually lock 
up, which doesn't seem like the best behaviour...

Even so, it will wait for 1 second per skb sent (20000 x 50uS) to wait 
for the queue to be started, so it will be pretty slow, and will recover 
from little hiccups without losing much.

> polling is for receives. We're basically telling the stack not to poll 
> our
> interface anymore.
OK, I see.

> e1000_suspend saves the entire configuration of the device and puts it in
> Wake-on-Lan mode, allowing it to be waken up by your 'zap' in the 
> proper way.
Not sure that's terribly useful.  It would be nice to be able to zap the 
ethernet to get a console dump from early stages, but talking to the 
device depends on all the intermediate PCI stuff being set up first, so 
netconsole could cause even more of a mess.
>> Then the e1000 would resume normally, including restarting the xmit 
>> queue so that netconsole can start again immediately; any netconsole 
>> output before the e1000 resume would be lost, of course (I guess it 
>> could be buffered).  That would suit me for now.
>
> after coming out of suspend, e1000_resume is called which basically
> reinitializes the entire device. In the entire sequence it is unlikely 
> that
> you'll actually be able to maintain netconsole in the first boot stage 
> - the
> network device will not be initialized by the kernel yet, and 
> obviously will
> be useless until e1000_resume is called!
Yes, but I think that's OK for what I'm looking at.  The problems I'm 
seeing happen later, and as I said in the first mail, I'm willing to 
accept a bletcherous hack if necessary (though obviously something clean 
and mergable would be preferable).

At the netpoll level, assuming that netpoll_send_skb doesn't busywait 
forever while the queue is XOFFed, it will toss things until the moment 
the ethernet device queue is up, and then it will resume as normal.

> I'm not sure that tweaking e1000 to survive longer is the answer here, 
> and you
> might be better off trying to have netconsole graciously wait
> (msleep_interruptable instead of udelay?)
Pretty sure netpoll can't sleep there...

> In any case, I see the biggest
> problem in the early boot stage when all nics are basically uninitialized
> until resume starts. You just can't assign it an IP address for 
> instance that
> easy, and even resume causes the device to reset and thus link 
> renegotiation,
> adding crucial seconds to the time that the link is down, in which 
> time you're
> stacking up netconsole messages, or worse, fail to initialize netconsole
netconsole has already been initialized.  It doesn't need reinit on resume.

> I hope this helps - I can't help but thinking that netconsole definately
> wasn't designed with this in mind.
Perhaps not, but it isn't far from being a useful tool in this case.  
Its much better than the alternative of having no information at all 
about the whole process.

    J

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  5:23       ` David Miller
@ 2006-06-09  5:50         ` Andi Kleen
  2006-06-09 17:14           ` Matt Mackall
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2006-06-09  5:50 UTC (permalink / raw)
  To: David Miller; +Cc: auke-jan.h.kok, jeremy, mpm, linux-kernel, netdev

On Friday 09 June 2006 07:23, David Miller wrote:
> From: Auke Kok <auke-jan.h.kok@intel.com>
> Date: Thu, 08 Jun 2006 22:13:48 -0700
> 
> > netconsole should retry. There is no timeout programmed here since that might
> > lose important information, and you rather want netconsole to survive an odd
> > unplugged cable then to lose vital debugging information when the system is
> > busy for instance. (losing link will cause the interface to be down and thus
> > the queue to be stopped)
> 
> I completely disagree that netpoll should loop when the ethernet
> cable is plugged out. 

Currently it is a bit dumb and doesn't distingush the various cases
well.

I submitted a patch to loop to be a bit more clever at some point. It can be still
found in the netdev archives.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-08 17:50 Using netconsole for debugging suspend/resume Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2006-06-09  3:46 ` Using netconsole for debugging suspend/resume Andi Kleen
@ 2006-06-09  8:34 ` Pavel Machek
  4 siblings, 0 replies; 30+ messages in thread
From: Pavel Machek @ 2006-06-09  8:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

On Čt 08-06-06 10:50:57, Jeremy Fitzhardinge wrote:
> I've been trying to get suspend/resume working well on my new laptop.  

Suspend-to-disk or -to-ram? You know about suspend.sf.net, right?

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  1:56   ` Jeremy Fitzhardinge
@ 2006-06-09 10:34     ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2006-06-09 10:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Matt Mackall, Linux Kernel Mailing List, netdev

On Friday 09 June 2006 03:56, Jeremy Fitzhardinge wrote:
> Rafael J. Wysocki wrote:
> > Please try doing "echo 8 > /proc/sys/kernel/printk" before suspend.
> >   
> Um, why?  That would increase the amount of log output, but I don't see 
> how it would help with netconsole preventing suspend, or not being able 
> to see console messages on a blank screen after resume.

Ah, that's after resume.  Sorry for the noise. :-)

Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  3:46 ` Using netconsole for debugging suspend/resume Andi Kleen
@ 2006-06-09 15:24   ` Mark Lord
  2006-06-12 11:21     ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Lord @ 2006-06-09 15:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Matt Mackall, Linux Kernel Mailing List,
	netdev

Andi Kleen wrote:
> 
> If your laptop has firewire you can also use firescope.
> (ftp://ftp.suse.com/pub/people/ak/firescope/) 
..
> FW keeps running as long as nobody resets the ieee1394 chip.

This looks interesting.  But how does one set it up for use
on the *other* end of that firewire cable?  The Quickstart and
manpage don't seem to describe this fully.

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09  5:50         ` Andi Kleen
@ 2006-06-09 17:14           ` Matt Mackall
  0 siblings, 0 replies; 30+ messages in thread
From: Matt Mackall @ 2006-06-09 17:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, auke-jan.h.kok, jeremy, linux-kernel, netdev

On Fri, Jun 09, 2006 at 07:50:25AM +0200, Andi Kleen wrote:
> On Friday 09 June 2006 07:23, David Miller wrote:
> > From: Auke Kok <auke-jan.h.kok@intel.com>
> > Date: Thu, 08 Jun 2006 22:13:48 -0700
> > 
> > > netconsole should retry. There is no timeout programmed here since that might
> > > lose important information, and you rather want netconsole to survive an odd
> > > unplugged cable then to lose vital debugging information when the system is
> > > busy for instance. (losing link will cause the interface to be down and thus
> > > the queue to be stopped)
> > 
> > I completely disagree that netpoll should loop when the ethernet
> > cable is plugged out. 
> 
> Currently it is a bit dumb and doesn't distingush the various cases
> well.
> 
> I submitted a patch to loop to be a bit more clever at some point. It can be still
> found in the netdev archives.

Agreed that timeouts should happen.

IIRC, the trouble with your patch was that it a) timed out on far too
short a timescale and b) locked up on my box. Unfortunately, so did my
own patch, which made timeouts approximately 1ms.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC] netpoll: don't spin forever sending to stopped queues
  2006-06-09  2:15   ` [PATCH RFC] netpoll: don't spin forever sending to stopped queues Jeremy Fitzhardinge
@ 2006-06-11 20:04     ` Matt Mackall
  2006-06-12 20:57       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 30+ messages in thread
From: Matt Mackall @ 2006-06-11 20:04 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Linux Kernel Mailing List, netdev

On Thu, Jun 08, 2006 at 07:15:50PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> >That's odd. Netpoll holds a reference to the device, of course, but so
> >does a normal "up" interface. So that shouldn't be the problem.
> >Another possibility is that outgoing packets from printks in the
> >driver are causing difficulty. Not sure what can be done about that.
> >  
> Here's a patch.  I haven't tested it beyond compiling it, and I don't 
> know if it is actually correct.  In this case, it seems pointless to 
> spin waiting for an even which will never happen.  Should 
> netif_poll_disable() cause netpoll_send_skb() (or something) to not even 
> bother trying to send?  netif_poll_disable seems mysteriously simple to me.
> 
>    J

Did this work for you at all?

> When transmitting a skb in netpoll_send_skb(), only retry a limited
> number of times if the device queue is stopped.

Where limited = once?

> Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
> 
> diff -r aac813f54617 net/core/netpoll.c
> --- a/net/core/netpoll.c	Wed Jun 07 14:53:40 2006 -0700
> +++ b/net/core/netpoll.c	Thu Jun 08 19:00:29 2006 -0700
> @@ -280,15 +280,10 @@ static void netpoll_send_skb(struct netp
> 		 * network drivers do not expect to be called if the queue is
> 		 * stopped.
> 		 */
> -		if (netif_queue_stopped(np->dev)) {
> -			np->dev->xmit_lock_owner = -1;
> -			spin_unlock(&np->dev->xmit_lock);
> -			netpoll_poll(np);
> -			udelay(50);
> -			continue;
> -		}
> -
> -		status = np->dev->hard_start_xmit(skb, np->dev);
> +		status = NETDEV_TX_BUSY;
> +		if (!netif_queue_stopped(np->dev))
> +			status = np->dev->hard_start_xmit(skb, np->dev);
> +
> 		np->dev->xmit_lock_owner = -1;
> 		spin_unlock(&np->dev->xmit_lock);
> 
> 
> 

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-09 15:24   ` Mark Lord
@ 2006-06-12 11:21     ` Andi Kleen
  2006-06-12 15:38       ` Mark Lord
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2006-06-12 11:21 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeremy Fitzhardinge, Matt Mackall, Linux Kernel Mailing List,
	netdev

On Friday 09 June 2006 17:24, Mark Lord wrote:
> Andi Kleen wrote:
> > 
> > If your laptop has firewire you can also use firescope.
> > (ftp://ftp.suse.com/pub/people/ak/firescope/) 
> ..
> > FW keeps running as long as nobody resets the ieee1394 chip.
> 
> This looks interesting.  But how does one set it up for use
> on the *other* end of that firewire cable?  The Quickstart and
> manpage don't seem to describe this fully.

It's in the manpage:

>.SH NOTES
>The target must have the ohci1394 driver loaded. This implies
>that firescope cannot be used in early boot.

That's it.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-12 11:21     ` Andi Kleen
@ 2006-06-12 15:38       ` Mark Lord
  2006-06-12 15:46         ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Lord @ 2006-06-12 15:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Matt Mackall, Linux Kernel Mailing List,
	netdev

Andi Kleen wrote:
> On Friday 09 June 2006 17:24, Mark Lord wrote:
>> Andi Kleen wrote:
>>> If your laptop has firewire you can also use firescope.
>>> (ftp://ftp.suse.com/pub/people/ak/firescope/) 
>> ..
>>> FW keeps running as long as nobody resets the ieee1394 chip.
>> This looks interesting.  But how does one set it up for use
>> on the *other* end of that firewire cable?  The Quickstart and
>> manpage don't seem to describe this fully.
> 
> It's in the manpage:
> 
>> .SH NOTES
>> The target must have the ohci1394 driver loaded. This implies
>> that firescope cannot be used in early boot.
> 
> That's it.

Okay, so I'm daft.  But.. *what* is "it" ??

We have two machines:  target (being debugged), and host (anything).
Sure, the target has to have ohci1394 loaded, and firescope running.
But what about the *other* end of the connection?  What commands?

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-12 15:38       ` Mark Lord
@ 2006-06-12 15:46         ` Andi Kleen
  2006-06-12 21:25           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2006-06-12 15:46 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeremy Fitzhardinge, Matt Mackall, Linux Kernel Mailing List,
	netdev

On Monday 12 June 2006 17:38, Mark Lord wrote:
> Andi Kleen wrote:
> > On Friday 09 June 2006 17:24, Mark Lord wrote:
> >> Andi Kleen wrote:
> >>> If your laptop has firewire you can also use firescope.
> >>> (ftp://ftp.suse.com/pub/people/ak/firescope/) 
> >> ..
> >>> FW keeps running as long as nobody resets the ieee1394 chip.
> >> This looks interesting.  But how does one set it up for use
> >> on the *other* end of that firewire cable?  The Quickstart and
> >> manpage don't seem to describe this fully.
> > 
> > It's in the manpage:
> > 
> >> .SH NOTES
> >> The target must have the ohci1394 driver loaded. This implies
> >> that firescope cannot be used in early boot.
> > 
> > That's it.
> 
> Okay, so I'm daft.  But.. *what* is "it" ??
> 
> We have two machines:  target (being debugged), and host (anything).
> Sure, the target has to have ohci1394 loaded, and firescope running.
> But what about the *other* end of the connection?  What commands?

>From the same manpage:
"The raw1394 module must be loaded and its device node
 be writable (this normally requires root)" 

Ok it doesn't say you need ohci1394 too and doesn't say that's the target.
If I do a new revision I'll perhaps expand the docs a bit.

So load ohci1394/raw1394 and run firescope as root. Your distribution
will hopefully take care of the device nodes. Usually you want 
something like firescope -Au System.map  

-Andi


>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC] netpoll: don't spin forever sending to stopped queues
  2006-06-12 20:57       ` Jeremy Fitzhardinge
@ 2006-06-12 20:53         ` Matt Mackall
  2006-06-12 21:20           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 30+ messages in thread
From: Matt Mackall @ 2006-06-12 20:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Linux Kernel Mailing List, netdev

On Mon, Jun 12, 2006 at 01:57:58PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> >On Thu, Jun 08, 2006 at 07:15:50PM -0700, Jeremy Fitzhardinge wrote:
> >  
> >>Here's a patch.  I haven't tested it beyond compiling it, and I don't 
> >>know if it is actually correct.  In this case, it seems pointless to 
> >>spin waiting for an even which will never happen.  Should 
> >>netif_poll_disable() cause netpoll_send_skb() (or something) to not even 
> >>bother trying to send?  netif_poll_disable seems mysteriously simple to 
> >>me.
> >>
> >>   J
> >>    
> >
> >Did this work for you at all?
> >  
> 
> No, it didn't appear to help; I get the same symptom.  I think fix is 
> correct (in that its better than what was there before), but there's 
> probably more going on in my case.  I haven't looked into it more deeply 
> yet.  I suspect there's another netpoll code path which is spinning 
> forever on an XOFFed queue.
> 
> >>When transmitting a skb in netpoll_send_skb(), only retry a limited
> >>number of times if the device queue is stopped.
> >>    
> >
> >Where limited = once?
> >  
> 
> No, it reuses the existing retry logic.  It retries 20000 times with a 
> 50us pause between attempts, so up to a second.  This seems excessive to 
> me; I don't know where those original numbers came from.  I tried 5000 
> retries, but it didn't make any difference to my case.

Ahh, right. I forgot that I'd done that. Can you resend?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC] netpoll: don't spin forever sending to stopped queues
  2006-06-11 20:04     ` Matt Mackall
@ 2006-06-12 20:57       ` Jeremy Fitzhardinge
  2006-06-12 20:53         ` Matt Mackall
  0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-12 20:57 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Linux Kernel Mailing List, netdev

Matt Mackall wrote:
> On Thu, Jun 08, 2006 at 07:15:50PM -0700, Jeremy Fitzhardinge wrote:
>   
>> Here's a patch.  I haven't tested it beyond compiling it, and I don't 
>> know if it is actually correct.  In this case, it seems pointless to 
>> spin waiting for an even which will never happen.  Should 
>> netif_poll_disable() cause netpoll_send_skb() (or something) to not even 
>> bother trying to send?  netif_poll_disable seems mysteriously simple to me.
>>
>>    J
>>     
>
> Did this work for you at all?
>   

No, it didn't appear to help; I get the same symptom.  I think fix is 
correct (in that its better than what was there before), but there's 
probably more going on in my case.  I haven't looked into it more deeply 
yet.  I suspect there's another netpoll code path which is spinning 
forever on an XOFFed queue.

>> When transmitting a skb in netpoll_send_skb(), only retry a limited
>> number of times if the device queue is stopped.
>>     
>
> Where limited = once?
>   

No, it reuses the existing retry logic.  It retries 20000 times with a 
50us pause between attempts, so up to a second.  This seems excessive to 
me; I don't know where those original numbers came from.  I tried 5000 
retries, but it didn't make any difference to my case.

    J

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC] netpoll: don't spin forever sending to stopped queues
  2006-06-12 20:53         ` Matt Mackall
@ 2006-06-12 21:20           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-12 21:20 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Linux Kernel Mailing List, netdev

Matt Mackall wrote:
> Ahh, right. I forgot that I'd done that. Can you resend?
>   
I just respun it against 2.6.17-rc6-mm2.

    J


--

Subject: netpoll: don't spin forever sending to blocked queues

When transmitting a skb in netpoll_send_skb(), only retry a limited
number of times if the device queue is stopped.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>

diff -r 0b8d3d4ee182 net/core/netpoll.c
--- a/net/core/netpoll.c	Mon Jun 12 13:46:23 2006 -0700
+++ b/net/core/netpoll.c	Mon Jun 12 13:48:34 2006 -0700
@@ -279,14 +279,10 @@ static void netpoll_send_skb(struct netp
 		 * network drivers do not expect to be called if the queue is
 		 * stopped.
 		 */
-		if (netif_queue_stopped(np->dev)) {
-			netif_tx_unlock(np->dev);
-			netpoll_poll(np);
-			udelay(50);
-			continue;
-		}
-
-		status = np->dev->hard_start_xmit(skb, np->dev);
+		status = NETDEV_TX_BUSY;
+		if (!netif_queue_stopped(np->dev))
+			status = np->dev->hard_start_xmit(skb, np->dev);
+
 		netif_tx_unlock(np->dev);
 
 		/* success */



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-12 15:46         ` Andi Kleen
@ 2006-06-12 21:25           ` Jeremy Fitzhardinge
  2006-06-13  3:47             ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-12 21:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Lord, Matt Mackall, Linux Kernel Mailing List, netdev

Andi Kleen wrote:
> On Monday 12 June 2006 17:38, Mark Lord wrote:
>   
>> Okay, so I'm daft.  But.. *what* is "it" ??
>>
>> We have two machines:  target (being debugged), and host (anything).
>> Sure, the target has to have ohci1394 loaded, and firescope running.
>> But what about the *other* end of the connection?  What commands?
>>     
>
> From the same manpage:
> "The raw1394 module must be loaded and its device node
>  be writable (this normally requires root)" 
>
> Ok it doesn't say you need ohci1394 too and doesn't say that's the target.
> If I do a new revision I'll perhaps expand the docs a bit.
>
> So load ohci1394/raw1394 and run firescope as root. Your distribution
> will hopefully take care of the device nodes. Usually you want 
> something like firescope -Au System.map  
>   

I think the confusion here is that the target doesn't need to be running 
anything; you can DMA chunks of memory with the OHCI controller with no 
need for any software support.  The debugger host is what's running 
firescope.

Unless I'm confused too, which is likely.  Andi, I think your docs 
should be more explicit about what runs where.

Also, the tricky bit for me is debugging resume; firescope still 
requires the OHCI device to come up to be useful, but I that's no 
different from using netconsole.

Neat stuff; I need to get my two firewire-enabled machines close enough 
to each other to try it out.

    J

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-12 21:25           ` Jeremy Fitzhardinge
@ 2006-06-13  3:47             ` Andi Kleen
  2006-06-13  4:49               ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2006-06-13  3:47 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Mark Lord, Matt Mackall, Linux Kernel Mailing List, netdev

On Monday 12 June 2006 23:25, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > On Monday 12 June 2006 17:38, Mark Lord wrote:
> >   
> >> Okay, so I'm daft.  But.. *what* is "it" ??
> >>
> >> We have two machines:  target (being debugged), and host (anything).
> >> Sure, the target has to have ohci1394 loaded, and firescope running.
> >> But what about the *other* end of the connection?  What commands?
> >>     
> >
> > From the same manpage:
> > "The raw1394 module must be loaded and its device node
> >  be writable (this normally requires root)" 
> >
> > Ok it doesn't say you need ohci1394 too and doesn't say that's the target.
> > If I do a new revision I'll perhaps expand the docs a bit.
> >
> > So load ohci1394/raw1394 and run firescope as root. Your distribution
> > will hopefully take care of the device nodes. Usually you want 
> > something like firescope -Au System.map  
> >   
> 
> I think the confusion here is that the target doesn't need to be running 
> anything; you can DMA chunks of memory with the OHCI controller with no 
> need for any software support.  

You need ohci1394 loaded at least once. That is why it only works
in relatively late boot.

I've been playing with the idea of writing "early1394" that just
turns the DMA controller on as early as possible similar to earlyprintk
on the target. Then it would be possible to use it for early
debugging too. But so far it's not done yet.

I'll try to write better docs next time.

BTW Bernd did a gdbstub based on the firescope
so you can even examine all kernel variables symbolically.  It can
even write variables, but not change the flow of the CPU.
Standard firescope can just hexdump read/write symbols. With gdb 
it's also possible to do a core file of the kernel.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-13  3:47             ` Andi Kleen
@ 2006-06-13  4:49               ` David Miller
  2006-06-13  4:54                 ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2006-06-13  4:49 UTC (permalink / raw)
  To: ak; +Cc: jeremy, lkml, mpm, linux-kernel, netdev

From: Andi Kleen <ak@suse.de>
Date: Tue, 13 Jun 2006 05:47:49 +0200

> I've been playing with the idea of writing "early1394" that just
> turns the DMA controller on as early as possible similar to earlyprintk
> on the target. Then it would be possible to use it for early
> debugging too. But so far it's not done yet.

Does this raw1394 thing with firescope just assume DMA address ==
physical address?  How would it work to access all of physical
memory properly on IOMMU platforms?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-13  4:49               ` David Miller
@ 2006-06-13  4:54                 ` Andi Kleen
  2006-06-13  5:03                   ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2006-06-13  4:54 UTC (permalink / raw)
  To: David Miller; +Cc: jeremy, lkml, mpm, linux-kernel, netdev

On Tuesday 13 June 2006 06:49, David Miller wrote:
> From: Andi Kleen <ak@suse.de>
> Date: Tue, 13 Jun 2006 05:47:49 +0200
> 
> > I've been playing with the idea of writing "early1394" that just
> > turns the DMA controller on as early as possible similar to earlyprintk
> > on the target. Then it would be possible to use it for early
> > debugging too. But so far it's not done yet.
> 
> Does this raw1394 thing with firescope just assume DMA address ==
> physical address? 

Yes.

> How would it work to access all of physical 
> memory properly on IOMMU platforms?

It assumes you don't have an IOMMU - relies on all memory
being accessible by ohci1394. On x86-64 it can't access > 4GB 
also, but that's normally ok because the kernel log buffer
is below that.

I guess if you use 1394 with remote DMA for other protocols (like
video etc.) there must be some way for the subsystem to map
the memory even on IOMMU systems. I admit I haven't dived that
deeply into the 1394 subsystem so I don't know how that works.

-Andi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-13  4:54                 ` Andi Kleen
@ 2006-06-13  5:03                   ` David Miller
  2006-06-13  7:18                     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2006-06-13  5:03 UTC (permalink / raw)
  To: ak; +Cc: jeremy, lkml, mpm, linux-kernel, netdev

From: Andi Kleen <ak@suse.de>
Date: Tue, 13 Jun 2006 06:54:14 +0200

> I guess if you use 1394 with remote DMA for other protocols (like
> video etc.) there must be some way for the subsystem to map
> the memory even on IOMMU systems. I admit I haven't dived that
> deeply into the 1394 subsystem so I don't know how that works.

Video-1394 has it's own driver, which does a consistent DMA
allocation, and then maps that into userspace using remap_pfn_range().
Entirely portable.

Strangely I don't even see any bus_to_virt() etc. calls in
the raw1394 driver, just these ptr2int() things...


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-13  5:03                   ` David Miller
@ 2006-06-13  7:18                     ` Christoph Hellwig
  2006-06-13  7:31                       ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2006-06-13  7:18 UTC (permalink / raw)
  To: David Miller; +Cc: ak, jeremy, lkml, mpm, linux-kernel, netdev

On Mon, Jun 12, 2006 at 10:03:46PM -0700, David Miller wrote:
> From: Andi Kleen <ak@suse.de>
> Date: Tue, 13 Jun 2006 06:54:14 +0200
> 
> > I guess if you use 1394 with remote DMA for other protocols (like
> > video etc.) there must be some way for the subsystem to map
> > the memory even on IOMMU systems. I admit I haven't dived that
> > deeply into the 1394 subsystem so I don't know how that works.
> 
> Video-1394 has it's own driver, which does a consistent DMA
> allocation, and then maps that into userspace using remap_pfn_range().
> Entirely portable.

That's actually not portable to certain arm platforms, but that's
a different story.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Using netconsole for debugging suspend/resume
  2006-06-13  7:18                     ` Christoph Hellwig
@ 2006-06-13  7:31                       ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2006-06-13  7:31 UTC (permalink / raw)
  To: hch; +Cc: ak, jeremy, lkml, mpm, linux-kernel, netdev

From: Christoph Hellwig <hch@infradead.org>
Date: Tue, 13 Jun 2006 08:18:19 +0100

> That's actually not portable to certain arm platforms, but that's
> a different story.

Yes, cache issues :-/

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2006-06-13  7:31 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08 17:50 Using netconsole for debugging suspend/resume Jeremy Fitzhardinge
2006-06-08 20:35 ` Auke Kok
2006-06-08 20:40 ` Rafael J. Wysocki
2006-06-09  1:56   ` Jeremy Fitzhardinge
2006-06-09 10:34     ` Rafael J. Wysocki
2006-06-08 21:07 ` Matt Mackall
2006-06-09  1:54   ` Jeremy Fitzhardinge
2006-06-09  5:13     ` Auke Kok
2006-06-09  5:23       ` David Miller
2006-06-09  5:50         ` Andi Kleen
2006-06-09 17:14           ` Matt Mackall
2006-06-09  5:45       ` Jeremy Fitzhardinge
2006-06-09  2:15   ` [PATCH RFC] netpoll: don't spin forever sending to stopped queues Jeremy Fitzhardinge
2006-06-11 20:04     ` Matt Mackall
2006-06-12 20:57       ` Jeremy Fitzhardinge
2006-06-12 20:53         ` Matt Mackall
2006-06-12 21:20           ` Jeremy Fitzhardinge
2006-06-09  3:46 ` Using netconsole for debugging suspend/resume Andi Kleen
2006-06-09 15:24   ` Mark Lord
2006-06-12 11:21     ` Andi Kleen
2006-06-12 15:38       ` Mark Lord
2006-06-12 15:46         ` Andi Kleen
2006-06-12 21:25           ` Jeremy Fitzhardinge
2006-06-13  3:47             ` Andi Kleen
2006-06-13  4:49               ` David Miller
2006-06-13  4:54                 ` Andi Kleen
2006-06-13  5:03                   ` David Miller
2006-06-13  7:18                     ` Christoph Hellwig
2006-06-13  7:31                       ` David Miller
2006-06-09  8:34 ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).