public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
       [not found]   ` <40FD4CFA.6070603@optonline.net>
@ 2004-07-20 17:46     ` Pavel Machek
  2004-07-20 18:10       ` Nathan Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2004-07-20 17:46 UTC (permalink / raw)
  To: Nathan Bryant; +Cc: linux-scsi, random1, Luben Tuikov, benh, kernel list

Hi!

> >Can you kill #if 0 code?
> 
> Yes. This is a work in progress. Interestingly, the ifdef'd-out code was 
> pasted from mptbase.c in the MPT Fusion driver. If it's broken here, 
> it's probably broken there -- seems the state parameter passed to the 
> pci resume callback is intended to be a PCI D state, not an ACPI S 
> state. Can somebody confirm or deny? The kernel is actually passing 
> state 2 (D2) to the driver when I enter ACPI S3, so presumably the same 
> failure could happen to fusion.

I'm no longer sure what should be passed there... We'll probably need
to turn it into enum... Actually swsusp code in -mm actually passes
value from enum, and mainline swsusp code passes 0/3. 

								Pavel
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 17:46     ` device_suspend() levels [was Re: [patch] ACPI work on aic7xxx] Pavel Machek
@ 2004-07-20 18:10       ` Nathan Bryant
  2004-07-20 18:25         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Bryant @ 2004-07-20 18:10 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-scsi, random1, Luben Tuikov, benh, kernel list

Pavel Machek wrote:
> Hi!
> 
> 
>>>Can you kill #if 0 code?
>>
>>Yes. This is a work in progress. Interestingly, the ifdef'd-out code was 
>>pasted from mptbase.c in the MPT Fusion driver. If it's broken here, 
>>it's probably broken there -- seems the state parameter passed to the 
>>pci resume callback is intended to be a PCI D state, not an ACPI S 
>>state. Can somebody confirm or deny? The kernel is actually passing 
>>state 2 (D2) to the driver when I enter ACPI S3, so presumably the same 
>>failure could happen to fusion.
> 
> 
> I'm no longer sure what should be passed there... We'll probably need
> to turn it into enum... Actually swsusp code in -mm actually passes
> value from enum, and mainline swsusp code passes 0/3. 
> 
> 								Pavel

Seems to me, aside from whether it's an enum or not, it should represent 
a D-state not an ACPI S-state. Some platforms (Power Mac?) probably 
implement PCI power management but not in an ACPI way.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 18:10       ` Nathan Bryant
@ 2004-07-20 18:25         ` Benjamin Herrenschmidt
  2004-07-20 18:34           ` Nathan Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-20 18:25 UTC (permalink / raw)
  To: Nathan Bryant
  Cc: Pavel Machek, linux-scsi, random1, Luben Tuikov,
	Linux Kernel list


> Seems to me, aside from whether it's an enum or not, it should represent 
> a D-state not an ACPI S-state. Some platforms (Power Mac?) probably 
> implement PCI power management but not in an ACPI way.

2 comments here:

 - The low level bus state (PCI D state for example) and the "linux"
state should be 2 different entities.

 - For PCI, we probably want a hook so the arch can implement it's own
version of pci_set_power_state() so that ACPI can use it's own trickery
there.

Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 18:25         ` Benjamin Herrenschmidt
@ 2004-07-20 18:34           ` Nathan Bryant
  2004-07-20 19:10             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Bryant @ 2004-07-20 18:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Pavel Machek, linux-scsi, random1, Luben Tuikov,
	Linux Kernel list

Benjamin Herrenschmidt wrote:

> 2 comments here:
> 
>  - The low level bus state (PCI D state for example) and the "linux"
> state should be 2 different entities.
> 
>  - For PCI, we probably want a hook so the arch can implement it's own
> version of pci_set_power_state() so that ACPI can use it's own trickery
> there.

Ok, so the takeaway message for driver writers is to treat the 
pci_dev->suspend() state parameter as an opaque value as far as 
possible, and just pass it on to the other layers


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 18:34           ` Nathan Bryant
@ 2004-07-20 19:10             ` Benjamin Herrenschmidt
  2004-07-20 19:23               ` Pavel Machek
       [not found]               ` <40FD82B1.8030704@optonline.net>
  0 siblings, 2 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-20 19:10 UTC (permalink / raw)
  To: Nathan Bryant
  Cc: Pavel Machek, linux-scsi, random1, Luben Tuikov,
	Linux Kernel list

On Tue, 2004-07-20 at 14:34, Nathan Bryant wrote:
> Benjamin Herrenschmidt wrote:
> 
> > 2 comments here:
> > 
> >  - The low level bus state (PCI D state for example) and the "linux"
> > state should be 2 different entities.
> > 
> >  - For PCI, we probably want a hook so the arch can implement it's own
> > version of pci_set_power_state() so that ACPI can use it's own trickery
> > there.
> 
> Ok, so the takeaway message for driver writers is to treat the 
> pci_dev->suspend() state parameter as an opaque value as far as 
> possible, and just pass it on to the other layers

NO ! The exact opposite in fact. I'll work on cleaning that up and
write some doco this week with Pavel.

Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 19:10             ` Benjamin Herrenschmidt
@ 2004-07-20 19:23               ` Pavel Machek
       [not found]               ` <40FD82B1.8030704@optonline.net>
  1 sibling, 0 replies; 13+ messages in thread
From: Pavel Machek @ 2004-07-20 19:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nathan Bryant, linux-scsi, random1, Luben Tuikov,
	Linux Kernel list

Hi!

> > > 2 comments here:
> > > 
> > >  - The low level bus state (PCI D state for example) and the "linux"
> > > state should be 2 different entities.
> > > 
> > >  - For PCI, we probably want a hook so the arch can implement it's own
> > > version of pci_set_power_state() so that ACPI can use it's own trickery
> > > there.
> > 
> > Ok, so the takeaway message for driver writers is to treat the 
> > pci_dev->suspend() state parameter as an opaque value as far as 
> > possible, and just pass it on to the other layers
> 
> NO ! The exact opposite in fact. I'll work on cleaning that up and
> write some doco this week with Pavel.

Agreed. I do not have strong opinion about Sx or Dx, but it should
be enum or equivalent so it is hard to get wrong. [And not "really
easy to get wrong because noone knows for sure", as it is now].

								Pavel
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
       [not found]               ` <40FD82B1.8030704@optonline.net>
@ 2004-07-20 20:41                 ` Benjamin Herrenschmidt
  2004-07-20 20:50                   ` Nathan Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-20 20:41 UTC (permalink / raw)
  To: Nathan Bryant; +Cc: Linux Kernel list

On Tue, 2004-07-20 at 16:38, Nathan Bryant wrote:
> Benjamin Herrenschmidt wrote:
> > NO ! The exact opposite in fact. I'll work on cleaning that up and
> > write some doco this week with Pavel.
> 
> Did more code reading. I assume the value passed to pci_dev->suspend is 
> going to be one of:
> 
> enum {
>          PM_SUSPEND_ON,
>          PM_SUSPEND_STANDBY,
>          PM_SUSPEND_MEM,
>          PM_SUSPEND_DISK,
>          PM_SUSPEND_MAX,
> };

Yes something around those lines

> And the value passed to pci_set_power_state will continue to be 0..3 
> (except enumerated)?

We could keep those unenumerated, which power state is to be used by the
device is under driver control, most of the time, they'll do D3 though,
for suspend-to-RAM, and they can probably just not do anything special
for suspend-to-disk (except shutting down the driver itself of course)

Note regarding aix7xxx, we also need proper hooks in the SCSI stack to
block the queue correctly etc... in the same way we do on IDE. I didn't
have time to look into this yet.

Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 20:41                 ` Benjamin Herrenschmidt
@ 2004-07-20 20:50                   ` Nathan Bryant
  2004-07-20 21:02                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Bryant @ 2004-07-20 20:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list

Benjamin Herrenschmidt wrote:

> Note regarding aix7xxx, we also need proper hooks in the SCSI stack to
> block the queue correctly etc... in the same way we do on IDE. I didn't
> have time to look into this yet.

Here's what we currently do, aic7xxx_core.c - looks like it attempts to 
quiesce, and then refuse to suspend if we happen to be busy. This is a 
little messy because it's done in the suspend call rather than the 
save_state call, therefore resume will still be called if this routine 
returns an error code, which will reinitialize the device when we didn't 
really need to.

int
ahc_suspend(struct ahc_softc *ahc)
{

         ahc_pause_and_flushwork(ahc);

         if (LIST_FIRST(&ahc->pending_scbs) != NULL) {
                 ahc_unpause(ahc);
                 return (EBUSY);
         }

#ifdef AHC_TARGET_MODE
         /*
          * XXX What about ATIOs that have not yet been serviced?
          * Perhaps we should just refuse to be suspended if we
          * are acting in a target role.
          */
         if (ahc->pending_device != NULL) {
                 ahc_unpause(ahc);
                 return (EBUSY);
         }
#endif
         ahc_shutdown(ahc);
         return (0);
}


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 20:50                   ` Nathan Bryant
@ 2004-07-20 21:02                     ` Benjamin Herrenschmidt
  2004-07-24 15:31                       ` Nathan Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-20 21:02 UTC (permalink / raw)
  To: Nathan Bryant; +Cc: Linux Kernel list

On Tue, 2004-07-20 at 16:50, Nathan Bryant wrote:
> Benjamin Herrenschmidt wrote:
> 
> > Note regarding aix7xxx, we also need proper hooks in the SCSI stack to
> > block the queue correctly etc... in the same way we do on IDE. I didn't
> > have time to look into this yet.
> 
> Here's what we currently do, aic7xxx_core.c - looks like it attempts to 
> quiesce, and then refuse to suspend if we happen to be busy. This is a 
> little messy because it's done in the suspend call rather than the 
> save_state call, therefore resume will still be called if this routine 
> returns an error code, which will reinitialize the device when we didn't 
> really need to.

save_state is a nonsense, didn't we kill it ? queue quiescing must be
done by the upper level, which is a bit nasty with things like md &
multipath... basically, the low level driver must have a way to notify
it's functional parent (as opposed to it's bus parent) that it's going
to sleep, and any path using this low level driver must then be
quiesced, the parent must resume only when all the drivers it relies
on are back up.

> int
> ahc_suspend(struct ahc_softc *ahc)
> {
> 
>          ahc_pause_and_flushwork(ahc);
> 
>          if (LIST_FIRST(&ahc->pending_scbs) != NULL) {
>                  ahc_unpause(ahc);
>                  return (EBUSY);
>          }
> 
> #ifdef AHC_TARGET_MODE
>          /*
>           * XXX What about ATIOs that have not yet been serviced?
>           * Perhaps we should just refuse to be suspended if we
>           * are acting in a target role.
>           */
>          if (ahc->pending_device != NULL) {
>                  ahc_unpause(ahc);
>                  return (EBUSY);
>          }
> #endif
>          ahc_shutdown(ahc);
>          return (0);
> }
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-20 21:02                     ` Benjamin Herrenschmidt
@ 2004-07-24 15:31                       ` Nathan Bryant
  2004-07-24 16:00                         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Bryant @ 2004-07-24 15:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list, Pavel Machek

Benjamin Herrenschmidt wrote:

>save_state is a nonsense, didn't we kill it ? queue quiescing must be
>done by the upper level, which is a bit nasty with things like md &
>multipath... basically, the low level driver must have a way to notify
>it's functional parent (as opposed to it's bus parent) that it's going
>to sleep, and any path using this low level driver must then be
>quiesced, the parent must resume only when all the drivers it relies
>on are back up.
>
Isn't sysfs supposed to take care of this for us? IOW, I shouldn't have 
to call up to the SCSI midlayer from pcidev->suspend in order to notify 
it of a suspend, the midlayer should call the driver before we ever try 
to suspend. This may become important some day when the upper layers 
need to decide which order to bring pci devices down

Looking in /sys/devices shows that sysfs already knows that 'host0' is a 
child of a SCSI PCI device.

$ ls 
/sys/devices/pci0000\:00/0000\:00\:1e.0/0000\:02\:02.0/host0/0\:0\:0\:0/
block   detach_state    model  queue_depth  rev         state    type
delete  device_blocked  power  rescan       scsi_level  timeout  vendor


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-24 15:31                       ` Nathan Bryant
@ 2004-07-24 16:00                         ` Benjamin Herrenschmidt
  2004-07-24 16:45                           ` Nathan Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-24 16:00 UTC (permalink / raw)
  To: Nathan Bryant; +Cc: Linux Kernel list, Pavel Machek

On Sat, 2004-07-24 at 11:31, Nathan Bryant wrote:
> Benjamin Herrenschmidt wrote:
> 
> >save_state is a nonsense, didn't we kill it ? 

sysfs only takes care about the bus hierarchy as far as suspend/resume
is concerned (which is the only sane way to do it imho)

> queue quiescing must be
> >done by the upper level, which is a bit nasty with things like md &
> >multipath... basically, the low level driver must have a way to notify
> >it's functional parent (as opposed to it's bus parent) that it's going
> >to sleep, and any path using this low level driver must then be
> >quiesced, the parent must resume only when all the drivers it relies
> >on are back up.
> >
> Isn't sysfs supposed to take care of this for us? IOW, I shouldn't have 
> to call up to the SCSI midlayer from pcidev->suspend in order to notify 
> it of a suspend, the midlayer should call the driver before we ever try 
> to suspend. This may become important some day when the upper layers 
> need to decide which order to bring pci devices down

No, the ordering cannot be dictated by the upper layer, but by the
physical bus hierarchy. The low level driver gets the suspend callback
and need to notify the parent. The md/multipath must keep track that one
of the device it relies on is going away and thus block the queues.

That is at least for machine suspend/resume.

> Looking in /sys/devices shows that sysfs already knows that 'host0' is a 
> child of a SCSI PCI device.

Yes, but the PM herarchy is the bus hierarchy, I don't see a simple way
of going through both in this case ...

> $ ls 
> /sys/devices/pci0000\:00/0000\:00\:1e.0/0000\:02\:02.0/host0/0\:0\:0\:0/
> block   detach_state    model  queue_depth  rev         state    type
> delete  device_blocked  power  rescan       scsi_level  timeout  vendor
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-24 16:00                         ` Benjamin Herrenschmidt
@ 2004-07-24 16:45                           ` Nathan Bryant
  2004-07-24 18:35                             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Bryant @ 2004-07-24 16:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list, Pavel Machek

Benjamin Herrenschmidt wrote:
> sysfs only takes care about the bus hierarchy as far as suspend/resume
> is concerned (which is the only sane way to do it imho)

I saw comments in one of the PCI IDE driver pcidev->suspend routines 
that say "we don't need to iterate over the list of drives, sysfs does 
that for us."

> No, the ordering cannot be dictated by the upper layer, but by the
> physical bus hierarchy. The low level driver gets the suspend callback
> and need to notify the parent. The md/multipath must keep track that one
> of the device it relies on is going away and thus block the queues.
> 
> That is at least for machine suspend/resume.

We're talking past each other. I'm saying you take into consideration 
the physical bus hierarchy: PCI bus x is a parent of SCSI bus y which is 
a parent of SCSI disk drive z. Suspend disk z, with involvement from the 
block layer and scsi midlayer, before even calling the actual 
pcidev->suspend routine on the SCSI bus adapter. Shouldn't require more 
than minimal LLD involvement.

>>Looking in /sys/devices shows that sysfs already knows that 'host0' is a 
>>child of a SCSI PCI device.
> 
> 
> Yes, but the PM herarchy is the bus hierarchy, I don't see a simple way
> of going through both in this case ...

In the case of IDE, IDE is registered as a bus_type and has generic 
suspend code for the whole bus that is unrelated to the pcidev. The PIIX 
IDE (Intel chipsets) PCI pcidev struct doesn't even have suspend and 
resume callbacks filled in, but it works fine!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: device_suspend() levels [was Re: [patch] ACPI work on aic7xxx]
  2004-07-24 16:45                           ` Nathan Bryant
@ 2004-07-24 18:35                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-24 18:35 UTC (permalink / raw)
  To: Nathan Bryant; +Cc: Linux Kernel list, Pavel Machek

On Sat, 2004-07-24 at 12:45, Nathan Bryant wrote:
> Benjamin Herrenschmidt wrote:
> > sysfs only takes care about the bus hierarchy as far as suspend/resume
> > is concerned (which is the only sane way to do it imho)
> 
> I saw comments in one of the PCI IDE driver pcidev->suspend routines 
> that say "we don't need to iterate over the list of drives, sysfs does 
> that for us."

That's different, because the disks are actually registered as
"struct device" childs of the bus, and thus get proper suspend/resume
callbacks.

> > No, the ordering cannot be dictated by the upper layer, but by the
> > physical bus hierarchy. The low level driver gets the suspend callback
> > and need to notify the parent. The md/multipath must keep track that one
> > of the device it relies on is going away and thus block the queues.
> > 
> > That is at least for machine suspend/resume.
> 
> We're talking past each other. I'm saying you take into consideration 
> the physical bus hierarchy: PCI bus x is a parent of SCSI bus y which is 
> a parent of SCSI disk drive z. Suspend disk z, with involvement from the 
> block layer and scsi midlayer, before even calling the actual 
> pcidev->suspend routine on the SCSI bus adapter. Shouldn't require more 
> than minimal LLD involvement.

Oh sure, the disks are in the loop, the problem happens with multipath
and such which "breaks" the bus hiearchy somewhat. The queue management
is part of the "functional" hierarchy (read: block layer) on top of
SCSI disks, thus the disks will be the one getting the suspend callback,
but they have to "notify" their functional parent (block layer, md, ...)
to properly get the queues stopped.

IDE sort-of does that internally, by generating a special request that
goes down the queue (in order to be properly ordered with whatever
is pending in the queue, including pending tagged commands if any),
and the "toplevel" IDE handling will stop processing the queue once
that request got past, but it's a hackery that at this point is quite
specific to drivers/ide/

> >>Looking in /sys/devices shows that sysfs already knows that 'host0' is a 
> >>child of a SCSI PCI device.
> > 
> > 
> > Yes, but the PM herarchy is the bus hierarchy, I don't see a simple way
> > of going through both in this case ...
> 
> In the case of IDE, IDE is registered as a bus_type and has generic 
> suspend code for the whole bus that is unrelated to the pcidev. The PIIX 
> IDE (Intel chipsets) PCI pcidev struct doesn't even have suspend and 
> resume callbacks filled in, but it works fine!

Well... not exactly. The pci_dev is the parent of the IDE bus in the
bus hierarchy. PIIX may lack the proper callbacks (it probably need
some stuff there too). For a good working example, ide/ppc/pmac.c,
it is a macio_dev (or a pci_dev, depending on the ASIC model).

The suspend request first reaches the disk(s), which does the queue
processing I mentioned earlier & stanby's the disks. Once that's done,
the parent (ide pmac) gets it's suspend call and does some suspend work
on the controller HW. Resume is the opposite, the controller gets
resumed first, then the child(s) (disk(s))

-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-07-24 18:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <40FD38A0.3000603@optonline.net>
     [not found] ` <20040720155928.GC10921@atrey.karlin.mff.cuni.cz>
     [not found]   ` <40FD4CFA.6070603@optonline.net>
2004-07-20 17:46     ` device_suspend() levels [was Re: [patch] ACPI work on aic7xxx] Pavel Machek
2004-07-20 18:10       ` Nathan Bryant
2004-07-20 18:25         ` Benjamin Herrenschmidt
2004-07-20 18:34           ` Nathan Bryant
2004-07-20 19:10             ` Benjamin Herrenschmidt
2004-07-20 19:23               ` Pavel Machek
     [not found]               ` <40FD82B1.8030704@optonline.net>
2004-07-20 20:41                 ` Benjamin Herrenschmidt
2004-07-20 20:50                   ` Nathan Bryant
2004-07-20 21:02                     ` Benjamin Herrenschmidt
2004-07-24 15:31                       ` Nathan Bryant
2004-07-24 16:00                         ` Benjamin Herrenschmidt
2004-07-24 16:45                           ` Nathan Bryant
2004-07-24 18:35                             ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox