LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
From: Srivatsa S. Bhat @ 2013-02-04 13:47 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	Srivatsa S. Bhat, netdev, namhyung, akpm, walken, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20130122073210.13822.50434.stgit@srivatsabhat.in.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3770 bytes --]

On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote:
> Hi,
> 
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).
> 
> All the users of preempt_disable()/local_irq_disable() who used to use it to
> prevent CPU offline, have been converted to the new primitives introduced in the
> patchset. Also, the CPU_DYING notifiers have been audited to check whether
> they can cope up with the removal of stop_machine() or whether they need to
> use new locks for synchronization (all CPU_DYING notifiers looked OK, without
> the need for any new locks).
> 
> Applies on v3.8-rc4. It currently has some locking issues with cpu idle (on
> which even lockdep didn't provide any insight unfortunately). So for now, it
> works with CONFIG_CPU_IDLE=n.
> 

I ran this patchset on a POWER 7 machine with 32 cores (128 logical CPUs)
[POWER doesn't have the cpu idle issue]. And the results (latency or the time
taken for a single CPU offline) are shown below.

Experiment:
----------

Run a heavy workload (genload from LTP) that generates significant system time;
With '# online CPUs' online, measure the time it takes to complete the stop-m/c
phase in mainline and the equivalent phase in the patched kernel for 1 CPU
offline operation. (It is important to note here that the measurement shows the
average time it takes to perform a *single* CPU offline operation).

Expected results:
----------------

Since stop-machine doesn't scale with no. of online CPUs, we expect the
mainline kernel to take longer and longer for taking 1 CPU offline, with
increasing no. of online CPUs. The patched kernel is expected to take a
constant amount of time, irrespective of the number of online CPUs, because it
has a scalable design.


Experimental results:
---------------------

                 Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c latency]

# online CPUs    Mainline (with stop-m/c)       This patchset (no stop-m/c)

      8                 17.04                          7.73

     16                 18.05                          6.44

     32                 17.31                          7.39

     64                 32.40                          9.28

    128                 98.23                          7.35


Analysis and conclusion:
------------------------

The patched kernel performs pretty well and meets our expectations. It beats
mainline easily. As shown in the table above and the graph attached with this
mail, it has the following advantages:

1. Avg. latency is less than mainline (roughly half that of even the least
   in mainline).

2. The avg. latency is a constant, irrespective of number of online CPUs in
   the system, which proves that the design/synchronization scheme is scalable.

3. Throughout the duration shown above, mainline disables interrupts on all
   CPUs. But the patched kernel not only has a smaller duration of hotplug,
   but also keeps interrupts enabled on other CPUs, which makes CPU offline
   less disruptive on latency-sensitive workloads running on the system.


So, this gives us an idea of how this patchset actually performs. Of course
there are bugs and issues that still need fixing (even mainline crashes with
hotplug sometimes), but I did the above experiment to verify whether the
design is working as expected and whether it really shows significant
improvements over mainline. And thankfully, it does :-)

Regards,
Srivatsa S. Bhat


[-- Attachment #2: CPU hotplug latency.png --]
[-- Type: image/png, Size: 172574 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 14:21 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130204124810.GB22096@kroah.com>

On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > Yes, but those are just remove events and we can only see how destructive they
> > > were after the removal.  The point is to be able to figure out whether or not
> > > we *want* to do the removal in the first place.
> > > 
> > > Say you have a computing node which signals a hardware problem in a processor
> > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > may want to eject that package, but you don't want to kill the system this
> > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > That may be costly, however (maybe weeks of computations), so it should be
> > > avoided if possible, but not at the expense of crashing the box if the eject
> > > doesn't work out.
> > 
> > It seems to me that we could handle that with the help of a new flag, say
> > "no_eject", in struct device, a global mutex, and a function that will walk
> > the given subtree of the device hierarchy and check if "no_eject" is set for
> > any devices in there.  Plus a global "no_eject" switch, perhaps.
> 
> I think this will always be racy, or at worst, slow things down on
> normal device operations as you will always be having to grab this flag
> whenever you want to do something new.

I don't see why this particular scheme should be racy, at least I don't see any
obvious races in it (although I'm not that good at races detection in general,
admittedly).

Also, I don't expect that flag to be used for everything, just for things known
to seriously break if forcible eject is done.  That may be not precise enough,
so that's a matter of defining its purpose more precisely.

We can do something like that on the ACPI level (ie. introduce a no_eject flag
in struct acpi_device and provide an iterface for the layers above ACPI to
manipulate it) but then devices without ACPI namespace objects won't be
covered.  That may not be a big deal, though.

So say dev is about to be used for something incompatible with ejecting, so to
speak.  Then, one would do platform_lock_eject(dev), which would check if dev
has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
platform_lock_eject(dev) would need to be checked to see if the device is not
gone.  If it returns success (0), one would do something to the device and
call platform_no_eject(dev) and then platform_unlock_eject(dev).

To clear no_eject one would just call platform_allow_to_eject(dev) that would
do all of the locking and clearing in one operation.

The ACPI eject side would be similar to the thing I described previously,
so it would (1) take acpi_eject_lock, (2) see if any struct acpi_device
involved has no_eject set and if not, then (3) do acpi_bus_trim(), (4)
carry out the eject and (5) release acpi_eject_lock.

Step (2) above might be optional, ie. if eject is forcible, we would just do
(3) etc. without (2).

The locking should prevent races from happening (and it should prevent two
ejects from happening at the same time too, which is not a bad thing by itself).

> See my comments earlier about pci hotplug and the design decisions there
> about "no eject" capabilities for why.

Well, I replied to that one too. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-04 14:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <2048116.Qo8UgQ5hjb@vostro.rjw.lan>

On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > Yes, but those are just remove events and we can only see how destructive they
> > > > were after the removal.  The point is to be able to figure out whether or not
> > > > we *want* to do the removal in the first place.
> > > > 
> > > > Say you have a computing node which signals a hardware problem in a processor
> > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > may want to eject that package, but you don't want to kill the system this
> > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > doesn't work out.
> > > 
> > > It seems to me that we could handle that with the help of a new flag, say
> > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > 
> > I think this will always be racy, or at worst, slow things down on
> > normal device operations as you will always be having to grab this flag
> > whenever you want to do something new.
> 
> I don't see why this particular scheme should be racy, at least I don't see any
> obvious races in it (although I'm not that good at races detection in general,
> admittedly).
> 
> Also, I don't expect that flag to be used for everything, just for things known
> to seriously break if forcible eject is done.  That may be not precise enough,
> so that's a matter of defining its purpose more precisely.
> 
> We can do something like that on the ACPI level (ie. introduce a no_eject flag
> in struct acpi_device and provide an iterface for the layers above ACPI to
> manipulate it) but then devices without ACPI namespace objects won't be
> covered.  That may not be a big deal, though.
> 
> So say dev is about to be used for something incompatible with ejecting, so to
> speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> platform_lock_eject(dev) would need to be checked to see if the device is not
> gone.  If it returns success (0), one would do something to the device and
> call platform_no_eject(dev) and then platform_unlock_eject(dev).

How does a device "know" it is doing something that is incompatible with
ejecting?  That's a non-trivial task from what I can tell.

What happens if a device wants to set that flag, right after it was told
to eject and the device was in the middle of being removed?  How can you
"fail" the "I can't be removed me now, so don't" requirement that it now
has?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2 1/1] powerpc/85xx: Board support for ppa8548
From: Timur Tabi @ 2013-02-04 15:30 UTC (permalink / raw)
  To: Stef van Os; +Cc: Scott Wood, Paul Mackerras, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1359920347-15340-1-git-send-email-stef.van.os@prodrive.nl>

On 02/03/2013 01:39 PM, Stef van Os wrote:

> +	pci0: pci@fe0008000 {
> +		status = "disabled";
> +	};
> +
> +	pci1: pci@fe0009000 {
> +		status = "disabled";
> +	};
> +
> +	pci2: pcie@fe000a000 {
> +		status = "disabled";
> +	};

I was hoping you'd follow my example and include a comment indicating 
why the PCI devices are all disabled.

> +static void ppa8548_show_cpuinfo(struct seq_file *m)
> +{
> +	uint svid, phid1;

Please don't used unsized integers for hardware registers.

	uint32_t svid, phid1;


-- 
Timur Tabi

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 16:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <5192355.CsKHU8mj3W@vostro.rjw.lan>

On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
  :
> > > Yes, but those are just remove events and we can only see how destructive they
> > > were after the removal.  The point is to be able to figure out whether or not
> > > we *want* to do the removal in the first place.
> > 
> > Yes, but, you will always race if you try to test to see if you can shut
> > down a device and then trying to do it.  So walking the bus ahead of
> > time isn't a good idea.
> >
> > And, we really don't have a viable way to recover if disconnect() fails,
> > do we.  What do we do in that situation, restore the other devices we
> > disconnected successfully?  How do we remember/know what they were?
> > 
> > PCI hotplug almost had this same problem until the designers finally
> > realized that they just had to accept the fact that removing a PCI
> > device could either happen by:
> > 	- a user yanking out the device, at which time the OS better
> > 	  clean up properly no matter what happens
> > 	- the user asked nicely to remove a device, and the OS can take
> > 	  as long as it wants to complete that action, including
> > 	  stalling for noticable amounts of time before eventually,
> > 	  always letting the action succeed.
> > 
> > I think the second thing is what you have to do here.  If a user tells
> > the OS it wants to remove these devices, you better do it.  If you
> > can't, because memory is being used by someone else, either move them
> > off, or just hope that nothing bad happens, before the user gets
> > frustrated and yanks out the CPU/memory module themselves physically :)
> 
> Well, that we can't help, but sometimes users really *want* the OS to tell them
> if it is safe to unplug something at this particualr time (think about the
> Windows' "safe remove" feature for USB sticks, for example; that came out of
> users' demand AFAIR).
> 
> So in my opinion it would be good to give them an option to do "safe eject" or
> "forcible eject", whichever they prefer.

For system device hot-plug, it always needs to be "safe eject".  This
feature will be implemented on mission critical servers, which are
managed by professional IT folks.  Crashing a server causes serious
money to the business.

A user yanking out a system device won't happen, and it immediately
crashes the system if it is done.  So, we have nothing to do with this
case.  The 2nd case can hang the operation, waiting forever to proceed,
which is still a serious issue for enterprise customers.


> > > Say you have a computing node which signals a hardware problem in a processor
> > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > may want to eject that package, but you don't want to kill the system this
> > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > That may be costly, however (maybe weeks of computations), so it should be
> > > avoided if possible, but not at the expense of crashing the box if the eject
> > > doesn't work out.
> > 
> > These same "situations" came up for PCI hotplug, and I still say the
> > same resolution there holds true, as described above.  The user wants to
> > remove something, so let them do it.  They always know best, and get mad
> > at us if we think otherwise :)
> 
> Well, not necessarily.  Users sometimes really don't know what they are doing
> and want us to give them a hint.  My opinion is that if we can give them a
> hint, there's no reason not to.
> 
> > What does the ACPI spec say about this type of thing?  Surely the same
> > people that did the PCI Hotplug spec were consulted when doing this part
> > of the spec, right?  Yeah, I know, I can dream...
> 
> It's not very specific (as usual), but it gives hints. :-)
> 
> For example, there is the _OST method (Section 6.3.5 of ACPI 5) that we are
> supposed to use to notify the platform of ejection failures and there are
> status codes like "0x81: Device in use by application" or "0x82: Device busy"
> that can be used in there.  So definitely the authors took ejection failures
> for software-related reasons into consideration.

That is correct.  Also, ACPI spec deliberately does not define
implementation details, so we defined DIG64 hotplug spec below (which I
contributed to the spec.)

http://www.dig64.org/home/DIG64_HPPF_R1_0.pdf

For example, Figure 2 in page 14 states memory hot-remove flow.  The
operation needs to either succeed or fail.  Crash or hang is not an
option.


Thanks,
-Toshi

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 16:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <2048116.Qo8UgQ5hjb@vostro.rjw.lan>

On Mon, 2013-02-04 at 15:21 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > Yes, but those are just remove events and we can only see how destructive they
> > > > were after the removal.  The point is to be able to figure out whether or not
> > > > we *want* to do the removal in the first place.
> > > > 
> > > > Say you have a computing node which signals a hardware problem in a processor
> > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > may want to eject that package, but you don't want to kill the system this
> > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > doesn't work out.
> > > 
> > > It seems to me that we could handle that with the help of a new flag, say
> > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > 
> > I think this will always be racy, or at worst, slow things down on
> > normal device operations as you will always be having to grab this flag
> > whenever you want to do something new.
> 
> I don't see why this particular scheme should be racy, at least I don't see any
> obvious races in it (although I'm not that good at races detection in general,
> admittedly).
> 
> Also, I don't expect that flag to be used for everything, just for things known
> to seriously break if forcible eject is done.  That may be not precise enough,
> so that's a matter of defining its purpose more precisely.
> 
> We can do something like that on the ACPI level (ie. introduce a no_eject flag
> in struct acpi_device and provide an iterface for the layers above ACPI to
> manipulate it) but then devices without ACPI namespace objects won't be
> covered.  That may not be a big deal, though.

I am afraid that bringing the device status management into the ACPI
level would not a good idea.  acpi_device should only reflect ACPI
device object information, not how its actual device is being used.

I like your initiative of acpi_scan_driver and I think scanning /
trimming of ACPI object info is what the ACPI drivers should do.


Thanks,
-Toshi

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/85xx: Board support for ppa8548
From: Scott Wood @ 2013-02-04 16:34 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Stef van Os, Paul Mackerras, linuxppc-dev
In-Reply-To: <CAOZdJXU6OFu1878s9rrNws3uqT_F7-GFS95ZJT5EZmw35=6hbQ@mail.gmail.com>

On 02/01/2013 10:34:44 PM, Timur Tabi wrote:
> On Fri, Feb 1, 2013 at 6:31 PM, Scott Wood <scottwood@freescale.com> =20
> wrote:
> >
> > I guess the reason you're not using fsl/mpc8548si-post.dtsi is that =20
> you
> > don't want PCI.  Maybe PCI and srio should be moved out of that =20
> file, or
> > ifdeffed if 85xx ever ends up using the preprocessor for its device =20
> trees.
>=20
> Wouldn't it be easier to add status=3D"disabled" in this dts file?  I do
> something similar with the LBC on the p1022rdk.
>=20
> 	board_lbc: lbc: localbus@ffe05000 {
> 		/* The P1022 RDK does not have any localbus devices */
> 		status =3D "disabled";
> 	};

Yeah, that'd work.

-Scott=

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 16:46 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-mm@kvack.org, yinghai@kernel.org,
	linux-kernel@vger.kernel.org, Rafael J. Wysocki,
	linux-acpi@vger.kernel.org, isimatu.yasuaki@jp.fujitsu.com,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <20130204124612.GA22096@kroah.com>

On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > 
> > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > help from the driver core here.
> > > > > > 
> > > > > > There are three different approaches suggested for system device
> > > > > > hot-plug:
> > > > > >  A. Proceed within system device bus scan.
> > > > > >  B. Proceed within ACPI bus scan.
> > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > 
> > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > 
> > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > clarifies why I am suggesting option 3.
> > > > > > 
> > > > > > 1. What are the system devices?
> > > > > > System devices provide system-wide core computing resources, which are
> > > > > > essential to compose a computer system.  System devices are not
> > > > > > connected to any particular standard buses.
> > > > > 
> > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > standard busses".  All this means is that system devices are connected
> > > > > to the "system" bus, nothing more.
> > > > 
> > > > Can you give me a few examples of other devices that support hotplug and
> > > > are not connected to any particular buses?  I will investigate them to
> > > > see how they are managed to support hotplug.
> > > 
> > > Any device that is attached to any bus in the driver model can be
> > > hotunplugged from userspace by telling it to be "unbound" from the
> > > driver controlling it.  Try it for any platform device in your system to
> > > see how it happens.
> > 
> > The unbind operation, as I understand from you, is to detach a driver
> > from a device.  Yes, unbinding can be done for any devices.  It is
> > however different from hot-plug operation, which unplugs a device.
> 
> Physically, yes, but to the driver involved, and the driver core, there
> is no difference.  That was one of the primary goals of the driver core
> creation so many years ago.
> 
> > Today, the unbind operation to an ACPI cpu/memory devices causes
> > hot-unplug (offline) operation to them, which is one of the major issues
> > for us since unbind cannot fail.  This patchset addresses this issue by
> > making the unbind operation of ACPI cpu/memory devices to do the
> > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > are supposed to be controlled by their drivers, cpu and memory modules.
> 
> I think that's the problem right there, solve that, please.

We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
can limit the ACPI drivers to do the scanning stuff only.   This is
precisely the intend of this patchset.  The real stuff, removing actual
devices, is done by the system device drivers/modules.


> > > > > > 2. Why are the system devices special?
> > > > > > The system devices are initialized during early boot-time, by multiple
> > > > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > > > provide low-level services to enable other subsystems to come up.
> > > > > 
> > > > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > > > for the point of view of the driver model from any other device or bus.
> > > > 
> > > > I think system devices are unique in a sense that they are initialized
> > > > before drivers run.
> > > 
> > > No, most all devices are "initialized" before a driver runs on it, USB
> > > is one such example, PCI another, and I'm pretty sure that there are
> > > others.
> > 
> > USB devices can be initialized after the USB bus driver is initialized.
> > Similarly, PCI devices can be initialized after the PCI bus driver is
> > initialized.  However, CPU and memory are initialized without any
> > dependency to their bus driver since there is no such thing.
> 
> You can create such a thing if you want :)

Well, a pseudo driver could be created for it, but it does not make any
difference.  Access to CPU and memory does not go thru any bus
controller visible to the OS.  CPU and memory are connected with links
(which are up at begging) and do not have bus structure any more.


> > In addition, CPU and memory have two drivers -- their actual
> > drivers/subsystems and their ACPI drivers.
> 
> Again, I feel that is the root of the problem.  Rafael seems to be
> working on solving this, which I think is essencial to your work as
> well.

Yes, Rafael is doing excellent work to turn ACPI drivers into ACPI
"scan" drivers, removing device driver portion, and keeping them as
attach / detach operation to ACPI device object.  My patchset is very
much aligned with this direction. :)


Thanks,
-Toshi

^ permalink raw reply

* [tip:irq/core] arch Kconfig: Remove references to IRQ_PER_CPU
From: tip-bot for James Hogan @ 2013-02-04 18:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-mips, tony.luck, vapier, lethal, deller, linux-kernel, ralf,
	jejb, fenghua.yu, james.hogan, paulus, hpa, uclinux-dist-devel,
	tglx, rkuo, linuxppc-dev, mingo
In-Reply-To: <1359972583-17134-1-git-send-email-james.hogan@imgtec.com>

Commit-ID:  f7c819c020db9796ae3a662b82a310617f92b15b
Gitweb:     http://git.kernel.org/tip/f7c819c020db9796ae3a662b82a310617f92b15b
Author:     James Hogan <james.hogan@imgtec.com>
AuthorDate: Mon, 4 Feb 2013 10:09:43 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 4 Feb 2013 18:53:20 +0100

arch Kconfig: Remove references to IRQ_PER_CPU

The IRQ_PER_CPU Kconfig symbol was removed in the following commit:

Commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 ("genirq: Remove
CONFIG_IRQ_PER_CPU") merged in v2.6.39-rc1.

But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
files where it was defined or selected. It's completely unused so remove
the remaining references.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: <uclinux-dist-devel@blackfin.uclinux.org>
Cc: <linux-mips@linux-mips.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Link: http://lkml.kernel.org/r/1359972583-17134-1-git-send-email-james.hogan@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/blackfin/Kconfig | 1 -
 arch/hexagon/Kconfig  | 1 -
 arch/ia64/Kconfig     | 1 -
 arch/mips/Kconfig     | 1 -
 arch/parisc/Kconfig   | 1 -
 arch/powerpc/Kconfig  | 1 -
 arch/sh/Kconfig       | 3 ---
 7 files changed, 9 deletions(-)

diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index 86f891f..67e4aaa 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -37,7 +37,6 @@ config BLACKFIN
 	select HAVE_GENERIC_HARDIRQS
 	select GENERIC_ATOMIC64
 	select GENERIC_IRQ_PROBE
-	select IRQ_PER_CPU if SMP
 	select USE_GENERIC_SMP_HELPERS if SMP
 	select HAVE_NMI_WATCHDOG if NMI_WATCHDOG
 	select GENERIC_SMP_IDLE_THREAD
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 40a3185..e4decc6 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -12,7 +12,6 @@ config HEXAGON
 	# select ARCH_WANT_OPTIONAL_GPIOLIB
 	# select ARCH_REQUIRE_GPIOLIB
 	# select HAVE_CLK
-	# select IRQ_PER_CPU
 	# select GENERIC_PENDING_IRQ if SMP
 	select GENERIC_ATOMIC64
 	select HAVE_PERF_EVENTS
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 3279646..00c2e88 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -29,7 +29,6 @@ config IA64
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PENDING_IRQ if SMP
-	select IRQ_PER_CPU
 	select GENERIC_IRQ_SHOW
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 121ed51..9becc44 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2160,7 +2160,6 @@ source "mm/Kconfig"
 config SMP
 	bool "Multi-Processing support"
 	depends on SYS_SUPPORTS_SMP
-	select IRQ_PER_CPU
 	select USE_GENERIC_SMP_HELPERS
 	help
 	  This enables support for systems with more than one CPU. If you have
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 9dd5c18..a32e34e 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -15,7 +15,6 @@ config PARISC
 	select BROKEN_RODATA
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PCI_IOMAP
-	select IRQ_PER_CPU
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d45edca..561ccca 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -124,7 +124,6 @@ config PPC
 	select HAVE_GENERIC_HARDIRQS
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select SPARSE_IRQ
-	select IRQ_PER_CPU
 	select IRQ_DOMAIN
 	select GENERIC_IRQ_SHOW
 	select GENERIC_IRQ_SHOW_LEVEL
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 996e008..9c833c5 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -90,9 +90,6 @@ config GENERIC_CSUM
 config GENERIC_HWEIGHT
 	def_bool y
 
-config IRQ_PER_CPU
-	def_bool y
-
 config GENERIC_GPIO
 	def_bool n
 

^ permalink raw reply related

* Re: [PATCH RESEND 1/1] arch Kconfig: remove references to IRQ_PER_CPU
From: James Bottomley @ 2013-02-04 18:30 UTC (permalink / raw)
  To: James Hogan
  Cc: linux-arch, linux-mips, linux-ia64, linux-parisc, linux-sh,
	linux-hexagon, Helge Deller, linux-kernel, Fenghua Yu,
	James E.J. Bottomley, Paul Mundt, Mike Frysinger, Ralf Baechle,
	uclinux-dist-devel, Thomas Gleixner, linuxppc-dev, Paul Mackerras
In-Reply-To: <1359972583-17134-1-git-send-email-james.hogan@imgtec.com>

On Mon, 2013-02-04 at 10:09 +0000, James Hogan wrote:
> The IRQ_PER_CPU Kconfig symbol was removed in the following commit:
> 
> Commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 ("genirq: Remove
> CONFIG_IRQ_PER_CPU") merged in v2.6.39-rc1.
> 
> But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
> files where it was defined or selected. It's completely unused so remove
> the remaining references.
> 
> Signed-off-by: James Hogan <james.hogan@imgtec.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Paul Mundt <lethal@linux-sh.org>
> Acked-by: Tony Luck <tony.luck@intel.com>
> Acked-by: Richard Kuo <rkuo@codeaurora.org>

For what it's worth ACK, but I don't really think you need it since the
patch is trivial and obviously correct.

> 
> Does anybody want to pick this patch up?

I see Thomas already has.  Thanks, by the way, for not doing this as one
patch per architecture ...

James

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 19:43 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1359994749.23410.113.camel@misato.fc.hp.com>

On Monday, February 04, 2013 09:19:09 AM Toshi Kani wrote:
> On Mon, 2013-02-04 at 15:21 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > we *want* to do the removal in the first place.
> > > > > 
> > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > may want to eject that package, but you don't want to kill the system this
> > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > doesn't work out.
> > > > 
> > > > It seems to me that we could handle that with the help of a new flag, say
> > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > 
> > > I think this will always be racy, or at worst, slow things down on
> > > normal device operations as you will always be having to grab this flag
> > > whenever you want to do something new.
> > 
> > I don't see why this particular scheme should be racy, at least I don't see any
> > obvious races in it (although I'm not that good at races detection in general,
> > admittedly).
> > 
> > Also, I don't expect that flag to be used for everything, just for things known
> > to seriously break if forcible eject is done.  That may be not precise enough,
> > so that's a matter of defining its purpose more precisely.
> > 
> > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > in struct acpi_device and provide an iterface for the layers above ACPI to
> > manipulate it) but then devices without ACPI namespace objects won't be
> > covered.  That may not be a big deal, though.
> 
> I am afraid that bringing the device status management into the ACPI
> level would not a good idea.  acpi_device should only reflect ACPI
> device object information, not how its actual device is being used.
> 
> I like your initiative of acpi_scan_driver and I think scanning /
> trimming of ACPI object info is what the ACPI drivers should do.

ACPI drivers, yes, but the users of ACPI already rely on information
in struct acpi_device.  Like ACPI device power states, for example.

So platform_no_eject(dev) is not much different in that respect from
platform_pci_set_power_state(pci_dev).

The whole "eject" concept is somewhat ACPI-specific, though, and the eject
notifications come from ACPI, so I don't have a problem with limiting it to
ACPI-backed devices for the time being.

If it turns out the be useful outside of ACPI, then we can move it up to the
driver core.  For now I don't see a compelling reason to do that.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 19:45 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-acpi@vger.kernel.org, Greg KH,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <1359996378.23410.130.camel@misato.fc.hp.com>

On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > 
> > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > help from the driver core here.
> > > > > > > 
> > > > > > > There are three different approaches suggested for system device
> > > > > > > hot-plug:
> > > > > > >  A. Proceed within system device bus scan.
> > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > 
> > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > 
> > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > clarifies why I am suggesting option 3.
> > > > > > > 
> > > > > > > 1. What are the system devices?
> > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > connected to any particular standard buses.
> > > > > > 
> > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > standard busses".  All this means is that system devices are connected
> > > > > > to the "system" bus, nothing more.
> > > > > 
> > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > are not connected to any particular buses?  I will investigate them to
> > > > > see how they are managed to support hotplug.
> > > > 
> > > > Any device that is attached to any bus in the driver model can be
> > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > driver controlling it.  Try it for any platform device in your system to
> > > > see how it happens.
> > > 
> > > The unbind operation, as I understand from you, is to detach a driver
> > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > however different from hot-plug operation, which unplugs a device.
> > 
> > Physically, yes, but to the driver involved, and the driver core, there
> > is no difference.  That was one of the primary goals of the driver core
> > creation so many years ago.
> > 
> > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > hot-unplug (offline) operation to them, which is one of the major issues
> > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > making the unbind operation of ACPI cpu/memory devices to do the
> > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > are supposed to be controlled by their drivers, cpu and memory modules.
> > 
> > I think that's the problem right there, solve that, please.
> 
> We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> can limit the ACPI drivers to do the scanning stuff only.   This is
> precisely the intend of this patchset.  The real stuff, removing actual
> devices, is done by the system device drivers/modules.

In case you haven't realized that yet, the $subject patchset has no future.

Let's just talk about how we can get what we need in more general terms.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 19:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1359993766.23410.105.camel@misato.fc.hp.com>

On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
>   :
> > > > Yes, but those are just remove events and we can only see how destructive they
> > > > were after the removal.  The point is to be able to figure out whether or not
> > > > we *want* to do the removal in the first place.
> > > 
> > > Yes, but, you will always race if you try to test to see if you can shut
> > > down a device and then trying to do it.  So walking the bus ahead of
> > > time isn't a good idea.
> > >
> > > And, we really don't have a viable way to recover if disconnect() fails,
> > > do we.  What do we do in that situation, restore the other devices we
> > > disconnected successfully?  How do we remember/know what they were?
> > > 
> > > PCI hotplug almost had this same problem until the designers finally
> > > realized that they just had to accept the fact that removing a PCI
> > > device could either happen by:
> > > 	- a user yanking out the device, at which time the OS better
> > > 	  clean up properly no matter what happens
> > > 	- the user asked nicely to remove a device, and the OS can take
> > > 	  as long as it wants to complete that action, including
> > > 	  stalling for noticable amounts of time before eventually,
> > > 	  always letting the action succeed.
> > > 
> > > I think the second thing is what you have to do here.  If a user tells
> > > the OS it wants to remove these devices, you better do it.  If you
> > > can't, because memory is being used by someone else, either move them
> > > off, or just hope that nothing bad happens, before the user gets
> > > frustrated and yanks out the CPU/memory module themselves physically :)
> > 
> > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > if it is safe to unplug something at this particualr time (think about the
> > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > users' demand AFAIR).
> > 
> > So in my opinion it would be good to give them an option to do "safe eject" or
> > "forcible eject", whichever they prefer.
> 
> For system device hot-plug, it always needs to be "safe eject".  This
> feature will be implemented on mission critical servers, which are
> managed by professional IT folks.  Crashing a server causes serious
> money to the business.

Well, "always" is a bit too strong a word as far as human behavior is concerned
in my opinion.

That said I would be perfectly fine with not supporting the "forcible eject" to
start with and waiting for the first request to add support for it.  I also
would be fine with taking bets on how much time it's going to take for such a
request to appear. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC/PATCH 29/32] usb: gadget: pxa27x_udc: let udc-core manage gadget->dev
From: Felipe Balbi @ 2013-02-04 19:53 UTC (permalink / raw)
  To: Robert Jarzmik
  Cc: kgene.kim, eric.y.miao, kuninori.morimoto.gx, alexander.shishkin,
	gregkh, yoshihiro.shimoda.uh, Linux USB Mailing List,
	nicolas.ferre, linux-geode, Felipe Balbi, linux-samsung-soc,
	haojian.zhuang, ben-linux, dahlmann.thomas, linux,
	Linux OMAP Mailing List, linuxppc-dev, linux-arm-kernel
In-Reply-To: <87txq1m57u.fsf@free.fr>

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Mon, Jan 28, 2013 at 09:18:29PM +0100, Robert Jarzmik wrote:
> Felipe Balbi <balbi@ti.com> writes:
> 
> > By simply setting a flag, we can drop some
> > boilerplate code.
> >
> > Signed-off-by: Felipe Balbi <balbi@ti.com>
> > ---
> >  drivers/usb/gadget/pxa27x_udc.c | 9 +--------
> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
> 
> And I tested also your patch and it works in my environment. For next patches
> I'd like to be CCed for pxa27x_udc stuff as I'm maintaining that one since its
> beginning (and yes, I know, I didn't put that in MAINTAINERS ...).

you should add yourself to MAINTAINERS. Please send a patch to Greg when
you have time.

No need to prepare a tree, though. I just need you to give your Acked-by
and I'll queue the patches myself.

cheers

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 19:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1890597.vDojv7So7R@vostro.rjw.lan>

On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> >   :
> > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > we *want* to do the removal in the first place.
> > > > 
> > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > time isn't a good idea.
> > > >
> > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > do we.  What do we do in that situation, restore the other devices we
> > > > disconnected successfully?  How do we remember/know what they were?
> > > > 
> > > > PCI hotplug almost had this same problem until the designers finally
> > > > realized that they just had to accept the fact that removing a PCI
> > > > device could either happen by:
> > > > 	- a user yanking out the device, at which time the OS better
> > > > 	  clean up properly no matter what happens
> > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > 	  as long as it wants to complete that action, including
> > > > 	  stalling for noticable amounts of time before eventually,
> > > > 	  always letting the action succeed.
> > > > 
> > > > I think the second thing is what you have to do here.  If a user tells
> > > > the OS it wants to remove these devices, you better do it.  If you
> > > > can't, because memory is being used by someone else, either move them
> > > > off, or just hope that nothing bad happens, before the user gets
> > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > 
> > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > if it is safe to unplug something at this particualr time (think about the
> > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > users' demand AFAIR).
> > > 
> > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > "forcible eject", whichever they prefer.
> > 
> > For system device hot-plug, it always needs to be "safe eject".  This
> > feature will be implemented on mission critical servers, which are
> > managed by professional IT folks.  Crashing a server causes serious
> > money to the business.
> 
> Well, "always" is a bit too strong a word as far as human behavior is concerned
> in my opinion.
> 
> That said I would be perfectly fine with not supporting the "forcible eject" to
> start with and waiting for the first request to add support for it.  I also
> would be fine with taking bets on how much time it's going to take for such a
> request to appear. :-)

Sounds good.  In my experience, though, it actually takes a LONG time to
convince customers that "safe eject" is actually safe.  Enterprise
customers are so afraid of doing anything risky that might cause the
system to crash or hang due to some defect.  I would be very surprised
to see a customer asking for a force operation when we do not guarantee
its outcome.  I have not seen such enterprise customers yet.

Thanks,
-Toshi 

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 20:07 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130204143351.GA20119@kroah.com>

On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > we *want* to do the removal in the first place.
> > > > > 
> > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > may want to eject that package, but you don't want to kill the system this
> > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > doesn't work out.
> > > > 
> > > > It seems to me that we could handle that with the help of a new flag, say
> > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > 
> > > I think this will always be racy, or at worst, slow things down on
> > > normal device operations as you will always be having to grab this flag
> > > whenever you want to do something new.
> > 
> > I don't see why this particular scheme should be racy, at least I don't see any
> > obvious races in it (although I'm not that good at races detection in general,
> > admittedly).
> > 
> > Also, I don't expect that flag to be used for everything, just for things known
> > to seriously break if forcible eject is done.  That may be not precise enough,
> > so that's a matter of defining its purpose more precisely.
> > 
> > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > in struct acpi_device and provide an iterface for the layers above ACPI to
> > manipulate it) but then devices without ACPI namespace objects won't be
> > covered.  That may not be a big deal, though.
> > 
> > So say dev is about to be used for something incompatible with ejecting, so to
> > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > platform_lock_eject(dev) would need to be checked to see if the device is not
> > gone.  If it returns success (0), one would do something to the device and
> > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> 
> How does a device "know" it is doing something that is incompatible with
> ejecting?  That's a non-trivial task from what I can tell.

I agree that this is complicated in general.  But.

There are devices known to have software "offline" and "online" operations
such that after the "offline" the given device is guaranteed to be not used
until "online".  We have that for CPU cores, for example, and user space can
do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
"online" set the no_eject flag (under the lock as appropriate) and the
"offline" clear it?  And why don't we define such "online" and "offline" for
all of the other "system" stuff, like memory, PCI host bridges etc. and make it
behave analogously?

Then, it is quite simple to say which devices should use the no_eject flag:
devices that have "online" and "offline" exported to user space.  And guess
who's responsible for "offlining" all of those things before trying to eject
them: user space is.  From the kernel's point of view it is all clear.  Hands
clean. :-)

Now, there's a different problem how to expose all of the relevant information
to user space so that it knows what to "offline" for the specific eject
operation to succeed, but that's kind of separate and worth addressing
anyway.

> What happens if a device wants to set that flag, right after it was told
> to eject and the device was in the middle of being removed?  How can you
> "fail" the "I can't be removed me now, so don't" requirement that it now
> has?

This one is easy. :-)

If platform_lock_eject() is called when an eject is under way, it will block
on acpi_eject_lock until the eject is complete and if the device is gone as
a result of the eject, it will return an error code.

In turn, if an eject happens after platform_lock_eject(), it will block until
platform_unlock_eject() and if platform_no_eject() is called in between the
lock and unlock, it will notice the device with no_eject set and bail out.

Quite obviously, it would be a bug to call platform_lock_eject() from within an
eject code path.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: Why is the e500v2 core not using cpuidle?
From: Scott Wood @ 2013-02-04 20:02 UTC (permalink / raw)
  To: Thomas Waldecker
  Cc: Linux PPC dev mailing list (linuxppc-dev@lists.ozlabs.org)
In-Reply-To: <B88C075EE1324644BA0452D5EFDD5828139523EA@TQ-MAIL.tq-net.de>

On 02/02/2013 03:41:27 AM, Thomas Waldecker wrote:
> Hi Scott,
>=20
> >> Why is there no support for the cpuidle framework?
> > Because nobody implemented it. :-)
> That's the reason I thought before :-)
>=20
> > The only reason I can think of to implement it on this chip would =20
> be to
> > dynamically choose when to enter nap versus doze, rather than always
> > just using doze.  It's not clear whether the difference in power
> > savings is worth it -- do you have any way of measuring?
>=20
> Is the e500 only using doze? There are comments in the file
> arch/powerpc/kernel/idle_e500.S
> which are stating:
> /*  Go to NAP or DOZE now */
> or
> /* Return from NAP/DOZE ...*/
>=20
> and because of this comments I thought that both modes are in use.

e500 can use nap instead, but it's statically chosen via sysctl.  The =20
default is doze.  Entering nap requires flushing the cache, so you'd =20
only use nap if you care more about lowering idle power consumption =20
than performance, and you wake infrequently enough that you're not =20
burning more power on the cache flushes than you save with the deeper =20
idle state.

> I have a way of measuring the power and it is also a small part of my =20
> masterthesis,
> but it is not very meaningful because at the measuring point there =20
> are other peripheral
> components too.
>=20
> According to the comments can I activate the nap mode somehow?

echo 1 > /proc/sys/kernel/powersave-nap

If you're able to measure a meaningful difference between the two, I'd =20
be interested in hearing your results.

> >> How can I debug the e500 idle modes?
> >> Are there any statistics?
> > Top reports idle percentage...
> If the e500 and e500v2 are indeed using only the doze mode it
> would be enough statistics.

Whichever mode you have selected, that will be used for all idling.  =20
Statistics would only be useful if the idle mode were dynamically =20
chosen.

> Such statistics would be great for the doze, nap (and sleep for the =20
> whole package).

The only way you'll get into sleep mode is through /sys/power/state.

-Scott=

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 20:12 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1360007184.23410.139.camel@misato.fc.hp.com>

On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > >   :
> > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > we *want* to do the removal in the first place.
> > > > > 
> > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > time isn't a good idea.
> > > > >
> > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > 
> > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > realized that they just had to accept the fact that removing a PCI
> > > > > device could either happen by:
> > > > > 	- a user yanking out the device, at which time the OS better
> > > > > 	  clean up properly no matter what happens
> > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > 	  as long as it wants to complete that action, including
> > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > 	  always letting the action succeed.
> > > > > 
> > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > can't, because memory is being used by someone else, either move them
> > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > 
> > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > if it is safe to unplug something at this particualr time (think about the
> > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > users' demand AFAIR).
> > > > 
> > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > "forcible eject", whichever they prefer.
> > > 
> > > For system device hot-plug, it always needs to be "safe eject".  This
> > > feature will be implemented on mission critical servers, which are
> > > managed by professional IT folks.  Crashing a server causes serious
> > > money to the business.
> > 
> > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > in my opinion.
> > 
> > That said I would be perfectly fine with not supporting the "forcible eject" to
> > start with and waiting for the first request to add support for it.  I also
> > would be fine with taking bets on how much time it's going to take for such a
> > request to appear. :-)
> 
> Sounds good.  In my experience, though, it actually takes a LONG time to
> convince customers that "safe eject" is actually safe.  Enterprise
> customers are so afraid of doing anything risky that might cause the
> system to crash or hang due to some defect.  I would be very surprised
> to see a customer asking for a force operation when we do not guarantee
> its outcome.  I have not seen such enterprise customers yet.

But we're talking about a kernel that is supposed to run on mobile phones too,
among other things.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 20:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <23473445.t1DSaBm58X@vostro.rjw.lan>

On Mon, 2013-02-04 at 21:12 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > >   :
> > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > we *want* to do the removal in the first place.
> > > > > > 
> > > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > > time isn't a good idea.
> > > > > >
> > > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > > 
> > > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > > realized that they just had to accept the fact that removing a PCI
> > > > > > device could either happen by:
> > > > > > 	- a user yanking out the device, at which time the OS better
> > > > > > 	  clean up properly no matter what happens
> > > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > > 	  as long as it wants to complete that action, including
> > > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > > 	  always letting the action succeed.
> > > > > > 
> > > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > > can't, because memory is being used by someone else, either move them
> > > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > > 
> > > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > > if it is safe to unplug something at this particualr time (think about the
> > > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > > users' demand AFAIR).
> > > > > 
> > > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > > "forcible eject", whichever they prefer.
> > > > 
> > > > For system device hot-plug, it always needs to be "safe eject".  This
> > > > feature will be implemented on mission critical servers, which are
> > > > managed by professional IT folks.  Crashing a server causes serious
> > > > money to the business.
> > > 
> > > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > > in my opinion.
> > > 
> > > That said I would be perfectly fine with not supporting the "forcible eject" to
> > > start with and waiting for the first request to add support for it.  I also
> > > would be fine with taking bets on how much time it's going to take for such a
> > > request to appear. :-)
> > 
> > Sounds good.  In my experience, though, it actually takes a LONG time to
> > convince customers that "safe eject" is actually safe.  Enterprise
> > customers are so afraid of doing anything risky that might cause the
> > system to crash or hang due to some defect.  I would be very surprised
> > to see a customer asking for a force operation when we do not guarantee
> > its outcome.  I have not seen such enterprise customers yet.
> 
> But we're talking about a kernel that is supposed to run on mobile phones too,
> among other things.

I think using this feature for RAS i.e. replacing a faulty device
on-line, will continue to be limited for high-end systems.  For low-end
systems, it does not make sense for customers to pay much $$ for this
feature.  They can just shut the system down for replacement, or they
can simply buy a new system instead of repairing.

That said, using this feature on VM for workload balancing does not
require any special hardware.  So, I can see someone willing to try out
to see how it goes with a force option on VM for personal use.   

Thanks,
-Toshi


 

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 20:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-acpi@vger.kernel.org, Greg KH,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <3007489.fG0fDZGHrB@vostro.rjw.lan>

On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > 
> > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > help from the driver core here.
> > > > > > > > 
> > > > > > > > There are three different approaches suggested for system device
> > > > > > > > hot-plug:
> > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > 
> > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > 
> > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > 
> > > > > > > > 1. What are the system devices?
> > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > connected to any particular standard buses.
> > > > > > > 
> > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > to the "system" bus, nothing more.
> > > > > > 
> > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > see how they are managed to support hotplug.
> > > > > 
> > > > > Any device that is attached to any bus in the driver model can be
> > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > see how it happens.
> > > > 
> > > > The unbind operation, as I understand from you, is to detach a driver
> > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > however different from hot-plug operation, which unplugs a device.
> > > 
> > > Physically, yes, but to the driver involved, and the driver core, there
> > > is no difference.  That was one of the primary goals of the driver core
> > > creation so many years ago.
> > > 
> > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > 
> > > I think that's the problem right there, solve that, please.
> > 
> > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > can limit the ACPI drivers to do the scanning stuff only.   This is
> > precisely the intend of this patchset.  The real stuff, removing actual
> > devices, is done by the system device drivers/modules.
> 
> In case you haven't realized that yet, the $subject patchset has no future.

That's really disappointing, esp. the fact that this basic approach has
been proven to work on other OS for years...


> Let's just talk about how we can get what we need in more general terms.

So, are we heading to an approach of doing everything in ACPI?  I am not
clear about which direction we have agreed with or disagreed with.

As for the eject flag approach, I agree with Greg.


Thanks,
-Toshi

^ permalink raw reply

* Re: [PATCH v2] powerpc/512x: add function for chip select parameter configuration
From: Timur Tabi @ 2013-02-04 21:22 UTC (permalink / raw)
  To: Anatolij Gustschin; +Cc: linuxppc-dev
In-Reply-To: <1359972962-9379-1-git-send-email-agust@denx.de>

On Mon, Feb 4, 2013 at 4:16 AM, Anatolij Gustschin <agust@denx.de> wrote:
> Add ability to configure chip select (CS) parameters for devices
> that need different CS parameters setup after their configuration.
> I.e. an FPGA device on LP bus can require different CS parameters
> for its bus interface after loading firmware into it. A driver
> can easily reconfigure the LPC CS parameters using this function.
>
> Signed-off-by: Anatolij Gustschin <agust@denx.de>
> ---

Acked-by: Timur Tabi <timur@tabi.org>

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 22:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <5598823.8hjkkMP1h9@vostro.rjw.lan>

On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > we *want* to do the removal in the first place.
> > > > > > 
> > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > doesn't work out.
> > > > > 
> > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > 
> > > > I think this will always be racy, or at worst, slow things down on
> > > > normal device operations as you will always be having to grab this flag
> > > > whenever you want to do something new.
> > > 
> > > I don't see why this particular scheme should be racy, at least I don't see any
> > > obvious races in it (although I'm not that good at races detection in general,
> > > admittedly).
> > > 
> > > Also, I don't expect that flag to be used for everything, just for things known
> > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > so that's a matter of defining its purpose more precisely.
> > > 
> > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > manipulate it) but then devices without ACPI namespace objects won't be
> > > covered.  That may not be a big deal, though.
> > > 
> > > So say dev is about to be used for something incompatible with ejecting, so to
> > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > gone.  If it returns success (0), one would do something to the device and
> > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > 
> > How does a device "know" it is doing something that is incompatible with
> > ejecting?  That's a non-trivial task from what I can tell.
> 
> I agree that this is complicated in general.  But.
> 
> There are devices known to have software "offline" and "online" operations
> such that after the "offline" the given device is guaranteed to be not used
> until "online".  We have that for CPU cores, for example, and user space can
> do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> "online" set the no_eject flag (under the lock as appropriate) and the
> "offline" clear it?  And why don't we define such "online" and "offline" for
> all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> behave analogously?
> 
> Then, it is quite simple to say which devices should use the no_eject flag:
> devices that have "online" and "offline" exported to user space.  And guess
> who's responsible for "offlining" all of those things before trying to eject
> them: user space is.  From the kernel's point of view it is all clear.  Hands
> clean. :-)
> 
> Now, there's a different problem how to expose all of the relevant information
> to user space so that it knows what to "offline" for the specific eject
> operation to succeed, but that's kind of separate and worth addressing
> anyway.

So, the idea is to run a user space program that off-lines all relevant
devices before trimming ACPI devices.  Is that right?  That sounds like
a worth idea to consider with.  This basically moves the "sequencer"
part into user space instead of the kernel space in my proposal.  I
agree that how to expose all of the relevant info to user space is an
issue.  Also, we will need to make sure that the user program always
runs per a kernel request and then informs a result back to the kernel,
so that the kernel can do the rest of an operation and inform a result
to FW with _OST or _EJ0.  This loop has to close.  I think it is going
to be more complicated than the kernel-only approach.

In addition, I am not sure if the "no_eject" flag in acpi_device is
really necessary here since the user program will inform the kernel if
all devices are off-line.  Also, the kernel will likely need to expose
the device info to the user program to tell which devices need to be
off-lined.  At that time, the kernel already knows if there is any
on-line device in the scope.


> > What happens if a device wants to set that flag, right after it was told
> > to eject and the device was in the middle of being removed?  How can you
> > "fail" the "I can't be removed me now, so don't" requirement that it now
> > has?
> 
> This one is easy. :-)
> 
> If platform_lock_eject() is called when an eject is under way, it will block
> on acpi_eject_lock until the eject is complete and if the device is gone as
> a result of the eject, it will return an error code.

In this case, we do really need to make sure that the user program does
not get killed in the middle of its operation since the kernel is
holding a lock while it is under way.


Thanks,
-Toshi


> In turn, if an eject happens after platform_lock_eject(), it will block until
> platform_unlock_eject() and if platform_no_eject() is called in between the
> lock and unlock, it will notice the device with no_eject set and bail out.
> 
> Quite obviously, it would be a bug to call platform_lock_eject() from within an
> eject code path.
> 
> Thanks,
> Rafael
> 
> 

^ permalink raw reply

* Re: [PATCH v6 08/15] memory-hotplug: Common APIs to support page tables hot-remove
From: Andrew Morton @ 2013-02-04 23:04 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim,
	linuxppc-dev
In-Reply-To: <1357723959-5416-9-git-send-email-tangchen@cn.fujitsu.com>

On Wed, 9 Jan 2013 17:32:32 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> +static void __meminit
> +remove_pagetable(unsigned long start, unsigned long end, bool direct)
> +{
> +	unsigned long next;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	bool pgd_changed = false;
> +
> +	for (; start < end; start = next) {
> +		pgd = pgd_offset_k(start);
> +		if (!pgd_present(*pgd))
> +			continue;
> +
> +		next = pgd_addr_end(start, end);
> +
> +		pud = (pud_t *)map_low_page((pud_t *)pgd_page_vaddr(*pgd));
> +		remove_pud_table(pud, start, next, direct);
> +		if (free_pud_table(pud, pgd))
> +			pgd_changed = true;
> +		unmap_low_page(pud);
> +	}
> +
> +	if (pgd_changed)
> +		sync_global_pgds(start, end - 1);
> +
> +	flush_tlb_all();
> +}

This generates a compiler warning saying that `next' may be used
uninitialised.

The warning is correct.  If we take that `continue' on the first pass
through the loop, the "start = next" will copy uninitialised data into
`start'.

Is this the correct fix?

--- a/arch/x86/mm/init_64.c~memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix-fix-fix-fix-fix
+++ a/arch/x86/mm/init_64.c
@@ -993,12 +993,12 @@ remove_pagetable(unsigned long start, un
 	bool pgd_changed = false;
 
 	for (; start < end; start = next) {
+		next = pgd_addr_end(start, end);
+
 		pgd = pgd_offset_k(start);
 		if (!pgd_present(*pgd))
 			continue;
 
-		next = pgd_addr_end(start, end);
-
 		pud = (pud_t *)pgd_page_vaddr(*pgd);
 		remove_pud_table(pud, start, next, direct);
 		if (free_pud_table(pud, pgd))
_

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 23:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1360010058.23410.169.camel@misato.fc.hp.com>

On Monday, February 04, 2013 01:34:18 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 21:12 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > > > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > > >   :
> > > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > > we *want* to do the removal in the first place.
> > > > > > > 
> > > > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > > > time isn't a good idea.
> > > > > > >
> > > > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > > > 
> > > > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > > > realized that they just had to accept the fact that removing a PCI
> > > > > > > device could either happen by:
> > > > > > > 	- a user yanking out the device, at which time the OS better
> > > > > > > 	  clean up properly no matter what happens
> > > > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > > > 	  as long as it wants to complete that action, including
> > > > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > > > 	  always letting the action succeed.
> > > > > > > 
> > > > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > > > can't, because memory is being used by someone else, either move them
> > > > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > > > 
> > > > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > > > if it is safe to unplug something at this particualr time (think about the
> > > > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > > > users' demand AFAIR).
> > > > > > 
> > > > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > > > "forcible eject", whichever they prefer.
> > > > > 
> > > > > For system device hot-plug, it always needs to be "safe eject".  This
> > > > > feature will be implemented on mission critical servers, which are
> > > > > managed by professional IT folks.  Crashing a server causes serious
> > > > > money to the business.
> > > > 
> > > > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > > > in my opinion.
> > > > 
> > > > That said I would be perfectly fine with not supporting the "forcible eject" to
> > > > start with and waiting for the first request to add support for it.  I also
> > > > would be fine with taking bets on how much time it's going to take for such a
> > > > request to appear. :-)
> > > 
> > > Sounds good.  In my experience, though, it actually takes a LONG time to
> > > convince customers that "safe eject" is actually safe.  Enterprise
> > > customers are so afraid of doing anything risky that might cause the
> > > system to crash or hang due to some defect.  I would be very surprised
> > > to see a customer asking for a force operation when we do not guarantee
> > > its outcome.  I have not seen such enterprise customers yet.
> > 
> > But we're talking about a kernel that is supposed to run on mobile phones too,
> > among other things.
> 
> I think using this feature for RAS i.e. replacing a faulty device
> on-line, will continue to be limited for high-end systems.  For low-end
> systems, it does not make sense for customers to pay much $$ for this
> feature.  They can just shut the system down for replacement, or they
> can simply buy a new system instead of repairing.
> 
> That said, using this feature on VM for workload balancing does not
> require any special hardware.  So, I can see someone willing to try out
> to see how it goes with a force option on VM for personal use.   

Besides, SMP was a $$ "enterprise" feature not so long ago, so things tend to
change. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 23:23 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-acpi@vger.kernel.org, Greg KH,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <1360011567.23410.179.camel@misato.fc.hp.com>

On Monday, February 04, 2013 01:59:27 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > > 
> > > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > > help from the driver core here.
> > > > > > > > > 
> > > > > > > > > There are three different approaches suggested for system device
> > > > > > > > > hot-plug:
> > > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > > 
> > > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > > 
> > > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > > 
> > > > > > > > > 1. What are the system devices?
> > > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > > connected to any particular standard buses.
> > > > > > > > 
> > > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > > to the "system" bus, nothing more.
> > > > > > > 
> > > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > > see how they are managed to support hotplug.
> > > > > > 
> > > > > > Any device that is attached to any bus in the driver model can be
> > > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > > see how it happens.
> > > > > 
> > > > > The unbind operation, as I understand from you, is to detach a driver
> > > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > > however different from hot-plug operation, which unplugs a device.
> > > > 
> > > > Physically, yes, but to the driver involved, and the driver core, there
> > > > is no difference.  That was one of the primary goals of the driver core
> > > > creation so many years ago.
> > > > 
> > > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > > 
> > > > I think that's the problem right there, solve that, please.
> > > 
> > > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > > can limit the ACPI drivers to do the scanning stuff only.   This is
> > > precisely the intend of this patchset.  The real stuff, removing actual
> > > devices, is done by the system device drivers/modules.
> > 
> > In case you haven't realized that yet, the $subject patchset has no future.
> 
> That's really disappointing, esp. the fact that this basic approach has
> been proven to work on other OS for years...
> 
> 
> > Let's just talk about how we can get what we need in more general terms.
> 
> So, are we heading to an approach of doing everything in ACPI?  I am not
> clear about which direction we have agreed with or disagreed with.
> 
> As for the eject flag approach, I agree with Greg.

Well, I'm not sure which of the Greg's thoughts you agree with. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox