LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 23:23 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-acpi@vger.kernel.org, Greg KH,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <1360011567.23410.179.camel@misato.fc.hp.com>

On Monday, February 04, 2013 01:59:27 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > > 
> > > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > > help from the driver core here.
> > > > > > > > > 
> > > > > > > > > There are three different approaches suggested for system device
> > > > > > > > > hot-plug:
> > > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > > 
> > > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > > 
> > > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > > 
> > > > > > > > > 1. What are the system devices?
> > > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > > connected to any particular standard buses.
> > > > > > > > 
> > > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > > to the "system" bus, nothing more.
> > > > > > > 
> > > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > > see how they are managed to support hotplug.
> > > > > > 
> > > > > > Any device that is attached to any bus in the driver model can be
> > > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > > see how it happens.
> > > > > 
> > > > > The unbind operation, as I understand from you, is to detach a driver
> > > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > > however different from hot-plug operation, which unplugs a device.
> > > > 
> > > > Physically, yes, but to the driver involved, and the driver core, there
> > > > is no difference.  That was one of the primary goals of the driver core
> > > > creation so many years ago.
> > > > 
> > > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > > 
> > > > I think that's the problem right there, solve that, please.
> > > 
> > > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > > can limit the ACPI drivers to do the scanning stuff only.   This is
> > > precisely the intend of this patchset.  The real stuff, removing actual
> > > devices, is done by the system device drivers/modules.
> > 
> > In case you haven't realized that yet, the $subject patchset has no future.
> 
> That's really disappointing, esp. the fact that this basic approach has
> been proven to work on other OS for years...
> 
> 
> > Let's just talk about how we can get what we need in more general terms.
> 
> So, are we heading to an approach of doing everything in ACPI?  I am not
> clear about which direction we have agreed with or disagreed with.
> 
> As for the eject flag approach, I agree with Greg.

Well, I'm not sure which of the Greg's thoughts you agree with. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-04 23:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-acpi@vger.kernel.org, Greg KH,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <1910026.S9WaQTy2uW@vostro.rjw.lan>

On Tue, 2013-02-05 at 00:23 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 01:59:27 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > > > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > > > 
> > > > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > > > help from the driver core here.
> > > > > > > > > > 
> > > > > > > > > > There are three different approaches suggested for system device
> > > > > > > > > > hot-plug:
> > > > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > > > 
> > > > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > > > 
> > > > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > > > 
> > > > > > > > > > 1. What are the system devices?
> > > > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > > > connected to any particular standard buses.
> > > > > > > > > 
> > > > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > > > to the "system" bus, nothing more.
> > > > > > > > 
> > > > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > > > see how they are managed to support hotplug.
> > > > > > > 
> > > > > > > Any device that is attached to any bus in the driver model can be
> > > > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > > > see how it happens.
> > > > > > 
> > > > > > The unbind operation, as I understand from you, is to detach a driver
> > > > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > > > however different from hot-plug operation, which unplugs a device.
> > > > > 
> > > > > Physically, yes, but to the driver involved, and the driver core, there
> > > > > is no difference.  That was one of the primary goals of the driver core
> > > > > creation so many years ago.
> > > > > 
> > > > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > > > 
> > > > > I think that's the problem right there, solve that, please.
> > > > 
> > > > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > > > can limit the ACPI drivers to do the scanning stuff only.   This is
> > > > precisely the intend of this patchset.  The real stuff, removing actual
> > > > devices, is done by the system device drivers/modules.
> > > 
> > > In case you haven't realized that yet, the $subject patchset has no future.
> > 
> > That's really disappointing, esp. the fact that this basic approach has
> > been proven to work on other OS for years...
> > 
> > 
> > > Let's just talk about how we can get what we need in more general terms.
> > 
> > So, are we heading to an approach of doing everything in ACPI?  I am not
> > clear about which direction we have agreed with or disagreed with.
> > 
> > As for the eject flag approach, I agree with Greg.
> 
> Well, I'm not sure which of the Greg's thoughts you agree with. :-)

Sorry, that was the Greg's comment below.  But then, I saw your other
email clarifying that the no_eject flag only reflects online/offline
status, not how the device is being used.  So, I replied with my
thoughts in a separate email. :)

===
How does a device "know" it is doing something that is incompatible with
ejecting?  That's a non-trivial task from what I can tell.
===

Thanks,
-Toshi

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-04 23:52 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1360016009.23410.213.camel@misato.fc.hp.com>

On Monday, February 04, 2013 03:13:29 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > we *want* to do the removal in the first place.
> > > > > > > 
> > > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > > doesn't work out.
> > > > > > 
> > > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > > 
> > > > > I think this will always be racy, or at worst, slow things down on
> > > > > normal device operations as you will always be having to grab this flag
> > > > > whenever you want to do something new.
> > > > 
> > > > I don't see why this particular scheme should be racy, at least I don't see any
> > > > obvious races in it (although I'm not that good at races detection in general,
> > > > admittedly).
> > > > 
> > > > Also, I don't expect that flag to be used for everything, just for things known
> > > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > > so that's a matter of defining its purpose more precisely.
> > > > 
> > > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > > manipulate it) but then devices without ACPI namespace objects won't be
> > > > covered.  That may not be a big deal, though.
> > > > 
> > > > So say dev is about to be used for something incompatible with ejecting, so to
> > > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > > gone.  If it returns success (0), one would do something to the device and
> > > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > > 
> > > How does a device "know" it is doing something that is incompatible with
> > > ejecting?  That's a non-trivial task from what I can tell.
> > 
> > I agree that this is complicated in general.  But.
> > 
> > There are devices known to have software "offline" and "online" operations
> > such that after the "offline" the given device is guaranteed to be not used
> > until "online".  We have that for CPU cores, for example, and user space can
> > do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> > "online" set the no_eject flag (under the lock as appropriate) and the
> > "offline" clear it?  And why don't we define such "online" and "offline" for
> > all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> > behave analogously?
> > 
> > Then, it is quite simple to say which devices should use the no_eject flag:
> > devices that have "online" and "offline" exported to user space.  And guess
> > who's responsible for "offlining" all of those things before trying to eject
> > them: user space is.  From the kernel's point of view it is all clear.  Hands
> > clean. :-)
> > 
> > Now, there's a different problem how to expose all of the relevant information
> > to user space so that it knows what to "offline" for the specific eject
> > operation to succeed, but that's kind of separate and worth addressing
> > anyway.
> 
> So, the idea is to run a user space program that off-lines all relevant
> devices before trimming ACPI devices.  Is that right?  That sounds like
> a worth idea to consider with.  This basically moves the "sequencer"
> part into user space instead of the kernel space in my proposal.  I
> agree that how to expose all of the relevant info to user space is an
> issue.  Also, we will need to make sure that the user program always
> runs per a kernel request and then informs a result back to the kernel,
> so that the kernel can do the rest of an operation and inform a result
> to FW with _OST or _EJ0.  This loop has to close.  I think it is going
> to be more complicated than the kernel-only approach.

I actually didn't think about that.  The point is that trying to offline
everything *synchronously* may just be pointless, because it may be
offlined upfront, before the eject is even requested.  So the sequence
would be to first offline things that we'll want to eject from user space
and then to send the eject request (e.g. via sysfs too).

Eject requests from eject buttons and things like that may just fail if
some components involved that should be offline are online.  The fact that
we might be able to offline them synchronously if we tried doesn't matter,
pretty much as it doesn't matter for hot-swappable disks.

You'd probably never try to hot-remove a disk before unmounting filesystems
mounted from it or failing it as a RAID component and nobody sane wants the
kernel to do things like that automatically when the user presses the eject
button.  In my opinion we should treat memory eject, or CPU package eject, or
PCI host bridge eject in exactly the same way: Don't eject if it is not
prepared for ejecting in the first place.

And if you think about it, that makes things *massively* simpler, because now
the kernel doesn't heed to worry about all of those "synchronous removal"
scenarions that very well may involve every single device in the system and
the whole problem is nicely split into several separate "implement
offline/online" problems that are subsystem-specific and a single
"eject if everything relevant is offline" problem which is kind of trivial.
Plus the one of exposing information to user space, which is separate too.

Now, each of them can be worked on separately, *tested* separately and
debugged separately if need be and it is much easier to isolate failures
and so on.

> In addition, I am not sure if the "no_eject" flag in acpi_device is
> really necessary here since the user program will inform the kernel if
> all devices are off-line.  Also, the kernel will likely need to expose
> the device info to the user program to tell which devices need to be
> off-lined.  At that time, the kernel already knows if there is any
> on-line device in the scope.

Well, that depends no what "the kernel" means and how it knows that.  Surely
the "online" components have to be marked somehow so that it is easy to check
if they are in the scope in the subsystem-independent way, so why don't we use
something like the no_eject flag for that?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-05  0:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <7003418.onqVlaaHJS@vostro.rjw.lan>

On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> You'd probably never try to hot-remove a disk before unmounting filesystems
> mounted from it or failing it as a RAID component and nobody sane wants the
> kernel to do things like that automatically when the user presses the eject
> button.  In my opinion we should treat memory eject, or CPU package eject, or
> PCI host bridge eject in exactly the same way: Don't eject if it is not
> prepared for ejecting in the first place.

Bad example, we have disks hot-removed all the time without any
filesystems being unmounted, and have supported this since the 2.2 days
(although we didn't get it "right" until 2.6.)

PCI Host bridge eject is the same as PCI eject today, the user asks us
to do it, and we can not fail it from happening.  We also can have them
removed without us being told about it in the first place, and can
properly clean up from it all.

> And if you think about it, that makes things *massively* simpler, because now
> the kernel doesn't heed to worry about all of those "synchronous removal"
> scenarions that very well may involve every single device in the system and
> the whole problem is nicely split into several separate "implement
> offline/online" problems that are subsystem-specific and a single
> "eject if everything relevant is offline" problem which is kind of trivial.
> Plus the one of exposing information to user space, which is separate too.
> 
> Now, each of them can be worked on separately, *tested* separately and
> debugged separately if need be and it is much easier to isolate failures
> and so on.

So you are agreeing with me in that we can not fail hot removing any
device, nice :)

greg k-h

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-05  1:02 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130205000447.GA21782@kroah.com>

On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > You'd probably never try to hot-remove a disk before unmounting filesystems
> > mounted from it or failing it as a RAID component and nobody sane wants the
> > kernel to do things like that automatically when the user presses the eject
> > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > prepared for ejecting in the first place.
> 
> Bad example, we have disks hot-removed all the time without any
> filesystems being unmounted, and have supported this since the 2.2 days
> (although we didn't get it "right" until 2.6.)

Well, that wasn't my point.

My point was that we have tools for unmounting filesystems from disks that
the user wants to hot-remove and the user is supposed to use those tools
before hot-removing the disks.  At least I wouldn't recommend anyone to
do otherwise. :-)

Now, for memory hot-removal we don't have anything like that, as far as I
can say, so my point was why don't we add memory "offline" that can be
done and tested separately from hot-removal and use that before we go and
hot-remove stuff?  And analogously for PCI host bridges etc.?

[Now, there's a question if an "eject" button on the system case, if there is
one, should *always* cause the eject to happen even though things are not
"offline".  My opinion is that not necessarily, because users may not be aware
that they are doing something wrong.

Quite analogously, does the power button always cause the system to shut down?
No.  So why the heck should an eject button always cause an eject to happen?
I see no reason.

That said, the most straightforward approach may be simply to let user space
disable eject events for specific devices when it wants and only enable them
when it knows that the given devices are ready for removal.

But I'm digressing.]

> PCI Host bridge eject is the same as PCI eject today, the user asks us
> to do it, and we can not fail it from happening.  We also can have them
> removed without us being told about it in the first place, and can
> properly clean up from it all.

Well, are you sure we'll always clean up?  I kind of have my doubts. :-)

> > And if you think about it, that makes things *massively* simpler, because now
> > the kernel doesn't heed to worry about all of those "synchronous removal"
> > scenarions that very well may involve every single device in the system and
> > the whole problem is nicely split into several separate "implement
> > offline/online" problems that are subsystem-specific and a single
> > "eject if everything relevant is offline" problem which is kind of trivial.
> > Plus the one of exposing information to user space, which is separate too.
> > 
> > Now, each of them can be worked on separately, *tested* separately and
> > debugged separately if need be and it is much easier to isolate failures
> > and so on.
> 
> So you are agreeing with me in that we can not fail hot removing any
> device, nice :)

That depends on how you define hot-removing.  If you regard the "offline"
as a separate operation that can be carried out independently and hot-remove
as the last step causing the device to actually go away, then I agree that
it can't fail.  The "offline" itself, however, is a different matter (pretty
much like unmounting a file system).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-02-05  0:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <7003418.onqVlaaHJS@vostro.rjw.lan>

On Tue, 2013-02-05 at 00:52 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 03:13:29 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > > > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > > we *want* to do the removal in the first place.
> > > > > > > > 
> > > > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > > > doesn't work out.
> > > > > > > 
> > > > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > > > 
> > > > > > I think this will always be racy, or at worst, slow things down on
> > > > > > normal device operations as you will always be having to grab this flag
> > > > > > whenever you want to do something new.
> > > > > 
> > > > > I don't see why this particular scheme should be racy, at least I don't see any
> > > > > obvious races in it (although I'm not that good at races detection in general,
> > > > > admittedly).
> > > > > 
> > > > > Also, I don't expect that flag to be used for everything, just for things known
> > > > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > > > so that's a matter of defining its purpose more precisely.
> > > > > 
> > > > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > > > manipulate it) but then devices without ACPI namespace objects won't be
> > > > > covered.  That may not be a big deal, though.
> > > > > 
> > > > > So say dev is about to be used for something incompatible with ejecting, so to
> > > > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > > > gone.  If it returns success (0), one would do something to the device and
> > > > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > > > 
> > > > How does a device "know" it is doing something that is incompatible with
> > > > ejecting?  That's a non-trivial task from what I can tell.
> > > 
> > > I agree that this is complicated in general.  But.
> > > 
> > > There are devices known to have software "offline" and "online" operations
> > > such that after the "offline" the given device is guaranteed to be not used
> > > until "online".  We have that for CPU cores, for example, and user space can
> > > do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> > > "online" set the no_eject flag (under the lock as appropriate) and the
> > > "offline" clear it?  And why don't we define such "online" and "offline" for
> > > all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> > > behave analogously?
> > > 
> > > Then, it is quite simple to say which devices should use the no_eject flag:
> > > devices that have "online" and "offline" exported to user space.  And guess
> > > who's responsible for "offlining" all of those things before trying to eject
> > > them: user space is.  From the kernel's point of view it is all clear.  Hands
> > > clean. :-)
> > > 
> > > Now, there's a different problem how to expose all of the relevant information
> > > to user space so that it knows what to "offline" for the specific eject
> > > operation to succeed, but that's kind of separate and worth addressing
> > > anyway.
> > 
> > So, the idea is to run a user space program that off-lines all relevant
> > devices before trimming ACPI devices.  Is that right?  That sounds like
> > a worth idea to consider with.  This basically moves the "sequencer"
> > part into user space instead of the kernel space in my proposal.  I
> > agree that how to expose all of the relevant info to user space is an
> > issue.  Also, we will need to make sure that the user program always
> > runs per a kernel request and then informs a result back to the kernel,
> > so that the kernel can do the rest of an operation and inform a result
> > to FW with _OST or _EJ0.  This loop has to close.  I think it is going
> > to be more complicated than the kernel-only approach.
> 
> I actually didn't think about that.  The point is that trying to offline
> everything *synchronously* may just be pointless, because it may be
> offlined upfront, before the eject is even requested.  So the sequence
> would be to first offline things that we'll want to eject from user space
> and then to send the eject request (e.g. via sysfs too).
> 
> Eject requests from eject buttons and things like that may just fail if
> some components involved that should be offline are online.  The fact that
> we might be able to offline them synchronously if we tried doesn't matter,
> pretty much as it doesn't matter for hot-swappable disks.
> 
> You'd probably never try to hot-remove a disk before unmounting filesystems
> mounted from it or failing it as a RAID component and nobody sane wants the
> kernel to do things like that automatically when the user presses the eject
> button.  In my opinion we should treat memory eject, or CPU package eject, or
> PCI host bridge eject in exactly the same way: Don't eject if it is not
> prepared for ejecting in the first place.
> 
> And if you think about it, that makes things *massively* simpler, because now
> the kernel doesn't heed to worry about all of those "synchronous removal"
> scenarions that very well may involve every single device in the system and
> the whole problem is nicely split into several separate "implement
> offline/online" problems that are subsystem-specific and a single
> "eject if everything relevant is offline" problem which is kind of trivial.
> Plus the one of exposing information to user space, which is separate too.

Oh, I see.  Yes, it certainly makes things really simpler.  It will
bring burden to a user, but it could be solved with proper tools.  I
totally agree that I/Os should be removed beforehand.  For CPUs and
memory, it would be a bad TCE for asking a user to find a right set of
the devices to off-line, but this could be addressed with proper tools.
I think we need to check if memory block (a unit of sysfs memory
online/offline) and an ACPI memory object actually corresponds nicely.
But in high-level, this sounds like a workable plan.


> Now, each of them can be worked on separately, *tested* separately and
> debugged separately if need be and it is much easier to isolate failures
> and so on.

Right, but it is also the case with "synchronous removal" as long as we
have sysfs online interface.  The difference is that this approach only
supports sysfs interface for off-lining.


> > In addition, I am not sure if the "no_eject" flag in acpi_device is
> > really necessary here since the user program will inform the kernel if
> > all devices are off-line.  Also, the kernel will likely need to expose
> > the device info to the user program to tell which devices need to be
> > off-lined.  At that time, the kernel already knows if there is any
> > on-line device in the scope.
> 
> Well, that depends no what "the kernel" means and how it knows that.  Surely
> the "online" components have to be marked somehow so that it is easy to check
> if they are in the scope in the subsystem-independent way, so why don't we use
> something like the no_eject flag for that?

Yes, I see your point.  My previous comment assumed that the kernel
would have to obtain device info and tell a user program to off-line
them.  In such case, I thought we would have to walk thru the actual
device tree and see online/offline info anyway.  But, since we are not
doing anything like that, having the flag in acpi_device seems to be a
reasonable way to avoid dealing with the actual device tree.


Thanks,
-Toshi

^ permalink raw reply

* [PATCH 0/4] Improve CFAR handling
From: Paul Mackerras @ 2013-02-05  4:09 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt, Alexander Graf; +Cc: kvm-ppc

The CFAR (Come From Address Register) is useful for debugging; it
records the address of the most recent taken branch or rfid
instructions.  At present, KVM doesn't even try to context switch it,
and the first-level interrupt handlers for some interrupts have a
branch before it gets saved, which will corrupt it.

This series fixes the interrupt handlers to not corrupt the CFAR, and
makes KVM context-switch it.  The series is against Ben H.'s
next branch.  The last patch in the series corrects a compile error
for 32-bit PR KVM configs which was introduced by an earlier commit in
Ben's next branch.

I suggest this series should go via Ben's tree rather than the KVM
tree, since most of the changes are to core powerpc interrupt handling
code.  Alex, if you could ack patch 3/4 that would be helpful.

Paul.

^ permalink raw reply

* [PATCH 1/4] powerpc: Remove Cell-specific relocation-on interrupt vector code
From: Paul Mackerras @ 2013-02-05  4:09 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt, Alexander Graf; +Cc: kvm-ppc
In-Reply-To: <20130205040902.GA20303@drongo>

The Cell processor doesn't support relocation-on interrupts, so we
don't need relocation-on versions of the interrupt vectors that are
purely Cell-specific.  This removes them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/exceptions-64s.S |   10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 7a1c87c..dc64165 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -817,26 +817,16 @@ vsx_unavailable_relon_pSeries_1:
 	. = 0x4f40
 	b	vsx_unavailable_relon_pSeries
 
-#ifdef CONFIG_CBE_RAS
-	STD_RELON_EXCEPTION_HV(0x5200, 0x1202, cbe_system_error)
-#endif /* CONFIG_CBE_RAS */
 	STD_RELON_EXCEPTION_PSERIES(0x5300, 0x1300, instruction_breakpoint)
 #ifdef CONFIG_PPC_DENORMALISATION
 	. = 0x5500
 	b	denorm_exception_hv
 #endif
-#ifdef CONFIG_CBE_RAS
-	STD_RELON_EXCEPTION_HV(0x5600, 0x1602, cbe_maintenance)
-#else
 #ifdef CONFIG_HVC_SCOM
 	STD_RELON_EXCEPTION_HV(0x5600, 0x1600, maintence_interrupt)
 	KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0x1600)
 #endif /* CONFIG_HVC_SCOM */
-#endif /* CONFIG_CBE_RAS */
 	STD_RELON_EXCEPTION_PSERIES(0x5700, 0x1700, altivec_assist)
-#ifdef CONFIG_CBE_RAS
-	STD_RELON_EXCEPTION_HV(0x5800, 0x1802, cbe_thermal)
-#endif /* CONFIG_CBE_RAS */
 
 	/* Other future vectors */
 	.align	7
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/4] powerpc: Save CFAR before branching in interrupt entry paths
From: Paul Mackerras @ 2013-02-05  4:10 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt, Alexander Graf; +Cc: kvm-ppc
In-Reply-To: <20130205040902.GA20303@drongo>

Some of the interrupt vectors on 64-bit POWER server processors are
only 32 bytes long, which is not enough for the full first-level
interrupt handler.  For these we currently just have a branch to an
out-of-line handler.  However, this means that we corrupt the CFAR
(come-from address register) on POWER7 and later processors.

To fix this, we split the EXCEPTION_PROLOG_1 macro into two pieces:
EXCEPTION_PROLOG_0 contains the part up to the point where the CFAR
is saved in the PACA, and EXCEPTION_PROLOG_1 contains the rest.  We
then put EXCEPTION_PROLOG_0 in the short interrupt vectors before
we branch to the out-of-line handler, which contains the rest of the
first-level interrupt handler.  To facilitate this, we define new
_OOL (out of line) variants of STD_EXCEPTION_PSERIES, etc.

In order to get EXCEPTION_PROLOG_0 to be short enough, i.e., no more
than 6 instructions, it was necessary to move the stores that move
the PPR and CFAR values into the PACA into __EXCEPTION_PROLOG_1 and
to get rid of one of the two HMT_MEDIUM instructions.  Previously
there was a HMT_MEDIUM_PPR_DISCARD before the prolog, which was
nop'd out on processors with the PPR (POWER7 and later), and then
another HMT_MEDIUM inside the HMT_MEDIUM_PPR_SAVE macro call inside
__EXCEPTION_PROLOG_1, which was nop'd out on processors without PPR.
Now the HMT_MEDIUM inside EXCEPTION_PROLOG_0 is there unconditionally
and the HMT_MEDIUM_PPR_DISCARD is not strictly necessary, although
this leaves it in for the interrupt vectors where there is room for
it.

Previously we had a handler for hypervisor maintenance interrupts at
0xe50, which doesn't leave enough room for the vector for hypervisor
emulation assist interrupts at 0xe40, since we need 8 instructions.
The 0xe50 vector was only used on POWER6, as the HMI vector was moved
to 0xe60 on POWER7.  Since we don't support running in hypervisor mode
on POWER6, we just remove the handler at 0xe50.

This also changes denorm_exception_hv to use EXCEPTION_PROLOG_0
instead of open-coding it, and removes the HMT_MEDIUM_PPR_DISCARD
from the relocation-on vectors (since any CPU that supports
relocation-on interrupts also has the PPR).

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/exception-64s.h |   84 +++++++++++++++++++++-----
 arch/powerpc/kernel/exceptions-64s.S     |   95 ++++++++++++++++++++----------
 2 files changed, 133 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 370298a..4dfc515 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -50,7 +50,7 @@
 #define EX_PPR		88	/* SMT thread status register (priority) */
 
 #ifdef CONFIG_RELOCATABLE
-#define EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
+#define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
 	ld	r12,PACAKBASE(r13);	/* get high part of &label */	\
 	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
 	LOAD_HANDLER(r12,label);					\
@@ -61,13 +61,15 @@
 	blr;
 #else
 /* If not relocatable, we can jump directly -- and save messing with LR */
-#define EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
+#define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
 	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
 	mfspr	r12,SPRN_##h##SRR1;	/* and SRR1 */			\
 	li	r10,MSR_RI;						\
 	mtmsrd 	r10,1;			/* Set RI (EE=0) */		\
 	b	label;
 #endif
+#define EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
+	__EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
 
 /*
  * As EXCEPTION_PROLOG_PSERIES(), except we've already got relocation on
@@ -75,6 +77,7 @@
  * case EXCEPTION_RELON_PROLOG_PSERIES_1 will be using lr.
  */
 #define EXCEPTION_RELON_PROLOG_PSERIES(area, label, h, extra, vec)	\
+	EXCEPTION_PROLOG_0(area);					\
 	EXCEPTION_PROLOG_1(area, extra, vec);				\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)
 
@@ -135,25 +138,32 @@ BEGIN_FTR_SECTION_NESTED(942)						\
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,0,942)  /*non P7*/		
 
 /*
- * Save PPR in paca whenever some register is available to use.
- * Then increase the priority.
+ * Get an SPR into a register if the CPU has the given feature
  */
-#define HMT_MEDIUM_PPR_SAVE(area, ra)					\
+#define OPT_GET_SPR(ra, spr, ftr)					\
 BEGIN_FTR_SECTION_NESTED(943)						\
-	mfspr	ra,SPRN_PPR;						\
-	std	ra,area+EX_PPR(r13);					\
-	HMT_MEDIUM;							\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,943) 
+	mfspr	ra,spr;							\
+END_FTR_SECTION_NESTED(ftr,ftr,943)
 
-#define __EXCEPTION_PROLOG_1(area, extra, vec)				\
+/*
+ * Save a register to the PACA if the CPU has the given feature
+ */
+#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr)				\
+BEGIN_FTR_SECTION_NESTED(943)						\
+	std	ra,offset(r13);						\
+END_FTR_SECTION_NESTED(ftr,ftr,943)
+
+#define EXCEPTION_PROLOG_0(area)					\
 	GET_PACA(r13);							\
 	std	r9,area+EX_R9(r13);	/* save r9 */			\
-	HMT_MEDIUM_PPR_SAVE(area, r9);					\
+	OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR);			\
+	HMT_MEDIUM;							\
 	std	r10,area+EX_R10(r13);	/* save r10 - r12 */		\
-	BEGIN_FTR_SECTION_NESTED(66);					\
-	mfspr	r10,SPRN_CFAR;						\
-	std	r10,area+EX_CFAR(r13);					\
-	END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66);		\
+	OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
+
+#define __EXCEPTION_PROLOG_1(area, extra, vec)				\
+	OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR);		\
+	OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);		\
 	SAVE_LR(r10, area);						\
 	mfcr	r9;							\
 	extra(vec);							\
@@ -178,6 +188,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,943)
 	__EXCEPTION_PROLOG_PSERIES_1(label, h)
 
 #define EXCEPTION_PROLOG_PSERIES(area, label, h, extra, vec)		\
+	EXCEPTION_PROLOG_0(area);					\
 	EXCEPTION_PROLOG_1(area, extra, vec);				\
 	EXCEPTION_PROLOG_PSERIES_1(label, h);
 
@@ -312,6 +323,13 @@ label##_pSeries:					\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common,	\
 				 EXC_STD, KVMTEST_PR, vec)
 
+/* Version of above for when we have to branch out-of-line */
+#define STD_EXCEPTION_PSERIES_OOL(vec, label)			\
+	.globl label##_pSeries;					\
+label##_pSeries:						\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);	\
+	EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD)
+
 #define STD_EXCEPTION_HV(loc, vec, label)		\
 	. = loc;					\
 	.globl label##_hv;				\
@@ -321,6 +339,13 @@ label##_hv:						\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common,	\
 				 EXC_HV, KVMTEST, vec)
 
+/* Version of above for when we have to branch out-of-line */
+#define STD_EXCEPTION_HV_OOL(vec, label)		\
+	.globl label##_hv;				\
+label##_hv:						\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec);	\
+	EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV)
+
 #define STD_RELON_EXCEPTION_PSERIES(loc, vec, label)	\
 	. = loc;					\
 	.globl label##_relon_pSeries;			\
@@ -331,6 +356,12 @@ label##_relon_pSeries:					\
 	EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
 				       EXC_STD, KVMTEST_PR, vec)
 
+#define STD_RELON_EXCEPTION_PSERIES_OOL(vec, label)		\
+	.globl label##_relon_pSeries;				\
+label##_relon_pSeries:						\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);	\
+	EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_STD)
+
 #define STD_RELON_EXCEPTION_HV(loc, vec, label)		\
 	. = loc;					\
 	.globl label##_relon_hv;			\
@@ -341,6 +372,12 @@ label##_relon_hv:					\
 	EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
 				       EXC_HV, KVMTEST, vec)
 
+#define STD_RELON_EXCEPTION_HV_OOL(vec, label)			\
+	.globl label##_relon_hv;				\
+label##_relon_hv:						\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec);		\
+	EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_HV)
+
 /* This associate vector numbers with bits in paca->irq_happened */
 #define SOFTEN_VALUE_0x500	PACA_IRQ_EE
 #define SOFTEN_VALUE_0x502	PACA_IRQ_EE
@@ -375,8 +412,10 @@ label##_relon_hv:					\
 #define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)		\
 	HMT_MEDIUM_PPR_DISCARD;						\
 	SET_SCRATCH0(r13);    /* save r13 */				\
-	__EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec);		\
+	EXCEPTION_PROLOG_0(PACA_EXGEN);					\
+	__EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec);			\
 	EXCEPTION_PROLOG_PSERIES_1(label##_common, h);
+
 #define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)		\
 	__MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)
 
@@ -394,9 +433,16 @@ label##_hv:								\
 	_MASKABLE_EXCEPTION_PSERIES(vec, label,				\
 				    EXC_HV, SOFTEN_TEST_HV)
 
+#define MASKABLE_EXCEPTION_HV_OOL(vec, label)				\
+	.globl label##_hv;						\
+label##_hv:								\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);		\
+	EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV);
+
 #define __MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra)	\
 	HMT_MEDIUM_PPR_DISCARD;						\
 	SET_SCRATCH0(r13);    /* save r13 */				\
+	EXCEPTION_PROLOG_0(PACA_EXGEN);					\
 	__EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec);		\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, h);
 #define _MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra)	\
@@ -416,6 +462,12 @@ label##_relon_hv:							\
 	_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label,			\
 					  EXC_HV, SOFTEN_NOTEST_HV)
 
+#define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label)			\
+	.globl label##_relon_hv;					\
+label##_relon_hv:							\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);		\
+	EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_HV);
+
 /*
  * Our exception common code can be passed various "additions"
  * to specify the behaviour of interrupts, whether to kick the
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index dc64165..b9bcf21 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -153,7 +153,10 @@ machine_check_pSeries_1:
 	 * some code path might still want to branch into the original
 	 * vector
 	 */
-	b	machine_check_pSeries
+	HMT_MEDIUM_PPR_DISCARD
+	SET_SCRATCH0(r13)		/* save r13 */
+	EXCEPTION_PROLOG_0(PACA_EXMC)
+	b	machine_check_pSeries_0
 
 	. = 0x300
 	.globl data_access_pSeries
@@ -172,6 +175,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB)
 data_access_slb_pSeries:
 	HMT_MEDIUM_PPR_DISCARD
 	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXSLB)
 	EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST, 0x380)
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_DAR
@@ -203,6 +207,7 @@ data_access_slb_pSeries:
 instruction_access_slb_pSeries:
 	HMT_MEDIUM_PPR_DISCARD
 	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXSLB)
 	EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
@@ -284,16 +289,28 @@ system_call_pSeries:
 	 */
 	. = 0xe00
 hv_exception_trampoline:
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	h_data_storage_hv
+
 	. = 0xe20
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	h_instr_storage_hv
+
 	. = 0xe40
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	emulation_assist_hv
-	. = 0xe50
-	b	hmi_exception_hv
+
 	. = 0xe60
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	hmi_exception_hv
+
 	. = 0xe80
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	h_doorbell_hv
 
 	/* We need to deal with the Altivec unavailable exception
@@ -303,14 +320,20 @@ hv_exception_trampoline:
 	 */
 performance_monitor_pSeries_1:
 	. = 0xf00
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	performance_monitor_pSeries
 
 altivec_unavailable_pSeries_1:
 	. = 0xf20
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	altivec_unavailable_pSeries
 
 vsx_unavailable_pSeries_1:
 	. = 0xf40
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	vsx_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
@@ -326,10 +349,7 @@ vsx_unavailable_pSeries_1:
 denorm_exception_hv:
 	HMT_MEDIUM_PPR_DISCARD
 	mtspr	SPRN_SPRG_HSCRATCH0,r13
-	mfspr	r13,SPRN_SPRG_HPACA
-	std	r9,PACA_EXGEN+EX_R9(r13)
-	HMT_MEDIUM_PPR_SAVE(PACA_EXGEN, r9)
-	std	r10,PACA_EXGEN+EX_R10(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	std	r11,PACA_EXGEN+EX_R11(r13)
 	std	r12,PACA_EXGEN+EX_R12(r13)
 	mfspr	r9,SPRN_SPRG_HSCRATCH0
@@ -372,8 +392,10 @@ machine_check_pSeries:
 machine_check_fwnmi:
 	HMT_MEDIUM_PPR_DISCARD
 	SET_SCRATCH0(r13)		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common,
-				 EXC_STD, KVMTEST, 0x200)
+	EXCEPTION_PROLOG_0(PACA_EXMC)
+machine_check_pSeries_0:
+	EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST, 0x200)
+	EXCEPTION_PROLOG_PSERIES_1(machine_check_common, EXC_STD)
 	KVM_HANDLER_SKIP(PACA_EXMC, EXC_STD, 0x200)
 
 	/* moved from 0x300 */
@@ -510,23 +532,23 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206)
 
 	.align	7
 	/* moved from 0xe00 */
-	STD_EXCEPTION_HV(., 0xe02, h_data_storage)
+	STD_EXCEPTION_HV_OOL(0xe02, h_data_storage)
 	KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02)
-	STD_EXCEPTION_HV(., 0xe22, h_instr_storage)
+	STD_EXCEPTION_HV_OOL(0xe22, h_instr_storage)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe22)
-	STD_EXCEPTION_HV(., 0xe42, emulation_assist)
+	STD_EXCEPTION_HV_OOL(0xe42, emulation_assist)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe42)
-	STD_EXCEPTION_HV(., 0xe62, hmi_exception) /* need to flush cache ? */
+	STD_EXCEPTION_HV_OOL(0xe62, hmi_exception) /* need to flush cache ? */
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe62)
-	MASKABLE_EXCEPTION_HV(., 0xe82, h_doorbell)
+	MASKABLE_EXCEPTION_HV_OOL(0xe82, h_doorbell)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe82)
 
 	/* moved from 0xf00 */
-	STD_EXCEPTION_PSERIES(., 0xf00, performance_monitor)
+	STD_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
 	KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf00)
-	STD_EXCEPTION_PSERIES(., 0xf20, altivec_unavailable)
+	STD_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
 	KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf20)
-	STD_EXCEPTION_PSERIES(., 0xf40, vsx_unavailable)
+	STD_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
 	KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40)
 
 /*
@@ -718,8 +740,8 @@ machine_check_common:
 	. = 0x4380
 	.globl data_access_slb_relon_pSeries
 data_access_slb_relon_pSeries:
-	HMT_MEDIUM_PPR_DISCARD
 	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXSLB)
 	EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_DAR
@@ -743,8 +765,8 @@ data_access_slb_relon_pSeries:
 	. = 0x4480
 	.globl instruction_access_slb_relon_pSeries
 instruction_access_slb_relon_pSeries:
-	HMT_MEDIUM_PPR_DISCARD
 	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXSLB)
 	EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x480)
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
@@ -788,33 +810,46 @@ system_call_relon_pSeries:
 	STD_RELON_EXCEPTION_PSERIES(0x4d00, 0xd00, single_step)
 
 	. = 0x4e00
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	h_data_storage_relon_hv
 
 	. = 0x4e20
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	h_instr_storage_relon_hv
 
 	. = 0x4e40
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	emulation_assist_relon_hv
 
-	. = 0x4e50
-	b	hmi_exception_relon_hv
-
 	. = 0x4e60
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	hmi_exception_relon_hv
 
 	. = 0x4e80
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	h_doorbell_relon_hv
 
 performance_monitor_relon_pSeries_1:
 	. = 0x4f00
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	performance_monitor_relon_pSeries
 
 altivec_unavailable_relon_pSeries_1:
 	. = 0x4f20
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	altivec_unavailable_relon_pSeries
 
 vsx_unavailable_relon_pSeries_1:
 	. = 0x4f40
+	SET_SCRATCH0(r13)
+	EXCEPTION_PROLOG_0(PACA_EXGEN)
 	b	vsx_unavailable_relon_pSeries
 
 	STD_RELON_EXCEPTION_PSERIES(0x5300, 0x1300, instruction_breakpoint)
@@ -1171,20 +1206,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 __end_handlers:
 
 	/* Equivalents to the above handlers for relocation-on interrupt vectors */
-	STD_RELON_EXCEPTION_HV(., 0xe00, h_data_storage)
+	STD_RELON_EXCEPTION_HV_OOL(0xe00, h_data_storage)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe00)
-	STD_RELON_EXCEPTION_HV(., 0xe20, h_instr_storage)
+	STD_RELON_EXCEPTION_HV_OOL(0xe20, h_instr_storage)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe20)
-	STD_RELON_EXCEPTION_HV(., 0xe40, emulation_assist)
+	STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe40)
-	STD_RELON_EXCEPTION_HV(., 0xe60, hmi_exception)
+	STD_RELON_EXCEPTION_HV_OOL(0xe60, hmi_exception)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe60)
-	MASKABLE_RELON_EXCEPTION_HV(., 0xe80, h_doorbell)
+	MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
 	KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe80)
 
-	STD_RELON_EXCEPTION_PSERIES(., 0xf00, performance_monitor)
-	STD_RELON_EXCEPTION_PSERIES(., 0xf20, altivec_unavailable)
-	STD_RELON_EXCEPTION_PSERIES(., 0xf40, vsx_unavailable)
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/4] KVM: PPC: Book3S HV: Preserve guest CFAR register value
From: Paul Mackerras @ 2013-02-05  4:10 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt, Alexander Graf; +Cc: kvm-ppc
In-Reply-To: <20130205040902.GA20303@drongo>

The CFAR (Come-From Address Register) is a useful debugging aid that
exists on POWER7 processors.  Currently HV KVM doesn't save or restore
the CFAR register for guest vcpus, making the CFAR of limited use in
guests.

This adds the necessary code to capture the CFAR value saved in the
early exception entry code (it has to be saved before any branch is
executed), save it in the vcpu.arch struct, and restore it on entry
to the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/exception-64s.h  |    8 ++++++--
 arch/powerpc/include/asm/kvm_book3s_asm.h |    3 +++
 arch/powerpc/include/asm/kvm_host.h       |    1 +
 arch/powerpc/kernel/asm-offsets.c         |    5 +++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |    9 +++++++++
 5 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 4dfc515..05e6d2e 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -199,10 +199,14 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define __KVM_HANDLER(area, h, n)					\
 do_kvm_##n:								\
+	BEGIN_FTR_SECTION_NESTED(947)					\
+	ld	r10,area+EX_CFAR(r13);					\
+	std	r10,HSTATE_CFAR(r13);					\
+	END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947);		\
 	ld	r10,area+EX_R10(r13);					\
-	stw	r9,HSTATE_SCRATCH1(r13);			\
+	stw	r9,HSTATE_SCRATCH1(r13);				\
 	ld	r9,area+EX_R9(r13);					\
-	std	r12,HSTATE_SCRATCH0(r13);			\
+	std	r12,HSTATE_SCRATCH0(r13);				\
 	li	r12,n;							\
 	b	kvmppc_interrupt
 
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 88609b2..cdc3d27 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -93,6 +93,9 @@ struct kvmppc_host_state {
 	u64 host_dscr;
 	u64 dec_expires;
 #endif
+#ifdef CONFIG_PPC_BOOK3S_64
+	u64 cfar;
+#endif
 };
 
 struct kvmppc_book3s_shadow_vcpu {
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ca9bf45..03d7bea 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -440,6 +440,7 @@ struct kvm_vcpu_arch {
 	ulong uamor;
 	u32 ctrl;
 	ulong dabr;
+	ulong cfar;
 #endif
 	u32 vrsave; /* also USPRG0 */
 	u32 mmucr;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index e39ca55..9a73fb0 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -475,6 +475,7 @@ int main(void)
 	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
 	DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
 	DEFINE(VCPU_PTID, offsetof(struct kvm_vcpu, arch.ptid));
+	DEFINE(VCPU_CFAR, offsetof(struct kvm_vcpu, arch.cfar));
 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
 	DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
@@ -554,6 +555,10 @@ int main(void)
 	DEFINE(IPI_PRIORITY, IPI_PRIORITY);
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	HSTATE_FIELD(HSTATE_CFAR, cfar);
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
 #else /* CONFIG_PPC_BOOK3S */
 	DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
 	DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 10b6c35..e33d11f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -539,6 +539,11 @@ fast_guest_return:
 
 	/* Enter guest */
 
+BEGIN_FTR_SECTION
+	ld	r5, VCPU_CFAR(r4)
+	mtspr	SPRN_CFAR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
+
 	ld	r5, VCPU_LR(r4)
 	lwz	r6, VCPU_CR(r4)
 	mtlr	r5
@@ -604,6 +609,10 @@ kvmppc_interrupt:
 	lwz	r4, HSTATE_SCRATCH1(r13)
 	std	r3, VCPU_GPR(R12)(r9)
 	stw	r4, VCPU_CR(r9)
+BEGIN_FTR_SECTION
+	ld	r3, HSTATE_CFAR(r13)
+	std	r3, VCPU_CFAR(r9)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 
 	/* Restore R1/R2 so we can handle faults */
 	ld	r1, HSTATE_HOST_R1(r13)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 4/4] KVM: PPC: Book3S PR: Fix compilation on 32-bit machines
From: Paul Mackerras @ 2013-02-05  4:11 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt, Alexander Graf; +Cc: kvm-ppc
In-Reply-To: <20130205040902.GA20303@drongo>

Commit a413f474a0 ("powerpc: Disable relocation on exceptions whenever
PR KVM is active") added calls to pSeries_disable_reloc_on_exc() and
pSeries_enable_reloc_on_exc() to book3s_pr.c, and added declarations
of those functions to <asm/hvcall.h>, but didn't add an include of
<asm/hvcall.h> to book3s_pr.c.  64-bit kernels seem to get hvcall.h
included via some other path, but 32-bit kernels fail to compile with:

arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_init_vm’:
arch/powerpc/kvm/book3s_pr.c:1300:4: error: implicit declaration of function ‘pSeries_disable_reloc_on_exc’ [-Werror=implicit-function-declaration]
arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_destroy_vm’:
arch/powerpc/kvm/book3s_pr.c:1316:4: error: implicit declaration of function ‘pSeries_enable_reloc_on_exc’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[2]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
make[1]: *** [arch/powerpc/kvm] Error 2
make: *** [sub-make] Error 2

This fixes it by adding an include of hvcall.h.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_pr.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 67e4708..6702442 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -35,6 +35,7 @@
 #include <asm/mmu_context.h>
 #include <asm/switch_to.h>
 #include <asm/firmware.h>
+#include <asm/hvcall.h>
 #include <linux/gfp.h>
 #include <linux/sched.h>
 #include <linux/vmalloc.h>
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/3] powerpc/mpc512x: fix noderef sparse warnings
From: Anatolij Gustschin @ 2013-02-05  7:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anatolij Gustschin

Fix:
warning: dereference of noderef expression

Signed-off-by: Anatolij Gustschin <agust@denx.de>
---
 arch/powerpc/platforms/512x/clock.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/512x/clock.c b/arch/powerpc/platforms/512x/clock.c
index 7937361..8a784d4 100644
--- a/arch/powerpc/platforms/512x/clock.c
+++ b/arch/powerpc/platforms/512x/clock.c
@@ -184,7 +184,7 @@ static unsigned long spmf_mult(void)
 		36, 40, 44, 48,
 		52, 56, 60, 64
 	};
-	int spmf = (clockctl->spmr >> 24) & 0xf;
+	int spmf = (in_be32(&clockctl->spmr) >> 24) & 0xf;
 	return spmf_to_mult[spmf];
 }
 
@@ -206,7 +206,7 @@ static unsigned long sysdiv_div_x_2(void)
 		52, 56, 58, 62,
 		60, 64, 66,
 	};
-	int sysdiv = (clockctl->scfr2 >> 26) & 0x3f;
+	int sysdiv = (in_be32(&clockctl->scfr2) >> 26) & 0x3f;
 	return sysdiv_to_div_x_2[sysdiv];
 }
 
@@ -230,7 +230,7 @@ static unsigned long sys_to_ref(unsigned long rate)
 
 static long ips_to_ref(unsigned long rate)
 {
-	int ips_div = (clockctl->scfr1 >> 23) & 0x7;
+	int ips_div = (in_be32(&clockctl->scfr1) >> 23) & 0x7;
 
 	rate *= ips_div;	/* csb_clk = ips_clk * ips_div */
 	rate *= 2;		/* sys_clk = csb_clk * 2 */
@@ -284,7 +284,7 @@ static struct clk sys_clk = {
 
 static void diu_clk_calc(struct clk *clk)
 {
-	int diudiv_x_2 = clockctl->scfr1 & 0xff;
+	int diudiv_x_2 = in_be32(&clockctl->scfr1) & 0xff;
 	unsigned long rate;
 
 	rate = sys_clk.rate;
@@ -311,7 +311,7 @@ static void half_clk_calc(struct clk *clk)
 
 static void generic_div_clk_calc(struct clk *clk)
 {
-	int div = (clockctl->scfr1 >> clk->div_shift) & 0x7;
+	int div = (in_be32(&clockctl->scfr1) >> clk->div_shift) & 0x7;
 
 	clk->rate = clk->parent->rate / div;
 }
@@ -329,7 +329,7 @@ static struct clk csb_clk = {
 
 static void e300_clk_calc(struct clk *clk)
 {
-	int spmf = (clockctl->spmr >> 16) & 0xf;
+	int spmf = (in_be32(&clockctl->spmr) >> 16) & 0xf;
 	int ratex2 = clk->parent->rate * spmf;
 
 	clk->rate = ratex2 / 2;
@@ -648,12 +648,12 @@ static void psc_calc_rate(struct clk *clk, int pscnum, struct device_node *np)
 	out_be32(&clockctl->pccr[pscnum], 0x00020000);
 	out_be32(&clockctl->pccr[pscnum], 0x00030000);
 
-	if (clockctl->pccr[pscnum] & 0x80) {
+	if (in_be32(&clockctl->pccr[pscnum]) & 0x80) {
 		clk->rate = spdif_rxclk.rate;
 		return;
 	}
 
-	switch ((clockctl->pccr[pscnum] >> 14) & 0x3) {
+	switch ((in_be32(&clockctl->pccr[pscnum]) >> 14) & 0x3) {
 	case 0:
 		mclk_src = sys_clk.rate;
 		break;
@@ -668,7 +668,7 @@ static void psc_calc_rate(struct clk *clk, int pscnum, struct device_node *np)
 		break;
 	}
 
-	mclk_div = ((clockctl->pccr[pscnum] >> 17) & 0x7fff) + 1;
+	mclk_div = ((in_be32(&clockctl->pccr[pscnum]) >> 17) & 0x7fff) + 1;
 	clk->rate = mclk_src / mclk_div;
 }
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 2/3] powerpc/mpc512x: fix sparce warnings for non static symbols
From: Anatolij Gustschin @ 2013-02-05  7:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anatolij Gustschin
In-Reply-To: <1360048818-3765-1-git-send-email-agust@denx.de>

Fix warnings:
symbol 'clockctl' was not declared. Should it be static?
symbol 'rate_clks' was not declared. Should it be static?
symbol 'dev_clks' was not declared. Should it be static?
symbol 'mpc5121_clk_init' was not declared. Should it be static?

Signed-off-by: Anatolij Gustschin <agust@denx.de>
---
 arch/powerpc/include/asm/mpc5121.h  |    1 +
 arch/powerpc/platforms/512x/clock.c |    7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mpc5121.h b/arch/powerpc/include/asm/mpc5121.h
index e700c8b..1b0ce6d 100644
--- a/arch/powerpc/include/asm/mpc5121.h
+++ b/arch/powerpc/include/asm/mpc5121.h
@@ -147,5 +147,6 @@ struct mpc512x_axe_module {
 
 
 int mpc512x_cs_config(int cs, u32 val);
+int __init mpc5121_clk_init(void);
 
 #endif /* __ASM_POWERPC_MPC5121_H__ */
diff --git a/arch/powerpc/platforms/512x/clock.c b/arch/powerpc/platforms/512x/clock.c
index 8a784d4..52d57d2 100644
--- a/arch/powerpc/platforms/512x/clock.c
+++ b/arch/powerpc/platforms/512x/clock.c
@@ -26,6 +26,7 @@
 
 #include <linux/of_platform.h>
 #include <asm/mpc5xxx.h>
+#include <asm/mpc5121.h>
 #include <asm/clk_interface.h>
 
 #undef CLK_DEBUG
@@ -122,7 +123,7 @@ struct mpc512x_clockctl {
 	u32 dccr;		/* DIU Clk Cnfg Reg */
 };
 
-struct mpc512x_clockctl __iomem *clockctl;
+static struct mpc512x_clockctl __iomem *clockctl;
 
 static int mpc5121_clk_enable(struct clk *clk)
 {
@@ -551,7 +552,7 @@ static struct clk ac97_clk = {
 	.calc = ac97_clk_calc,
 };
 
-struct clk *rate_clks[] = {
+static struct clk *rate_clks[] = {
 	&ref_clk,
 	&sys_clk,
 	&diu_clk,
@@ -607,7 +608,7 @@ static void rate_clks_init(void)
  * There are two clk enable registers with 32 enable bits each
  * psc clocks and device clocks are all stored in dev_clks
  */
-struct clk dev_clks[2][32];
+static struct clk dev_clks[2][32];
 
 /*
  * Given a psc number return the dev_clk
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 3/3] powerpc/mpc5xxx: fix sparse warning for non static symbol
From: Anatolij Gustschin @ 2013-02-05  7:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anatolij Gustschin
In-Reply-To: <1360048818-3765-1-git-send-email-agust@denx.de>

Fix warning:
symbol 'mpc5xxx_get_bus_frequency' was not declared. Should it be static?

Signed-off-by: Anatolij Gustschin <agust@denx.de>
---
 arch/powerpc/sysdev/mpc5xxx_clocks.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/mpc5xxx_clocks.c b/arch/powerpc/sysdev/mpc5xxx_clocks.c
index 96f815a..5492dc5 100644
--- a/arch/powerpc/sysdev/mpc5xxx_clocks.c
+++ b/arch/powerpc/sysdev/mpc5xxx_clocks.c
@@ -9,9 +9,9 @@
 #include <linux/kernel.h>
 #include <linux/of_platform.h>
 #include <linux/export.h>
+#include <asm/mpc5xxx.h>
 
-unsigned int
-mpc5xxx_get_bus_frequency(struct device_node *node)
+unsigned long mpc5xxx_get_bus_frequency(struct device_node *node)
 {
 	struct device_node *np;
 	const unsigned int *p_bus_freq = NULL;
-- 
1.7.5.4

^ permalink raw reply related

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-05 11:11 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130205000447.GA21782@kroah.com>

On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > You'd probably never try to hot-remove a disk before unmounting filesystems
> > mounted from it or failing it as a RAID component and nobody sane wants the
> > kernel to do things like that automatically when the user presses the eject
> > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > prepared for ejecting in the first place.
> 
> Bad example, we have disks hot-removed all the time without any
> filesystems being unmounted, and have supported this since the 2.2 days
> (although we didn't get it "right" until 2.6.)

I actually don't think it is really bad, because it exposes the problem nicely.

Namely, there are two arguments that can be made here.  The first one is the
usability argument: Users should always be allowed to do what they want,
because it is [explicit content] annoying if software pretends to know better
what to do than the user (it is a convenience argument too, because usually
it's *easier* to allow users to do what they want).  The second one is the
data integrity argument: Operations that may lead to data loss should never
be carried out, because it is [explicit content] disappointing to lose valuable
stuff by a stupid mistake if software allows that mistake to be made (that also
may be costly in terms of real money).

You seem to believe that we should always follow the usability argument, while
Toshi seems to be thinking that (at least in the case of the "system" devices),
the data integrity argument is more important.  They are both valid arguments,
however, and they are in conflict, so this is a matter of balance.

You're saying that in the case of disks we always follow the usability argument
entirely.  I'm fine with that, although I suspect that some people may not be
considering this as the right balance.

Toshi seems to be thinking that for the hotplug of memory/CPUs/host bridges we
should always follow the data integrity argument entirely, because the users of
that feature value their data so much that they pretty much don't care about
usability.  That very well may be the case, so I'm fine with that too, although
I'm sure there are people who'll argue that this is not the right balance
either.

Now, the point is that we *can* do what Toshi is arguing for and that doesn't
seem to be overly complicated, so my question is: Why don't we do that, at
least to start with?  If it turns out eventually that the users care about
usability too, after all, we can add a switch to adjust things more to their
liking.  Still, we can very well do that later.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* [PATCH] powerpc: fix ics_rtas_init and start_secondary section mismatch
From: Daniel Borkmann @ 2013-02-05 15:07 UTC (permalink / raw)
  To: linuxppc-dev

It seems, we're fine with just annotating the two functions.
Thus, this fixes the following build warnings on ppc64:

WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1664):
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init
annotation or the annotation of .xics_register_ics is wrong.

WARNING: arch/powerpc/sysdev/built-in.o(.text+0x6044):
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init
annotation or the annotation of .xics_register_ics is wrong.

WARNING: arch/powerpc/kernel/built-in.o(.text+0x2db30):
The function .start_secondary() references
the function __cpuinit .vdso_getcpu_init().
This is often because .start_secondary lacks a __cpuinit
annotation or the annotation of .vdso_getcpu_init is wrong.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 Note: compile-tested only!

 arch/powerpc/kernel/smp.c           |    2 +-
 arch/powerpc/sysdev/xics/ics-rtas.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 793401e..76bd9da 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -610,7 +610,7 @@ static struct device_node *cpu_to_l2cache(int cpu)
 }
 
 /* Activate a secondary processor. */
-void start_secondary(void *unused)
+__cpuinit void start_secondary(void *unused)
 {
 	unsigned int cpu = smp_processor_id();
 	struct device_node *l2_cache;
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index c782f85..936575d 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -213,7 +213,7 @@ static int ics_rtas_host_match(struct ics *ics, struct device_node *node)
 	return !of_device_is_compatible(node, "chrp,iic");
 }
 
-int ics_rtas_init(void)
+__init int ics_rtas_init(void)
 {
 	ibm_get_xive = rtas_token("ibm,get-xive");
 	ibm_set_xive = rtas_token("ibm,set-xive");
-- 
1.7.1

^ permalink raw reply related

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-02-05 18:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <4225828.6MQHJn7Yzr@vostro.rjw.lan>

On Tue, Feb 05, 2013 at 12:11:17PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> > On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > > You'd probably never try to hot-remove a disk before unmounting filesystems
> > > mounted from it or failing it as a RAID component and nobody sane wants the
> > > kernel to do things like that automatically when the user presses the eject
> > > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > > prepared for ejecting in the first place.
> > 
> > Bad example, we have disks hot-removed all the time without any
> > filesystems being unmounted, and have supported this since the 2.2 days
> > (although we didn't get it "right" until 2.6.)
> 
> I actually don't think it is really bad, because it exposes the problem nicely.
> 
> Namely, there are two arguments that can be made here.  The first one is the
> usability argument: Users should always be allowed to do what they want,
> because it is [explicit content] annoying if software pretends to know better
> what to do than the user (it is a convenience argument too, because usually
> it's *easier* to allow users to do what they want).  The second one is the
> data integrity argument: Operations that may lead to data loss should never
> be carried out, because it is [explicit content] disappointing to lose valuable
> stuff by a stupid mistake if software allows that mistake to be made (that also
> may be costly in terms of real money).
> 
> You seem to believe that we should always follow the usability argument, while
> Toshi seems to be thinking that (at least in the case of the "system" devices),
> the data integrity argument is more important.  They are both valid arguments,
> however, and they are in conflict, so this is a matter of balance.
> 
> You're saying that in the case of disks we always follow the usability argument
> entirely.  I'm fine with that, although I suspect that some people may not be
> considering this as the right balance.
> 
> Toshi seems to be thinking that for the hotplug of memory/CPUs/host bridges we
> should always follow the data integrity argument entirely, because the users of
> that feature value their data so much that they pretty much don't care about
> usability.  That very well may be the case, so I'm fine with that too, although
> I'm sure there are people who'll argue that this is not the right balance
> either.
> 
> Now, the point is that we *can* do what Toshi is arguing for and that doesn't
> seem to be overly complicated, so my question is: Why don't we do that, at
> least to start with?  If it turns out eventually that the users care about
> usability too, after all, we can add a switch to adjust things more to their
> liking.  Still, we can very well do that later.

Ok, I'd much rather deal with reviewing actual implementations than
talking about theory at this point in time, so let's see what you all
can come up with next and I'll be glad to review it.

thanks,

greg k-h

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-02-05 21:13 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, Toshi Kani, jiang.liu, wency, linux-acpi, yinghai,
	linux-kernel, linux-mm, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130205183948.GA19026@kroah.com>

On Tuesday, February 05, 2013 10:39:48 AM Greg KH wrote:
> On Tue, Feb 05, 2013 at 12:11:17PM +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> > > On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > > > You'd probably never try to hot-remove a disk before unmounting filesystems
> > > > mounted from it or failing it as a RAID component and nobody sane wants the
> > > > kernel to do things like that automatically when the user presses the eject
> > > > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > > > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > > > prepared for ejecting in the first place.
> > > 
> > > Bad example, we have disks hot-removed all the time without any
> > > filesystems being unmounted, and have supported this since the 2.2 days
> > > (although we didn't get it "right" until 2.6.)
> > 
> > I actually don't think it is really bad, because it exposes the problem nicely.
> > 
> > Namely, there are two arguments that can be made here.  The first one is the
> > usability argument: Users should always be allowed to do what they want,
> > because it is [explicit content] annoying if software pretends to know better
> > what to do than the user (it is a convenience argument too, because usually
> > it's *easier* to allow users to do what they want).  The second one is the
> > data integrity argument: Operations that may lead to data loss should never
> > be carried out, because it is [explicit content] disappointing to lose valuable
> > stuff by a stupid mistake if software allows that mistake to be made (that also
> > may be costly in terms of real money).
> > 
> > You seem to believe that we should always follow the usability argument, while
> > Toshi seems to be thinking that (at least in the case of the "system" devices),
> > the data integrity argument is more important.  They are both valid arguments,
> > however, and they are in conflict, so this is a matter of balance.
> > 
> > You're saying that in the case of disks we always follow the usability argument
> > entirely.  I'm fine with that, although I suspect that some people may not be
> > considering this as the right balance.
> > 
> > Toshi seems to be thinking that for the hotplug of memory/CPUs/host bridges we
> > should always follow the data integrity argument entirely, because the users of
> > that feature value their data so much that they pretty much don't care about
> > usability.  That very well may be the case, so I'm fine with that too, although
> > I'm sure there are people who'll argue that this is not the right balance
> > either.
> > 
> > Now, the point is that we *can* do what Toshi is arguing for and that doesn't
> > seem to be overly complicated, so my question is: Why don't we do that, at
> > least to start with?  If it turns out eventually that the users care about
> > usability too, after all, we can add a switch to adjust things more to their
> > liking.  Still, we can very well do that later.
> 
> Ok, I'd much rather deal with reviewing actual implementations than
> talking about theory at this point in time, so let's see what you all
> can come up with next and I'll be glad to review it.

Sure, thanks a lot for your comments so far!

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* [PATCH] perf/powerpc: Fix compile warnings
From: Sukadev Bhattiprolu @ 2013-02-05 23:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linuxppc-dev, sfr, mingo, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Tue, 5 Feb 2013 15:04:49 -0800
Subject: [PATCH] perf/powerpc: Fix compile warnings

Fix compile errors like those below:

  CC      arch/powerpc/perf/power7-pmu.o
/home/git/linux/arch/powerpc/perf/power7-pmu.c:397:2: error: initialization from
incompatible pointer type [-Werror]

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 arch/powerpc/include/asm/perf_event_server.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index ee63205..92460bc 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -124,7 +124,7 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
  * POWER CPU specification.
  */
 #define	EVENT_VAR(_id, _suffix)		event_attr_##_id##_suffix
-#define	EVENT_PTR(_id, _suffix)		&EVENT_VAR(_id, _suffix)
+#define	EVENT_PTR(_id, _suffix)		&EVENT_VAR(_id, _suffix).attr.attr
 
 #define	EVENT_ATTR(_name, _id, _suffix)					\
 	PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,	\
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence
From: Tang Chen @ 2013-02-06  3:07 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, liuj97, len.brown, Miao Xie,
	Wen Congyang, cmetcalf, wujianguo, yinghai, KAMEZAWA Hiroyuki,
	laijs, linux-kernel, minchan.kim, akpm, linuxppc-dev
In-Reply-To: <50ED8834.1090804@parallels.com>

Hi Glauber, all,

An old thing I want to discuss with you. :)

On 01/09/2013 11:09 PM, Glauber Costa wrote:
>>>> memory can't be offlined when CONFIG_MEMCG is selected.
>>>> For example: there is a memory device on node 1. The address range
>>>> is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
>>>> and memory11 under the directory /sys/devices/system/memory/.
>>>>
>>>> If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
>>>> when we online pages. When we online memory8, the memory stored page cgroup
>>>> is not provided by this memory device. But when we online memory9, the memory
>>>> stored page cgroup may be provided by memory8. So we can't offline memory8
>>>> now. We should offline the memory in the reversed order.
>>>>
>>>> When the memory device is hotremoved, we will auto offline memory provided
>>>> by this memory device. But we don't know which memory is onlined first, so
>>>> offlining memory may fail. In such case, iterate twice to offline the memory.
>>>> 1st iterate: offline every non primary memory block.
>>>> 2nd iterate: offline primary (i.e. first added) memory block.
>>>>
>>>> This idea is suggested by KOSAKI Motohiro.
>>>>
>>>> Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>
>>>
>>> Maybe there is something here that I am missing - I admit that I came
>>> late to this one, but this really sounds like a very ugly hack, that
>>> really has no place in here.
>>>
>>> Retrying, of course, may make sense, if we have reasonable belief that
>>> we may now succeed. If this is the case, you need to document - in the
>>> code - while is that.
>>>
>>> The memcg argument, however, doesn't really cut it. Why can't we make
>>> all page_cgroup allocations local to the node they are describing? If
>>> memcg is the culprit here, we should fix it, and not retry. If there is
>>> still any benefit in retrying, then we retry being very specific about why.
>>
>> We try to make all page_cgroup allocations local to the node they are describing
>> now. If the memory is the first memory onlined in this node, we will allocate
>> it from the other node.
>>
>> For example, node1 has 4 memory blocks: 8-11, and we online it from 8 to 11
>> 1. memory block 8, page_cgroup allocations are in the other nodes
>> 2. memory block 9, page_cgroup allocations are in memory block 8
>>
>> So we should offline memory block 9 first. But we don't know in which order
>> the user online the memory block.
>>
>> I think we can modify memcg like this:
>> allocate the memory from the memory block they are describing
>>
>> I am not sure it is OK to do so.
>
> I don't see a reason why not.
>
> You would have to tweak a bit the lookup function for page_cgroup, but
> assuming you will always have the pfns and limits, it should be easy to do.
>
> I think the only tricky part is that today we have a single
> node_page_cgroup, and we would of course have to have one per memory
> block. My assumption is that the number of memory blocks is limited and
> likely not very big. So even a static array would do.
>

About the idea "allocate the memory from the memory block they are 
describing",

online_pages()
  |-->memory_notify(MEM_GOING_ONLINE, &arg) ----------- memory of this 
section is not in buddy yet.
       |-->page_cgroup_callback()
            |-->online_page_cgroup()
                 |-->init_section_page_cgroup()
                      |-->alloc_page_cgroup() --------- allocate 
page_cgroup from buddy system.

When onlining pages, we allocate page_cgroup from buddy. And the being 
onlined pages are not in
buddy yet. I think we can reserve some memory in the section for 
page_cgroup, and return all the
rest to the buddy.

But when the system is booting,

start_kernel()
  |-->setup_arch()
  |-->mm_init()
  |    |-->mem_init()
  |         |-->numa_free_all_bootmem() -------------- all the pages are 
in buddy system.
  |-->page_cgroup_init()
       |-->init_section_page_cgroup()
            |-->alloc_page_cgroup() ------------------ I don't know how 
to reserve memory in each section.

So any idea about how to deal with it when the system is booting please?


And one more question, a memory section is 128MB in Linux. If we reserve 
part of the them for page_cgroup,
then anyone who wants to allocate a contiguous memory larger than 128MB, 
it will fail, right ?
Is it OK ?

Thanks. :)

^ permalink raw reply

* Re: [PATCH 1/3] powerpc/mpc512x: fix noderef sparse warnings
From: Kim Phillips @ 2013-02-06  3:11 UTC (permalink / raw)
  To: Anatolij Gustschin; +Cc: linuxppc-dev
In-Reply-To: <1360048818-3765-1-git-send-email-agust@denx.de>

On Tue, 5 Feb 2013 08:20:16 +0100
Anatolij Gustschin <agust@denx.de> wrote:

> Fix:
> warning: dereference of noderef expression
> 
> Signed-off-by: Anatolij Gustschin <agust@denx.de>
> ---
>  arch/powerpc/platforms/512x/clock.c |   18 +++++++++---------
>  1 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/512x/clock.c b/arch/powerpc/platforms/512x/clock.c
> index 7937361..8a784d4 100644
> --- a/arch/powerpc/platforms/512x/clock.c
> +++ b/arch/powerpc/platforms/512x/clock.c
> @@ -184,7 +184,7 @@ static unsigned long spmf_mult(void)
>  		36, 40, 44, 48,
>  		52, 56, 60, 64
>  	};
> -	int spmf = (clockctl->spmr >> 24) & 0xf;
> +	int spmf = (in_be32(&clockctl->spmr) >> 24) & 0xf;

power arch should start using the more portable i/o accessors
io{read,write}32be instead of the arch-centric {in,out}_be32.  The
io{read,write}32be functions have better sparse annotation, so an
endian check will complain if registers are not defined __be32.

Kim

^ permalink raw reply

* Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence
From: Tang Chen @ 2013-02-06  9:17 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, liuj97, len.brown, Miao Xie,
	Wen Congyang, cmetcalf, wujianguo, yinghai, KAMEZAWA Hiroyuki,
	laijs, linux-kernel, minchan.kim, akpm, linuxppc-dev
In-Reply-To: <5111C8EB.6090805@cn.fujitsu.com>

Hi all,

On 02/06/2013 11:07 AM, Tang Chen wrote:
> Hi Glauber, all,
>
> An old thing I want to discuss with you. :)
>
> On 01/09/2013 11:09 PM, Glauber Costa wrote:
>>>>> memory can't be offlined when CONFIG_MEMCG is selected.
>>>>> For example: there is a memory device on node 1. The address range
>>>>> is [1G, 1.5G). You will find 4 new directories memory8, memory9,
>>>>> memory10,
>>>>> and memory11 under the directory /sys/devices/system/memory/.
>>>>>
>>>>> If CONFIG_MEMCG is selected, we will allocate memory to store page
>>>>> cgroup
>>>>> when we online pages. When we online memory8, the memory stored
>>>>> page cgroup
>>>>> is not provided by this memory device. But when we online memory9,
>>>>> the memory
>>>>> stored page cgroup may be provided by memory8. So we can't offline
>>>>> memory8
>>>>> now. We should offline the memory in the reversed order.
>>>>>
>>>>> When the memory device is hotremoved, we will auto offline memory
>>>>> provided
>>>>> by this memory device. But we don't know which memory is onlined
>>>>> first, so
>>>>> offlining memory may fail. In such case, iterate twice to offline
>>>>> the memory.
>>>>> 1st iterate: offline every non primary memory block.
>>>>> 2nd iterate: offline primary (i.e. first added) memory block.
>>>>>
>>>>> This idea is suggested by KOSAKI Motohiro.
>>>>>
>>>>> Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>
>>>>
>>>> Maybe there is something here that I am missing - I admit that I came
>>>> late to this one, but this really sounds like a very ugly hack, that
>>>> really has no place in here.
>>>>
>>>> Retrying, of course, may make sense, if we have reasonable belief that
>>>> we may now succeed. If this is the case, you need to document - in the
>>>> code - while is that.
>>>>
>>>> The memcg argument, however, doesn't really cut it. Why can't we make
>>>> all page_cgroup allocations local to the node they are describing? If
>>>> memcg is the culprit here, we should fix it, and not retry. If there is
>>>> still any benefit in retrying, then we retry being very specific
>>>> about why.
>>>
>>> We try to make all page_cgroup allocations local to the node they are
>>> describing
>>> now. If the memory is the first memory onlined in this node, we will
>>> allocate
>>> it from the other node.
>>>
>>> For example, node1 has 4 memory blocks: 8-11, and we online it from 8
>>> to 11
>>> 1. memory block 8, page_cgroup allocations are in the other nodes
>>> 2. memory block 9, page_cgroup allocations are in memory block 8
>>>
>>> So we should offline memory block 9 first. But we don't know in which
>>> order
>>> the user online the memory block.
>>>
>>> I think we can modify memcg like this:
>>> allocate the memory from the memory block they are describing
>>>
>>> I am not sure it is OK to do so.
>>
>> I don't see a reason why not.
>>
>> You would have to tweak a bit the lookup function for page_cgroup, but
>> assuming you will always have the pfns and limits, it should be easy
>> to do.
>>
>> I think the only tricky part is that today we have a single
>> node_page_cgroup, and we would of course have to have one per memory
>> block. My assumption is that the number of memory blocks is limited and
>> likely not very big. So even a static array would do.
>>
>
> About the idea "allocate the memory from the memory block they are
> describing",
>
> online_pages()
> |-->memory_notify(MEM_GOING_ONLINE, &arg) ----------- memory of this
> section is not in buddy yet.
> |-->page_cgroup_callback()
> |-->online_page_cgroup()
> |-->init_section_page_cgroup()
> |-->alloc_page_cgroup() --------- allocate page_cgroup from buddy system.
>
> When onlining pages, we allocate page_cgroup from buddy. And the being
> onlined pages are not in
> buddy yet. I think we can reserve some memory in the section for
> page_cgroup, and return all the
> rest to the buddy.
>
> But when the system is booting,
>
> start_kernel()
> |-->setup_arch()
> |-->mm_init()
> | |-->mem_init()
> | |-->numa_free_all_bootmem() -------------- all the pages are in buddy
> system.
> |-->page_cgroup_init()
> |-->init_section_page_cgroup()
> |-->alloc_page_cgroup() ------------------ I don't know how to reserve
> memory in each section.
>
> So any idea about how to deal with it when the system is booting please?
>

How about this way.

1) Add a new flag PAGE_CGROUP_INFO, like SECTION_INFO and MIX_SECTION_INFO.
2) In sparse_init(), reserve some beginning pages of each section as 
bootmem.
3) In register_page_bootmem_info_section(), set these pages as
      page->lru.next = PAGE_CGROUP_INFO;

Then these pages will not go to buddy system.

But I do worry about the fragment problem because part of each section will
be used in the very beginning.

Thanks. :)

>
> And one more question, a memory section is 128MB in Linux. If we reserve
> part of the them for page_cgroup,
> then anyone who wants to allocate a contiguous memory larger than 128MB,
> it will fail, right ?
> Is it OK ?
>
> Thanks. :)
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence
From: Tang Chen @ 2013-02-06 10:10 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, liuj97, len.brown, Miao Xie,
	Wen Congyang, cmetcalf, wujianguo, yinghai, KAMEZAWA Hiroyuki,
	laijs, linux-kernel, minchan.kim, akpm, linuxppc-dev
In-Reply-To: <51121FB7.1070205@cn.fujitsu.com>

On 02/06/2013 05:17 PM, Tang Chen wrote:
> Hi all,
>
> On 02/06/2013 11:07 AM, Tang Chen wrote:
>> Hi Glauber, all,
>>
>> An old thing I want to discuss with you. :)
>>
>> On 01/09/2013 11:09 PM, Glauber Costa wrote:
>>>>>> memory can't be offlined when CONFIG_MEMCG is selected.
>>>>>> For example: there is a memory device on node 1. The address range
>>>>>> is [1G, 1.5G). You will find 4 new directories memory8, memory9,
>>>>>> memory10,
>>>>>> and memory11 under the directory /sys/devices/system/memory/.
>>>>>>
>>>>>> If CONFIG_MEMCG is selected, we will allocate memory to store page
>>>>>> cgroup
>>>>>> when we online pages. When we online memory8, the memory stored
>>>>>> page cgroup
>>>>>> is not provided by this memory device. But when we online memory9,
>>>>>> the memory
>>>>>> stored page cgroup may be provided by memory8. So we can't offline
>>>>>> memory8
>>>>>> now. We should offline the memory in the reversed order.
>>>>>>
>>>>>> When the memory device is hotremoved, we will auto offline memory
>>>>>> provided
>>>>>> by this memory device. But we don't know which memory is onlined
>>>>>> first, so
>>>>>> offlining memory may fail. In such case, iterate twice to offline
>>>>>> the memory.
>>>>>> 1st iterate: offline every non primary memory block.
>>>>>> 2nd iterate: offline primary (i.e. first added) memory block.
>>>>>>
>>>>>> This idea is suggested by KOSAKI Motohiro.
>>>>>>
>>>>>> Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>
>>>>>
>>>>> Maybe there is something here that I am missing - I admit that I came
>>>>> late to this one, but this really sounds like a very ugly hack, that
>>>>> really has no place in here.
>>>>>
>>>>> Retrying, of course, may make sense, if we have reasonable belief that
>>>>> we may now succeed. If this is the case, you need to document - in the
>>>>> code - while is that.
>>>>>
>>>>> The memcg argument, however, doesn't really cut it. Why can't we make
>>>>> all page_cgroup allocations local to the node they are describing? If
>>>>> memcg is the culprit here, we should fix it, and not retry. If
>>>>> there is
>>>>> still any benefit in retrying, then we retry being very specific
>>>>> about why.
>>>>
>>>> We try to make all page_cgroup allocations local to the node they are
>>>> describing
>>>> now. If the memory is the first memory onlined in this node, we will
>>>> allocate
>>>> it from the other node.
>>>>
>>>> For example, node1 has 4 memory blocks: 8-11, and we online it from 8
>>>> to 11
>>>> 1. memory block 8, page_cgroup allocations are in the other nodes
>>>> 2. memory block 9, page_cgroup allocations are in memory block 8
>>>>
>>>> So we should offline memory block 9 first. But we don't know in which
>>>> order
>>>> the user online the memory block.
>>>>
>>>> I think we can modify memcg like this:
>>>> allocate the memory from the memory block they are describing
>>>>
>>>> I am not sure it is OK to do so.
>>>
>>> I don't see a reason why not.
>>>
>>> You would have to tweak a bit the lookup function for page_cgroup, but
>>> assuming you will always have the pfns and limits, it should be easy
>>> to do.
>>>
>>> I think the only tricky part is that today we have a single
>>> node_page_cgroup, and we would of course have to have one per memory
>>> block. My assumption is that the number of memory blocks is limited and
>>> likely not very big. So even a static array would do.
>>>
>>
>> About the idea "allocate the memory from the memory block they are
>> describing",
>>
>> online_pages()
>> |-->memory_notify(MEM_GOING_ONLINE, &arg) ----------- memory of this
>> section is not in buddy yet.
>> |-->page_cgroup_callback()
>> |-->online_page_cgroup()
>> |-->init_section_page_cgroup()
>> |-->alloc_page_cgroup() --------- allocate page_cgroup from buddy system.
>>
>> When onlining pages, we allocate page_cgroup from buddy. And the being
>> onlined pages are not in
>> buddy yet. I think we can reserve some memory in the section for
>> page_cgroup, and return all the
>> rest to the buddy.
>>
>> But when the system is booting,
>>
>> start_kernel()
>> |-->setup_arch()
>> |-->mm_init()
>> | |-->mem_init()
>> | |-->numa_free_all_bootmem() -------------- all the pages are in buddy
>> system.
>> |-->page_cgroup_init()
>> |-->init_section_page_cgroup()
>> |-->alloc_page_cgroup() ------------------ I don't know how to reserve
>> memory in each section.
>>
>> So any idea about how to deal with it when the system is booting please?
>>
>
> How about this way.
>
> 1) Add a new flag PAGE_CGROUP_INFO, like SECTION_INFO and MIX_SECTION_INFO.
> 2) In sparse_init(), reserve some beginning pages of each section as
> bootmem.

Hi all,

After digging into bootmem code, I met another problem.

memblock allocates memory from high address to low address, using 
memblock.current_limit
to remember where the upper limit is. What I am doing will produce a lot 
of fragments,
and the memory will be non-contiguous. So we need to modify memblock again.

I don't think it's a good idea. How do you think ?

Thanks. :)

> 3) In register_page_bootmem_info_section(), set these pages as
> page->lru.next = PAGE_CGROUP_INFO;
>
> Then these pages will not go to buddy system.
>
> But I do worry about the fragment problem because part of each section will
> be used in the very beginning.
>
> Thanks. :)
>
>>
>> And one more question, a memory section is 128MB in Linux. If we reserve
>> part of the them for page_cgroup,
>> then anyone who wants to allocate a contiguous memory larger than 128MB,
>> it will fail, right ?
>> Is it OK ?
>>
>> Thanks. :)
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence
From: Glauber Costa @ 2013-02-06 14:24 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, liuj97, len.brown, Miao Xie,
	Wen Congyang, cmetcalf, wujianguo, yinghai, KAMEZAWA Hiroyuki,
	laijs, linux-kernel, minchan.kim, akpm, linuxppc-dev
In-Reply-To: <51122C1D.5020002@cn.fujitsu.com>

On 02/06/2013 02:10 PM, Tang Chen wrote:
> On 02/06/2013 05:17 PM, Tang Chen wrote:
>> Hi all,
>>
>> On 02/06/2013 11:07 AM, Tang Chen wrote:
>>> Hi Glauber, all,
>>>
>>> An old thing I want to discuss with you. :)
>>>
>>> On 01/09/2013 11:09 PM, Glauber Costa wrote:
>>>>>>> memory can't be offlined when CONFIG_MEMCG is selected.
>>>>>>> For example: there is a memory device on node 1. The address range
>>>>>>> is [1G, 1.5G). You will find 4 new directories memory8, memory9,
>>>>>>> memory10,
>>>>>>> and memory11 under the directory /sys/devices/system/memory/.
>>>>>>>
>>>>>>> If CONFIG_MEMCG is selected, we will allocate memory to store page
>>>>>>> cgroup
>>>>>>> when we online pages. When we online memory8, the memory stored
>>>>>>> page cgroup
>>>>>>> is not provided by this memory device. But when we online memory9,
>>>>>>> the memory
>>>>>>> stored page cgroup may be provided by memory8. So we can't offline
>>>>>>> memory8
>>>>>>> now. We should offline the memory in the reversed order.
>>>>>>>
>>>>>>> When the memory device is hotremoved, we will auto offline memory
>>>>>>> provided
>>>>>>> by this memory device. But we don't know which memory is onlined
>>>>>>> first, so
>>>>>>> offlining memory may fail. In such case, iterate twice to offline
>>>>>>> the memory.
>>>>>>> 1st iterate: offline every non primary memory block.
>>>>>>> 2nd iterate: offline primary (i.e. first added) memory block.
>>>>>>>
>>>>>>> This idea is suggested by KOSAKI Motohiro.
>>>>>>>
>>>>>>> Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>
>>>>>>
>>>>>> Maybe there is something here that I am missing - I admit that I came
>>>>>> late to this one, but this really sounds like a very ugly hack, that
>>>>>> really has no place in here.
>>>>>>
>>>>>> Retrying, of course, may make sense, if we have reasonable belief
>>>>>> that
>>>>>> we may now succeed. If this is the case, you need to document - in
>>>>>> the
>>>>>> code - while is that.
>>>>>>
>>>>>> The memcg argument, however, doesn't really cut it. Why can't we make
>>>>>> all page_cgroup allocations local to the node they are describing? If
>>>>>> memcg is the culprit here, we should fix it, and not retry. If
>>>>>> there is
>>>>>> still any benefit in retrying, then we retry being very specific
>>>>>> about why.
>>>>>
>>>>> We try to make all page_cgroup allocations local to the node they are
>>>>> describing
>>>>> now. If the memory is the first memory onlined in this node, we will
>>>>> allocate
>>>>> it from the other node.
>>>>>
>>>>> For example, node1 has 4 memory blocks: 8-11, and we online it from 8
>>>>> to 11
>>>>> 1. memory block 8, page_cgroup allocations are in the other nodes
>>>>> 2. memory block 9, page_cgroup allocations are in memory block 8
>>>>>
>>>>> So we should offline memory block 9 first. But we don't know in which
>>>>> order
>>>>> the user online the memory block.
>>>>>
>>>>> I think we can modify memcg like this:
>>>>> allocate the memory from the memory block they are describing
>>>>>
>>>>> I am not sure it is OK to do so.
>>>>
>>>> I don't see a reason why not.
>>>>
>>>> You would have to tweak a bit the lookup function for page_cgroup, but
>>>> assuming you will always have the pfns and limits, it should be easy
>>>> to do.
>>>>
>>>> I think the only tricky part is that today we have a single
>>>> node_page_cgroup, and we would of course have to have one per memory
>>>> block. My assumption is that the number of memory blocks is limited and
>>>> likely not very big. So even a static array would do.
>>>>
>>>
>>> About the idea "allocate the memory from the memory block they are
>>> describing",
>>>
>>> online_pages()
>>> |-->memory_notify(MEM_GOING_ONLINE, &arg) ----------- memory of this
>>> section is not in buddy yet.
>>> |-->page_cgroup_callback()
>>> |-->online_page_cgroup()
>>> |-->init_section_page_cgroup()
>>> |-->alloc_page_cgroup() --------- allocate page_cgroup from buddy
>>> system.
>>>
>>> When onlining pages, we allocate page_cgroup from buddy. And the being
>>> onlined pages are not in
>>> buddy yet. I think we can reserve some memory in the section for
>>> page_cgroup, and return all the
>>> rest to the buddy.
>>>
>>> But when the system is booting,
>>>
>>> start_kernel()
>>> |-->setup_arch()
>>> |-->mm_init()
>>> | |-->mem_init()
>>> | |-->numa_free_all_bootmem() -------------- all the pages are in buddy
>>> system.
>>> |-->page_cgroup_init()
>>> |-->init_section_page_cgroup()
>>> |-->alloc_page_cgroup() ------------------ I don't know how to reserve
>>> memory in each section.
>>>
>>> So any idea about how to deal with it when the system is booting please?
>>>
>>
>> How about this way.
>>
>> 1) Add a new flag PAGE_CGROUP_INFO, like SECTION_INFO and
>> MIX_SECTION_INFO.
>> 2) In sparse_init(), reserve some beginning pages of each section as
>> bootmem.
> 
> Hi all,
> 
> After digging into bootmem code, I met another problem.
> 
> memblock allocates memory from high address to low address, using
> memblock.current_limit
> to remember where the upper limit is. What I am doing will produce a lot
> of fragments,
> and the memory will be non-contiguous. So we need to modify memblock again.
> 
> I don't think it's a good idea. How do you think ?
> 
> Thanks. :)
> 
>> 3) In register_page_bootmem_info_section(), set these pages as
>> page->lru.next = PAGE_CGROUP_INFO;
>>
>> Then these pages will not go to buddy system.
>>
>> But I do worry about the fragment problem because part of each section
>> will
>> be used in the very beginning.
>>
>> Thanks. :)
>>
>>>
>>> And one more question, a memory section is 128MB in Linux. If we reserve
>>> part of the them for page_cgroup,
>>> then anyone who wants to allocate a contiguous memory larger than 128MB,
>>> it will fail, right ?
>>> Is it OK ?
No, it is not.

Another take on this: Can't we free all the page_cgroup structure before
we actually start removing the sections ? If we do this, we would be
basically left with no problem at all, since when your code starts
running we would no longer have any page_cgroup allocated.

All you have to guarantee is that it happens after the memory block is
already isolated and allocations no longer can reach it.

What do you think ?

^ permalink raw reply

* ethtool occationally fails to communicate with with ucc_geth
From: Lennart Sorensen @ 2013-02-06 20:05 UTC (permalink / raw)
  To: Li Yang; +Cc: netdev, linuxppc-dev, linux-kernel, Len Sorensen

We are occationally seeing ethtool fail to communicate with ucc_geth.
I think I have tracked down why it happens, but I don't see a good way
to fix it.

When the phy state changes, adjust_link() checks if the state has changed
and if the link is up.  If it is it does:

                if (new_state) {
                        /*
                         * To change the MAC configuration we need to disable
                         * the controller. To do so, we have to either grab
                         * ugeth->lock, which is a bad idea since 'graceful
                         * stop' commands might take quite a while, or we can
                         * quiesce driver's activity.
                         */
                        ugeth_quiesce(ugeth);
                        ugeth_disable(ugeth, COMM_DIR_RX_AND_TX);

                        out_be32(&ug_regs->maccfg2, tempval);
                        out_be32(&uf_regs->upsmr, upsmr);

                        ugeth_enable(ugeth, COMM_DIR_RX_AND_TX);
                        ugeth_activate(ugeth);
                }

The problem I believe is that ugeth_quiesce() does netif_device_detach
which clears __LINK_STATE_PRESENT, and hence makes dev_ethtool fail
due to:

        if (!dev || !netif_device_present(dev))
                return -ENODEV;

So if ethtool happens to be run between ugeth_quiesce() and
ugeth_activate(), it fails as if the device simply doesn't exist, which
is of course not true, it's just temporarily disabled.

I don't see any obvious way to make the ethtool requests block while the
adjust_link does it's business.  It seems that that making the device
disappear is the wrong thing to do though.

I am able to make it happen if I do:

'while ethtool ifname; do :; done' while plugging and unplugging the
cable for a few minutes.

Any suggestions?

-- 
len Sorensen

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox