LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/5] IBM Akebono: Add support for a new PHY interface to the IBM emac driver
From: David Miller @ 2014-03-07 20:41 UTC (permalink / raw)
  To: alistair; +Cc: netdev, linuxppc-dev, linux-kernel, devicetree
In-Reply-To: <1394077948-8395-3-git-send-email-alistair@popple.id.au>

From: Alistair Popple <alistair@popple.id.au>
Date: Thu,  6 Mar 2014 14:52:25 +1100

> +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);

When an expression spans multiple lines, the lines should end with
operators rather than begin with them.

Also, it would be easier to read this call if parenthesis were put
around the third argument and then indented to match, f.e.:

	out_be32(dev->reg, (in_be32(dev->reg) | WKUP_ETH_RGMIIEN |
			    WKUP_ETH_TX_OE | WKUP_ETH_RX_IE));

^ permalink raw reply

* rfc: checkpatch logical line continuations (was IBM Akebono: Add support for a new PHY interface to the IBM emac driver)
From: Joe Perches @ 2014-03-07 21:02 UTC (permalink / raw)
  To: David Miller
  Cc: Randy Dunlap, devicetree, Dan Carpenter, alistair, linux-kernel,
	Josh Triplett, netdev, Andrew Morton, linuxppc-dev
In-Reply-To: <20140307.154142.488351276799532264.davem@davemloft.net>

(added some cc's)

On Fri, 2014-03-07 at 15:41 -0500, David Miller wrote:
> From: Alistair Popple <alistair@popple.id.au>
> Date: Thu,  6 Mar 2014 14:52:25 +1100
> 
> > +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> > +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);
> 
> When an expression spans multiple lines, the lines should end with
> operators rather than begin with them.

That's not in CodingStyle currently.

Right now, checkpatch emits a --strict only warning on "&&" or "||"
at the beginning of line but that could be changed to any "$Operators"

our $Arithmetic = qr{\+|-|\*|\/|%};
our $Operators	= qr{
			<=|>=|==|!=|
			=>|->|<<|>>|<|>|!|~|
			&&|\|\||,|\^|\+\+|--|&|\||$Arithmetic
		  }x;

The ones that likely have a too high false positive rates
are the negation "!" and bitwise "~".

Also, using perl, it's hard to distinguish between a
logical "&" and the address-of "&" as well as the
multiplication "*" and indirection "*" so maybe those
should be excluded too.

And I think it should only be added as a --strict test.

^ permalink raw reply

* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2014-03-07 21:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel list

Hi Linus !

Here are a couple of powerpc fixes for 3.14. One is (another !) nasty TM
problem, we can crash the kernel by forking inside a transaction. The
other one is a simple fix for an alignment issue which can hurt in LE
mode.

Cheers,
Ben.

The following changes since commit e0cf957614976896111e676e5134ac98ee227d3d:

  powerpc/powernv: Fix indirect XSCOM unmangling (2014-02-28 19:15:49 +1100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to a5b2cf5b1af424ee3dd9e3ce6d5cea18cb927e67:

  powerpc: Align p_dyn, p_rela and p_st symbols (2014-03-07 13:50:19 +1100)

----------------------------------------------------------------
Anton Blanchard (1):
      powerpc: Align p_dyn, p_rela and p_st symbols

Michael Neuling (1):
      powerpc/tm: Fix crash when forking inside a transaction

 arch/powerpc/kernel/process.c  | 9 +++++++++
 arch/powerpc/kernel/reloc_64.S | 1 +
 2 files changed, 10 insertions(+)

^ permalink raw reply

* Re: rfc: checkpatch logical line continuations
From: David Miller @ 2014-03-07 21:23 UTC (permalink / raw)
  To: joe
  Cc: randy.dunlap, devicetree, error27, alistair, linux-kernel, josh,
	netdev, akpm, linuxppc-dev
In-Reply-To: <1394226164.16156.96.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Fri, 07 Mar 2014 13:02:44 -0800

> Right now, checkpatch emits a --strict only warning on "&&" or "||"
> at the beginning of line but that could be changed to any "$Operators"
> 
> our $Arithmetic = qr{\+|-|\*|\/|%};
> our $Operators	= qr{
> 			<=|>=|==|!=|
> 			=>|->|<<|>>|<|>|!|~|
> 			&&|\|\||,|\^|\+\+|--|&|\||$Arithmetic
> 		  }x;
> 
> The ones that likely have a too high false positive rates
> are the negation "!" and bitwise "~".

Unary operators at the beginning of a line are perfectly fine,
it's the other ones that are the problem.

^ permalink raw reply

* Re: rfc: checkpatch logical line continuations (was IBM Akebono: Add support for a new PHY interface to the IBM emac driver)
From: Joe Perches @ 2014-03-07 21:45 UTC (permalink / raw)
  To: josh
  Cc: devicetree, Dan Carpenter, alistair, Randy Dunlap, linux-kernel,
	netdev, Andrew Morton, linuxppc-dev, David Miller
In-Reply-To: <20140307213017.GA18769@cloud>

On Fri, 2014-03-07 at 13:30 -0800, josh@joshtriplett.org wrote:
> On Fri, Mar 07, 2014 at 01:02:44PM -0800, Joe Perches wrote:
> > On Fri, 2014-03-07 at 15:41 -0500, David Miller wrote:
> > > From: Alistair Popple <alistair@popple.id.au>
> > > Date: Thu,  6 Mar 2014 14:52:25 +1100
> > > 
> > > > +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> > > > +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);
> > > 
> > > When an expression spans multiple lines, the lines should end with
> > > operators rather than begin with them.
> > 
> > That's not in CodingStyle currently.
> 
> It's also not even remotely consistent across existing kernel code, and
> it isn't obvious that there's a general developer consensus on the
> "right" way to write it.

I agree with that.  Stuff that's not in CodingStyle generally
doesn't have a developer consensus.

> > Right now, checkpatch emits a --strict only warning on "&&" or "||"
> > at the beginning of line but that could be changed to any "$Operators"
> > 
> > our $Arithmetic = qr{\+|-|\*|\/|%};
> > our $Operators	= qr{
> > 			<=|>=|==|!=|
> > 			=>|->|<<|>>|<|>|!|~|
> > 			&&|\|\||,|\^|\+\+|--|&|\||$Arithmetic
> > 		  }x;
> > 
> > The ones that likely have a too high false positive rates
> > are the negation "!" and bitwise "~".
> 
> I don't think warning about operators at start of line seems like a good
> idea at all.  There are plenty of cases where putting the operator at
> the start of the line will produce a better result.  (I'd actually
> suggest that in *most* cases.)
> 
> > Also, using perl, it's hard to distinguish between a
> > logical "&" and the address-of "&" as well as the
> > multiplication "*" and indirection "*" so maybe those
> > should be excluded too.
> > 
> > And I think it should only be added as a --strict test.
> 
> Agreed, if even that.

And probably made specific to net/ and drivers/net like
a few other comment style tests until such time as a
consensus exists.

^ permalink raw reply

* Re: rfc: checkpatch logical line continuations (was IBM Akebono: Add support for a new PHY interface to the IBM emac driver)
From: josh @ 2014-03-07 21:30 UTC (permalink / raw)
  To: Joe Perches
  Cc: Randy Dunlap, devicetree, Dan Carpenter, alistair, linux-kernel,
	netdev, Andrew Morton, linuxppc-dev, David Miller
In-Reply-To: <1394226164.16156.96.camel@joe-AO722>

On Fri, Mar 07, 2014 at 01:02:44PM -0800, Joe Perches wrote:
> On Fri, 2014-03-07 at 15:41 -0500, David Miller wrote:
> > From: Alistair Popple <alistair@popple.id.au>
> > Date: Thu,  6 Mar 2014 14:52:25 +1100
> > 
> > > +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> > > +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);
> > 
> > When an expression spans multiple lines, the lines should end with
> > operators rather than begin with them.
> 
> That's not in CodingStyle currently.

It's also not even remotely consistent across existing kernel code, and
it isn't obvious that there's a general developer consensus on the
"right" way to write it.

> Right now, checkpatch emits a --strict only warning on "&&" or "||"
> at the beginning of line but that could be changed to any "$Operators"
> 
> our $Arithmetic = qr{\+|-|\*|\/|%};
> our $Operators	= qr{
> 			<=|>=|==|!=|
> 			=>|->|<<|>>|<|>|!|~|
> 			&&|\|\||,|\^|\+\+|--|&|\||$Arithmetic
> 		  }x;
> 
> The ones that likely have a too high false positive rates
> are the negation "!" and bitwise "~".

I don't think warning about operators at start of line seems like a good
idea at all.  There are plenty of cases where putting the operator at
the start of the line will produce a better result.  (I'd actually
suggest that in *most* cases.)

> Also, using perl, it's hard to distinguish between a
> logical "&" and the address-of "&" as well as the
> multiplication "*" and indirection "*" so maybe those
> should be excluded too.
> 
> And I think it should only be added as a --strict test.

Agreed, if even that.

- Josh Triplett

^ permalink raw reply

* Re: rfc: checkpatch logical line continuations (was IBM Akebono: Add support for a new PHY interface to the IBM emac driver)
From: Dan Carpenter @ 2014-03-07 23:04 UTC (permalink / raw)
  To: josh
  Cc: Randy Dunlap, devicetree, Dan Carpenter, alistair, linux-kernel,
	netdev, Joe Perches, Andrew Morton, linuxppc-dev, David Miller
In-Reply-To: <20140307213017.GA18769@cloud>

On Fri, Mar 07, 2014 at 01:30:17PM -0800, josh@joshtriplett.org wrote:
> On Fri, Mar 07, 2014 at 01:02:44PM -0800, Joe Perches wrote:
> > On Fri, 2014-03-07 at 15:41 -0500, David Miller wrote:
> > > From: Alistair Popple <alistair@popple.id.au>
> > > Date: Thu,  6 Mar 2014 14:52:25 +1100
> > > 
> > > > +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> > > > +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);
> > > 
> > > When an expression spans multiple lines, the lines should end with
> > > operators rather than begin with them.
> > 
> > That's not in CodingStyle currently.
> 
> It's also not even remotely consistent across existing kernel code, and
> it isn't obvious that there's a general developer consensus on the
> "right" way to write it.
> 

We just had this discussion in staging and Greg modified the patch to
put the operator at the end.

https://lkml.org/lkml/2014/2/25/125

It's like logical && and || operators which go at the end these days.
I don't really want to have a lot of checkpatch churn to convert
everything...

regards,
dan carpenter

^ permalink raw reply

* Re: rfc: checkpatch logical line continuations (was IBM Akebono: Add support for a new PHY interface to the IBM emac driver)
From: Joe Perches @ 2014-03-07 23:15 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Randy Dunlap, devicetree, Dan Carpenter, alistair, josh,
	linux-kernel, netdev, Andrew Morton, linuxppc-dev, David Miller
In-Reply-To: <20140307230420.GH29018@mwanda>

On Sat, 2014-03-08 at 02:04 +0300, Dan Carpenter wrote:
> On Fri, Mar 07, 2014 at 01:30:17PM -0800, josh@joshtriplett.org wrote:
> > On Fri, Mar 07, 2014 at 01:02:44PM -0800, Joe Perches wrote:
> > > On Fri, 2014-03-07 at 15:41 -0500, David Miller wrote:
> > > > From: Alistair Popple <alistair@popple.id.au>
> > > > Date: Thu,  6 Mar 2014 14:52:25 +1100
> > > > 
> > > > > +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> > > > > +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);
> > > > 
> > > > When an expression spans multiple lines, the lines should end with
> > > > operators rather than begin with them.
> > > 
> > > That's not in CodingStyle currently.
> > 
> > It's also not even remotely consistent across existing kernel code, and
> > it isn't obvious that there's a general developer consensus on the
> > "right" way to write it.
> > 
> 
> We just had this discussion in staging and Greg modified the patch to
> put the operator at the end.
> 
> https://lkml.org/lkml/2014/2/25/125

I remember and it's the reason I bring it up in a
more public way.

> It's like logical && and || operators which go at the end these days.
> I don't really want to have a lot of checkpatch churn to convert
> everything...

Nor I really.  I simply would like a tool that lets
more core maintainers like David M avoid sending out
"do this, not that" type emails about patches.

I don't mind adding style checking that emits something
for patches and is quieter when scanning files.

^ permalink raw reply

* [PATCH] eeh_pseries: Missing break?
From: Joe Perches @ 2014-03-08  0:31 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev, linux-kernel

Looks like this is unintentional as the
result = EEH_STATE_UNAVAILABLE is being
overwritten by EEH_STATE_NOT_SUPPORT in the
fallthrough to the default case.
---
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 8a8f047..83da53f 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -460,14 +460,15 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
 		case 5:
 			if (rets[2]) {
 				if (state) *state = rets[2];
 				result = EEH_STATE_UNAVAILABLE;
 			} else {
 				result = EEH_STATE_NOT_SUPPORT;
 			}
+			break;
 		default:
 			result = EEH_STATE_NOT_SUPPORT;
 		}
 	} else {
 		result = EEH_STATE_NOT_SUPPORT;
 	}
 

^ permalink raw reply related

* Re: [PATCH] eeh_pseries: Missing break?
From: Gavin Shan @ 2014-03-08 16:16 UTC (permalink / raw)
  To: Joe Perches; +Cc: linuxppc-dev, Gavin Shan, linux-kernel
In-Reply-To: <1394238692.16156.115.camel@joe-AO722>

On Fri, Mar 07, 2014 at 04:31:32PM -0800, Joe Perches wrote:
>Looks like this is unintentional as the
>result = EEH_STATE_UNAVAILABLE is being
>overwritten by EEH_STATE_NOT_SUPPORT in the
>fallthrough to the default case.

Thanks, Joe. It wasn't unintentional. Could you have better commit log
and subject, then repost it?

The format looks like:

---

powerpc/eeh: Fix overwritten PE state

In pseries_eeh_get_state(), we always have EEH_STATE_UNAVAILABLE
overwritten by EEH_STATE_NOT_SUPPORT because of the missed "break"
the patch fixes the issue.

Signed-off-by: Joe Perches <joe@perches.com>

---

With the better commit log/subject, please have:

Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>

>---
>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>index 8a8f047..83da53f 100644
>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>@@ -460,14 +460,15 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
> 		case 5:
> 			if (rets[2]) {
> 				if (state) *state = rets[2];
> 				result = EEH_STATE_UNAVAILABLE;
> 			} else {
> 				result = EEH_STATE_NOT_SUPPORT;
> 			}
>+			break;
> 		default:
> 			result = EEH_STATE_NOT_SUPPORT;
> 		}
> 	} else {
> 		result = EEH_STATE_NOT_SUPPORT;
> 	}
>

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH] eeh_pseries: Missing break?
From: Joe Perches @ 2014-03-08 16:26 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20140308161647.GA24296@shangw.(null)>

On Sun, 2014-03-09 at 00:16 +0800, Gavin Shan wrote:
> On Fri, Mar 07, 2014 at 04:31:32PM -0800, Joe Perches wrote:
> >Looks like this is unintentional as the
> >result = EEH_STATE_UNAVAILABLE is being
> >overwritten by EEH_STATE_NOT_SUPPORT in the
> >fallthrough to the default case.
> 
> Thanks, Joe. It wasn't unintentional.

Hi Gavin.

English usages of "double negatives" are different
than other languages.  "it wasn't unintentional"
means the same thing as "it was intentional".

> Could you have better commit log
> and subject, then repost it?
> 
> The format looks like:
> 
> ---
> 
> powerpc/eeh: Fix overwritten PE state
> 
> In pseries_eeh_get_state(), we always have EEH_STATE_UNAVAILABLE
> overwritten by EEH_STATE_NOT_SUPPORT because of the missed "break"
> the patch fixes the issue.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

>From my perspective, you should write up a commit
message of your own choice (I wouldn't use "we",
but the rest seems OK) and add a Reported-by:

All I did was notice it and bring it to your
attention.

> ---
> 
> With the better commit log/subject, please have:
> 
> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> 
> >---
> >diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >index 8a8f047..83da53f 100644
> >--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> >+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >@@ -460,14 +460,15 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
> > 		case 5:
> > 			if (rets[2]) {
> > 				if (state) *state = rets[2];
> > 				result = EEH_STATE_UNAVAILABLE;
> > 			} else {
> > 				result = EEH_STATE_NOT_SUPPORT;
> > 			}
> >+			break;
> > 		default:
> > 			result = EEH_STATE_NOT_SUPPORT;
> > 		}
> > 	} else {
> > 		result = EEH_STATE_NOT_SUPPORT;
> > 	}
> >
> 
> Thanks,
> Gavin
> 

^ permalink raw reply

* Re: [PATCH] eeh_pseries: Missing break?
From: Gavin Shan @ 2014-03-08 16:37 UTC (permalink / raw)
  To: Joe Perches; +Cc: linuxppc-dev, Gavin Shan, linux-kernel
In-Reply-To: <1394296003.6972.26.camel@joe-AO722>

On Sat, Mar 08, 2014 at 08:26:43AM -0800, Joe Perches wrote:
>On Sun, 2014-03-09 at 00:16 +0800, Gavin Shan wrote:
>> On Fri, Mar 07, 2014 at 04:31:32PM -0800, Joe Perches wrote:

.../...

>English usages of "double negatives" are different
>than other languages.  "it wasn't unintentional"
>means the same thing as "it was intentional".
>

Sorry, typo :)

>> Could you have better commit log
>> and subject, then repost it?
>> 

.../...

>From my perspective, you should write up a commit
>message of your own choice (I wouldn't use "we",
>but the rest seems OK) and add a Reported-by:
>
>All I did was notice it and bring it to your
>attention.
>

Ok. I will post it. Thanks!

Thanks,
Gavin


>> >---
>> >diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> >index 8a8f047..83da53f 100644
>> >--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>> >+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> >@@ -460,14 +460,15 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>> > 		case 5:
>> > 			if (rets[2]) {
>> > 				if (state) *state = rets[2];
>> > 				result = EEH_STATE_UNAVAILABLE;
>> > 			} else {
>> > 				result = EEH_STATE_NOT_SUPPORT;
>> > 			}
>> >+			break;
>> > 		default:
>> > 			result = EEH_STATE_NOT_SUPPORT;
>> > 		}
>> > 	} else {
>> > 		result = EEH_STATE_NOT_SUPPORT;
>> > 	}
>> >
>> 
>> Thanks,
>> Gavin
>> 
>
>
>

^ permalink raw reply

* RE: rfc: checkpatch logical line continuations (was IBM Akebono: Add support for a new PHY interface to the IBM emac driver)
From: David Laight @ 2014-03-10  9:53 UTC (permalink / raw)
  To: 'josh@joshtriplett.org', Joe Perches
  Cc: Randy Dunlap, devicetree@vger.kernel.org, Dan Carpenter,
	alistair@popple.id.au, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org, David Miller
In-Reply-To: <20140307213017.GA18769@cloud>

From: josh@joshtriplett.org
> On Fri, Mar 07, 2014 at 01:02:44PM -0800, Joe Perches wrote:
> > On Fri, 2014-03-07 at 15:41 -0500, David Miller wrote:
> > > From: Alistair Popple <alistair@popple.id.au>
> > > Date: Thu,  6 Mar 2014 14:52:25 +1100
> > >
> > > > +	out_be32(dev->reg, in_be32(dev->reg) | WKUP_ETH_RGMIIEN
> > > > +		 | WKUP_ETH_TX_OE | WKUP_ETH_RX_IE);
> > >
> > > When an expression spans multiple lines, the lines should end with
> > > operators rather than begin with them.
> >
> > That's not in CodingStyle currently.
>=20
> It's also not even remotely consistent across existing kernel code, and
> it isn't obvious that there's a general developer consensus on the
> "right" way to write it.

My personal preference (which counts for nothing here) is to put
the operators at the start of the continuation like in order to
make it more obvious that it is a continuation.

The netdev rules are particularly problematical for code like:
        if (tst(foo, foo2, foo3, ...) && ....... &&
                tst2(......) && tst3()) {
                baz(....);
where a scan read of the LHS gives the wrong logic.

At least we don't have a coding style that allows very long lnes
an puts } and { on their own lines - leading to:
                ...
        }
        while (foo(...) && bar(...) && ..... /* very long line falls off sc=
reen */
        {
                int x;
Is that the top or bottom of a loop?

	David

^ permalink raw reply

* Re: [PATCH 1/2] Revert "KVM: PPC: Book3S HV: Add new state for transactional memory"
From: Paul Mackerras @ 2014-03-10 10:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Paolo Bonzini, Scott Wood
  Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1394102170-22126-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Thu, Mar 06, 2014 at 04:06:09PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This reverts commit 7b490411c37f7ab7965cbdfe5e3ec28eadb6db5b which cause
> the below crash in the host.

OK, I understand now what happened, which is this: when I sent out
that patch, I inadvertently included a hunk of extra code as a result
of not cleaning up a rebase properly.  The next patch in the series
removed the extraneous hunk, but Alex didn't apply the next patch.

We can either do this revert, or apply a patch removing the extra
hunk, but one or the other should go in for 3.14 since it's quite
broken as it is (that is, HV-mode KVM on powerpc is broken).

Paolo, do you have a preference about revert vs. fix?  Are you happy
to take what Aneesh sent (in which case please add my acked-by and
perhaps edit the commentary to say how the problem arose), or do you
want a freshly-prepared patch, and if so against which branch?

Thanks,
Paul.

^ permalink raw reply

* Re: [PATCH 1/2] Revert "KVM: PPC: Book3S HV: Add new state for transactional memory"
From: Paolo Bonzini @ 2014-03-10 10:51 UTC (permalink / raw)
  To: Paul Mackerras, Aneesh Kumar K.V, Scott Wood
  Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <20140310105028.GA5934@iris.ozlabs.ibm.com>

Il 10/03/2014 11:50, Paul Mackerras ha scritto:
> We can either do this revert, or apply a patch removing the extra
> hunk, but one or the other should go in for 3.14 since it's quite
> broken as it is (that is, HV-mode KVM on powerpc is broken).
>
> Paolo, do you have a preference about revert vs. fix?  Are you happy
> to take what Aneesh sent (in which case please add my acked-by and
> perhaps edit the commentary to say how the problem arose), or do you
> want a freshly-prepared patch, and if so against which branch?

I prefer a fix.

Paolo

^ permalink raw reply

* [PATCH v2 0/6] powernv:cpufreq: Dynamic cpu-frequency scaling
From: Gautham R. Shenoy @ 2014-03-10 11:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srivatsa.bhat, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Hi,

This is the v2 of the consolidated patchset consisting
patches for enabling cpufreq on IBM POWERNV platforms
along with some enhancements. 

The v1 of these patches have been previously 
submitted on linuxppc-dev [1][2].

- This patchset contains code for the platform driver to support CPU
  frequency scaling on IBM POWERNV platforms.

- In addition to the standard control and status files exposed by the
  cpufreq core, the patchset exposes the nominal frequency through the
  file named "cpuinfo_nominal_freq".

The patchset is based against commit c3bebc71c4bcdafa24b506adf0c1de3c1f77e2e0
of the mainline tree.

[1]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-February/115244.html
[2]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-March/115703.html

Gautham R. Shenoy (3):
  powernv:cpufreq: Create pstate_id_to_freq() helper
  powernv:cpufreq: Export nominal frequency via sysfs.
  powernv:cpufreq: Implement the driver->get() method

Srivatsa S. Bhat (2):
  powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.
  powernv,cpufreq:Add per-core locking to serialize frequency
    transitions

Vaidyanathan Srinivasan (1):
  powernv: cpufreq driver for powernv platform

 arch/powerpc/include/asm/reg.h         |   4 +
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 drivers/cpufreq/Kconfig                |   1 +
 drivers/cpufreq/Kconfig.powerpc        |  13 ++
 drivers/cpufreq/Makefile               |   1 +
 drivers/cpufreq/powernv-cpufreq.c      | 397 +++++++++++++++++++++++++++++++++
 6 files changed, 417 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v2 1/6] powernv: cpufreq driver for powernv platform
From: Gautham R. Shenoy @ 2014-03-10 11:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anton Blanchard, srivatsa.bhat, Gautham R. Shenoy
In-Reply-To: <1394449861-8688-1-git-send-email-ego@linux.vnet.ibm.com>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/reg.h         |   4 +
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 drivers/cpufreq/Kconfig                |   1 +
 drivers/cpufreq/Kconfig.powerpc        |  13 ++
 drivers/cpufreq/Makefile               |   1 +
 drivers/cpufreq/powernv-cpufreq.c      | 277 +++++++++++++++++++++++++++++++++
 6 files changed, 297 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 90c06ec..84f92ca 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -271,6 +271,10 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_IC		0x350	/* Virtual Instruction Count */
 #define SPRN_VTB	0x351	/* Virtual Time Base */
+#define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
+#define SPRN_PMSR	0x355   /* Power Management Status Reg */
+#define SPRN_PMCR	0x374	/* Power Management Control Register */
+
 /* HFSCR and FSCR bit numbers are the same */
 #define FSCR_TAR_LG	8	/* Enable Target Address Register */
 #define FSCR_EBB_LG	7	/* Enable Event Based Branching */
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..1fe12b1 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,7 @@ config PPC_POWERNV
 	select PPC_UDBG_16550
 	select PPC_SCOM
 	select ARCH_RANDOM
+	select CPU_FREQ
 	default y
 
 config PPC_POWERNV_RTAS
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 4b029c0..4ba1632 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -48,6 +48,7 @@ config CPU_FREQ_STAT_DETAILS
 choice
 	prompt "Default CPUFreq governor"
 	default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ
+	default CPU_FREQ_DEFAULT_GOV_ONDEMAND if POWERNV_CPUFREQ
 	default CPU_FREQ_DEFAULT_GOV_PERFORMANCE
 	help
 	  This option sets which CPUFreq governor shall be loaded at
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
index ca0021a..93f8689 100644
--- a/drivers/cpufreq/Kconfig.powerpc
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -54,3 +54,16 @@ config PPC_PASEMI_CPUFREQ
 	help
 	  This adds the support for frequency switching on PA Semi
 	  PWRficient processors.
+
+config POWERNV_CPUFREQ
+       tristate "CPU frequency scaling for IBM POWERNV platform"
+       depends on PPC_POWERNV
+       select CPU_FREQ_GOV_PERFORMANCE
+       select CPU_FREQ_GOV_POWERSAVE
+       select CPU_FREQ_GOV_USERSPACE
+       select CPU_FREQ_GOV_ONDEMAND
+       select CPU_FREQ_GOV_CONSERVATIVE
+       default y
+       help
+	 This adds support for CPU frequency switching on IBM POWERNV
+	 platform
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 7494565..0dbb963 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_PPC_CORENET_CPUFREQ)   += ppc-corenet-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC)		+= pmac32-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC64)		+= pmac64-cpufreq.o
 obj-$(CONFIG_PPC_PASEMI_CPUFREQ)	+= pasemi-cpufreq.o
+obj-$(CONFIG_POWERNV_CPUFREQ)		+= powernv-cpufreq.o
 
 ##################################################################################
 # Other platform drivers
diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
new file mode 100644
index 0000000..ab1551f
--- /dev/null
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -0,0 +1,277 @@
+/*
+ * POWERNV cpufreq driver for the IBM POWER processors
+ *
+ * (C) Copyright IBM 2014
+ *
+ * Author: Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt)	"powernv-cpufreq: " fmt
+
+#include <linux/module.h>
+#include <linux/cpufreq.h>
+#include <linux/of.h>
+#include <asm/cputhreads.h>
+
+/* FIXME: Make this per-core */
+static DEFINE_MUTEX(freq_switch_mutex);
+
+#define POWERNV_MAX_PSTATES	256
+
+static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
+static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
+
+/*
+ * Initialize the freq table based on data obtained
+ * from the firmware passed via device-tree
+ */
+
+static int init_powernv_pstates(void)
+{
+	struct device_node *power_mgt;
+	int nr_pstates = 0;
+	int pstate_min, pstate_max, pstate_nominal;
+	const __be32 *pstate_ids, *pstate_freqs;
+	int i;
+	u32 len_ids, len_freqs;
+
+	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+	if (!power_mgt) {
+		pr_warn("power-mgt node not found\n");
+		return -ENODEV;
+	}
+
+	if (of_property_read_u32(power_mgt, "ibm,pstate-min", &pstate_min)) {
+		pr_warn("ibm,pstate-min node not found\n");
+		return -ENODEV;
+	}
+
+	if (of_property_read_u32(power_mgt, "ibm,pstate-max", &pstate_max)) {
+		pr_warn("ibm,pstate-max node not found\n");
+		return -ENODEV;
+	}
+
+	if (of_property_read_u32(power_mgt, "ibm,pstate-nominal",
+				 &pstate_nominal)) {
+		pr_warn("ibm,pstate-nominal not found\n");
+		return -ENODEV;
+	}
+	pr_info("cpufreq pstate min %d nominal %d max %d\n", pstate_min,
+		pstate_nominal, pstate_max);
+
+	pstate_ids = of_get_property(power_mgt, "ibm,pstate-ids", &len_ids);
+	if (!pstate_ids) {
+		pr_warn("ibm,pstate-ids not found\n");
+		return -ENODEV;
+	}
+
+	pstate_freqs = of_get_property(power_mgt, "ibm,pstate-frequencies-mhz",
+				      &len_freqs);
+	if (!pstate_freqs) {
+		pr_warn("ibm,pstate-frequencies-mhz not found\n");
+		return -ENODEV;
+	}
+
+	WARN_ON(len_ids != len_freqs);
+	nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
+	WARN_ON(!nr_pstates);
+
+	pr_debug("NR PStates %d\n", nr_pstates);
+	for (i = 0; i < nr_pstates; i++) {
+		u32 id = be32_to_cpu(pstate_ids[i]);
+		u32 freq = be32_to_cpu(pstate_freqs[i]);
+
+		pr_debug("PState id %d freq %d MHz\n", id, freq);
+		powernv_freqs[i].driver_data = i;
+		powernv_freqs[i].frequency = freq * 1000; /* kHz */
+		powernv_pstate_ids[i] = id;
+	}
+	/* End of list marker entry */
+	powernv_freqs[i].driver_data = 0;
+	powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
+
+	/* Print frequency table */
+	for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
+		pr_debug("%d: %d\n", i, powernv_freqs[i].frequency);
+
+	return 0;
+}
+
+static struct freq_attr *powernv_cpu_freq_attr[] = {
+	&cpufreq_freq_attr_scaling_available_freqs,
+	NULL,
+};
+
+/* Helper routines */
+
+/* Access helpers to power mgt SPR */
+
+static inline unsigned long get_pmspr(unsigned long sprn)
+{
+	switch (sprn) {
+	case SPRN_PMCR:
+		return mfspr(SPRN_PMCR);
+
+	case SPRN_PMICR:
+		return mfspr(SPRN_PMICR);
+
+	case SPRN_PMSR:
+		return mfspr(SPRN_PMSR);
+	}
+	BUG();
+}
+
+static inline void set_pmspr(unsigned long sprn, unsigned long val)
+{
+	switch (sprn) {
+	case SPRN_PMCR:
+		mtspr(SPRN_PMCR, val);
+		return;
+
+	case SPRN_PMICR:
+		mtspr(SPRN_PMICR, val);
+		return;
+
+	case SPRN_PMSR:
+		mtspr(SPRN_PMSR, val);
+		return;
+	}
+	BUG();
+}
+
+static void set_pstate(void *pstate)
+{
+	unsigned long val;
+	unsigned long pstate_ul = *(unsigned long *) pstate;
+
+	val = get_pmspr(SPRN_PMCR);
+	val = val & 0x0000ffffffffffffULL;
+	/* Set both global(bits 56..63) and local(bits 48..55) PStates */
+	val = val | (pstate_ul << 56) | (pstate_ul << 48);
+	pr_debug("Setting cpu %d pmcr to %016lX\n", smp_processor_id(), val);
+	set_pmspr(SPRN_PMCR, val);
+}
+
+static int powernv_set_freq(cpumask_var_t cpus, unsigned int new_index)
+{
+	unsigned long val = (unsigned long) powernv_pstate_ids[new_index];
+
+	/*
+	 * Use smp_call_function to send IPI and execute the
+	 * mtspr on target cpu.  We could do that without IPI
+	 * if current CPU is within policy->cpus (core)
+	 */
+
+	val = val & 0xFF;
+	smp_call_function_any(cpus, set_pstate, &val, 1);
+	return 0;
+}
+
+static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+	int base, i;
+
+#ifdef CONFIG_SMP
+	base = cpu_first_thread_sibling(policy->cpu);
+
+	for (i = 0; i < threads_per_core; i++)
+		cpumask_set_cpu(base + i, policy->cpus);
+#endif
+	policy->cpuinfo.transition_latency = 25000;
+
+	policy->cur = powernv_freqs[0].frequency;
+	cpufreq_frequency_table_get_attr(powernv_freqs, policy->cpu);
+	return cpufreq_frequency_table_cpuinfo(policy, powernv_freqs);
+}
+
+static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+	cpufreq_frequency_table_put_attr(policy->cpu);
+	return 0;
+}
+
+static int powernv_cpufreq_verify(struct cpufreq_policy *policy)
+{
+	return cpufreq_frequency_table_verify(policy, powernv_freqs);
+}
+
+static int powernv_cpufreq_target(struct cpufreq_policy *policy,
+			      unsigned int target_freq,
+			      unsigned int relation)
+{
+	int rc;
+	struct cpufreq_freqs freqs;
+	unsigned int new_index;
+
+	cpufreq_frequency_table_target(policy, powernv_freqs, target_freq,
+				       relation, &new_index);
+
+	freqs.old = policy->cur;
+	freqs.new = powernv_freqs[new_index].frequency;
+	freqs.cpu = policy->cpu;
+
+	mutex_lock(&freq_switch_mutex);
+	cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
+
+	pr_debug("setting frequency for cpu %d to %d kHz index %d pstate %d",
+		 policy->cpu,
+		 powernv_freqs[new_index].frequency,
+		 new_index,
+		 powernv_pstate_ids[new_index]);
+
+	rc = powernv_set_freq(policy->cpus, new_index);
+
+	cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
+	mutex_unlock(&freq_switch_mutex);
+
+	return rc;
+}
+
+static struct cpufreq_driver powernv_cpufreq_driver = {
+	.verify		= powernv_cpufreq_verify,
+	.target		= powernv_cpufreq_target,
+	.init		= powernv_cpufreq_cpu_init,
+	.exit		= powernv_cpufreq_cpu_exit,
+	.name		= "powernv-cpufreq",
+	.flags		= CPUFREQ_CONST_LOOPS,
+	.attr		= powernv_cpu_freq_attr,
+};
+
+static int __init powernv_cpufreq_init(void)
+{
+	int rc = 0;
+
+	/* Discover pstates from device tree and init */
+
+	rc = init_powernv_pstates();
+
+	if (rc) {
+		pr_info("powernv-cpufreq disabled\n");
+		return rc;
+	}
+
+	rc = cpufreq_register_driver(&powernv_cpufreq_driver);
+	return rc;
+}
+
+static void __exit powernv_cpufreq_exit(void)
+{
+	cpufreq_unregister_driver(&powernv_cpufreq_driver);
+}
+
+module_init(powernv_cpufreq_init);
+module_exit(powernv_cpufreq_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>");
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 5/6] powernv:cpufreq: Export nominal frequency via sysfs.
From: Gautham R. Shenoy @ 2014-03-10 11:11 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srivatsa.bhat, Gautham R. Shenoy
In-Reply-To: <1394449861-8688-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Create a driver attribute named cpuinfo_nominal_freq which
creates a sysfs read-only file named cpuinfo_nominal_freq. Export
the frequency corresponding to the nominal_pstate through this
interface.

Nominal frequency is the highest non-turbo frequency for the
platform.  This is generally used for setting governor policies from
user space for optimal energy efficiency.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 0ecd163..183bbc4 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -142,8 +142,30 @@ static unsigned int pstate_id_to_freq(int pstate_id)
 	return powernv_freqs[i].frequency;
 }
 
+/**
+ * show_cpuinfo_nominal_freq - Show the nominal CPU frequency as indicated by
+ * the firmware
+ */
+static ssize_t show_cpuinfo_nominal_freq(struct cpufreq_policy *policy,
+					char *buf)
+{
+	int nominal_freq;
+	nominal_freq = pstate_id_to_freq(powernv_pstate_info.pstate_nominal_id);
+	return sprintf(buf, "%u\n", nominal_freq);
+}
+
+
+struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = {
+	.attr = { .name = "cpuinfo_nominal_freq",
+		  .mode = 0444,
+		},
+	.show = show_cpuinfo_nominal_freq,
+};
+
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
 	&cpufreq_freq_attr_scaling_available_freqs,
+	&cpufreq_freq_attr_cpuinfo_nominal_freq,
 	NULL,
 };
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 3/6] powernv, cpufreq:Add per-core locking to serialize frequency transitions
From: Gautham R. Shenoy @ 2014-03-10 11:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srivatsa.bhat, Gautham R. Shenoy
In-Reply-To: <1394449861-8688-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>

On POWER systems, the CPU frequency is controlled at a core-level and
hence we need to serialize so that only one of the threads in the core
switches the core's frequency at a time.

Using a global mutex lock would needlessly serialize _all_ frequency
transitions in the system (across all cores). So introduce per-core
locking to enable finer-grained synchronization and thereby enhance
the speed and responsiveness of the cpufreq driver to varying workload
demands.

The design of per-core locking is very simple and straight-forward: we
first define a Per-CPU lock and use the ones that belongs to the first
thread sibling of the core.

cpu_first_thread_sibling() macro is used to find the *common* lock for
all thread siblings belonging to a core.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 4cad727..4c2e8ca 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -24,8 +24,15 @@
 #include <linux/of.h>
 #include <asm/cputhreads.h>
 
-/* FIXME: Make this per-core */
-static DEFINE_MUTEX(freq_switch_mutex);
+/* Per-Core locking for frequency transitions */
+static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
+
+#define lock_core_freq(cpu)				\
+			mutex_lock(&per_cpu(freq_switch_lock,\
+				cpu_first_thread_sibling(cpu)));
+#define unlock_core_freq(cpu)				\
+			mutex_unlock(&per_cpu(freq_switch_lock,\
+				cpu_first_thread_sibling(cpu)));
 
 #define POWERNV_MAX_PSTATES	256
 
@@ -233,7 +240,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy *policy,
 	freqs.new = powernv_freqs[new_index].frequency;
 	freqs.cpu = policy->cpu;
 
-	mutex_lock(&freq_switch_mutex);
+	lock_core_freq(policy->cpu);
 	cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
 
 	pr_debug("setting frequency for cpu %d to %d kHz index %d pstate %d",
@@ -245,7 +252,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy *policy,
 	rc = powernv_set_freq(policy->cpus, new_index);
 
 	cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
-	mutex_unlock(&freq_switch_mutex);
+	unlock_core_freq(policy->cpu);
 
 	return rc;
 }
@@ -262,7 +269,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 
 static int __init powernv_cpufreq_init(void)
 {
-	int rc = 0;
+	int cpu, rc = 0;
 
 	/* Discover pstates from device tree and init */
 
@@ -272,6 +279,10 @@ static int __init powernv_cpufreq_init(void)
 		pr_info("powernv-cpufreq disabled\n");
 		return rc;
 	}
+	/* Init per-core mutex */
+	for_each_possible_cpu(cpu) {
+		mutex_init(&per_cpu(freq_switch_lock, cpu));
+	}
 
 	rc = cpufreq_register_driver(&powernv_cpufreq_driver);
 	return rc;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 2/6] powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.
From: Gautham R. Shenoy @ 2014-03-10 11:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srivatsa.bhat, Gautham R. Shenoy
In-Reply-To: <1394449861-8688-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>

Create a helper method that computes the cpumask corresponding to the
thread-siblings of a cpu. Use this for initializing the policy->cpus
mask for a given cpu.

(Original code written by Srivatsa S. Bhat. Gautham moved this to a
helper function!)

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index ab1551f..4cad727 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -115,6 +115,23 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
 
 /* Helper routines */
 
+/**
+ * Sets the bits corresponding to the thread-siblings of cpu in its core
+ * in 'cpus'.
+ */
+static void powernv_cpu_to_core_mask(unsigned int cpu, cpumask_var_t cpus)
+{
+	int base, i;
+
+	base = cpu_first_thread_sibling(cpu);
+
+	for (i = 0; i < threads_per_core; i++) {
+		cpumask_set_cpu(base + i, cpus);
+	}
+
+	return;
+}
+
 /* Access helpers to power mgt SPR */
 
 static inline unsigned long get_pmspr(unsigned long sprn)
@@ -180,13 +197,8 @@ static int powernv_set_freq(cpumask_var_t cpus, unsigned int new_index)
 
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-	int base, i;
-
 #ifdef CONFIG_SMP
-	base = cpu_first_thread_sibling(policy->cpu);
-
-	for (i = 0; i < threads_per_core; i++)
-		cpumask_set_cpu(base + i, policy->cpus);
+	powernv_cpu_to_core_mask(policy->cpu, policy->cpus);
 #endif
 	policy->cpuinfo.transition_latency = 25000;
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 6/6] powernv:cpufreq: Implement the driver->get() method
From: Gautham R. Shenoy @ 2014-03-10 11:11 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srivatsa.bhat, Gautham R. Shenoy
In-Reply-To: <1394449861-8688-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

The current frequency of a cpu is reported through the sysfs file
cpuinfo_cur_freq. This requires the driver to implement a
"->get(unsigned int cpu)" method which will return the current
operating frequency.

Implement a function named powernv_cpufreq_get() which reads the local
pstate from the PMSR and returns the corresponding frequency.

Set the powernv_cpufreq_driver.get hook to powernv_cpufreq_get().

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 48 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 183bbc4..6f3b6e1 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -223,6 +223,53 @@ static inline void set_pmspr(unsigned long sprn, unsigned long val)
 	BUG();
 }
 
+/*
+ * Computes the current frequency on this cpu
+ * and stores the result in *ret_freq.
+ */
+static void powernv_read_cpu_freq(void *ret_freq)
+{
+	unsigned long pmspr_val;
+	s8 local_pstate_id;
+	int *cur_freq, freq, pstate_id;
+
+	cur_freq = (int *)ret_freq;
+	pmspr_val = get_pmspr(SPRN_PMSR);
+
+	/* The local pstate id corresponds bits 48..55 in the PMSR.
+         * Note: Watch out for the sign! */
+	local_pstate_id = (pmspr_val >> 48) & 0xFF;
+	pstate_id = local_pstate_id;
+
+	freq = pstate_id_to_freq(pstate_id);
+	pr_debug("cpu %d pmsr %lx pstate_id %d frequency %d \n",
+		smp_processor_id(), pmspr_val, pstate_id, freq);
+	*cur_freq = freq;
+}
+
+/*
+ * Returns the cpu frequency as reported by the firmware for 'cpu'.
+ * This value is reported through the sysfs file cpuinfo_cur_freq.
+ */
+unsigned int powernv_cpufreq_get(unsigned int cpu)
+{
+	int ret_freq;
+	cpumask_var_t sibling_mask;
+
+	if (unlikely(!zalloc_cpumask_var(&sibling_mask, GFP_KERNEL))) {
+		smp_call_function_single(cpu, powernv_read_cpu_freq,
+					&ret_freq, 1);
+		return ret_freq;
+	}
+
+	powernv_cpu_to_core_mask(cpu, sibling_mask);
+	smp_call_function_any(sibling_mask, powernv_read_cpu_freq,
+			&ret_freq, 1);
+
+	free_cpumask_var(sibling_mask);
+	return ret_freq;
+}
+
 static void set_pstate(void *pstate)
 {
 	unsigned long val;
@@ -309,6 +356,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy *policy,
 static struct cpufreq_driver powernv_cpufreq_driver = {
 	.verify		= powernv_cpufreq_verify,
 	.target		= powernv_cpufreq_target,
+	.get		= powernv_cpufreq_get,
 	.init		= powernv_cpufreq_cpu_init,
 	.exit		= powernv_cpufreq_cpu_exit,
 	.name		= "powernv-cpufreq",
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 4/6] powernv:cpufreq: Create pstate_id_to_freq() helper
From: Gautham R. Shenoy @ 2014-03-10 11:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srivatsa.bhat, Gautham R. Shenoy
In-Reply-To: <1394449861-8688-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Create a helper routine that can return the cpu-frequency for the
corresponding pstate_id.

Also, cache the values of the pstate_max, pstate_min and
pstate_nominal and nr_pstates in a static structure so that they can
be reused in the future to perform any validations.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 4c2e8ca..0ecd163 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -39,6 +39,14 @@ static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
+struct powernv_pstate_info {
+	int pstate_min_id;
+	int pstate_max_id;
+	int pstate_nominal_id;
+	int nr_pstates;
+};
+static struct powernv_pstate_info powernv_pstate_info;
+
 /*
  * Initialize the freq table based on data obtained
  * from the firmware passed via device-tree
@@ -112,9 +120,28 @@ static int init_powernv_pstates(void)
 	for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
 		pr_debug("%d: %d\n", i, powernv_freqs[i].frequency);
 
+	powernv_pstate_info.pstate_min_id = pstate_min;
+	powernv_pstate_info.pstate_max_id = pstate_max;
+	powernv_pstate_info.pstate_nominal_id = pstate_nominal;
+	powernv_pstate_info.nr_pstates = nr_pstates;
+
 	return 0;
 }
 
+/**
+ * Returns the cpu frequency corresponding to the pstate_id.
+ */
+static unsigned int pstate_id_to_freq(int pstate_id)
+{
+	int i;
+
+	i = powernv_pstate_info.pstate_max_id - pstate_id;
+
+	BUG_ON(i >= powernv_pstate_info.nr_pstates || i < 0);
+	WARN_ON(powernv_pstate_ids[i] != pstate_id);
+	return powernv_freqs[i].frequency;
+}
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
 	&cpufreq_freq_attr_scaling_available_freqs,
 	NULL,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure
From: Sudeep Holla @ 2014-03-10 11:12 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras,
	linux-kernel@vger.kernel.org, Sudeep Holla
In-Reply-To: <531963D9.5040701@linux.vnet.ibm.com>

Hi Anshuman,

On 07/03/14 06:14, Anshuman Khandual wrote:
> On 03/07/2014 09:36 AM, Anshuman Khandual wrote:
>> On 02/19/2014 09:36 PM, Sudeep Holla wrote:
>>> From: Sudeep Holla <sudeep.holla@arm.com>
>>>
>>> This patch removes the redundant sysfs cacheinfo code by making use of
>>> the newly introduced generic cacheinfo infrastructure.
>>>
>>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> Cc: Paul Mackerras <paulus@samba.org>
>>> Cc: linuxppc-dev@lists.ozlabs.org
>>> ---
>>>   arch/powerpc/kernel/cacheinfo.c | 831 ++++++-------------------------=
---------
>>>   arch/powerpc/kernel/cacheinfo.h |   8 -
>>>   arch/powerpc/kernel/sysfs.c     |   4 -
>>>   3 files changed, 109 insertions(+), 734 deletions(-)
>>>   delete mode 100644 arch/powerpc/kernel/cacheinfo.h
>>>
>>> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cach=
einfo.c
>>> index 2912b87..05b7580 100644
>>> --- a/arch/powerpc/kernel/cacheinfo.c
>>> +++ b/arch/powerpc/kernel/cacheinfo.c
>>> @@ -10,38 +10,10 @@
>>>    * 2 as published by the Free Software Foundation.
>>>    */
>>>
>>> +#include <linux/cacheinfo.h>
>>>   #include <linux/cpu.h>
>>> -#include <linux/cpumask.h>
>>>   #include <linux/kernel.h>
>>> -#include <linux/kobject.h>
>>> -#include <linux/list.h>
>>> -#include <linux/notifier.h>
>>>   #include <linux/of.h>
>>> -#include <linux/percpu.h>
>>> -#include <linux/slab.h>
>>> -#include <asm/prom.h>
>>> -
>>> -#include "cacheinfo.h"
>>> -
>>> -/* per-cpu object for tracking:
>>> - * - a "cache" kobject for the top-level directory
>>> - * - a list of "index" objects representing the cpu's local cache hier=
archy
>>> - */
>>> -struct cache_dir {
>>> -=09struct kobject *kobj; /* bare (not embedded) kobject for cache
>>> -=09=09=09       * directory */
>>> -=09struct cache_index_dir *index; /* list of index objects */
>>> -};
>>> -
>>> -/* "index" object: each cpu's cache directory has an index
>>> - * subdirectory corresponding to a cache object associated with the
>>> - * cpu.  This object's lifetime is managed via the embedded kobject.
>>> - */
>>> -struct cache_index_dir {
>>> -=09struct kobject kobj;
>>> -=09struct cache_index_dir *next; /* next index in parent directory */
>>> -=09struct cache *cache;
>>> -};
>>>
>>>   /* Template for determining which OF properties to query for a given
>>>    * cache type */
>>> @@ -60,11 +32,6 @@ struct cache_type_info {
>>>   =09const char *nr_sets_prop;
>>>   };
>>>
>>> -/* These are used to index the cache_type_info array. */
>>> -#define CACHE_TYPE_UNIFIED     0
>>> -#define CACHE_TYPE_INSTRUCTION 1
>>> -#define CACHE_TYPE_DATA        2
>>> -
>>>   static const struct cache_type_info cache_type_info[] =3D {
>>>   =09{
>>>   =09=09/* PowerPC Processor binding says the [di]-cache-*
>>> @@ -77,246 +44,115 @@ static const struct cache_type_info cache_type_in=
fo[] =3D {
>>>   =09=09.nr_sets_prop    =3D "d-cache-sets",
>>>   =09},
>>>   =09{
>>> -=09=09.name            =3D "Instruction",
>>> -=09=09.size_prop       =3D "i-cache-size",
>>> -=09=09.line_size_props =3D { "i-cache-line-size",
>>> -=09=09=09=09     "i-cache-block-size", },
>>> -=09=09.nr_sets_prop    =3D "i-cache-sets",
>>> -=09},
>>> -=09{
>>>   =09=09.name            =3D "Data",
>>>   =09=09.size_prop       =3D "d-cache-size",
>>>   =09=09.line_size_props =3D { "d-cache-line-size",
>>>   =09=09=09=09     "d-cache-block-size", },
>>>   =09=09.nr_sets_prop    =3D "d-cache-sets",
>>>   =09},
>>> +=09{
>>> +=09=09.name            =3D "Instruction",
>>> +=09=09.size_prop       =3D "i-cache-size",
>>> +=09=09.line_size_props =3D { "i-cache-line-size",
>>> +=09=09=09=09     "i-cache-block-size", },
>>> +=09=09.nr_sets_prop    =3D "i-cache-sets",
>>> +=09},
>>>   };
>>
>>
>> Hey Sudeep,
>>
>> After applying this patch, the cache_type_info array looks like this.
>>
>> static const struct cache_type_info cache_type_info[] =3D {
>>          {
>>                  /*
>>                   * PowerPC Processor binding says the [di]-cache-*
>>                   * must be equal on unified caches, so just use
>>                   * d-cache properties.
>>                   */
>>                  .name            =3D "Unified",
>>                  .size_prop       =3D "d-cache-size",
>>                  .line_size_props =3D { "d-cache-line-size",
>>                                       "d-cache-block-size", },
>>                  .nr_sets_prop    =3D "d-cache-sets",
>>          },
>>          {
>>                  .name            =3D "Data",
>>                  .size_prop       =3D "d-cache-size",
>>                  .line_size_props =3D { "d-cache-line-size",
>>                                       "d-cache-block-size", },
>>                  .nr_sets_prop    =3D "d-cache-sets",
>>          },
>>          {
>>                  .name            =3D "Instruction",
>>                  .size_prop       =3D "i-cache-size",
>>                  .line_size_props =3D { "i-cache-line-size",
>>                                       "i-cache-block-size", },
>>                  .nr_sets_prop    =3D "i-cache-sets",
>>          },
>> };
>>
>> and this function computes the the array index for any given cache type
>> define for PowerPC.
>>
>> static inline int get_cacheinfo_idx(enum cache_type type)
>> {
>>          if (type =3D=3D CACHE_TYPE_UNIFIED)
>>                  return 0;
>>          else
>>                  return type;
>> }
>>
>> These types are define in include/linux/cacheinfo.h as
>>
>> enum cache_type {
>>          CACHE_TYPE_NOCACHE =3D 0,
>>          CACHE_TYPE_INST =3D BIT(0),=09=09---> 1
>>          CACHE_TYPE_DATA =3D BIT(1),=09=09---> 2
>>          CACHE_TYPE_SEPARATE =3D CACHE_TYPE_INST | CACHE_TYPE_DATA,
>>          CACHE_TYPE_UNIFIED =3D BIT(2),
>> };
>>
>> When it is UNIFIED we return index 0, which is correct. But the index
>> for instruction and data cache seems to be swapped which wrong. This
>> will fetch invalid properties for any given cache type.
>>

Ah, that's silly mistake on my side, will fix it.

>> I have done some initial review and testing for this patch's impact on
>> PowerPC (ppc64 POWER specifically). I am trying to do some code clean-up
>> and re-arrangements. Will post out soon. Thanks !

Thanks for taking time for testing and reviewing these patches.

>
> It does not work correctly on POWER.
>
> The new patchset adds some more attributes for every cache entry apart fr=
om
> what we used to have on PowerPC before. From the ABI perspective, the old=
 ones
> should reflect the correct value in the same manner as before. Looks like
> the generic code will make any attribute as "Unknown" if the arch code do=
es
> not populate them in the respective callback.
>

Yes this is on my list, I need to avoid populating the sysfs files with=20
"Unknown" as value, will do that in next version.

> Here are some problems found on a POWER7 system
>
> (1) L1 instruction cache (cpu<N>/cache/index1/)
>
> =09=3D=3D=3D=3D=3D=3D Before patch =3D=3D=3D=3D=3D=3D
>
> =09coherency_line_size: =09128
> =09level:=09=09=091
> =09shared_cpu_map:=09=0900000000,00000000,00000000,00000000,00000000,0000=
0000,00000000,00000000,00000000,
>          =09=09=0900000000,00000000,00000000,00000000,00000000,00000000,0=
0000000,00000000,00000000,
> =09=09=09=0900000000,00000000,00000000,00000000,00000000,00000000,0000000=
0,00000000,00000000,
> =09=09=09=0900000000,00000000,00000000,00000000,00000f00
> =09size:=09=09=0932K
> =09type:=09=09=09Instruction
>
> =09=3D=3D=3D=3D=3D After patch =3D=3D=3D=3D=3D=3D=3D=3D
>
> =09coherency_line_size:=09Unknown=09=09=09=09=09=09----> Wrong
> =09level:=09=09=091
> =09shared_cpu_map:=09=0900000000,00000000,00000000,00000000,00000000,0000=
0000,00000000,00000000,00000000,
>          =09=09=0900000000,00000000,00000000,00000000,00000000,00000000,0=
0000000,00000000,00000000,
> =09=09=09=0900000000,00000000,00000000,00000000,00000000,00000000,0000000=
0,00000000,00000000,
> =09=09=09=0900000000,00000000,00000000,00000000,00ffffff=09----> Wrong
> =09size:=09=09=090K=09=09=09=09=09=09----> Wrong
> =09type:=09=09=09Instruction=09
>
> (2) L3 cache (cpu<N>/cache/index3/)
>
> =09=3D=3D=3D=3D=3D=3D Before patch =3D=3D=3D=3D=3D=3D
>
> =09number_of_sets:=09=091
> =09size:=09=09=094096K
> =09ways_of_associativity:=090
>
> =09=3D=3D=3D=3D=3D After patch =3D=3D=3D=3D=3D=3D=3D=3D
>
> =09number_of_sets:=09=091
> =09size:=09=09=094096K
> =09ways_of_associativity:=09Unknown=09=09----> Wrong
>
> Need to revisit this implementation on PowerPC and figure out the cause o=
f these problems.
>

Yes, based on the logs you have provided, I will check for the root=20
cause of these issues. I will get back with questions if I need=20
clarifications.

Regards,
Sudeep

^ permalink raw reply

* [PATCH v3 00/52] CPU hotplug: Fix issues with callback registration
From: Srivatsa S. Bhat @ 2014-03-10 20:33 UTC (permalink / raw)
  To: paulus, oleg, mingo, rjw, rusty, peterz, tglx, akpm
  Cc: linux-arch, ego, walken, linux, linux-pm, linux-kernel,
	linuxppc-dev, srivatsa.bhat, tj, paulmck

Hi,

Many subsystems and drivers have the need to register CPU hotplug callbacks
from their init routines and also perform initialization for the CPUs that are
already online. But unfortunately there is no race-free way to achieve this
today.

For example, consider this piece of code:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is not safe because there is a possibility of an ABBA deadlock involving
the cpu_add_remove_lock and the cpu_hotplug.lock.

          CPU 0                                         CPU 1
          -----                                         -----

   Acquire cpu_hotplug.lock
   [via get_online_cpus()]

                                              CPU online/offline operation
                                              takes cpu_add_remove_lock
                                              [via cpu_maps_update_begin()]

   Try to acquire
   cpu_add_remove_lock
   [via register_cpu_notifier()]

                                              CPU online/offline operation
                                              tries to acquire cpu_hotplug.lock
                                              [via cpu_hotplug_begin()]

                            *** DEADLOCK! ***


Other combinations of callback registration also don't work correctly.
Examples:

	register_cpu_notifier(&foobar_cpu_notifier);

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	put_online_cpus();

This can lead to double initialization if a hotplug operation occurs after
registering the notifier and before invoking get_online_cpus().

On the other hand, the following piece of code can miss hotplug events
altogether:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	put_online_cpus();
              ^
              |   Race window; Can miss hotplug events here
              v
	register_cpu_notifier(&foobar_cpu_notifier);


To solve these issues and provide a race-free method to register CPU hotplug
callbacks, this patchset introduces new variants of the callback registration
APIs that don't hold the cpu_add_remove_lock, and exports the
cpu_add_remove_lock via 2 new APIs cpu_notifier_register_begin/done() for use
by various subsystems. With this in place, the following code snippet will
register a hotplug callback as well as initialize already online CPUs without
any race conditions.

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* This doesn't take the cpu_add_remove_lock */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();


In this patchset, patch 1 adds lockdep annotations to catch the above mentioned
deadlock scenario. Patch 2 introduces the new APIs and infrastructure necessary
for race-free callback registration. The remaining patches perform tree-wide
conversions (to use this model).

This patchset has been hosted in the below git tree. It applies cleanly on
v3.14-rc6.

git://github.com/srivatsabhat/linux.git cpuhp-registration-fixes-v3


Changes from v2:
* Collected more Acks from subsystem maintainers.
* Updated the xen-balloon patch and got Ack from Boris Ostrovsky.


Gautham R. Shenoy (1):
      CPU hotplug: Add lockdep annotations to get/put_online_cpus()

Srivatsa S. Bhat (51):
      CPU hotplug: Provide lockless versions of callback registration functions
      Doc/cpu-hotplug: Specify race-free way to register CPU hotplug callbacks
      CPU hotplug, perf: Fix CPU hotplug callback registration
      ia64, salinfo: Fix hotplug callback registration
      ia64, palinfo: Fix CPU hotplug callback registration
      ia64, topology: Fix CPU hotplug callback registration
      ia64, err-inject: Fix CPU hotplug callback registration
      arm, hw-breakpoint: Fix CPU hotplug callback registration
      arm, kvm: Fix CPU hotplug callback registration
      s390, cacheinfo: Fix CPU hotplug callback registration
      s390, smp: Fix CPU hotplug callback registration
      sparc, sysfs: Fix CPU hotplug callback registration
      powerpc, sysfs: Fix CPU hotplug callback registration
      x86, msr: Fix CPU hotplug callback registration
      x86, cpuid: Fix CPU hotplug callback registration
      x86, vsyscall: Fix CPU hotplug callback registration
      x86, intel, uncore: Fix CPU hotplug callback registration
      x86, mce: Fix CPU hotplug callback registration
      x86, therm_throt.c: Fix CPU hotplug callback registration
      x86, therm_throt.c: Remove unused therm_cpu_lock
      x86, amd, ibs: Fix CPU hotplug callback registration
      x86, intel, cacheinfo: Fix CPU hotplug callback registration
      x86, intel, rapl: Fix CPU hotplug callback registration
      x86, amd, uncore: Fix CPU hotplug callback registration
      x86, hpet: Fix CPU hotplug callback registration
      x86, pci, amd-bus: Fix CPU hotplug callback registration
      x86, oprofile, nmi: Fix CPU hotplug callback registration
      x86, kvm: Fix CPU hotplug callback registration
      arm64, hw_breakpoint.c: Fix CPU hotplug callback registration
      arm64, debug-monitors: Fix CPU hotplug callback registration
      powercap, intel-rapl: Fix CPU hotplug callback registration
      scsi, bnx2i: Fix CPU hotplug callback registration
      scsi, bnx2fc: Fix CPU hotplug callback registration
      scsi, fcoe: Fix CPU hotplug callback registration
      zsmalloc: Fix CPU hotplug callback registration
      acpi-cpufreq: Fix CPU hotplug callback registration
      drivers/base/topology.c: Fix CPU hotplug callback registration
      clocksource, dummy-timer: Fix CPU hotplug callback registration
      intel-idle: Fix CPU hotplug callback registration
      oprofile, nmi-timer: Fix CPU hotplug callback registration
      octeon, watchdog: Fix CPU hotplug callback registration
      thermal, x86-pkg-temp: Fix CPU hotplug callback registration
      hwmon, coretemp: Fix CPU hotplug callback registration
      hwmon, via-cputemp: Fix CPU hotplug callback registration
      xen, balloon: Fix CPU hotplug callback registration
      trace, ring-buffer: Fix CPU hotplug callback registration
      profile: Fix CPU hotplug callback registration
      mm, vmstat: Fix CPU hotplug callback registration
      mm, zswap: Fix CPU hotplug callback registration
      net/core/flow.c: Fix CPU hotplug callback registration
      net/iucv/iucv.c: Fix CPU hotplug callback registration

 Documentation/cpu-hotplug.txt                 |   45 +++++++++
 arch/arm/kernel/hw_breakpoint.c               |    8 +-
 arch/arm/kvm/arm.c                            |    7 +
 arch/arm64/kernel/debug-monitors.c            |    6 +
 arch/arm64/kernel/hw_breakpoint.c             |    7 +
 arch/ia64/kernel/err_inject.c                 |   15 +++
 arch/ia64/kernel/palinfo.c                    |    6 +
 arch/ia64/kernel/salinfo.c                    |    6 +
 arch/ia64/kernel/topology.c                   |    6 +
 arch/powerpc/kernel/sysfs.c                   |    8 +-
 arch/s390/kernel/cache.c                      |    5 +
 arch/s390/kernel/smp.c                        |   13 ++-
 arch/sparc/kernel/sysfs.c                     |    6 +
 arch/x86/kernel/cpu/intel_cacheinfo.c         |   13 ++-
 arch/x86/kernel/cpu/mcheck/mce.c              |    8 +-
 arch/x86/kernel/cpu/mcheck/therm_throt.c      |   18 +---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c      |    6 +
 arch/x86/kernel/cpu/perf_event_amd_uncore.c   |    7 +
 arch/x86/kernel/cpu/perf_event_intel_rapl.c   |    9 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    6 +
 arch/x86/kernel/cpuid.c                       |   15 ++-
 arch/x86/kernel/hpet.c                        |    4 +
 arch/x86/kernel/msr.c                         |   16 ++-
 arch/x86/kernel/vsyscall_64.c                 |    6 +
 arch/x86/kvm/x86.c                            |    7 +
 arch/x86/oprofile/nmi_int.c                   |   15 +++
 arch/x86/pci/amd_bus.c                        |    5 +
 drivers/base/topology.c                       |   12 ++
 drivers/clocksource/dummy_timer.c             |   11 ++
 drivers/cpufreq/acpi-cpufreq.c                |    7 +
 drivers/hwmon/coretemp.c                      |   14 +--
 drivers/hwmon/via-cputemp.c                   |   14 +--
 drivers/idle/intel_idle.c                     |   12 ++
 drivers/oprofile/nmi_timer_int.c              |   23 +++--
 drivers/powercap/intel_rapl.c                 |   10 ++
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c             |   12 ++
 drivers/scsi/bnx2i/bnx2i_init.c               |   12 ++
 drivers/scsi/fcoe/fcoe.c                      |   15 +++
 drivers/thermal/x86_pkg_temp_thermal.c        |   14 +--
 drivers/watchdog/octeon-wdt-main.c            |   11 ++
 drivers/xen/balloon.c                         |   36 +++++--
 include/linux/cpu.h                           |   47 ++++++++++
 include/linux/perf_event.h                    |   16 +++
 kernel/cpu.c                                  |   38 +++++++-
 kernel/profile.c                              |   20 +++-
 kernel/trace/ring_buffer.c                    |   19 ++--
 mm/vmstat.c                                   |    6 +
 mm/zsmalloc.c                                 |   17 +++-
 mm/zswap.c                                    |    8 +-
 net/core/flow.c                               |    8 +-
 net/iucv/iucv.c                               |  121 ++++++++++++-------------
 51 files changed, 550 insertions(+), 226 deletions(-)


Regards,
Srivatsa S. Bhat
IBM Linux Technology Center

^ permalink raw reply

* [PATCH v3 02/52] CPU hotplug: Provide lockless versions of callback registration functions
From: Srivatsa S. Bhat @ 2014-03-10 20:34 UTC (permalink / raw)
  To: paulus, oleg, mingo, rjw, rusty, peterz, tglx, akpm
  Cc: linux-arch, ego, walken, linux, linux-pm, Peter Zijlstra,
	Rafael J. Wysocki, linux-kernel, Ingo Molnar, linuxppc-dev,
	Srivatsa S. Bhat, Oleg Nesterov, tj, Toshi Kani, Thomas Gleixner,
	paulmck, Andrew Morton
In-Reply-To: <20140310203312.10746.310.stgit@srivatsabhat.in.ibm.com>

The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

The deadlock is shown below:

          CPU 0                                         CPU 1
          -----                                         -----

   Acquire cpu_hotplug.lock
   [via get_online_cpus()]

                                              CPU online/offline operation
                                              takes cpu_add_remove_lock
                                              [via cpu_maps_update_begin()]


   Try to acquire
   cpu_add_remove_lock
   [via register_cpu_notifier()]


                                              CPU online/offline operation
                                              tries to acquire cpu_hotplug.lock
                                              [via cpu_hotplug_begin()]


                            *** DEADLOCK! ***

The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.

This can be achieved by writing the callback registration code as follows:

	cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* This doesn't take the cpu_add_remove_lock */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_maps_update_done();  [ or cpu_notifier_register_done(); see below ]

Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.

Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.

Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().

In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/cpu.c        |   21 +++++++++++++++++++--
 2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 03e235ad..488d6eb 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -122,26 +122,46 @@ enum {
 		{ .notifier_call = fn, .priority = pri };	\
 	register_cpu_notifier(&fn##_nb);			\
 }
+
+#define __cpu_notifier(fn, pri) {				\
+	static struct notifier_block fn##_nb =			\
+		{ .notifier_call = fn, .priority = pri };	\
+	__register_cpu_notifier(&fn##_nb);			\
+}
 #else /* #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */
 #define cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
+#define __cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 #endif /* #else #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) */
+
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
+extern int __register_cpu_notifier(struct notifier_block *nb);
 extern void unregister_cpu_notifier(struct notifier_block *nb);
+extern void __unregister_cpu_notifier(struct notifier_block *nb);
 #else
 
 #ifndef MODULE
 extern int register_cpu_notifier(struct notifier_block *nb);
+extern int __register_cpu_notifier(struct notifier_block *nb);
 #else
 static inline int register_cpu_notifier(struct notifier_block *nb)
 {
 	return 0;
 }
+
+static inline int __register_cpu_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
 #endif
 
 static inline void unregister_cpu_notifier(struct notifier_block *nb)
 {
 }
+
+static inline void __unregister_cpu_notifier(struct notifier_block *nb)
+{
+}
 #endif
 
 int cpu_up(unsigned int cpu);
@@ -149,19 +169,32 @@ void notify_cpu_starting(unsigned int cpu);
 extern void cpu_maps_update_begin(void);
 extern void cpu_maps_update_done(void);
 
+#define cpu_notifier_register_begin	cpu_maps_update_begin
+#define cpu_notifier_register_done	cpu_maps_update_done
+
 #else	/* CONFIG_SMP */
 
 #define cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
+#define __cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 
 static inline int register_cpu_notifier(struct notifier_block *nb)
 {
 	return 0;
 }
 
+static inline int __register_cpu_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
 static inline void unregister_cpu_notifier(struct notifier_block *nb)
 {
 }
 
+static inline void __unregister_cpu_notifier(struct notifier_block *nb)
+{
+}
+
 static inline void cpu_maps_update_begin(void)
 {
 }
@@ -170,6 +203,14 @@ static inline void cpu_maps_update_done(void)
 {
 }
 
+static inline void cpu_notifier_register_begin(void)
+{
+}
+
+static inline void cpu_notifier_register_done(void)
+{
+}
+
 #endif /* CONFIG_SMP */
 extern struct bus_type cpu_subsys;
 
@@ -183,8 +224,11 @@ extern void put_online_cpus(void);
 extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
+#define __hotcpu_notifier(fn, pri)	__cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
+#define __register_hotcpu_notifier(nb)	__register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
+#define __unregister_hotcpu_notifier(nb)	__unregister_cpu_notifier(nb)
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
 
@@ -197,9 +241,12 @@ static inline void cpu_hotplug_done(void) {}
 #define cpu_hotplug_disable()	do { } while (0)
 #define cpu_hotplug_enable()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
+#define __hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
+#define __register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
 #define unregister_hotcpu_notifier(nb)	({ (void)(nb); })
+#define __unregister_hotcpu_notifier(nb)	({ (void)(nb); })
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 33caf5e..a9e710e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -28,18 +28,23 @@
 static DEFINE_MUTEX(cpu_add_remove_lock);
 
 /*
- * The following two API's must be used when attempting
- * to serialize the updates to cpu_online_mask, cpu_present_mask.
+ * The following two APIs (cpu_maps_update_begin/done) must be used when
+ * attempting to serialize the updates to cpu_online_mask & cpu_present_mask.
+ * The APIs cpu_notifier_register_begin/done() must be used to protect CPU
+ * hotplug callback (un)registration performed using __register_cpu_notifier()
+ * or __unregister_cpu_notifier().
  */
 void cpu_maps_update_begin(void)
 {
 	mutex_lock(&cpu_add_remove_lock);
 }
+EXPORT_SYMBOL(cpu_notifier_register_begin);
 
 void cpu_maps_update_done(void)
 {
 	mutex_unlock(&cpu_add_remove_lock);
 }
+EXPORT_SYMBOL(cpu_notifier_register_done);
 
 static RAW_NOTIFIER_HEAD(cpu_chain);
 
@@ -183,6 +188,11 @@ int __ref register_cpu_notifier(struct notifier_block *nb)
 	return ret;
 }
 
+int __ref __register_cpu_notifier(struct notifier_block *nb)
+{
+	return raw_notifier_chain_register(&cpu_chain, nb);
+}
+
 static int __cpu_notify(unsigned long val, void *v, int nr_to_call,
 			int *nr_calls)
 {
@@ -206,6 +216,7 @@ static void cpu_notify_nofail(unsigned long val, void *v)
 	BUG_ON(cpu_notify(val, v));
 }
 EXPORT_SYMBOL(register_cpu_notifier);
+EXPORT_SYMBOL(__register_cpu_notifier);
 
 void __ref unregister_cpu_notifier(struct notifier_block *nb)
 {
@@ -215,6 +226,12 @@ void __ref unregister_cpu_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_cpu_notifier);
 
+void __ref __unregister_cpu_notifier(struct notifier_block *nb)
+{
+	raw_notifier_chain_unregister(&cpu_chain, nb);
+}
+EXPORT_SYMBOL(__unregister_cpu_notifier);
+
 /**
  * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
  * @cpu: a CPU id

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox