Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2 2/3] Documentation/i2c: sync docs with current state of i2c-tools
From: Wolfram Sang @ 2018-04-13 16:50 UTC (permalink / raw)
  To: Sam Hansen; +Cc: linux-i2c, corbet, linux-doc, linux-kernel
In-Reply-To: <20180413164405.127522-2-hansens@google.com>

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

On Fri, Apr 13, 2018 at 09:44:04AM -0700, Sam Hansen wrote:
> Currently, Documentation/i2c/dev-interface describes the use of
> i2c_smbus_* helper routines as static inlined functions provided by
> linux/i2c-dev.h.  Work has been done to refactor the linux/i2c-dev.h file
> in the i2c-tools project out into its own library.  As a result, these
> docs have become stale.
> 
> This patch corrects the discrepancy and directs the reader to the
> i2c-tools project for more information.
> 
> Signed-off-by: Sam Hansen <hansens@google.com>

Looks good to me as well.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/3] Documentation/i2c: whitespace cleanup
From: Wolfram Sang @ 2018-04-13 16:50 UTC (permalink / raw)
  To: Sam Hansen; +Cc: linux-i2c, corbet, linux-doc, linux-kernel
In-Reply-To: <20180413164405.127522-1-hansens@google.com>

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]

On Fri, Apr 13, 2018 at 09:44:03AM -0700, Sam Hansen wrote:
> This strips trailing whitespace in Documentation/i2c/dev-interface.
> 
> Signed-off-by: Sam Hansen <hansens@google.com>

Looks good to me. But please send new series as seperate threads.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 3/3] Documentation/i2c: adopt kernel commenting style in examples
From: Wolfram Sang @ 2018-04-13 16:50 UTC (permalink / raw)
  To: Sam Hansen; +Cc: linux-i2c, corbet, linux-doc, linux-kernel
In-Reply-To: <20180413164405.127522-3-hansens@google.com>

[-- Attachment #1: Type: text/plain, Size: 428 bytes --]


> -  int adapter_nr = 2; /* probably dynamically determined */

Such comments are actually OK.

> -    /* ERROR HANDLING; you can check errno to see what went wrong */

Such as well.

> -  /* Using I2C Write, equivalent of
> -     i2c_smbus_write_word_data(file, reg, 0x6543) */
> +  /*
> +   * Using I2C Write, equivalent of
> +   * i2c_smbus_write_word_data(file, reg, 0x6543).
> +   */

This is the only broken one AFAICT.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 3/3] Documentation/i2c: adopt kernel commenting style in examples
From: Sam Hansen @ 2018-04-13 16:44 UTC (permalink / raw)
  To: linux-i2c, hansens, wsa, corbet, linux-doc, linux-kernel
  Cc: Sam Hansen, wsa, corbet, linux-doc, linux-kernel
In-Reply-To: <20180413164405.127522-1-hansens@google.com>

The example I2C code is rewritten to adopt the preferred kernel block
commenting style.

Signed-off-by: Sam Hansen <hansens@google.com>
---
 Documentation/i2c/dev-interface | 57 +++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/Documentation/i2c/dev-interface b/Documentation/i2c/dev-interface
index f92ee1f59914..e17f5814c199 100644
--- a/Documentation/i2c/dev-interface
+++ b/Documentation/i2c/dev-interface
@@ -30,24 +30,34 @@ assume much about them. They can even change from one boot to the next.
 
 Next thing, open the device file, as follows:
 
+  /*
+   * The adapter is probably dynamically determined.
+   */
   int file;
-  int adapter_nr = 2; /* probably dynamically determined */
+  int adapter_nr = 2;
   char filename[20];
 
   snprintf(filename, 19, "/dev/i2c-%d", adapter_nr);
   file = open(filename, O_RDWR);
   if (file < 0) {
-    /* ERROR HANDLING; you can check errno to see what went wrong */
+    /*
+     * ERROR HANDLING; you can check errno to see what went wrong.
+     */
     exit(1);
   }
 
 When you have opened the device, you must specify with what device
 address you want to communicate:
 
-  int addr = 0x40; /* The I2C address */
+  /*
+   * The I2C address.
+   */
+  int addr = 0x40;
 
   if (ioctl(file, I2C_SLAVE, addr) < 0) {
-    /* ERROR HANDLING; you can check errno to see what went wrong */
+    /*
+     * ERROR HANDLING; you can check errno to see what went wrong.
+     */
     exit(1);
   }
 
@@ -55,32 +65,51 @@ Well, you are all set up now. You can now use SMBus commands or plain
 I2C to communicate with your device. SMBus commands are preferred if
 the device supports them. Both are illustrated below.
 
-  __u8 reg = 0x10; /* Device register to access */
+  /*
+   * The device register to access.
+   */
+  __u8 reg = 0x10;
   __s32 res;
   char buf[10];
 
-  /* Using SMBus commands */
+  /*
+   * Using SMBus commands.
+   */
   res = i2c_smbus_read_word_data(file, reg);
   if (res < 0) {
-    /* ERROR HANDLING: i2c transaction failed */
+    /*
+     * ERROR HANDLING: i2c transaction failed.
+     */
   } else {
-    /* res contains the read word */
+    /*
+     * res contains the read word.
+     */
   }
 
-  /* Using I2C Write, equivalent of
-     i2c_smbus_write_word_data(file, reg, 0x6543) */
+  /*
+   * Using I2C Write, equivalent of
+   * i2c_smbus_write_word_data(file, reg, 0x6543).
+   */
   buf[0] = reg;
   buf[1] = 0x43;
   buf[2] = 0x65;
   if (write(file, buf, 3) != 3) {
-    /* ERROR HANDLING: i2c transaction failed */
+    /*
+     * ERROR HANDLING: i2c transaction failed.
+     */
   }
 
-  /* Using I2C Read, equivalent of i2c_smbus_read_byte(file) */
+  /*
+   * Using I2C Read, equivalent of i2c_smbus_read_byte(file).
+   */
   if (read(file, buf, 1) != 1) {
-    /* ERROR HANDLING: i2c transaction failed */
+    /*
+     * ERROR HANDLING: i2c transaction failed.
+     */
   } else {
-    /* buf[0] contains the read byte */
+    /*
+     * buf[0] contains the read byte.
+     */
   }
 
 Note that only a subset of the I2C and SMBus protocols can be achieved by
-- 
2.17.0.484.g0c8726318c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 2/3] Documentation/i2c: sync docs with current state of i2c-tools
From: Sam Hansen @ 2018-04-13 16:44 UTC (permalink / raw)
  To: linux-i2c, hansens, wsa, corbet, linux-doc, linux-kernel
  Cc: Sam Hansen, wsa, corbet, linux-doc, linux-kernel
In-Reply-To: <20180413164405.127522-1-hansens@google.com>

Currently, Documentation/i2c/dev-interface describes the use of
i2c_smbus_* helper routines as static inlined functions provided by
linux/i2c-dev.h.  Work has been done to refactor the linux/i2c-dev.h file
in the i2c-tools project out into its own library.  As a result, these
docs have become stale.

This patch corrects the discrepancy and directs the reader to the
i2c-tools project for more information.

Signed-off-by: Sam Hansen <hansens@google.com>
---
 Documentation/i2c/dev-interface | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/Documentation/i2c/dev-interface b/Documentation/i2c/dev-interface
index c8737d502791..f92ee1f59914 100644
--- a/Documentation/i2c/dev-interface
+++ b/Documentation/i2c/dev-interface
@@ -23,11 +23,6 @@ First, you need to include these two headers:
   #include <linux/i2c-dev.h>
   #include <i2c/smbus.h>
 
-(Please note that there are two files named "i2c-dev.h" out there. One is
-distributed with the Linux kernel and the other one is included in the
-source tree of i2c-tools. They used to be different in content but since 2012
-they're identical. You should use "linux/i2c-dev.h").
-
 Now, you have to decide which adapter you want to access. You should
 inspect /sys/class/i2c-dev/ or run "i2cdetect -l" to decide this.
 Adapter numbers are assigned somewhat dynamically, so you can not
@@ -140,8 +135,8 @@ ioctl(file, I2C_RDWR, struct i2c_rdwr_ioctl_data *msgset)
   set in each message, overriding the values set with the above ioctl's.
 
 ioctl(file, I2C_SMBUS, struct i2c_smbus_ioctl_data *args)
-  Not meant to be called  directly; instead, use the access functions
-  below.
+  If possible, use the provided i2c_smbus_* methods described below instead
+  of issuing direct ioctls.
 
 You can do plain i2c transactions by using read(2) and write(2) calls.
 You do not need to pass the address byte; instead, set it through
@@ -166,10 +161,9 @@ what happened. The 'write' transactions return 0 on success; the
 returns the number of values read. The block buffers need not be longer
 than 32 bytes.
 
-The above functions are all inline functions, that resolve to calls to
-the i2c_smbus_access function, that on its turn calls a specific ioctl
-with the data in a specific format. Read the source code if you
-want to know what happens behind the screens.
+The above functions are made available by linking against the libi2c library,
+which is provided by the i2c-tools project.  See:
+https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.
 
 
 Implementation details
-- 
2.17.0.484.g0c8726318c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 1/3] Documentation/i2c: whitespace cleanup
From: Sam Hansen @ 2018-04-13 16:44 UTC (permalink / raw)
  To: linux-i2c, hansens, wsa, corbet, linux-doc, linux-kernel
  Cc: Sam Hansen, wsa, corbet, linux-doc, linux-kernel
In-Reply-To: <CAP9bUy7rONNHYeRkYtaWHcfqoZcGKXJz-JocED26zBj9n+-ZbQ@mail.gmail.com>

This strips trailing whitespace in Documentation/i2c/dev-interface.

Signed-off-by: Sam Hansen <hansens@google.com>
---
 Documentation/i2c/dev-interface | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/i2c/dev-interface b/Documentation/i2c/dev-interface
index d04e6e4964ee..c8737d502791 100644
--- a/Documentation/i2c/dev-interface
+++ b/Documentation/i2c/dev-interface
@@ -9,8 +9,8 @@ i2c adapters present on your system at a given time. i2cdetect is part of
 the i2c-tools package.
 
 I2C device files are character device files with major device number 89
-and a minor device number corresponding to the number assigned as 
-explained above. They should be called "i2c-%d" (i2c-0, i2c-1, ..., 
+and a minor device number corresponding to the number assigned as
+explained above. They should be called "i2c-%d" (i2c-0, i2c-1, ...,
 i2c-10, ...). All 256 minor device numbers are reserved for i2c.
 
 
@@ -38,7 +38,7 @@ Next thing, open the device file, as follows:
   int file;
   int adapter_nr = 2; /* probably dynamically determined */
   char filename[20];
-  
+
   snprintf(filename, 19, "/dev/i2c-%d", adapter_nr);
   file = open(filename, O_RDWR);
   if (file < 0) {
@@ -72,7 +72,7 @@ the device supports them. Both are illustrated below.
     /* res contains the read word */
   }
 
-  /* Using I2C Write, equivalent of 
+  /* Using I2C Write, equivalent of
      i2c_smbus_write_word_data(file, reg, 0x6543) */
   buf[0] = reg;
   buf[1] = 0x43;
@@ -147,7 +147,7 @@ You can do plain i2c transactions by using read(2) and write(2) calls.
 You do not need to pass the address byte; instead, set it through
 ioctl I2C_SLAVE before you try to access the device.
 
-You can do SMBus level transactions (see documentation file smbus-protocol 
+You can do SMBus level transactions (see documentation file smbus-protocol
 for details) through the following functions:
   __s32 i2c_smbus_write_quick(int file, __u8 value);
   __s32 i2c_smbus_read_byte(int file);
@@ -158,7 +158,7 @@ for details) through the following functions:
   __s32 i2c_smbus_write_word_data(int file, __u8 command, __u16 value);
   __s32 i2c_smbus_process_call(int file, __u8 command, __u16 value);
   __s32 i2c_smbus_read_block_data(int file, __u8 command, __u8 *values);
-  __s32 i2c_smbus_write_block_data(int file, __u8 command, __u8 length, 
+  __s32 i2c_smbus_write_block_data(int file, __u8 command, __u8 length,
                                    __u8 *values);
 All these transactions return -1 on failure; you can read errno to see
 what happened. The 'write' transactions return 0 on success; the
-- 
2.17.0.484.g0c8726318c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* RE: [RFC 02/10] PCI: cadence: Update cdns_pcie_ep_raise_irq function signature
From: Alan Douglas @ 2018-04-13 16:05 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <52314006c63f8d21907dfaa03207b39e36bb8171.1523379766.git.gustavo.pimentel@synopsys.com>

On Tue, Apr 10 at 18.15, Gustavo Pimentel wrote:
> From: Gustavo Pimentel [mailto:gustavo.pimentel@synopsys.com]
> Changes the cdns_pcie_ep_raise_irq function signature, namely the
> interrupt_num variable type from u8 to u16 to accommodate the MSI-X
> maximum interrupts of 2048.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> ---
>  drivers/pci/cadence/pcie-cadence-ep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/cadence/pcie-cadence-ep.c
> b/drivers/pci/cadence/pcie-cadence-ep.c
> index 3d8283e..6d6322c 100644
> --- a/drivers/pci/cadence/pcie-cadence-ep.c
> +++ b/drivers/pci/cadence/pcie-cadence-ep.c
> @@ -363,7 +363,7 @@ static int cdns_pcie_ep_send_msi_irq(struct
> cdns_pcie_ep *ep, u8 fn,  }
> 
>  static int cdns_pcie_ep_raise_irq(struct pci_epc *epc, u8 fn,
> -				  enum pci_epc_irq_type type, u8
> interrupt_num)
> +				  enum pci_epc_irq_type type, u16
> interrupt_num)
>  {
>  	struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
> 
> --
> 2.7.4
> 
Acked-by: Alan Douglas <adouglas@cadence.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Documentation/i2c: sync docs with current state of i2c-tools.
From: Sam Hansen @ 2018-04-13 16:02 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Wolfram Sang, linux-i2c, corbet, linux-doc, linux-kernel
In-Reply-To: <20180413141359.793919fb@endymion>

On Fri, Apr 13, 2018 at 5:13 AM, Jean Delvare <jdelvare@suse.de> wrote:
> Hi Wolfram, Sam,
>
> On Fri, 13 Apr 2018 00:24:57 +0200, Wolfram Sang wrote:
>> On Thu, Apr 12, 2018 at 02:33:42PM -0700, Sam Hansen wrote:
>> > Currently, Documentation/i2c/dev-interface describes the use of i2c_smbus_*
>> > helper routines as static inlined functions provided by linux/i2c-dev.h.  Work
>> > has been done to refactor the linux/i2c-dev.h file in the i2c-tools project
>> > out into its own library.  As a result, these docs have become stale.
>>
>> Thanks for fixing this!
>>
>> > This patch corrects the discrepancy and directs the reader to the i2c-tools
>> > project for more information.  Additionally, some trailing-whitespace cleanups
>> > were made.
>>
>> Minor nit: Having the whitespace changes in a seperate patch is a tad
>> easier to review.
>>
>> > -  /* Using I2C Write, equivalent of
>> > +  /* Using I2C Write, equivalent of
>> >       i2c_smbus_write_word_data(file, reg, 0x6543) */
>>
>> Maybe change to Kernel coding style comments while here?
>>
>> > -  Not meant to be called  directly; instead, use the access functions
>> > -  below.
>> > +  If possible, use the provided i2c_smbus_* methods described below in favor
>> > +  of issuing direct ioctls.
>>
>> Why this change?
>
> I'm also not sure if "in favor of" is right. "instead of" would sound
> better to me, but I'm no native English speaker, I could be wrong.

Sounds good, I'll adopt "instead of".  Regarding Wolfram's earlier
comment, as an engineer, requiring an out-of-tree library to build
drivers felt a little off.  I can revert this section if you want,
just let me know.

>
>> > -The above functions are all inline functions, that resolve to calls to
>> > -the i2c_smbus_access function, that on its turn calls a specific ioctl
>> > -with the data in a specific format. Read the source code if you
>> > -want to know what happens behind the screens.
>> > +The above functions are made available by linking against the libi2c library,
>> > +which is provided by the i2c-tools project.  See:
>> > +https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.
>>
>> This is fine with me. Maybe Jean has a comment on this?
>
> Fixing the documentation is always welcome, thanks Sam for stepping in.
> However we really want separate patches for whitespace fixes and actual
> contents change, as Wolfram already mentioned above.

Will do!  I'll send a v2 patch set, thanks.

>
> --
> Jean Delvare
> SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] Documentation: ftrace: clarify filters with dynamic ftrace and graph
From: Steven Rostedt @ 2018-04-13 15:54 UTC (permalink / raw)
  To: Steffen Maier; +Cc: linux-doc, Ingo Molnar, Jonathan Corbet
In-Reply-To: <20180413153915.26931-1-maier@linux.ibm.com>

On Fri, 13 Apr 2018 17:39:15 +0200
Steffen Maier <maier@linux.ibm.com> wrote:

> I fell into the trap of having set up function tracer with a very
> limited filter and then switched over to function_graph and was
> erroneously wondering why the latter did not trace what I expected,
> which was the full unabridged graph recursion.
> 
> Signed-off-by: Steffen Maier <maier@linux.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>

Looks good.

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>

Jon, you want to take it in your tree?

-- Steve

> Cc: linux-doc@vger.kernel.org
> ---
> 
> Changes since v1:
> Integrated review comments by Steven Rostedt
> * less emotional phrasing
> * also mention function profiling with set_ftrace_filter
>   (hopefully I got that right as I don't have experience with it)
> 
>  Documentation/trace/ftrace.rst | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
> index e45f0786f3f9..9bbd3aefadb2 100644
> --- a/Documentation/trace/ftrace.rst
> +++ b/Documentation/trace/ftrace.rst
> @@ -224,6 +224,8 @@ of ftrace. Here is a list of some of the key files:
>  	has a side effect of enabling or disabling specific functions
>  	to be traced. Echoing names of functions into this file
>  	will limit the trace to only those functions.
> +	This influences the tracers "function" and "function_graph"
> +	and thus also function profiling (see "function_profile_enabled").
>  
>  	The functions listed in "available_filter_functions" are what
>  	can be written into this file.
> @@ -265,6 +267,8 @@ of ftrace. Here is a list of some of the key files:
>  	Functions listed in this file will cause the function graph
>  	tracer to only trace these functions and the functions that
>  	they call. (See the section "dynamic ftrace" for more details).
> +	Note, set_ftrace_filter and set_ftrace_notrace still affects
> +	what functions are being traced.
>  
>    set_graph_notrace:
>  
> @@ -277,7 +281,8 @@ of ftrace. Here is a list of some of the key files:
>  
>  	This lists the functions that ftrace has processed and can trace.
>  	These are the function names that you can pass to
> -	"set_ftrace_filter" or "set_ftrace_notrace".
> +	"set_ftrace_filter", "set_ftrace_notrace",
> +	"set_graph_function", or "set_graph_notrace".
>  	(See the section "dynamic ftrace" below for more details.)
>  
>    dyn_ftrace_total_info:

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2] Documentation: ftrace: clarify filters with dynamic ftrace and graph
From: Steffen Maier @ 2018-04-13 15:39 UTC (permalink / raw)
  To: linux-doc; +Cc: Steffen Maier, Steven Rostedt, Ingo Molnar, Jonathan Corbet
In-Reply-To: <20180413095745.4662eb8f@gandalf.local.home>

I fell into the trap of having set up function tracer with a very
limited filter and then switched over to function_graph and was
erroneously wondering why the latter did not trace what I expected,
which was the full unabridged graph recursion.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
---

Changes since v1:
Integrated review comments by Steven Rostedt
* less emotional phrasing
* also mention function profiling with set_ftrace_filter
  (hopefully I got that right as I don't have experience with it)

 Documentation/trace/ftrace.rst | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index e45f0786f3f9..9bbd3aefadb2 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -224,6 +224,8 @@ of ftrace. Here is a list of some of the key files:
 	has a side effect of enabling or disabling specific functions
 	to be traced. Echoing names of functions into this file
 	will limit the trace to only those functions.
+	This influences the tracers "function" and "function_graph"
+	and thus also function profiling (see "function_profile_enabled").
 
 	The functions listed in "available_filter_functions" are what
 	can be written into this file.
@@ -265,6 +267,8 @@ of ftrace. Here is a list of some of the key files:
 	Functions listed in this file will cause the function graph
 	tracer to only trace these functions and the functions that
 	they call. (See the section "dynamic ftrace" for more details).
+	Note, set_ftrace_filter and set_ftrace_notrace still affects
+	what functions are being traced.
 
   set_graph_notrace:
 
@@ -277,7 +281,8 @@ of ftrace. Here is a list of some of the key files:
 
 	This lists the functions that ftrace has processed and can trace.
 	These are the function names that you can pass to
-	"set_ftrace_filter" or "set_ftrace_notrace".
+	"set_ftrace_filter", "set_ftrace_notrace",
+	"set_graph_function", or "set_graph_notrace".
 	(See the section "dynamic ftrace" below for more details.)
 
   dyn_ftrace_total_info:
-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] Documentation: ftrace: clarify filters with dynamic ftrace and graph
From: Steven Rostedt @ 2018-04-13 13:57 UTC (permalink / raw)
  To: Steffen Maier; +Cc: linux-doc, Ingo Molnar, Jonathan Corbet
In-Reply-To: <20180413092616.9239-1-maier@linux.ibm.com>

On Fri, 13 Apr 2018 11:26:16 +0200
Steffen Maier <maier@linux.ibm.com> wrote:

> I fell into the trap of having set up function tracer with a very
> limited filter and then switched over to function_graph and was
> erroneously wondering why the latter did not trace what I expected,
> which was the full unabridged graph recursion.

I didn't realize that the documentation doesn't state this. I mention
it in all my talks that talk about function and function_graph tracer
and set_ftrace_filter.

> 
> Signed-off-by: Steffen Maier <maier@linux.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: linux-doc@vger.kernel.org
> ---
>  Documentation/trace/ftrace.rst | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
> index e45f0786f3f9..68835d062344 100644
> --- a/Documentation/trace/ftrace.rst
> +++ b/Documentation/trace/ftrace.rst
> @@ -224,6 +224,7 @@ of ftrace. Here is a list of some of the key files:
>  	has a side effect of enabling or disabling specific functions
>  	to be traced. Echoing names of functions into this file
>  	will limit the trace to only those functions.
> +	This influences both the tracers "function" and "function_graph"!

I know you probably were a bit upset about the documentation not
emphasizing that this affects function_graph, but there's no reason to
have an exclamation point at the end of that statement.

It states that it affects what functions are traced without mentioning
tracers. The assumption was that any "function type" tracing will be
affected by it. Which is no longer true, because stack tracer has its
own filter files.

We should also mention that function profiling is affected by that too.


>  
>  	The functions listed in "available_filter_functions" are what
>  	can be written into this file.
> @@ -265,6 +266,9 @@ of ftrace. Here is a list of some of the key files:
>  	Functions listed in this file will cause the function graph
>  	tracer to only trace these functions and the functions that
>  	they call. (See the section "dynamic ftrace" for more details).
> +	With dynamic ftrace, the graph tracer can only trace functions that
> +	set_ftrace_filter minus set_ftrace_notrace armed; this also influences
> +	the ability to chase function call chains.

Again, the above seems a bit emotional ;-) Just something like this:

	Note, set_ftrace_filter and set_ftrace_notrace still affects
	what functions are being traced.


>  
>    set_graph_notrace:
>  
> @@ -277,7 +281,8 @@ of ftrace. Here is a list of some of the key files:
>  
>  	This lists the functions that ftrace has processed and can trace.
>  	These are the function names that you can pass to
> -	"set_ftrace_filter" or "set_ftrace_notrace".
> +	"set_ftrace_filter", "set_ftrace_notrace",
> +	"set_graph_function", or "set_graph_notrace".
>  	(See the section "dynamic ftrace" below for more details.)

This part is fine.

-- Steve

-- Steve

>  
>    dyn_ftrace_total_info:

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
From: gengdongjiu @ 2018-04-13 13:50 UTC (permalink / raw)
  To: James Morse, gengdongjiu
  Cc: lishuo1, merry.libing, linux-arm-kernel@lists.infradead.org,
	Liujun (Jun Liu), linux-kernel@vger.kernel.org, corbet@lwn.net,
	marc.zyngier@arm.com, catalin.marinas@arm.com,
	linux-doc@vger.kernel.org, rjw@rjwysocki.net,
	linux@armlinux.org.uk, will.deacon@arm.com,
	robert.moore@intel.com, linux-acpi@vger.kernel.org, bp@alien8.de,
	lv.zheng@intel.com, Huangshaoyu, kvmarm@lists.cs.columbia.edu,
	devel@acpica.org
In-Reply-To: <649fa2f6-1823-116e-fe69-aeaaa6a508af@arm.com>

James,
   Thanks for this mail.

On 2018/4/13 0:14, James Morse wrote:
> Hi gengdongjiu,
> 
> On 12/04/18 06:00, gengdongjiu wrote:
>> 2018-02-16 1:55 GMT+08:00 James Morse <james.morse@arm.com>:
>>> On 05/02/18 11:24, gengdongjiu wrote:
>>>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>>>>> TGE}?
>>>>
>>>> Yes, it is.
>>>
>>> ... and yet ...
>>>
>>>
>>>>> What does your firmware do when it wants to emulate SError but its masked?
>>>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>>>> PSTATE.A  set.
>>>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>>>> emulated  SError should go to EL1. This effectively masks SError.)
>>>>
>>>> Currently we does not consider much about the mask status(SPSR).
>>>
>>> .. this is a problem.
>>>
>>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret to
>>> EL2. This should never happen, SError is effectively masked if you are running
>>> at an EL higher than the one its routed to.
>>>
>>> More obviously: if the exception came from the EL that SError should be routed
>>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only
> 
>> James, I  summarized the masking and routing rules for SError to
>> confirm with you for the firmware first solution,
> 
> You also said "Currently we does not consider much about the mask status(SPSR)."
Yes, we currently do not consider much it. After clarification with you, we want to modify the EL3 firmware to follow this rule.

> 
> 
>> 1. If the HCR_EL2.{AMO,TGE} is set,
> 
> If one or the other of these bits is set: (AMO==1 || TGE==1)
> 
>> which means the SError should route to EL2,
>> When system happens SError and trap to EL3,   If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
>> and find this SError come from EL2, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; but if
>> it find that this SError come from EL1 or EL0, it also need to deliver
>> an SError, right?
> 
> Yes.
> 
> 
>> 2. If the HCR_EL2.{AMO,TGE} is not set,
> 
> If neither of these bits is set: (AMO==0 && TGE == 0)
> 
>> which means the SError should route to EL1,
>> When system happens SError and trap to EL3, If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
> 
> (I'm reading this as all three of these bits are clear)
sorry, it is a typo issue.
it should be HCR_EL2.AMO and HCR_EL2.TGE are both clear, but SPSR_EL3.A is set.

> 
>> and find this SError come from EL1, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; 
> 
> No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
> interrupted EL1 and the A bit was clear, so EL1 can take an SError.

Agree.

> 
> The two cases here are:
> AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
> exception interrupted EL1 and the A bit was set, you need to do the BERT trick.

> 
> If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
"BERT trick" is storing the RAS error in the BERT and 'reboot, right?

> regardless of the A bit, as SError is implicitly masked by running at a higher
> exception level than it was routed to.


> 
> 
>>From your v11 reply:
>> 2. The exception came from the EL that SError should not be routed
>> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
>> firmware still deliver SError
> 
> (this is re-iterating the two-cases above:)
> 'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
> Route-to-EL1+interrupted-EL2.
> 
> Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
> SError can be delivered to EL2, as EL2 can't mask SError when executing at a
> lower EL.
Agree.

> 
> Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
> running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not be
> delivered.
"can not be delivered" means storing the RAS error in the BERT and 'reboot, right?
In the Table D1-15 in "D1.14.2 Asynchronous exception masking", for the case, it is "C"
"C"means SError is not taken regardless of the value of the Process state interrupt mask.
for this case, whether it will be unsafe if  BIOS directly reboot?


> KVM does this on the way out of a guest, if an SError occurs during this time
> the CPU will wait until execution returns to EL1 before delivering the SError.
> Your firmware has to do the same.
> 
> Table D1-15 in "D1.14.2 Asynchronous exception masking" has a table with all the
> combinations. The ARM-ARM is what we need to match with this behaviour.
> 
> 
>> but if it find that this SError come from EL0, it also need to deliver an
>> SError, right?
> 
> I thought interrupted-EL0 could always be delivered: but re-reading the
> ARM-ARM's "D1.14.2 Asynchronous exception masking", if asynchronous exceptions
> are routed to EL1 then EL0&EL1 are treated the same.
> So if SError is routed to EL1, the exception interrupted EL0, and SPSR_EL3.A was
> set, you still can't deliver the emulated-SError you have to do the BERT-trick.
> Linux doesn't do this today, but another OS might (e.g. UEFI), and we might do
> this in the future.
For this case, whether it will be unsafe if  BIOS directly reboot?
For example, for some test purpose, EL0 set PSTATE.A, just right happen SError, then BIOS will reboot system.
I am afraid that system will become unsafe because BIOS will reboot system.

> 
> This is really tricky for firmware to get right. Another alternative would be to
> put the CPER records in a Polled buffer, unless something needs doing right now,
> in which case a BERT-reboot is probably best.

In summary:

[1]:
Route-to-EL1 + interrupted-EL1, if SPSR_EL3.A is set, EL3 firmware can't deliver the emulated-SError, store the RAS error in the BERT and 'reboot.
Route-to-EL2 + interrupted-EL2, if SPSR_EL3.A is set, EL3 firmware can't deliver the emulated-SError, store the RAS error in the BERT and 'reboot.

I agree above two cases, but maybe we need to ensure that only in EL2 SError handler and EL1 SError exception handler the PSTATE.A is set, for other places, the PSTATE.A is not set.
then BIOS can know this is nested-SError when find the SPSR_EL3.A is set, can we ensure that in the Linux kernel code and KVM code?

[2]:
Route-to-EL2 + interrupted-EL1, regardless of SPSR_EL3.A the emulated SError can be delivered to EL2.
Route-to-EL2 + interrupted-EL0, regardless of SPSR_EL3.A the emulated SError can be delivered to EL2.

I agree above two cases.

[3]:
Route-to-EL1+interrupted-EL0, if SPSR_EL3.A is set, EL3 firmware can't deliver the emulated-SError, store the RAS error in the BERT and 'reboot
Route-to-EL1+interrupted-EL2, EL3 firmware store the RAS error in the BERT and 'reboot regardless of SPSR_EL3.A.

For above two cases, I am worried system will become unsafe because BIOS will reboot system.


> 
> 
> Thanks,
> 
> James
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Documentation/i2c: sync docs with current state of i2c-tools.
From: Jean Delvare @ 2018-04-13 12:13 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: Sam Hansen, linux-i2c, corbet, linux-doc, linux-kernel
In-Reply-To: <20180412222457.x4jdwt27ylgi74o3@katana>

Hi Wolfram, Sam,

On Fri, 13 Apr 2018 00:24:57 +0200, Wolfram Sang wrote:
> On Thu, Apr 12, 2018 at 02:33:42PM -0700, Sam Hansen wrote:
> > Currently, Documentation/i2c/dev-interface describes the use of i2c_smbus_*
> > helper routines as static inlined functions provided by linux/i2c-dev.h.  Work
> > has been done to refactor the linux/i2c-dev.h file in the i2c-tools project
> > out into its own library.  As a result, these docs have become stale.  
> 
> Thanks for fixing this!
> 
> > This patch corrects the discrepancy and directs the reader to the i2c-tools
> > project for more information.  Additionally, some trailing-whitespace cleanups
> > were made.  
> 
> Minor nit: Having the whitespace changes in a seperate patch is a tad
> easier to review.
> 
> > -  /* Using I2C Write, equivalent of 
> > +  /* Using I2C Write, equivalent of
> >       i2c_smbus_write_word_data(file, reg, 0x6543) */  
> 
> Maybe change to Kernel coding style comments while here?
> 
> > -  Not meant to be called  directly; instead, use the access functions
> > -  below.
> > +  If possible, use the provided i2c_smbus_* methods described below in favor
> > +  of issuing direct ioctls.  
> 
> Why this change?

I'm also not sure if "in favor of" is right. "instead of" would sound
better to me, but I'm no native English speaker, I could be wrong.

> > -The above functions are all inline functions, that resolve to calls to
> > -the i2c_smbus_access function, that on its turn calls a specific ioctl
> > -with the data in a specific format. Read the source code if you
> > -want to know what happens behind the screens.
> > +The above functions are made available by linking against the libi2c library,
> > +which is provided by the i2c-tools project.  See:
> > +https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.  
> 
> This is fine with me. Maybe Jean has a comment on this?

Fixing the documentation is always welcome, thanks Sam for stepping in.
However we really want separate patches for whitespace fixes and actual
contents change, as Wolfram already mentioned above.

-- 
Jean Delvare
SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] Documentation: ftrace: clarify filters with dynamic ftrace and graph
From: Steffen Maier @ 2018-04-13  9:26 UTC (permalink / raw)
  To: linux-doc; +Cc: Steffen Maier, Steven Rostedt, Ingo Molnar, Jonathan Corbet

I fell into the trap of having set up function tracer with a very
limited filter and then switched over to function_graph and was
erroneously wondering why the latter did not trace what I expected,
which was the full unabridged graph recursion.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
---
 Documentation/trace/ftrace.rst | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index e45f0786f3f9..68835d062344 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -224,6 +224,7 @@ of ftrace. Here is a list of some of the key files:
 	has a side effect of enabling or disabling specific functions
 	to be traced. Echoing names of functions into this file
 	will limit the trace to only those functions.
+	This influences both the tracers "function" and "function_graph"!
 
 	The functions listed in "available_filter_functions" are what
 	can be written into this file.
@@ -265,6 +266,9 @@ of ftrace. Here is a list of some of the key files:
 	Functions listed in this file will cause the function graph
 	tracer to only trace these functions and the functions that
 	they call. (See the section "dynamic ftrace" for more details).
+	With dynamic ftrace, the graph tracer can only trace functions that
+	set_ftrace_filter minus set_ftrace_notrace armed; this also influences
+	the ability to chase function call chains.
 
   set_graph_notrace:
 
@@ -277,7 +281,8 @@ of ftrace. Here is a list of some of the key files:
 
 	This lists the functions that ftrace has processed and can trace.
 	These are the function names that you can pass to
-	"set_ftrace_filter" or "set_ftrace_notrace".
+	"set_ftrace_filter", "set_ftrace_notrace",
+	"set_graph_function", or "set_graph_notrace".
 	(See the section "dynamic ftrace" below for more details.)
 
   dyn_ftrace_total_info:
-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [RFC bpf-next v2 4/8] bpf: add documentation for eBPF helpers (23-32)
From: Alexei Starovoitov @ 2018-04-13  0:28 UTC (permalink / raw)
  To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180410144157.4831-5-quentin.monnet@netronome.com>

On Tue, Apr 10, 2018 at 03:41:53PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
> 
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
> 
> This patch contains descriptions for the following helper functions, all
> written by Daniel:
> 
> - bpf_get_prandom_u32()
> - bpf_get_smp_processor_id()
> - bpf_get_cgroup_classid()
> - bpf_get_route_realm()
> - bpf_skb_load_bytes()
> - bpf_csum_diff()
> - bpf_skb_get_tunnel_opt()
> - bpf_skb_set_tunnel_opt()
> - bpf_skb_change_proto()
> - bpf_skb_change_type()
> 
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
>  include/uapi/linux/bpf.h | 125 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 125 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f3ea8824efbc..d147d9dd6a83 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -473,6 +473,14 @@ union bpf_attr {
>   * 		The number of bytes written to the buffer, or a negative error
>   * 		in case of failure.
>   *
> + * u32 bpf_prandom_u32(void)
> + * 	Return
> + * 		A random 32-bit unsigned value.

there is no such helper.
It's called bpf_get_prandom_u32().
I'd also add a note that it's using its own random state and cannot be
used to infer seed of other random functions in the kernel.

> + *
> + * u32 bpf_get_smp_processor_id(void)
> + * 	Return
> + * 		The SMP (Symmetric multiprocessing) processor id.

probably worth adding a note to explain that all bpf programs run
with preemption disabled, so processor id is stable for the run of the program.

> + *
>   * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
>   * 	Description
>   * 		Store *len* bytes from address *from* into the packet
> @@ -604,6 +612,13 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
> + * 	Description
> + * 		Retrieve the classid for the current task, i.e. for the
> + * 		net_cls (network classifier) cgroup to which *skb* belongs.

please add that kernel should be configured with CONFIG_NET_CLS_CGROUP=y|m
and mention Documentation/cgroup-v1/net_cls.txt
Otherwise 'network classifier' is way too generic.
I'd also mention that placing a task into net_cls controller
disables all of cgroup-bpf.

> + * 	Return
> + * 		The classid, or 0 for the default unconfigured classid.
> + *
>   * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
>   * 	Description
>   * 		Push a *vlan_tci* (VLAN tag control information) of protocol
> @@ -703,6 +718,14 @@ union bpf_attr {
>   * 		are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
>   * 		error.
>   *
> + * u32 bpf_get_route_realm(struct sk_buff *skb)
> + * 	Description
> + * 		Retrieve the realm or the route, that is to say the
> + * 		**tclassid** field of the destination for the *skb*.

Similarly this only works if CONFIG_IP_ROUTE_CLASSID is on.

> + * 	Return
> + * 		The realm of the route for the packet associated to *sdb*, or 0
> + * 		if none was found.
> + *
>   * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
>   * 	Description
>   * 		Write perf raw sample into a perf event held by *map* of type
> @@ -779,6 +802,21 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len)
> + * 	Description
> + * 		This helper was provided as an easy way to load data from a
> + * 		packet. It can be used to load *len* bytes from *offset* from
> + * 		the packet associated to *skb*, into the buffer pointed by
> + * 		*to*.
> + *
> + * 		Since Linux 4.7, this helper is deprecated in favor of
> + * 		"direct packet access", enabling packet data to be manipulated
> + * 		with *skb*\ **->data** and *skb*\ **->data_end** pointing
> + * 		respectively to the first byte of packet data and to the byte
> + * 		after the last byte of packet data.

I wouldn't call it deprecated.
It's still useful when programmer wants to read large quantities of
data from the packet

> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
> + *
>   * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
>   * 	Description
>   * 		Walk a user or a kernel stack and return its id. To achieve
> @@ -814,6 +852,93 @@ union bpf_attr {
>   * 		The positive or null stack id on success, or a negative error
>   * 		in case of failure.
>   *
> + * s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size, __wsum seed)
> + * 	Description
> + * 		Compute a checksum difference, from the raw buffer pointed by
> + * 		*from*, of length *from_size* (that must be a multiple of 4),
> + * 		towards the raw buffer pointed by *to*, of size *to_size*
> + * 		(same remark). An optional *seed* can be added to the value.
> + *
> + * 		This is flexible enough to be used in several ways:
> + *
> + * 		* With *from_size* == 0, *to_size* > 0 and *seed* set to
> + * 		  checksum, it can be used when pushing new data.
> + * 		* With *from_size* > 0, *to_size* == 0 and *seed* set to
> + * 		  checksum, it can be used when removing data from a packet.
> + * 		* With *from_size* > 0, *to_size* > 0 and *seed* set to 0, it
> + * 		  can be used to compute a diff. Note that *from_size* and
> + * 		  *to_size* do not need to be equal.
> + * 	Return
> + * 		The checksum result, or a negative error code in case of
> + * 		failure.
> + *
> + * int bpf_skb_get_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
> + * 	Description
> + * 		Retrieve tunnel options metadata for the packet associated to
> + * 		*skb*, and store the raw tunnel option data to the buffer *opt*
> + * 		of *size*.
> + * 	Return
> + * 		The size of the option data retrieved.
> + *
> + * int bpf_skb_set_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
> + * 	Description
> + * 		Set tunnel options metadata for the packet associated to *skb*
> + * 		to the option data contained in the raw buffer *opt* of *size*.
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
> + * 	Description
> + * 		Change the protocol of the *skb* to *proto*. Currently
> + * 		supported are transition from IPv4 to IPv6, and from IPv6 to
> + * 		IPv4. The helper takes care of the groundwork for the
> + * 		transition, including resizing the socket buffer. The eBPF
> + * 		program is expected to fill the new headers, if any, via
> + * 		**skb_store_bytes**\ () and to recompute the checksums with
> + * 		**bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\
> + * 		().
> + *
> + * 		Internally, the GSO type is marked as dodgy so that headers are
> + * 		checked and segments are recalculated by the GSO/GRO engine.
> + * 		The size for GSO target is adapted as well.
> + *
> + * 		All values for *flags* are reserved for future usage, and must
> + * 		be left at zero.
> + *
> + * 		A call to this helper is susceptible to change data from the
> + * 		packet. Therefore, at load time, all checks on pointers
> + * 		previously done by the verifier are invalidated and must be
> + * 		performed again.
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_change_type(struct sk_buff *skb, u32 type)
> + * 	Description
> + * 		Change the packet type for the packet associated to *skb*. This
> + * 		comes down to setting *skb*\ **->pkt_type** to *type*, except
> + * 		the eBPF program does not have a write access to *skb*\
> + * 		**->pkt_type** beside this helper. Using a helper here allows
> + * 		for graceful handling of errors.
> + *
> + * 		The major use case is to change incoming *skb*s to
> + * 		**PACKET_HOST** in a programmatic way instead of having to
> + * 		recirculate via **redirect**\ (..., **BPF_F_INGRESS**), for
> + * 		example.
> + *
> + * 		Note that *type* only allows certain values. At this time, they
> + * 		are:
> + *
> + * 		**PACKET_HOST**
> + * 			Packet is for us.
> + * 		**PACKET_BROADCAST**
> + * 			Send packet to all.
> + * 		**PACKET_MULTICAST**
> + * 			Send packet to group.
> + * 		**PACKET_OTHERHOST**
> + * 			Send packet to someone else.
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
> + *
>   * u64 bpf_get_current_task(void)
>   * 	Return
>   * 		A pointer to the current task struct.
> -- 
> 2.14.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Documentation/i2c: sync docs with current state of i2c-tools.
From: Wolfram Sang @ 2018-04-12 22:24 UTC (permalink / raw)
  To: Sam Hansen; +Cc: linux-i2c, corbet, linux-doc, linux-kernel, Jean Delvare
In-Reply-To: <20180412213342.138010-1-hansens@google.com>

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

Hi,

On Thu, Apr 12, 2018 at 02:33:42PM -0700, Sam Hansen wrote:
> Currently, Documentation/i2c/dev-interface describes the use of i2c_smbus_*
> helper routines as static inlined functions provided by linux/i2c-dev.h.  Work
> has been done to refactor the linux/i2c-dev.h file in the i2c-tools project
> out into its own library.  As a result, these docs have become stale.

Thanks for fixing this!

> This patch corrects the discrepancy and directs the reader to the i2c-tools
> project for more information.  Additionally, some trailing-whitespace cleanups
> were made.

Minor nit: Having the whitespace changes in a seperate patch is a tad
easier to review.

> -  /* Using I2C Write, equivalent of 
> +  /* Using I2C Write, equivalent of
>       i2c_smbus_write_word_data(file, reg, 0x6543) */

Maybe change to Kernel coding style comments while here?

> -  Not meant to be called  directly; instead, use the access functions
> -  below.
> +  If possible, use the provided i2c_smbus_* methods described below in favor
> +  of issuing direct ioctls.

Why this change?

> -The above functions are all inline functions, that resolve to calls to
> -the i2c_smbus_access function, that on its turn calls a specific ioctl
> -with the data in a specific format. Read the source code if you
> -want to know what happens behind the screens.
> +The above functions are made available by linking against the libi2c library,
> +which is provided by the i2c-tools project.  See:
> +https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.

This is fine with me. Maybe Jean has a comment on this?

Kind regards,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH] Documentation/i2c: sync docs with current state of i2c-tools.
From: Sam Hansen @ 2018-04-12 21:33 UTC (permalink / raw)
  To: linux-i2c, hansens, wsa, corbet, linux-doc, linux-kernel
  Cc: Sam Hansen, wsa, corbet, linux-doc, linux-kernel

Currently, Documentation/i2c/dev-interface describes the use of i2c_smbus_*
helper routines as static inlined functions provided by linux/i2c-dev.h.  Work
has been done to refactor the linux/i2c-dev.h file in the i2c-tools project
out into its own library.  As a result, these docs have become stale.

This patch corrects the discrepancy and directs the reader to the i2c-tools
project for more information.  Additionally, some trailing-whitespace cleanups
were made.

Signed-off-by: Sam Hansen <hansens@google.com>
---
 Documentation/i2c/dev-interface | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/Documentation/i2c/dev-interface b/Documentation/i2c/dev-interface
index d04e6e4964ee..5323588fe99d 100644
--- a/Documentation/i2c/dev-interface
+++ b/Documentation/i2c/dev-interface
@@ -9,8 +9,8 @@ i2c adapters present on your system at a given time. i2cdetect is part of
 the i2c-tools package.
 
 I2C device files are character device files with major device number 89
-and a minor device number corresponding to the number assigned as 
-explained above. They should be called "i2c-%d" (i2c-0, i2c-1, ..., 
+and a minor device number corresponding to the number assigned as
+explained above. They should be called "i2c-%d" (i2c-0, i2c-1, ...,
 i2c-10, ...). All 256 minor device numbers are reserved for i2c.
 
 
@@ -23,11 +23,6 @@ First, you need to include these two headers:
   #include <linux/i2c-dev.h>
   #include <i2c/smbus.h>
 
-(Please note that there are two files named "i2c-dev.h" out there. One is
-distributed with the Linux kernel and the other one is included in the
-source tree of i2c-tools. They used to be different in content but since 2012
-they're identical. You should use "linux/i2c-dev.h").
-
 Now, you have to decide which adapter you want to access. You should
 inspect /sys/class/i2c-dev/ or run "i2cdetect -l" to decide this.
 Adapter numbers are assigned somewhat dynamically, so you can not
@@ -38,7 +33,7 @@ Next thing, open the device file, as follows:
   int file;
   int adapter_nr = 2; /* probably dynamically determined */
   char filename[20];
-  
+
   snprintf(filename, 19, "/dev/i2c-%d", adapter_nr);
   file = open(filename, O_RDWR);
   if (file < 0) {
@@ -72,7 +67,7 @@ the device supports them. Both are illustrated below.
     /* res contains the read word */
   }
 
-  /* Using I2C Write, equivalent of 
+  /* Using I2C Write, equivalent of
      i2c_smbus_write_word_data(file, reg, 0x6543) */
   buf[0] = reg;
   buf[1] = 0x43;
@@ -140,14 +135,14 @@ ioctl(file, I2C_RDWR, struct i2c_rdwr_ioctl_data *msgset)
   set in each message, overriding the values set with the above ioctl's.
 
 ioctl(file, I2C_SMBUS, struct i2c_smbus_ioctl_data *args)
-  Not meant to be called  directly; instead, use the access functions
-  below.
+  If possible, use the provided i2c_smbus_* methods described below in favor
+  of issuing direct ioctls.
 
 You can do plain i2c transactions by using read(2) and write(2) calls.
 You do not need to pass the address byte; instead, set it through
 ioctl I2C_SLAVE before you try to access the device.
 
-You can do SMBus level transactions (see documentation file smbus-protocol 
+You can do SMBus level transactions (see documentation file smbus-protocol
 for details) through the following functions:
   __s32 i2c_smbus_write_quick(int file, __u8 value);
   __s32 i2c_smbus_read_byte(int file);
@@ -158,7 +153,7 @@ for details) through the following functions:
   __s32 i2c_smbus_write_word_data(int file, __u8 command, __u16 value);
   __s32 i2c_smbus_process_call(int file, __u8 command, __u16 value);
   __s32 i2c_smbus_read_block_data(int file, __u8 command, __u8 *values);
-  __s32 i2c_smbus_write_block_data(int file, __u8 command, __u8 length, 
+  __s32 i2c_smbus_write_block_data(int file, __u8 command, __u8 length,
                                    __u8 *values);
 All these transactions return -1 on failure; you can read errno to see
 what happened. The 'write' transactions return 0 on success; the
@@ -166,10 +161,9 @@ what happened. The 'write' transactions return 0 on success; the
 returns the number of values read. The block buffers need not be longer
 than 32 bytes.
 
-The above functions are all inline functions, that resolve to calls to
-the i2c_smbus_access function, that on its turn calls a specific ioctl
-with the data in a specific format. Read the source code if you
-want to know what happens behind the screens.
+The above functions are made available by linking against the libi2c library,
+which is provided by the i2c-tools project.  See:
+https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.
 
 
 Implementation details
-- 
2.17.0.484.g0c8726318c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v4] mm: remove odd HAVE_PTE_SPECIAL
From: David Rientjes @ 2018-04-12 20:46 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: linux-kernel, linux-mm, mhocko, akpm, linuxppc-dev, x86,
	linux-doc, linux-snps-arc, linux-arm-kernel, linux-riscv,
	linux-s390, linux-sh, sparclinux, Jerome Glisse, aneesh.kumar,
	benh, paulus, Jonathan Corbet, Catalin Marinas, Will Deacon,
	Yoshinori Sato, Rich Felker, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Vineet Gupta, Palmer Dabbelt, Albert Ou,
	Martin Schwidefsky, Heiko Carstens, Robin Murphy,
	Christophe LEROY
In-Reply-To: <1523533733-25437-1-git-send-email-ldufour@linux.vnet.ibm.com>

On Thu, 12 Apr 2018, Laurent Dufour wrote:

> Remove the additional define HAVE_PTE_SPECIAL and rely directly on
> CONFIG_ARCH_HAS_PTE_SPECIAL.
> 
> There is no functional change introduced by this patch
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] gpiolib: add hogs support for machine code
From: Christian Lamparter @ 2018-04-12 20:00 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Linus Walleij, Jonathan Corbet, linux-gpio, linux-doc,
	linux-kernel
In-Reply-To: <20180410203028.11412-1-brgl@bgdev.pl>

On Dienstag, 10. April 2018 22:30:28 CEST Bartosz Golaszewski wrote:
> Board files constitute a significant part of the users of the legacy
> GPIO framework. In many cases they only export a line and set its
> desired value. We could use GPIO hogs for that like we do for DT and
> ACPI but there's no support for that in machine code.
> 
> This patch proposes to extend the machine.h API with support for
> registering hog tables in board files.
> 
> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
> ---
> @@ -1326,6 +1364,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data,
>  
>  	acpi_gpiochip_add(chip);
>  
> +	machine_gpiochip_add(chip);
> +
>  	/*
>  	 * By first adding the chardev, and then adding the device,
>  	 * we get a device node entry in sysfs under
> @@ -3462,6 +3502,33 @@ void gpiod_remove_lookup_table(struct gpiod_lookup_table *table)

I think I see the same problem right here in regards to pinctrls
and gpiohogs that have with DeviceTree:
<https://patchwork.kernel.org/patch/10313767/>

The problem is that unlike native gpio-controllers, pinctrls need 
to have a "pin/gpio range" defined before any gpio-hogs can be added.

If this is not the case the generic pinctrl_gpio_reguest() [0] will
fail with -EPROBE_DEFER at this point. (see the call chain in the
"pinctrl: msm: fix gpio-hog related boot issueslogin register" mail
starting from gpiod_hog).

And now the crux of the matter is that currently in order for pinctrl
drivers to register the range they have to call gpiochip_add_pin_range() [1]. 
But they only can do it after the gpiochip_add_data_with_key() [2], since
this function initializes the pin_ranges list [3].

So what will happen is that you'll get an
"gpiochip_machine_hog: unable to hog GPIO line $LABEL $GPIONR -517" error
for every single gpio-hog and wonder why :(.

Regards,
Christian

[0] <https://elixir.bootlin.com/linux/v4.16.2/source/drivers/pinctrl/core.c#L743>
[1] <https://elixir.bootlin.com/linux/latest/source/drivers/gpio/gpiolib.c#L2078>
[2] <https://elixir.bootlin.com/linux/latest/source/drivers/gpio/gpiolib.c#L1136>
[3] <https://elixir.bootlin.com/linux/latest/source/drivers/gpio/gpiolib.c#L1253>



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
From: Jae Hyun Yoo @ 2018-04-12 19:51 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
	Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
	Haiyue Wang, James Feist, Jason M Biils, Jean Delvare,
	Joel Stanley, Julia Cartwright, Miguel Ojeda, Milton Miller II,
	Pavel Machek, Randy Dunlap, Stef van Os, Sumeet R Pawnikar,
	Vernon Mauery, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc
In-Reply-To: <20180412173714.GA13496@roeck-us.net>

On 4/12/2018 10:37 AM, Guenter Roeck wrote:
> On Thu, Apr 12, 2018 at 10:09:51AM -0700, Jae Hyun Yoo wrote:
> [ ... ]
>>>>>>>> +static int find_core_index(struct peci_cputemp *priv, int channel)
>>>>>>>> +{
>>>>>>>> +    int core_channel = channel - DEFAULT_CHANNEL_NUMS;
>>>>>>>> +    int idx, found = 0;
>>>>>>>> +
>>>>>>>> +    for (idx = 0; idx < priv->gen_info->core_max; idx++) {
>>>>>>>> +        if (priv->core_mask & BIT(idx)) {
>>>>>>>> +            if (core_channel == found)
>>>>>>>> +                break;
>>>>>>>> +
>>>>>>>> +            found++;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return idx;
>>>>>>>
>>>>>>> What if nothing is found ?
>>>>>>>
>>>>>>
>>>>>> Core temperature group will be registered only when it detects at
>>>>>> least one core checked by check_resolved_cores(), so
>>>>>> find_core_index() can be called only when priv->core_mask has a
>>>>>> non-zero value. The 'nothing is found' case will not happen.
>>>>>>
>>>>> That doesn't guarantee a match. If what you are saying is correct
>>>>> there should always be
>>>>> a well defined match of channel -> idx, and the search should be
>>>>> unnecessary.
>>>>>
>>>>
>>>> There could be some disabled cores in the resolved core mask bit
>>>> sequence also it should remove indexing gap in channel numbering so it
>>>> is the reason why this search function is needed. Well defined match of
>>>> channel -> idx would not be always satisfied.
>>>>
>>> Are you saying that each call to the function, with the same parameters,
>>> can return a different result ?
>>>
>>
>> No, the result will be consistent. After reading the priv->core_mask once in
>> check_resolved_cores(), the value will not be changed. I'm saying about this
>> case, for example if core number 2 is unresolved in total 4 cores, then the
>> idx order will be '0, 1, 3' but channel order will be '5, 6, 7' without
>> making any indexing gap.
>>
> 
> And you yet you claim that this is not well defined ? Or are you concerned
> about the amount of memory consumed by providing an array for the mapping ?
> 
> Note that an indexing gap is acceptable and, in many cases, preferred.
> 

If the indexing gap is acceptable, the index search function isn't 
needed anymore. I'll fix all relating code to make that use direct 
mapping of channel -> idx then. Thanks!

> [ ... ]
> 
>>>>>>>> +
>>>>>>>> +    dev_dbg(dev, "%s: sensor '%s'\n", dev_name(hwmon_dev),
>>>>>>>> priv->name);
>>>>>>>> +
>>>>>
>>>>> Why does this message display the device name twice ?
>>>>>
>>>>
>>>> For an example, dev_name(hwmon_dev) shows 'hwmon5' and priv->name shows
>>>> 'peci-cputemp0'.
>>>>
>>> And dev_dbg() shows another device name. So you'll have something like
>>>
>>> peci-cputemp0: hwmon5: sensor 'peci-cputemp0'
>>>
>>
>> Practically it shows like
>>
>> peci-cputemp 0-30:00: hwmon10: sensor 'peci_cputemp.cpu0'
>>
>> where 0-30:00 is assigned by peci core.
>>
> 
> And what message would you see for cpu1 ?
> 

It shows like

peci-cputemp 0-31:00: hwmon10: sensor 'peci_cputemp.cpu1'
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] sched/fair: add support to tune PELT ramp/decay timings
From: Peter Zijlstra @ 2018-04-12 19:13 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Juri Lelli, Joel Fernandes,
	Steve Muckle, Dietmar Eggemann, Morten Rasmussen, Jonathan Corbet,
	Paul Turner, linux-doc
In-Reply-To: <20180409165134.707-1-patrick.bellasi@arm.com>

On Mon, Apr 09, 2018 at 05:51:34PM +0100, Patrick Bellasi wrote:
> The PELT half-life is the time [ms] required by the PELT signal to build
> up a 50% load/utilization, starting from zero. This time is currently
> hardcoded to be 32ms, a value which seems to make sense for most of the
> workloads.
> 
> However, 32ms has been verified to be too long for certain classes of
> workloads. For example, in the mobile space many tasks affecting the
> user-experience run with a 16ms or 8ms cadence, since they need to match
> the common 60Hz or 120Hz refresh rate of the graphics pipeline.
> This contributed so fare to the idea that "PELT is too slow" to properly
> track the utilization of interactive mobile workloads, especially
> compared to alternative load tracking solutions which provides a
> better representation of tasks demand in the range of 10-20ms.

Initially the 32 was chosen to more or less correspond to the effective
scheduling period (sysctl_sched_latency based). The thinking was that if
you pick a PELT window shorter than the period, the result becomes
unstable due to not all tasks getting an equal go at things.

(of course, stuffing enough tasks on a rq will break this, but at that
point you have worse problems to deal with)

Should we retain this? Esp. with the lower end (8ms) I worry we'll see
more of those effects.


> Fortunately, since the integration of the utilization estimation
> support in mainline kernel:
> 
>    commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
> 
> a fast decay time is no longer an issue for tasks utilization estimation.
> Although estimated utilization does not slow down the decay of blocked
> utilization on idle CPUs, for mobile workloads this seems not to be a
> major concern compared to the benefits in interactivity responsiveness.

By picking a smaller PELT window, the util_est window shrinks
correspondingly; is that intentional or do we want to modify
UTIL_EST_WEIGHT_SHIFT to negate the PELT window changes?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
From: Guenter Roeck @ 2018-04-12 17:37 UTC (permalink / raw)
  To: Jae Hyun Yoo
  Cc: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
	Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
	Haiyue Wang, James Feist, Jason M Biils, Jean Delvare,
	Joel Stanley, Julia Cartwright, Miguel Ojeda, Milton Miller II,
	Pavel Machek, Randy Dunlap, Stef van Os, Sumeet R Pawnikar,
	Vernon Mauery, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc
In-Reply-To: <89c8c847-87e7-a849-e05d-d32b787e9305@linux.intel.com>

On Thu, Apr 12, 2018 at 10:09:51AM -0700, Jae Hyun Yoo wrote:
[ ... ]
> >>>>>>+static int find_core_index(struct peci_cputemp *priv, int channel)
> >>>>>>+{
> >>>>>>+    int core_channel = channel - DEFAULT_CHANNEL_NUMS;
> >>>>>>+    int idx, found = 0;
> >>>>>>+
> >>>>>>+    for (idx = 0; idx < priv->gen_info->core_max; idx++) {
> >>>>>>+        if (priv->core_mask & BIT(idx)) {
> >>>>>>+            if (core_channel == found)
> >>>>>>+                break;
> >>>>>>+
> >>>>>>+            found++;
> >>>>>>+        }
> >>>>>>+    }
> >>>>>>+
> >>>>>>+    return idx;
> >>>>>
> >>>>>What if nothing is found ?
> >>>>>
> >>>>
> >>>>Core temperature group will be registered only when it detects at
> >>>>least one core checked by check_resolved_cores(), so
> >>>>find_core_index() can be called only when priv->core_mask has a
> >>>>non-zero value. The 'nothing is found' case will not happen.
> >>>>
> >>>That doesn't guarantee a match. If what you are saying is correct
> >>>there should always be
> >>>a well defined match of channel -> idx, and the search should be
> >>>unnecessary.
> >>>
> >>
> >>There could be some disabled cores in the resolved core mask bit
> >>sequence also it should remove indexing gap in channel numbering so it
> >>is the reason why this search function is needed. Well defined match of
> >>channel -> idx would not be always satisfied.
> >>
> >Are you saying that each call to the function, with the same parameters,
> >can return a different result ?
> >
> 
> No, the result will be consistent. After reading the priv->core_mask once in
> check_resolved_cores(), the value will not be changed. I'm saying about this
> case, for example if core number 2 is unresolved in total 4 cores, then the
> idx order will be '0, 1, 3' but channel order will be '5, 6, 7' without
> making any indexing gap.
> 

And you yet you claim that this is not well defined ? Or are you concerned
about the amount of memory consumed by providing an array for the mapping ?

Note that an indexing gap is acceptable and, in many cases, preferred.

[ ... ]

> >>>>>>+
> >>>>>>+    dev_dbg(dev, "%s: sensor '%s'\n", dev_name(hwmon_dev),
> >>>>>>priv->name);
> >>>>>>+
> >>>
> >>>Why does this message display the device name twice ?
> >>>
> >>
> >>For an example, dev_name(hwmon_dev) shows 'hwmon5' and priv->name shows
> >>'peci-cputemp0'.
> >>
> >And dev_dbg() shows another device name. So you'll have something like
> >
> >peci-cputemp0: hwmon5: sensor 'peci-cputemp0'
> >
> 
> Practically it shows like
> 
> peci-cputemp 0-30:00: hwmon10: sensor 'peci_cputemp.cpu0'
> 
> where 0-30:00 is assigned by peci core.
> 

And what message would you see for cpu1 ?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
From: Jae Hyun Yoo @ 2018-04-12 17:09 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
	Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
	Haiyue Wang, James Feist, Jason M Biils, Jean Delvare,
	Joel Stanley, Julia Cartwright, Miguel Ojeda, Milton Miller II,
	Pavel Machek, Randy Dunlap, Stef van Os, Sumeet R Pawnikar,
	Vernon Mauery, linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc
In-Reply-To: <13d4c632-a20e-3c40-066a-1e9df02a0595@roeck-us.net>

On 4/11/2018 8:40 PM, Guenter Roeck wrote:
> On 04/11/2018 07:51 PM, Jae Hyun Yoo wrote:
>> On 4/11/2018 5:34 PM, Guenter Roeck wrote:
>>> On 04/11/2018 02:59 PM, Jae Hyun Yoo wrote:
>>>> Hi Guenter,
>>>>
>>>> Thanks a lot for sharing your time. Please see my inline answers.
>>>>
>>>> On 4/10/2018 3:28 PM, Guenter Roeck wrote:
>>>>> On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote:
>>>>>> This commit adds PECI cputemp and dimmtemp hwmon drivers.
>>>>>>
>>>>>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
>>>>>> Reviewed-by: Haiyue Wang <haiyue.wang@linux.intel.com>
>>>>>> Reviewed-by: James Feist <james.feist@linux.intel.com>
>>>>>> Reviewed-by: Vernon Mauery <vernon.mauery@linux.intel.com>
>>>>>> Cc: Alan Cox <alan@linux.intel.com>
>>>>>> Cc: Andrew Jeffery <andrew@aj.id.au>
>>>>>> Cc: Andrew Lunn <andrew@lunn.ch>
>>>>>> Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
>>>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>>>> Cc: Fengguang Wu <fengguang.wu@intel.com>
>>>>>> Cc: Greg KH <gregkh@linuxfoundation.org>
>>>>>> Cc: Guenter Roeck <linux@roeck-us.net>
>>>>>> Cc: Jason M Biils <jason.m.bills@linux.intel.com>
>>>>>> Cc: Jean Delvare <jdelvare@suse.com>
>>>>>> Cc: Joel Stanley <joel@jms.id.au>
>>>>>> Cc: Julia Cartwright <juliac@eso.teric.us>
>>>>>> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
>>>>>> Cc: Milton Miller II <miltonm@us.ibm.com>
>>>>>> Cc: Pavel Machek <pavel@ucw.cz>
>>>>>> Cc: Randy Dunlap <rdunlap@infradead.org>
>>>>>> Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
>>>>>> Cc: Sumeet R Pawnikar <sumeet.r.pawnikar@intel.com>
>>>>>> ---
>>>>>>   drivers/hwmon/Kconfig         |  28 ++
>>>>>>   drivers/hwmon/Makefile        |   2 +
>>>>>>   drivers/hwmon/peci-cputemp.c  | 783 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>>>   drivers/hwmon/peci-dimmtemp.c | 432 +++++++++++++++++++++++
>>>>>>   4 files changed, 1245 insertions(+)
>>>>>>   create mode 100644 drivers/hwmon/peci-cputemp.c
>>>>>>   create mode 100644 drivers/hwmon/peci-dimmtemp.c
>>>>>>
>>>>>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>>>>>> index f249a4428458..c52f610f81d0 100644
>>>>>> --- a/drivers/hwmon/Kconfig
>>>>>> +++ b/drivers/hwmon/Kconfig
>>>>>> @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904
>>>>>>         This driver can also be built as a module.  If so, the module
>>>>>>         will be called nct7904.
>>>>>> +config SENSORS_PECI_CPUTEMP
>>>>>> +    tristate "PECI CPU temperature monitoring support"
>>>>>> +    depends on OF
>>>>>> +    depends on PECI
>>>>>> +    help
>>>>>> +      If you say yes here you get support for the generic Intel PECI
>>>>>> +      cputemp driver which provides Digital Thermal Sensor (DTS) 
>>>>>> thermal
>>>>>> +      readings of the CPU package and CPU cores that are 
>>>>>> accessible using
>>>>>> +      the PECI Client Command Suite via the processor PECI client.
>>>>>> +      Check Documentation/hwmon/peci-cputemp for details.
>>>>>> +
>>>>>> +      This driver can also be built as a module.  If so, the module
>>>>>> +      will be called peci-cputemp.
>>>>>> +
>>>>>> +config SENSORS_PECI_DIMMTEMP
>>>>>> +    tristate "PECI DIMM temperature monitoring support"
>>>>>> +    depends on OF
>>>>>> +    depends on PECI
>>>>>> +    help
>>>>>> +      If you say yes here you get support for the generic Intel 
>>>>>> PECI hwmon
>>>>>> +      driver which provides Digital Thermal Sensor (DTS) thermal 
>>>>>> readings of
>>>>>> +      DIMM components that are accessible using the PECI Client 
>>>>>> Command
>>>>>> +      Suite via the processor PECI client.
>>>>>> +      Check Documentation/hwmon/peci-dimmtemp for details.
>>>>>> +
>>>>>> +      This driver can also be built as a module.  If so, the module
>>>>>> +      will be called peci-dimmtemp.
>>>>>> +
>>>>>>   config SENSORS_NSA320
>>>>>>       tristate "ZyXEL NSA320 and compatible fan speed and 
>>>>>> temperature sensors"
>>>>>>       depends on GPIOLIB && OF
>>>>>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>>>>>> index e7d52a36e6c4..48d9598fcd3a 100644
>>>>>> --- a/drivers/hwmon/Makefile
>>>>>> +++ b/drivers/hwmon/Makefile
>>>>>> @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802)    += nct7802.o
>>>>>>   obj-$(CONFIG_SENSORS_NCT7904)    += nct7904.o
>>>>>>   obj-$(CONFIG_SENSORS_NSA320)    += nsa320-hwmon.o
>>>>>>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)    += ntc_thermistor.o
>>>>>> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP)    += peci-cputemp.o
>>>>>> +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP)    += peci-dimmtemp.o
>>>>>>   obj-$(CONFIG_SENSORS_PC87360)    += pc87360.o
>>>>>>   obj-$(CONFIG_SENSORS_PC87427)    += pc87427.o
>>>>>>   obj-$(CONFIG_SENSORS_PCF8591)    += pcf8591.o
>>>>>> diff --git a/drivers/hwmon/peci-cputemp.c 
>>>>>> b/drivers/hwmon/peci-cputemp.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..f0bc92687512
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/hwmon/peci-cputemp.c
>>>>>> @@ -0,0 +1,783 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>> +// Copyright (c) 2018 Intel Corporation
>>>>>> +
>>>>>> +#include <linux/delay.h>
>>>>>> +#include <linux/hwmon.h>
>>>>>> +#include <linux/hwmon-sysfs.h>
>>>>>
>>>>> Is this include needed ?
>>>>>
>>>>
>>>> No it isn't. Will drop the line.
>>>>
>>>>>> +#include <linux/jiffies.h>
>>>>>> +#include <linux/module.h>
>>>>>> +#include <linux/of_device.h>
>>>>>> +#include <linux/peci.h>
>>>>>> +
>>>>>> +#define TEMP_TYPE_PECI        6  /* Sensor type 6: Intel PECI */
>>>>>> +
>>>>>> +#define CORE_MAX_ON_HSX       18 /* Max number of cores on 
>>>>>> Haswell */
>>>>>> +#define CORE_MAX_ON_BDX       24 /* Max number of cores on 
>>>>>> Broadwell */
>>>>>> +#define CORE_MAX_ON_SKX       28 /* Max number of cores on 
>>>>>> Skylake */
>>>>>> +
>>>>>> +#define DEFAULT_CHANNEL_NUMS  5
>>>>>> +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX
>>>>>> +#define CPUTEMP_CHANNEL_NUMS  (DEFAULT_CHANNEL_NUMS + 
>>>>>> CORETEMP_CHANNEL_NUMS)
>>>>>> +
>>>>>> +#define CLIENT_CPU_ID_MASK    0xf0ff0  /* Mask for Family / Model 
>>>>>> info */
>>>>>> +
>>>>>> +#define UPDATE_INTERVAL_MIN   HZ
>>>>>> +
>>>>>> +enum cpu_gens {
>>>>>> +    CPU_GEN_HSX, /* Haswell Xeon */
>>>>>> +    CPU_GEN_BRX, /* Broadwell Xeon */
>>>>>> +    CPU_GEN_SKX, /* Skylake Xeon */
>>>>>> +    CPU_GEN_MAX
>>>>>> +};
>>>>>> +
>>>>>> +struct cpu_gen_info {
>>>>>> +    u32 type;
>>>>>> +    u32 cpu_id;
>>>>>> +    u32 core_max;
>>>>>> +};
>>>>>> +
>>>>>> +struct temp_data {
>>>>>> +    bool valid;
>>>>>> +    s32  value;
>>>>>> +    unsigned long last_updated;
>>>>>> +};
>>>>>> +
>>>>>> +struct temp_group {
>>>>>> +    struct temp_data die;
>>>>>> +    struct temp_data dts_margin;
>>>>>> +    struct temp_data tcontrol;
>>>>>> +    struct temp_data tthrottle;
>>>>>> +    struct temp_data tjmax;
>>>>>> +    struct temp_data core[CORETEMP_CHANNEL_NUMS];
>>>>>> +};
>>>>>> +
>>>>>> +struct peci_cputemp {
>>>>>> +    struct peci_client *client;
>>>>>> +    struct device *dev;
>>>>>> +    char name[PECI_NAME_SIZE];
>>>>>> +    struct temp_group temp;
>>>>>> +    u8 addr;
>>>>>> +    uint cpu_no;
>>>>>> +    const struct cpu_gen_info *gen_info;
>>>>>> +    u32 core_mask;
>>>>>> +    u32 temp_config[CPUTEMP_CHANNEL_NUMS + 1];
>>>>>> +    uint config_idx;
>>>>>> +    struct hwmon_channel_info temp_info;
>>>>>> +    const struct hwmon_channel_info *info[2];
>>>>>> +    struct hwmon_chip_info chip;
>>>>>> +};
>>>>>> +
>>>>>> +enum cputemp_channels {
>>>>>> +    channel_die,
>>>>>> +    channel_dts_mrgn,
>>>>>> +    channel_tcontrol,
>>>>>> +    channel_tthrottle,
>>>>>> +    channel_tjmax,
>>>>>> +    channel_core,
>>>>>> +};
>>>>>> +
>>>>>> +static const struct cpu_gen_info cpu_gen_info_table[] = {
>>>>>> +    { .type = CPU_GEN_HSX,
>>>>>> +      .cpu_id = 0x306f0, /* Family code: 6, Model number: 63 
>>>>>> (0x3f) */
>>>>>> +      .core_max = CORE_MAX_ON_HSX },
>>>>>> +    { .type = CPU_GEN_BRX,
>>>>>> +      .cpu_id = 0x406f0, /* Family code: 6, Model number: 79 
>>>>>> (0x4f) */
>>>>>> +      .core_max = CORE_MAX_ON_BDX },
>>>>>> +    { .type = CPU_GEN_SKX,
>>>>>> +      .cpu_id = 0x50650, /* Family code: 6, Model number: 85 
>>>>>> (0x55) */
>>>>>> +      .core_max = CORE_MAX_ON_SKX },
>>>>>> +};
>>>>>> +
>>>>>> +static const u32 config_table[DEFAULT_CHANNEL_NUMS + 1] = {
>>>>>> +    /* Die temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
>>>>>> +    HWMON_T_CRIT_HYST,
>>>>>> +
>>>>>> +    /* DTS margin temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MIN | HWMON_T_LCRIT,
>>>>>> +
>>>>>> +    /* Tcontrol temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
>>>>>> +
>>>>>> +    /* Tthrottle temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT,
>>>>>> +
>>>>>> +    /* Tjmax temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT,
>>>>>> +
>>>>>> +    /* Core temperature - for all core channels */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
>>>>>> +    HWMON_T_CRIT_HYST,
>>>>>> +};
>>>>>> +
>>>>>> +static const char *cputemp_label[CPUTEMP_CHANNEL_NUMS] = {
>>>>>> +    "Die",
>>>>>> +    "DTS margin",
>>>>>> +    "Tcontrol",
>>>>>> +    "Tthrottle",
>>>>>> +    "Tjmax",
>>>>>> +    "Core 0", "Core 1", "Core 2", "Core 3",
>>>>>> +    "Core 4", "Core 5", "Core 6", "Core 7",
>>>>>> +    "Core 8", "Core 9", "Core 10", "Core 11",
>>>>>> +    "Core 12", "Core 13", "Core 14", "Core 15",
>>>>>> +    "Core 16", "Core 17", "Core 18", "Core 19",
>>>>>> +    "Core 20", "Core 21", "Core 22", "Core 23",
>>>>>> +};
>>>>>> +
>>>>>> +static int send_peci_cmd(struct peci_cputemp *priv,
>>>>>> +             enum peci_cmd cmd,
>>>>>> +             void *msg)
>>>>>> +{
>>>>>> +    return peci_command(priv->client->adapter, cmd, msg);
>>>>>> +}
>>>>>> +
>>>>>> +static int need_update(struct temp_data *temp)
>>>>>
>>>>> Please use bool.
>>>>>
>>>>
>>>> Okay. I'll use bool instead of int.
>>>>
>>>>>> +{
>>>>>> +    if (temp->valid &&
>>>>>> +        time_before(jiffies, temp->last_updated + 
>>>>>> UPDATE_INTERVAL_MIN))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    return 1;
>>>>>> +}
>>>>>> +
>>>>>> +static void mark_updated(struct temp_data *temp)
>>>>>> +{
>>>>>> +    temp->valid = true;
>>>>>> +    temp->last_updated = jiffies;
>>>>>> +}
>>>>>> +
>>>>>> +static s32 ten_dot_six_to_millidegree(s32 val)
>>>>>> +{
>>>>>> +    return ((val ^ 0x8000) - 0x8000) * 1000 / 64;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_tjmax(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!priv->temp.tjmax.valid) {
>>>>>> +        msg.addr = priv->addr;
>>>>>> +        msg.index = MBX_INDEX_TEMP_TARGET;
>>>>>> +        msg.param = 0;
>>>>>> +        msg.rx_len = 4;
>>>>>> +
>>>>>> +        rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
>>>>>> +        priv->temp.tjmax.valid = true;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_tcontrol(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 tcontrol_margin;
>>>>>> +    s32 tthrottle_offset;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.tcontrol))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_TEMP_TARGET;
>>>>>> +    msg.param = 0;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    tcontrol_margin = msg.pkg_config[1];
>>>>>> +    tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>>>>>> +    priv->temp.tcontrol.value = priv->temp.tjmax.value - 
>>>>>> tcontrol_margin;
>>>>>> +
>>>>>> +    tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>>>>>> +    priv->temp.tthrottle.value = priv->temp.tjmax.value - 
>>>>>> tthrottle_offset;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.tcontrol);
>>>>>> +    mark_updated(&priv->temp.tthrottle);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_tthrottle(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 tcontrol_margin;
>>>>>> +    s32 tthrottle_offset;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.tthrottle))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_TEMP_TARGET;
>>>>>> +    msg.param = 0;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>>>>>> +    priv->temp.tthrottle.value = priv->temp.tjmax.value - 
>>>>>> tthrottle_offset;
>>>>>> +
>>>>>> +    tcontrol_margin = msg.pkg_config[1];
>>>>>> +    tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>>>>>> +    priv->temp.tcontrol.value = priv->temp.tjmax.value - 
>>>>>> tcontrol_margin;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.tthrottle);
>>>>>> +    mark_updated(&priv->temp.tcontrol);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>
>>>>> I am quite completely missing how the two functions above are 
>>>>> different.
>>>>>
>>>>
>>>> The two above functions are slightly different but uses the same 
>>>> PECI command which provides both Tthrottle and Tcontrol values in 
>>>> pkg_config array so it updates the values to reduce duplicate PECI 
>>>> transactions. Probably, combining these two functions into 
>>>> get_ttrottle_and_tcontrol() would look better. I'll rewrite it.
>>>>
>>>>>> +
>>>>>> +static int get_die_temp(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_get_temp_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.die))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    priv->temp.die.value = priv->temp.tjmax.value +
>>>>>> +                   ((s32)msg.temp_raw * 1000 / 64);
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.die);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_dts_margin(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 dts_margin;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.dts_margin))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_DTS_MARGIN;
>>>>>> +    msg.param = 0;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>>>>>> +
>>>>>> +    /**
>>>>>> +     * Processors return a value of DTS reading in 10.6 format
>>>>>> +     * (10 bits signed decimal, 6 bits fractional).
>>>>>> +     * Error codes:
>>>>>> +     *   0x8000: General sensor error
>>>>>> +     *   0x8001: Reserved
>>>>>> +     *   0x8002: Underflow on reading value
>>>>>> +     *   0x8003-0x81ff: Reserved
>>>>>> +     */
>>>>>> +    if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
>>>>>> +        return -EIO;
>>>>>> +
>>>>>> +    dts_margin = ten_dot_six_to_millidegree(dts_margin);
>>>>>> +
>>>>>> +    priv->temp.dts_margin.value = dts_margin;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.dts_margin);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_core_temp(struct peci_cputemp *priv, int core_index)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 core_dts_margin;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.core[core_index]))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
>>>>>> +    msg.param = core_index;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>>>>>> +
>>>>>> +    /**
>>>>>> +     * Processors return a value of the core DTS reading in 10.6 
>>>>>> format
>>>>>> +     * (10 bits signed decimal, 6 bits fractional).
>>>>>> +     * Error codes:
>>>>>> +     *   0x8000: General sensor error
>>>>>> +     *   0x8001: Reserved
>>>>>> +     *   0x8002: Underflow on reading value
>>>>>> +     *   0x8003-0x81ff: Reserved
>>>>>> +     */
>>>>>> +    if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>>>>>> +        return -EIO;
>>>>>> +
>>>>>> +    core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
>>>>>> +
>>>>>> +    priv->temp.core[core_index].value = priv->temp.tjmax.value +
>>>>>> +                        core_dts_margin;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.core[core_index]);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>
>>>>> There is a lot of duplication in those functions. Would it be possible
>>>>> to find common code and use functions for it instead of duplicating
>>>>> everything several times ?
>>>>>
>>>>
>>>> Are you pointing out this code?
>>>> /**
>>>>   * Processors return a value of the core DTS reading in 10.6 format
>>>>   * (10 bits signed decimal, 6 bits fractional).
>>>>   * Error codes:
>>>>   *   0x8000: General sensor error
>>>>   *   0x8001: Reserved
>>>>   *   0x8002: Underflow on reading value
>>>>   *   0x8003-0x81ff: Reserved
>>>>   */
>>>> if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>>>>      return -EIO;
>>>>
>>>> Then I'll rewrite it as a function. If not, please point out the 
>>>> duplication.
>>>>
>>>
>>> There is lots of other duplication.
>>>
>>
>> Sorry but can you point out the duplication?
>>
> write a python script to do a semantic comparison.
> 

Okay. I'll try to simplify this code again.

>>>>>> +static int find_core_index(struct peci_cputemp *priv, int channel)
>>>>>> +{
>>>>>> +    int core_channel = channel - DEFAULT_CHANNEL_NUMS;
>>>>>> +    int idx, found = 0;
>>>>>> +
>>>>>> +    for (idx = 0; idx < priv->gen_info->core_max; idx++) {
>>>>>> +        if (priv->core_mask & BIT(idx)) {
>>>>>> +            if (core_channel == found)
>>>>>> +                break;
>>>>>> +
>>>>>> +            found++;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return idx;
>>>>>
>>>>> What if nothing is found ?
>>>>>
>>>>
>>>> Core temperature group will be registered only when it detects at 
>>>> least one core checked by check_resolved_cores(), so 
>>>> find_core_index() can be called only when priv->core_mask has a 
>>>> non-zero value. The 'nothing is found' case will not happen.
>>>>
>>> That doesn't guarantee a match. If what you are saying is correct 
>>> there should always be
>>> a well defined match of channel -> idx, and the search should be 
>>> unnecessary.
>>>
>>
>> There could be some disabled cores in the resolved core mask bit 
>> sequence also it should remove indexing gap in channel numbering so it 
>> is the reason why this search function is needed. Well defined match 
>> of channel -> idx would not be always satisfied.
>>
> Are you saying that each call to the function, with the same parameters,
> can return a different result ?
> 

No, the result will be consistent. After reading the priv->core_mask 
once in check_resolved_cores(), the value will not be changed. I'm 
saying about this case, for example if core number 2 is unresolved in 
total 4 cores, then the idx order will be '0, 1, 3' but channel order 
will be '5, 6, 7' without making any indexing gap.

>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_string(struct device *dev,
>>>>>> +                   enum hwmon_sensor_types type,
>>>>>> +                   u32 attr, int channel, const char **str)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int core_index;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_label:
>>>>>> +        if (channel < DEFAULT_CHANNEL_NUMS) {
>>>>>> +            *str = cputemp_label[channel];
>>>>>> +        } else {
>>>>>> +            core_index = find_core_index(priv, channel);
>>>>>
>>>>> FWIW, it might be better to pass channel - DEFAULT_CHANNEL_NUMS
>>>>> as parameter.
>>>>>
>>>>
>>>> cputemp_read_string() is mapped to read_string member of hwmon_ops 
>>>> struct, so hwmon susbsystem passes the channel parameter based on 
>>>> the registered channel order. Should I modify hwmon subsystem code?
>>>>
>>>
>>> Huh ? Changing
>>>      f(x) { y = x - const; }
>>> ...
>>>      f(x);
>>>
>>> to
>>>      f(y) { }
>>> ...
>>>      f(x - const);
>>>
>>> requires a hwmon core change ? Really ?
>>>
>>
>> Sorry for my misunderstanding. You are right. I'll change the 
>> parameter passing of find_core_index() from 'channel' to 'channel - 
>> DEFAULT_CHANNEL_NUMS'.
>>
>>>>> What if find_core_index() returns priv->gen_info->core_max, ie
>>>>> if it didn't find a core ?
>>>>>
>>>>
>>>> As explained above, find_core index() returns a correct index always.
>>>>
>>>>>> +            *str = cputemp_label[DEFAULT_CHANNEL_NUMS + core_index];
>>>>>> +        }
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_die(struct device *dev,
>>>>>> +                enum hwmon_sensor_types type,
>>>>>> +                u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_die_temp(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.die.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_max:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit_hyst:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_dts_margin(struct device *dev,
>>>>>> +                   enum hwmon_sensor_types type,
>>>>>> +                   u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_dts_margin(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.dts_margin.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_min:
>>>>>> +        *val = 0;
>>>>>> +        return 0;
>>>>>
>>>>> This attribute should not exist.
>>>>>
>>>>
>>>> This is an attribute of DTS margin temperature which reflects 
>>>> thermal margin to Tcontrol of the CPU package. If it shows '0' means 
>>>> it reached to Tcontrol, the first level of thermal warning. If the 
>>>> CPU keeps getting hot then this DTS margin shows a negative value 
>>>> until it reaches to Tjmax. When the temperature reaches to Tjmax at 
>>>> last then it shows the lower critcal value which lcrit indicates as 
>>>> the second level of thermal warning.
>>>>
>>>
>>> The hwmon ABI reports chip values, not constants. Even though some 
>>> drivers do
>>> it, reporting a constant is always wrong.
>>>
>>>>>> +    case hwmon_temp_lcrit:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value - priv->temp.tjmax.value;
>>>>>
>>>>> lcrit is tcontrol - tjmax, and crit_hyst above is
>>>>> tjmax - tcontrol ? How does this make sense ?
>>>>>
>>>>
>>>> Both Tjmax and Tcontrol have positive values and Tjmax is greater 
>>>> than Tcontrol always. As explained above, lcrit of DTS margin should 
>>>> show a negative value means the margin goes down across '0'. On the 
>>>> other hand, crit_hyst of Die temperature should show absolute 
>>>> hyterisis value between Tcontrol and Tjmax.
>>>>
>>> The hwmon ABI requires reporting of absolute temperatures in 
>>> milli-degrees C.
>>> Your statements make it very clear that this driver does not report
>>> absolute temperatures. This is not acceptable.
>>>
>>
>> Okay. I'll remove the 'DTS margin' temperature. All others are 
>> reporting absolute temperatures.
>>
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_tcontrol(struct device *dev,
>>>>>> +                 enum hwmon_sensor_types type,
>>>>>> +                 u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>
>>>>> Am I missing something, or is the same temperature reported several 
>>>>> times ?
>>>>> tjmax is also reported as temp_crit cputemp_read_die(), for example.
>>>>>
>>>>
>>>> This driver provides multiple channels and each channel has its own 
>>>> supplement attributes. As you mentioned, Die temperature channel and 
>>>> Core temperature channel have their individual crit attributes and 
>>>> they reflect the same value, Tjmax. It is not reporting several 
>>>> times but reporting the same value.
>>>>
>>> Then maybe fold the functions accordingly ?
>>>
>>
>> I'll use a single function for 'Die temperature' and 'Core 
>> temperature' that have the same attributes set. It would simplify this 
>> code a bit.
>>
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_tthrottle(struct device *dev,
>>>>>> +                  enum hwmon_sensor_types type,
>>>>>> +                  u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_tthrottle(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tthrottle.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_tjmax(struct device *dev,
>>>>>> +                  enum hwmon_sensor_types type,
>>>>>> +                  u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_core(struct device *dev,
>>>>>> +                 enum hwmon_sensor_types type,
>>>>>> +                 u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int core_index = find_core_index(priv, channel);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_core_temp(priv, core_index);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.core[core_index].value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_max:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit_hyst:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>
>>>>> There is again a lot of duplication in those functions.
>>>>>
>>>>
>>>> Each function is called from cputemp_read() which is mapped to read 
>>>> function pointer of hwmon_ops struct. Since each channel has 
>>>> different set of attributes so the cputemp_read() calls an 
>>>> individual channel handler after checking the channel type. Of 
>>>> course, we can handle all attributes of all channels in a single 
>>>> function but the way also needs channel type checking code on each 
>>>> attribute.
>>>>
>>>>>> +
>>>>>> +static int cputemp_read(struct device *dev,
>>>>>> +            enum hwmon_sensor_types type,
>>>>>> +            u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    switch (channel) {
>>>>>> +    case channel_die:
>>>>>> +        return cputemp_read_die(dev, type, attr, channel, val);
>>>>>> +    case channel_dts_mrgn:
>>>>>> +        return cputemp_read_dts_margin(dev, type, attr, channel, 
>>>>>> val);
>>>>>> +    case channel_tcontrol:
>>>>>> +        return cputemp_read_tcontrol(dev, type, attr, channel, val);
>>>>>> +    case channel_tthrottle:
>>>>>> +        return cputemp_read_tthrottle(dev, type, attr, channel, 
>>>>>> val);
>>>>>> +    case channel_tjmax:
>>>>>> +        return cputemp_read_tjmax(dev, type, attr, channel, val);
>>>>>> +    default:
>>>>>> +        if (channel < CPUTEMP_CHANNEL_NUMS)
>>>>>> +            return cputemp_read_core(dev, type, attr, channel, val);
>>>>>> +
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static umode_t cputemp_is_visible(const void *data,
>>>>>> +                  enum hwmon_sensor_types type,
>>>>>> +                  u32 attr, int channel)
>>>>>> +{
>>>>>> +    const struct peci_cputemp *priv = data;
>>>>>> +
>>>>>> +    if (priv->temp_config[channel] & BIT(attr))
>>>>>> +        return 0444;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct hwmon_ops cputemp_ops = {
>>>>>> +    .is_visible = cputemp_is_visible,
>>>>>> +    .read_string = cputemp_read_string,
>>>>>> +    .read = cputemp_read,
>>>>>> +};
>>>>>> +
>>>>>> +static int check_resolved_cores(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pci_cfg_local_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!(priv->client->adapter->cmd_mask & 
>>>>>> BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    /* Get the RESOLVED_CORES register value */
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.bus = 1;
>>>>>> +    msg.device = 30;
>>>>>> +    msg.function = 3;
>>>>>> +    msg.reg = 0xB4;
>>>>>
>>>>> Can this be made less magic with some defines ?
>>>>>
>>>>
>>>> Sure, will use defines instead.
>>>>
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    priv->core_mask = msg.pci_config[3] << 24 |
>>>>>> +              msg.pci_config[2] << 16 |
>>>>>> +              msg.pci_config[1] << 8 |
>>>>>> +              msg.pci_config[0];
>>>>>> +
>>>>>> +    if (!priv->core_mask)
>>>>>> +        return -EAGAIN;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", 
>>>>>> priv->core_mask);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int create_core_temp_info(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    int rc, i;
>>>>>> +
>>>>>> +    rc = check_resolved_cores(priv);
>>>>>> +    if (!rc) {
>>>>>> +        for (i = 0; i < priv->gen_info->core_max; i++) {
>>>>>> +            if (priv->core_mask & BIT(i)) {
>>>>>> +                priv->temp_config[priv->config_idx++] =
>>>>>> +                             config_table[channel_core];
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static int check_cpu_id(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    u32 cpu_id;
>>>>>> +    int i, rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_CPU_ID;
>>>>>> +    msg.param = PKG_ID_CPU_ID;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    cpu_id = ((msg.pkg_config[2] << 16) | (msg.pkg_config[1] << 8) |
>>>>>> +          msg.pkg_config[0]) & CLIENT_CPU_ID_MASK;
>>>>>> +
>>>>>> +    for (i = 0; i < CPU_GEN_MAX; i++) {
>>>>>> +        if (cpu_id == cpu_gen_info_table[i].cpu_id) {
>>>>>> +            priv->gen_info = &cpu_gen_info_table[i];
>>>>>> +            break;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!priv->gen_info)
>>>>>> +        return -ENODEV;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "CPU_ID: 0x%x\n", cpu_id);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int peci_cputemp_probe(struct peci_client *client)
>>>>>> +{
>>>>>> +    struct device *dev = &client->dev;
>>>>>> +    struct peci_cputemp *priv;
>>>>>> +    struct device *hwmon_dev;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if ((client->adapter->cmd_mask &
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>>>>>> +        dev_err(dev, "Client doesn't support temperature 
>>>>>> monitoring\n");
>>>>>> +        return -EINVAL;
>>>>>
>>>>> Does this mean there will be an error message for each 
>>>>> non-supported CPU ?
>>>>> Why ?
>>>>>
>>>>
>>>> For proper operation of this driver, PECI_CMD_GET_TEMP and 
>>>> PECI_CMD_RD_PKG_CFG have to be supported by a client CPU. 
>>>> PECI_CMD_GET_TEMP is provided as a default command but 
>>>> PECI_CMD_RD_PKG_CFG depends on PECI minor revision of a CPU package 
>>>> so this checking is needed.
>>>>
>>>
>>> I do not question the check. I question the error message and error 
>>> return value.
>>> Why is it an _error_ if the CPU does not support the functionality, 
>>> and why does
>>> it have to be reported in the kernel log ?
>>>
>>
>> Got it. I'll change that to dev_dbg.
>>
>>>>>> +    }
>>>>>> +
>>>>>> +    priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>>>>> +    if (!priv)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    dev_set_drvdata(dev, priv);
>>>>>> +    priv->client = client;
>>>>>> +    priv->dev = dev;
>>>>>> +    priv->addr = client->addr;
>>>>>> +    priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>>>>>> +
>>>>>> +    snprintf(priv->name, PECI_NAME_SIZE, "peci_cputemp.cpu%d",
>>>>>> +         priv->cpu_no);
>>>>>> +
>>>>>> +    rc = check_cpu_id(priv);
>>>>>> +    if (rc) {
>>>>>> +        dev_err(dev, "Client CPU is not supported\n");
>>>>>
>>>>> -ENODEV is not an error, and should not result in an error message.
>>>>> Besides, the error can also be propagated from peci core code,
>>>>> and may well be something else.
>>>>>
>>>>
>>>> Got it. I'll remove the error message and will add a proper handling 
>>>> code into PECI core.
>>>>
>>>>>> +        return rc;
>>>>>> +    }
>>>>>> +
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_die];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_dts_mrgn];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_tcontrol];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_tthrottle];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_tjmax];
>>>>>> +
>>>>>> +    rc = create_core_temp_info(priv);
>>>>>> +    if (rc)
>>>>>> +        dev_dbg(dev, "Failed to create core temp info\n");
>>>>>
>>>>> Then what ? Shouldn't this result in probe deferral or something 
>>>>> more useful
>>>>> instead of just being ignored ?
>>>>>
>>>>
>>>> This driver can't support core temperature monitoring if a CPU 
>>>> doesn't support PECI_CMD_RD_PCI_CFG_LOCAL command. In that case, it 
>>>> skips core temperature group creation and supports only basic 
>>>> temperature monitoring of Die, DTS margin and etc. I'll add this 
>>>> description as a comment.
>>>>
>>>
>>> The message says "Failed to ...". It does not say "This CPU does not 
>>> support ...".
>>>
>>
>> Got it. Will correct the message.
>>
>>>>>> +
>>>>>> +    priv->chip.ops = &cputemp_ops;
>>>>>> +    priv->chip.info = priv->info;
>>>>>> +
>>>>>> +    priv->info[0] = &priv->temp_info;
>>>>>> +
>>>>>> +    priv->temp_info.type = hwmon_temp;
>>>>>> +    priv->temp_info.config = priv->temp_config;
>>>>>> +
>>>>>> +    hwmon_dev = devm_hwmon_device_register_with_info(priv->dev,
>>>>>> +                             priv->name,
>>>>>> +                             priv,
>>>>>> +                             &priv->chip,
>>>>>> +                             NULL);
>>>>>> +
>>>>>> +    if (IS_ERR(hwmon_dev))
>>>>>> +        return PTR_ERR(hwmon_dev);
>>>>>> +
>>>>>> +    dev_dbg(dev, "%s: sensor '%s'\n", dev_name(hwmon_dev), 
>>>>>> priv->name);
>>>>>> +
>>>
>>> Why does this message display the device name twice ?
>>>
>>
>> For an example, dev_name(hwmon_dev) shows 'hwmon5' and priv->name 
>> shows 'peci-cputemp0'.
>>
> And dev_dbg() shows another device name. So you'll have something like
> 
> peci-cputemp0: hwmon5: sensor 'peci-cputemp0'
> 

Practically it shows like

peci-cputemp 0-30:00: hwmon10: sensor 'peci_cputemp.cpu0'

where 0-30:00 is assigned by peci core.

>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct of_device_id peci_cputemp_of_table[] = {
>>>>>> +    { .compatible = "intel,peci-cputemp" },
>>>>>> +    { }
>>>>>> +};
>>>>>> +MODULE_DEVICE_TABLE(of, peci_cputemp_of_table);
>>>>>> +
>>>>>> +static struct peci_driver peci_cputemp_driver = {
>>>>>> +    .probe  = peci_cputemp_probe,
>>>>>> +    .driver = {
>>>>>> +        .name           = "peci-cputemp",
>>>>>> +        .of_match_table = of_match_ptr(peci_cputemp_of_table),
>>>>>> +    },
>>>>>> +};
>>>>>> +module_peci_driver(peci_cputemp_driver);
>>>>>> +
>>>>>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
>>>>>> +MODULE_DESCRIPTION("PECI cputemp driver");
>>>>>> +MODULE_LICENSE("GPL v2");
>>>>>> diff --git a/drivers/hwmon/peci-dimmtemp.c 
>>>>>> b/drivers/hwmon/peci-dimmtemp.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..78bf29cb2c4c
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/hwmon/peci-dimmtemp.c
>>>>>
>>>>> FWIW, this should be two separate patches.
>>>>>
>>>>
>>>> Should I split out hwmon documents and dt bindings too?
>>>>
>>>>>> @@ -0,0 +1,432 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>> +// Copyright (c) 2018 Intel Corporation
>>>>>> +
>>>>>> +#include <linux/delay.h>
>>>>>> +#include <linux/hwmon.h>
>>>>>> +#include <linux/hwmon-sysfs.h>
>>>>>
>>>>> Needed ?
>>>>>
>>>>
>>>> No. Will drop the line.
>>>>
>>>>>> +#include <linux/jiffies.h>
>>>>>> +#include <linux/module.h>
>>>>>> +#include <linux/of_device.h>
>>>>>> +#include <linux/peci.h>
>>>>>> +#include <linux/workqueue.h>
>>>>>> +
>>>>>> +#define TEMP_TYPE_PECI       6  /* Sensor type 6: Intel PECI */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX_ON_HSX 8  /* Max number of channel ranks on 
>>>>>> Haswell */
>>>>>> +#define DIMM_IDX_MAX_ON_HSX  3  /* Max DIMM index per channel on 
>>>>>> Haswell */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX_ON_BDX 4  /* Max number of channel ranks on 
>>>>>> Broadwell */
>>>>>> +#define DIMM_IDX_MAX_ON_BDX  3  /* Max DIMM index per channel on 
>>>>>> Broadwell */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX_ON_SKX 6  /* Max number of channel ranks on 
>>>>>> Skylake */
>>>>>> +#define DIMM_IDX_MAX_ON_SKX  2  /* Max DIMM index per channel on 
>>>>>> Skylake */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX        CHAN_RANK_MAX_ON_HSX
>>>>>> +#define DIMM_IDX_MAX         DIMM_IDX_MAX_ON_HSX
>>>>>> +
>>>>>> +#define DIMM_NUMS_MAX        (CHAN_RANK_MAX * DIMM_IDX_MAX)
>>>>>> +
>>>>>> +#define CLIENT_CPU_ID_MASK   0xf0ff0  /* Mask for Family / Model 
>>>>>> info */
>>>>>> +
>>>>>> +#define UPDATE_INTERVAL_MIN  HZ
>>>>>> +
>>>>>> +#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
>>>>>> +#define DIMM_MASK_CHECK_RETRY_MAX     60 /* 60 x 5 secs = 5 
>>>>>> minutes */
>>>>>> +
>>>>>> +enum cpu_gens {
>>>>>> +    CPU_GEN_HSX, /* Haswell Xeon */
>>>>>> +    CPU_GEN_BRX, /* Broadwell Xeon */
>>>>>> +    CPU_GEN_SKX, /* Skylake Xeon */
>>>>>> +    CPU_GEN_MAX
>>>>>> +};
>>>>>> +
>>>>>> +struct cpu_gen_info {
>>>>>> +    u32 type;
>>>>>> +    u32 cpu_id;
>>>>>> +    u32 chan_rank_max;
>>>>>> +    u32 dimm_idx_max;
>>>>>> +};
>>>>>> +
>>>>>> +struct temp_data {
>>>>>> +    bool valid;
>>>>>> +    s32  value;
>>>>>> +    unsigned long last_updated;
>>>>>> +};
>>>>>> +
>>>>>> +struct peci_dimmtemp {
>>>>>> +    struct peci_client *client;
>>>>>> +    struct device *dev;
>>>>>> +    struct workqueue_struct *work_queue;
>>>>>> +    struct delayed_work work_handler;
>>>>>> +    char name[PECI_NAME_SIZE];
>>>>>> +    struct temp_data temp[DIMM_NUMS_MAX];
>>>>>> +    u8 addr;
>>>>>> +    uint cpu_no;
>>>>>> +    const struct cpu_gen_info *gen_info;
>>>>>> +    u32 dimm_mask;
>>>>>> +    int retry_count;
>>>>>> +    int channels;
>>>>>> +    u32 temp_config[DIMM_NUMS_MAX + 1];
>>>>>> +    struct hwmon_channel_info temp_info;
>>>>>> +    const struct hwmon_channel_info *info[2];
>>>>>> +    struct hwmon_chip_info chip;
>>>>>> +};
>>>>>> +
>>>>>> +static const struct cpu_gen_info cpu_gen_info_table[] = {
>>>>>> +    { .type  = CPU_GEN_HSX,
>>>>>> +      .cpu_id = 0x306f0, /* Family code: 6, Model number: 63 
>>>>>> (0x3f) */
>>>>>> +      .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
>>>>>> +      .dimm_idx_max  = DIMM_IDX_MAX_ON_HSX },
>>>>>> +    { .type  = CPU_GEN_BRX,
>>>>>> +      .cpu_id = 0x406f0, /* Family code: 6, Model number: 79 
>>>>>> (0x4f) */
>>>>>> +      .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
>>>>>> +      .dimm_idx_max  = DIMM_IDX_MAX_ON_BDX },
>>>>>> +    { .type  = CPU_GEN_SKX,
>>>>>> +      .cpu_id = 0x50650, /* Family code: 6, Model number: 85 
>>>>>> (0x55) */
>>>>>> +      .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
>>>>>> +      .dimm_idx_max  = DIMM_IDX_MAX_ON_SKX },
>>>>>> +};
>>>>>> +
>>>>>> +static const char *dimmtemp_label[CHAN_RANK_MAX][DIMM_IDX_MAX] = {
>>>>>> +    { "DIMM A0", "DIMM A1", "DIMM A2" },
>>>>>> +    { "DIMM B0", "DIMM B1", "DIMM B2" },
>>>>>> +    { "DIMM C0", "DIMM C1", "DIMM C2" },
>>>>>> +    { "DIMM D0", "DIMM D1", "DIMM D2" },
>>>>>> +    { "DIMM E0", "DIMM E1", "DIMM E2" },
>>>>>> +    { "DIMM F0", "DIMM F1", "DIMM F2" },
>>>>>> +    { "DIMM G0", "DIMM G1", "DIMM G2" },
>>>>>> +    { "DIMM H0", "DIMM H1", "DIMM H2" },
>>>>>> +};
>>>>>> +
>>>>>> +static int send_peci_cmd(struct peci_dimmtemp *priv, enum 
>>>>>> peci_cmd cmd,
>>>>>> +             void *msg)
>>>>>> +{
>>>>>> +    return peci_command(priv->client->adapter, cmd, msg);
>>>>>> +}
>>>>>> +
>>>>>> +static int need_update(struct temp_data *temp)
>>>>>> +{
>>>>>> +    if (temp->valid &&
>>>>>> +        time_before(jiffies, temp->last_updated + 
>>>>>> UPDATE_INTERVAL_MIN))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    return 1;
>>>>>> +}
>>>>>> +
>>>>>> +static void mark_updated(struct temp_data *temp)
>>>>>> +{
>>>>>> +    temp->valid = true;
>>>>>> +    temp->last_updated = jiffies;
>>>>>> +}
>>>>>
>>>>> It might make sense to provide the duplicate functions in a core file.
>>>>>
>>>>
>>>> It is temperature monitoring specific function and it touches module 
>>>> specific variables. Do you really think that this non-generic 
>>>> function should be moved to PECI core?
>>>>
>>>>>> +
>>>>>> +static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
>>>>>> +{
>>>>>> +    int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
>>>>>> +    int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp[dimm_no]))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>>>>>> +    msg.param = chan_rank;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    priv->temp[dimm_no].value = msg.pkg_config[dimm_order] * 1000;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp[dimm_no]);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int find_dimm_number(struct peci_dimmtemp *priv, int channel)
>>>>>> +{
>>>>>> +    int dimm_nums_max = priv->gen_info->chan_rank_max *
>>>>>> +                priv->gen_info->dimm_idx_max;
>>>>>> +    int idx, found = 0;
>>>>>> +
>>>>>> +    for (idx = 0; idx < dimm_nums_max; idx++) {
>>>>>> +        if (priv->dimm_mask & BIT(idx)) {
>>>>>> +            if (channel == found)
>>>>>> +                break;
>>>>>> +
>>>>>> +            found++;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return idx;
>>>>>> +}
>>>>>
>>>>> This again looks like duplicate code.
>>>>>
>>>>
>>>> find_dimm_number()? I'm sure it isn't.
>>>>
>>>>>> +
>>>>>> +static int dimmtemp_read_string(struct device *dev,
>>>>>> +                enum hwmon_sensor_types type,
>>>>>> +                u32 attr, int channel, const char **str)
>>>>>> +{
>>>>>> +    struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>>>>>> +    u32 dimm_idx_max = priv->gen_info->dimm_idx_max;
>>>>>> +    int dimm_no, chan_rank, dimm_idx;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_label:
>>>>>> +        dimm_no = find_dimm_number(priv, channel);
>>>>>> +        chan_rank = dimm_no / dimm_idx_max;
>>>>>> +        dimm_idx = dimm_no % dimm_idx_max;
>>>>>> +        *str = dimmtemp_label[chan_rank][dimm_idx];
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int dimmtemp_read(struct device *dev, enum 
>>>>>> hwmon_sensor_types type,
>>>>>> +             u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>>>>>> +    int dimm_no = find_dimm_number(priv, channel);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_dimm_temp(priv, dimm_no);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp[dimm_no].value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static umode_t dimmtemp_is_visible(const void *data,
>>>>>> +                   enum hwmon_sensor_types type,
>>>>>> +                   u32 attr, int channel)
>>>>>> +{
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_label:
>>>>>> +    case hwmon_temp_input:
>>>>>> +        return 0444;
>>>>>> +    default:
>>>>>> +        return 0;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static const struct hwmon_ops dimmtemp_ops = {
>>>>>> +    .is_visible = dimmtemp_is_visible,
>>>>>> +    .read_string = dimmtemp_read_string,
>>>>>> +    .read = dimmtemp_read,
>>>>>> +};
>>>>>> +
>>>>>> +static int check_populated_dimms(struct peci_dimmtemp *priv)
>>>>>> +{
>>>>>> +    u32 chan_rank_max = priv->gen_info->chan_rank_max;
>>>>>> +    u32 dimm_idx_max = priv->gen_info->dimm_idx_max;
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    int chan_rank, dimm_idx;
>>>>>> +    int rc, channels = 0;
>>>>>> +
>>>>>> +    for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
>>>>>> +        msg.addr = priv->addr;
>>>>>> +        msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>>>>>> +        msg.param = chan_rank;
>>>>>> +        msg.rx_len = 4;
>>>>>> +
>>>>>> +        rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +        if (rc) {
>>>>>> +            priv->dimm_mask = 0;
>>>>>> +            return rc;
>>>>>> +        }
>>>>>> +
>>>>>> +        for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++) {
>>>>>> +            if (msg.pkg_config[dimm_idx]) {
>>>>>> +                priv->dimm_mask |= BIT(chan_rank *
>>>>>> +                               chan_rank_max +
>>>>>> +                               dimm_idx);
>>>>>> +                channels++;
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!priv->dimm_mask)
>>>>>> +        return -EAGAIN;
>>>>>> +
>>>>>> +    priv->channels = channels;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", 
>>>>>> priv->dimm_mask);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int create_dimm_temp_info(struct peci_dimmtemp *priv)
>>>>>> +{
>>>>>> +    struct device *hwmon_dev;
>>>>>> +    int rc, i;
>>>>>> +
>>>>>> +    rc = check_populated_dimms(priv);
>>>>>> +    if (!rc) {
>>>>>
>>>>> Please handle error cases first.
>>>>>
>>>>
>>>> Sure, I'll rewrite it.
>>>>
>>>>>> +        for (i = 0; i < priv->channels; i++)
>>>>>> +            priv->temp_config[i] = HWMON_T_LABEL | HWMON_T_INPUT;
>>>>>> +
>>>>>> +        priv->chip.ops = &dimmtemp_ops;
>>>>>> +        priv->chip.info = priv->info;
>>>>>> +
>>>>>> +        priv->info[0] = &priv->temp_info;
>>>>>> +
>>>>>> +        priv->temp_info.type = hwmon_temp;
>>>>>> +        priv->temp_info.config = priv->temp_config;
>>>>>> +
>>>>>> +        hwmon_dev = devm_hwmon_device_register_with_info(priv->dev,
>>>>>> +                                 priv->name,
>>>>>> +                                 priv,
>>>>>> +                                 &priv->chip,
>>>>>> +                                 NULL);
>>>>>> +        rc = PTR_ERR_OR_ZERO(hwmon_dev);
>>>>>> +        if (!rc)
>>>>>> +            dev_dbg(priv->dev, "%s: sensor '%s'\n",
>>>>>> +                dev_name(hwmon_dev), priv->name);
>>>>>> +    } else if (rc == -EAGAIN) {
>>>>>> +        if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
>>>>>> +            queue_delayed_work(priv->work_queue,
>>>>>> +                       &priv->work_handler,
>>>>>> +                       DIMM_MASK_CHECK_DELAY_JIFFIES);
>>>>>> +            priv->retry_count++;
>>>>>> +            dev_dbg(priv->dev,
>>>>>> +                "Deferred DIMM temp info creation\n");
>>>>>> +        } else {
>>>>>> +            rc = -ETIMEDOUT;
>>>>>> +            dev_err(priv->dev,
>>>>>> +                "Timeout retrying DIMM temp info creation\n");
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static void create_dimm_temp_info_delayed(struct work_struct *work)
>>>>>> +{
>>>>>> +    struct delayed_work *dwork = to_delayed_work(work);
>>>>>> +    struct peci_dimmtemp *priv = container_of(dwork, struct 
>>>>>> peci_dimmtemp,
>>>>>> +                          work_handler);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    rc = create_dimm_temp_info(priv);
>>>>>> +    if (rc && rc != -EAGAIN)
>>>>>> +        dev_dbg(priv->dev, "Failed to create DIMM temp info\n");
>>>>>> +}
>>>>>> +
>>>>>> +static int check_cpu_id(struct peci_dimmtemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    u32 cpu_id;
>>>>>> +    int i, rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_CPU_ID;
>>>>>> +    msg.param = PKG_ID_CPU_ID;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    cpu_id = ((msg.pkg_config[2] << 16) | (msg.pkg_config[1] << 8) |
>>>>>> +          msg.pkg_config[0]) & CLIENT_CPU_ID_MASK;
>>>>>> +
>>>>>> +    for (i = 0; i < CPU_GEN_MAX; i++) {
>>>>>> +        if (cpu_id == cpu_gen_info_table[i].cpu_id) {
>>>>>> +            priv->gen_info = &cpu_gen_info_table[i];
>>>>>> +            break;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!priv->gen_info)
>>>>>> +        return -ENODEV;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "CPU_ID: 0x%x\n", cpu_id);
>>>>>> +    return 0;
>>>>>> +}
>>>>>
>>>>> More duplicate code.
>>>>>
>>>>
>>>> Okay. In case of check_cpu_id(), it could be used as a generic PECI 
>>>> function. I'll move it into PECI core.
>>>>
>>>>>> +
>>>>>> +static int peci_dimmtemp_probe(struct peci_client *client)
>>>>>> +{
>>>>>> +    struct device *dev = &client->dev;
>>>>>> +    struct peci_dimmtemp *priv;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if ((client->adapter->cmd_mask &
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>>>>>
>>>>> One set of ( ) is unnecessary on each side of the expression.
>>>>>
>>>>
>>>> '&' has a precedence over '!=' but '|' doesn't. I'll rewrite it to:
>>>>
>>>
>>> Actually, that is wrong. You refer to address-of. Bit operations do 
>>> have lower
>>> precedence that comparisons. I stand corrected.
>>>
>>>>      if (client->adapter->cmd_mask &
>>>>          (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG)) !=
>>>>          (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG)))
>>>>
>>>>>> +        dev_err(dev, "Client doesn't support temperature 
>>>>>> monitoring\n");
>>>>>> +        return -EINVAL;
>>>>>
>>>>> Why is this "invalid", and why does it warrant an error message ?
>>>>>
>>>>
>>>> Should I use -EPERM? Any suggestion?
>>>>
>>>
>>> Is it an _error_ if the CPU does not support this functionality ?
>>>
>>
>> Actually, it returns from this probe() function without making any 
>> hwmon info creation so I intended to handle this case as an error. Am 
>> I wrong?
>>
> 
> If the functionality or HW supported by the driver isn't available, it 
> is customary
> to return -ENODEV and no error message. Otherwise the kernel log would 
> drown in
> "not supported" error messages. I don't see where it would add any value 
> to handle
> this driver differently.
> 
> EINVAL    Invalid argument
> EPERM    Operation not permitted
> 
> You'll have to work hard to convince me that any of those makes sense, 
> and that
> 
> ENODEV    No such device
> 
> doesn't. More specifically, if EINVAL makes sense, the caller did 
> something wrong,
> meaning there is a problem in the infrastructure which should get fixed.
> The same is true for EPERM.
> 

Now I fully understood what you pointed out. Thanks for the detailed 
explanation. I'll change the error return value to -ENODEV and will use 
dev_dbg for the message printing. Thanks!

>>>>>> +    }
>>>>>> +
>>>>>> +    priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>>>>> +    if (!priv)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    dev_set_drvdata(dev, priv);
>>>>>> +    priv->client = client;
>>>>>> +    priv->dev = dev;
>>>>>> +    priv->addr = client->addr;
>>>>>> +    priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>>>>>
>>>>> Is priv->addr guaranteed to be >= PECI_BASE_ADDR ?
>>>>
>>>> Client address range validation will be done in 
>>>> peci_check_addr_validity() in PECI core before probing a device driver.
>>>>
>>>>>> +
>>>>>> +    snprintf(priv->name, PECI_NAME_SIZE, "peci_dimmtemp.cpu%d",
>>>>>> +         priv->cpu_no);
>>>>>> +
>>>>>> +    rc = check_cpu_id(priv);
>>>>>> +    if (rc) {
>>>>>> +        dev_err(dev, "Client CPU is not supported\n");
>>>>>
>>>>> Or the peci command failed.
>>>>>
>>>>
>>>> I'll remove the error message and will add a proper handling code 
>>>> into PECI core on each error type.
>>>>
>>>>>> +        return rc;
>>>>>> +    }
>>>>>> +
>>>>>> +    priv->work_queue = alloc_ordered_workqueue(priv->name, 0);
>>>>>> +    if (!priv->work_queue)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    INIT_DELAYED_WORK(&priv->work_handler, 
>>>>>> create_dimm_temp_info_delayed);
>>>>>> +
>>>>>> +    rc = create_dimm_temp_info(priv);
>>>>>> +    if (rc && rc != -EAGAIN) {
>>>>>> +        dev_err(dev, "Failed to create DIMM temp info\n");
>>>>>> +        goto err_free_wq;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +
>>>>>> +err_free_wq:
>>>>>> +    destroy_workqueue(priv->work_queue);
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static int peci_dimmtemp_remove(struct peci_client *client)
>>>>>> +{
>>>>>> +    struct peci_dimmtemp *priv = dev_get_drvdata(&client->dev);
>>>>>> +
>>>>>> +    cancel_delayed_work(&priv->work_handler);
>>>>>
>>>>> cancel_delayed_work_sync() ?
>>>>>
>>>>
>>>> Yes, it would be safer. Will fix it.
>>>>
>>>>>> +    destroy_workqueue(priv->work_queue);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct of_device_id peci_dimmtemp_of_table[] = {
>>>>>> +    { .compatible = "intel,peci-dimmtemp" },
>>>>>> +    { }
>>>>>> +};
>>>>>> +MODULE_DEVICE_TABLE(of, peci_dimmtemp_of_table);
>>>>>> +
>>>>>> +static struct peci_driver peci_dimmtemp_driver = {
>>>>>> +    .probe  = peci_dimmtemp_probe,
>>>>>> +    .remove = peci_dimmtemp_remove,
>>>>>> +    .driver = {
>>>>>> +        .name           = "peci-dimmtemp",
>>>>>> +        .of_match_table = of_match_ptr(peci_dimmtemp_of_table),
>>>>>> +    },
>>>>>> +};
>>>>>> +module_peci_driver(peci_dimmtemp_driver);
>>>>>> +
>>>>>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
>>>>>> +MODULE_DESCRIPTION("PECI dimmtemp driver");
>>>>>> +MODULE_LICENSE("GPL v2");
>>>>>> -- 
>>>>>> 2.16.2
>>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-hwmon" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification
From: James Morse @ 2018-04-12 16:14 UTC (permalink / raw)
  To: gengdongjiu
  Cc: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, rjw, bp, lenb, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	zhengxiang9
In-Reply-To: <5161fb31-e5bb-872c-3e12-d7d38326aaca@huawei.com>

Hi gengdongjiu,

On 12/04/18 07:09, gengdongjiu wrote:
> On 2018/4/10 22:15, James Morse wrote:
>> On 09/04/18 22:36, Dongjiu Geng wrote:
>>> 1. Detect whether KVM can set set guest SError syndrome
>>> 2. Support to Set VSESR_EL2 and inject SError by user space.
>>> 3. Support live migration to keep SError pending state and VSESR_EL2 value.
>>> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so support this
>>>    notification in software, KVM or kernel ARCH code call handle_guest_sei() to let ACP driver
>>>    to handle this notification.
>>
>> Please don't post code during the merge-window, will this apply to v4.17-rc1? We
>> can't know until its tagged.

Posting code during the merge-window isn't helpful as the kernel is a moving
target, its better to wait for an 'rc' to base it on.

> I do not know when it is merge-window. About the apply version, it does not have limited.

'git fetch' Linus' tree and look at the tags. 'v4.16' lost its '-rc' suffixes,
and there isn't a 'v4.17-rc1' yet, so we are still in the merge window.

Linus sends a message to LKML. eg:
https://lkml.org/lkml/2018/4/1/175

net-next closes shortly before the merge window, and re-opens afterwards. There
is a handy web page:
http://vger.kernel.org/~davem/net-next.html


>> This series is doing two separate things, please split it into two series.
> OK, thanks!
> 
>>
>> But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI when
>> firmware is ignoring the normal world's PSTATE.A.
>>
>> The latest lobe of that discussion was on the list here:
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html
> I have replied the mail.
> I still have some questions that need to clarify with you.
> After clarification, we will follow that.
> The question is in the reply of this mail "https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html"

Lets keep that discussion on v9 then.


>> As it is, we would need to spot SError being delivered while SError is masked,
>> spray nasty messages about firmware being horrifically buggy, then panic(). For
>> a corrected error, this looks bad, but its preferable to letting firmware
>> silently overwrite the exception registers, causing linux to spin through the
>> vectors 'eret' with all exceptions masked.
>> I still think its best to wait for firmware that does the right thing.

> Let us  discuss that in another mail.
> In a summary, I think firmware follow below rule can be OK, right?
> 1. The exception came from the EL that SError should be routed to(according to hcr_EL2.{AMO, TGE}),but PSTATE.A was set, EL3 firmware can't deliver SError;

> 2. The exception came from the EL that SError should not be routed to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 firmware still deliver SError

Problem here, more on v9.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
From: James Morse @ 2018-04-12 16:14 UTC (permalink / raw)
  To: gengdongjiu
  Cc: lishuo1, merry.libing, gengdongjiu,
	linux-arm-kernel@lists.infradead.org, Liujun (Jun Liu),
	linux-kernel@vger.kernel.org, corbet@lwn.net,
	marc.zyngier@arm.com, catalin.marinas@arm.com,
	linux-doc@vger.kernel.org, rjw@rjwysocki.net,
	linux@armlinux.org.uk, will.deacon@arm.com,
	robert.moore@intel.com, linux-acpi@vger.kernel.org, bp@alien8.de,
	lv.zheng@intel.com, Huangshaoyu, kvmarm@lists.cs.columbia.edu,
	devel@acpica.org
In-Reply-To: <CAMj-D2CEcNjdi8VkSMw0aTqeb678nFnBKiq5ggix3gJhzkgSEA@mail.gmail.com>

Hi gengdongjiu,

On 12/04/18 06:00, gengdongjiu wrote:
> 2018-02-16 1:55 GMT+08:00 James Morse <james.morse@arm.com>:
>> On 05/02/18 11:24, gengdongjiu wrote:
>>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>>>> TGE}?
>>>
>>> Yes, it is.
>>
>> ... and yet ...
>>
>>
>>>> What does your firmware do when it wants to emulate SError but its masked?
>>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>>> PSTATE.A  set.
>>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>>> emulated  SError should go to EL1. This effectively masks SError.)
>>>
>>> Currently we does not consider much about the mask status(SPSR).
>>
>> .. this is a problem.
>>
>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret to
>> EL2. This should never happen, SError is effectively masked if you are running
>> at an EL higher than the one its routed to.
>>
>> More obviously: if the exception came from the EL that SError should be routed
>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only

> James, I  summarized the masking and routing rules for SError to
> confirm with you for the firmware first solution,

You also said "Currently we does not consider much about the mask status(SPSR)."

> 1. If the HCR_EL2.{AMO,TGE} is set,

If one or the other of these bits is set: (AMO==1 || TGE==1)

> which means the SError should route to EL2,
> When system happens SError and trap to EL3,   If EL3 find
> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
> and find this SError come from EL2, it will not deliver an SError:
> store the RAS error in the BERT and 'reboot'; but if
> it find that this SError come from EL1 or EL0, it also need to deliver
> an SError, right?

Yes.

> 2. If the HCR_EL2.{AMO,TGE} is not set,

If neither of these bits is set: (AMO==0 && TGE == 0)

> which means the SError should route to EL1,
> When system happens SError and trap to EL3, If EL3 find
> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,

(I'm reading this as all three of these bits are clear)

> and find this SError come from EL1, it will not deliver an SError:
> store the RAS error in the BERT and 'reboot'; 

No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
interrupted EL1 and the A bit was clear, so EL1 can take an SError.

The two cases here are:
AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
exception interrupted EL1 and the A bit was set, you need to do the BERT trick.

If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
regardless of the A bit, as SError is implicitly masked by running at a higher
exception level than it was routed to.

From your v11 reply:
> 2. The exception came from the EL that SError should not be routed
> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
> firmware still deliver SError

(this is re-iterating the two-cases above:)
'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
Route-to-EL1+interrupted-EL2.

Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
SError can be delivered to EL2, as EL2 can't mask SError when executing at a
lower EL.

Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not be
delivered.
KVM does this on the way out of a guest, if an SError occurs during this time
the CPU will wait until execution returns to EL1 before delivering the SError.
Your firmware has to do the same.

Table D1-15 in "D1.14.2 Asynchronous exception masking" has a table with all the
combinations. The ARM-ARM is what we need to match with this behaviour.

> but if it find that this SError come from EL0, it also need to deliver an
> SError, right?

I thought interrupted-EL0 could always be delivered: but re-reading the
ARM-ARM's "D1.14.2 Asynchronous exception masking", if asynchronous exceptions
are routed to EL1 then EL0&EL1 are treated the same.
So if SError is routed to EL1, the exception interrupted EL0, and SPSR_EL3.A was
set, you still can't deliver the emulated-SError you have to do the BERT-trick.
Linux doesn't do this today, but another OS might (e.g. UEFI), and we might do
this in the future.

This is really tricky for firmware to get right. Another alternative would be to
put the CPER records in a Polled buffer, unless something needs doing right now,
in which case a BERT-reboot is probably best.

Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox