Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings: Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
From: Jae Hyun Yoo @ 2018-03-09 23:47 UTC (permalink / raw)
  To: Milton Miller II, Pavel Machek
  Cc: linux-hwmon, andrew, jdelvare, arnd, linux-doc, andrew, gregkh,
	openbmc, linux-kernel, devicetree, linux, linux-arm-kernel
In-Reply-To: <OF08D8261E.8F72E5DC-ON0025824B.008218BB-0025824B.008218C6@notes.na.collabserv.com>

Hi Milton,

Thanks for sharing your time to review this patch. Please see my answer 
inline.

Jae

On 3/9/2018 3:41 PM, Milton Miller II wrote:
> About  03/07/2018 04:12PM in some time zone, Pavel Machek wrote:
>> Subject: Re: [PATCH v2 2/8] [PATCH 2/8] Documentations: dt-bindings:
>> Add a document of PECI adapter driver for Aspeed AST24xx/25xx SoCs
>>
>> Hi!
>>
>>>> Are these SoCs x86-based?
>>>
>>> Yes, these are ARM SoCs. Please see Andrew's answer as well.
>>
>> Understood, thanks.
>>
>>>>> +	Read sampling point selection. The whole period of a bit time
>> will be
>>>>> +	divided into 16 time frames. This value will determine which
>> time frame
>>>>> +	this controller will sample PECI signal for data read back.
>> Usually in
>>>>> +	the middle of a bit time is the best.
>>>>
>>>> English? "This value will determine when this controller"?
>>>>
>>>
>>> Could I change it like below?:
>>>
>>> "This value will determine in which time frame this controller
>> samples PECI
>>> signal for data read back"
>>
>> I guess... I'm not native speaker, I guess this could be improved
>> some
>> more.
>>
> 
> I agree this wording is still confusing.
> 
> The problem is that the key subject, the time of the sampling, is in the descriptive clause "in which time frame".
> 
> "This value will determine the time frame in which the controller will sample"
> 
> or perhaps phrase it as saving a specific sample from the over-clock, or a phase of the clock.
> 

Yes, that looks more better. I'll change the wording as you suggested. 
Thanks a lot!

Jae

>> Best regards,
>> 									Pavel
>>
>> -- 
>> (english) http://www.livejournal.com/~pavelmachek
>> (cesky, pictures)
>> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>>
> 
> milton
> --
> Speaking for myself not IBM.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
From: Mike Galbraith @ 2018-03-10  3:47 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin
In-Reply-To: <f0fca497-e4e7-3dbd-1e5c-a7f502c05dcb@redhat.com>

On Fri, 2018-03-09 at 18:06 -0500, Waiman Long wrote:
> On 03/09/2018 05:17 PM, Peter Zijlstra wrote:
> > On Fri, Mar 09, 2018 at 03:43:34PM -0500, Waiman Long wrote:
> >> The isolcpus= parameter just reduce the cpus available to the rests of
> >> the system. The cpuset controller does look at that value and make
> >> adjustment accordingly, but it has no dependence on exclusive cpu/mem
> >> features of cpuset.
> > The isolcpus= boot param is donkey shit and needs to die. cpuset _used_
> > to be able to fully replace it, but with the advent of cgroup 'feature'
> > this got lost.
> >
> > And instead of fixing it, you're making it _far_ worse. You completely
> > removed all the bits that allow repartitioning the scheduler domains.
> >
> > Mike is completely right, full NAK on any such approach.
> 
> So you are talking about sched_relax_domain_level and
> sched_load_balance. I have not removed any bits. I just haven't exposed
> them yet. It does seem like these 2 control knobs are useful from the
> scheduling perspective. Do we also need cpu_exclusive or just the two
> sched control knobs are enough?

Some form of cpu_exclusive (preferably exactly that, but something else
could replace it) is needed to define sets that must not overlap any
other set at creation time or any time thereafter.  A set with property
'exclusive' is the enabler for fundamentally exclusive (but dynamic!)
set properties such as 'isolated' (etc etc).

	-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/8] Move most GPIO documentation to driver-api/gpio/ and ReST
From: Jonathan Neuschäfer @ 2018-03-10 11:18 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Jonathan Neuschäfer, linux-gpio, linux-kernel, linux-doc,
	Linus Walleij
In-Reply-To: <20180309104120.40244f8a@lwn.net>

[-- Attachment #1: Type: text/plain, Size: 5444 bytes --]

On Fri, Mar 09, 2018 at 10:41:20AM -0700, Jonathan Corbet wrote:
> On Fri,  9 Mar 2018 00:40:16 +0100
> Jonathan Neuschäfer <j.neuschaefer@gmx.net> wrote:
> 
> > The aim of this patchset is to move the GPIO subsystem's documentation
> > under Documentation/driver-api/gpio/ such that it is picked up by Sphinx
> > and compiled into HTML. I moved everything except for sysfs.txt, because
> > this file describes the legacy sysfs ABI, and doesn't seem to serve much
> > (non-historical) purpose anymore.
> > 
> > There are still some rough edges:
> > 
> > * I think the API documentation (kernel-doc) should be moved out of
> >   index.rst into more appropriate files.
> > * The headings are arguably wrong, because driver.rst, consumer.rst,
> >   etc. use the "document title" style, even though they are part of the
> >   GPIO chapter. But the resulting TOC tree is consistent, and I did not
> >   want to change almost all headings.
> > * Some of the files could use more :c:func:`...` references and general
> >   ReST polish.
> > 
> > But I don't want to make this patchset too large.
> 
> [For future reference, if you're going to CC me on most of a patch series,
> I'd really rather get the whole thing so I don't have to go looking for
> it.]

Noted.

> From a quick look, it seems like a reasonable RST conversion to me.  I do
> wonder if sysfs.txt should just be removed, since it documents something
> we don't want people to use at this point?  Historical purposes are well
> served by the repository history.
> 
> I'd say changing the headings is worthwhile if it produces a better
> result.

I just tried this on one file (driver.rst) and it made no difference in
the output (neither in the TOC in index.html nor in driver.html).

Thanks,
Jonathan Neuschäfer


> OTOH I'd be wary of adding too much "polish"; we really want to
> retain the readability of the plain-text files.
> 
> Anyway, I'm happy to take these through the docs tree or see them go via
> GPIO; Linus, what's your preference?
> 
> Thanks,
> 
> jon


----

diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst
index 505ee906d7d9..8eb08005be55 100644
--- a/Documentation/driver-api/gpio/driver.rst
+++ b/Documentation/driver-api/gpio/driver.rst
@@ -1,4 +1,3 @@
-================================
 GPIO Descriptor Driver Interface
 ================================
 
@@ -13,7 +12,7 @@ the structures used to define a GPIO driver:
 
 
 Internal Representation of GPIOs
-================================
+--------------------------------
 
 Inside a GPIO driver, individual GPIOs are identified by their hardware number,
 which is a unique number between 0 and n, n being the number of GPIOs managed by
@@ -36,7 +35,7 @@ identify GPIOs in a bank of I2C GPIO expanders.
 
 
 Controller Drivers: gpio_chip
-=============================
+-----------------------------
 
 In the gpiolib framework each GPIO controller is packaged as a "struct
 gpio_chip" (see linux/gpio/driver.h for its complete definition) with members
@@ -74,7 +73,7 @@ not be required.
 
 
 GPIO electrical configuration
------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 GPIOs can be configured for several electrical modes of operation by using the
 .set_config() callback. Currently this API supports setting debouncing and
@@ -95,7 +94,7 @@ numbers on the pin controller so they can properly cross-reference each other.
 
 
 GPIOs with debounce support
----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Debouncing is a configuration set to a pin indicating that it is connected to
 a mechanical switch or button, or similar that may bounce. Bouncing means the
@@ -112,7 +111,7 @@ is not configurable.
 
 
 GPIOs with open drain/source support
-------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Open drain (CMOS) or open collector (TTL) means the line is not actively driven
 high: instead you provide the drain/collector as output, so when the transistor
@@ -209,7 +208,7 @@ of actively driving the line low, it is set to input.
 
 
 GPIO drivers providing IRQs
----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It is custom that GPIO drivers (GPIO chips) are also providing interrupts,
 most often cascaded off a parent interrupt controller, and in some special
 cases the GPIO logic is melded with a SoC's primary interrupt controller.
@@ -359,7 +358,7 @@ below exists.
 
 
 Locking IRQ usage
------------------
+~~~~~~~~~~~~~~~~~
 Input GPIOs can be used as IRQ signals. When this happens, a driver is requested
 to mark the GPIO as being used as an IRQ::
 
@@ -378,7 +377,7 @@ When using the gpiolib irqchip helpers, these callback are automatically
 assigned.
 
 Real-Time compliance for GPIO IRQ chips
----------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Any provider of irqchips needs to be carefully tailored to support Real Time
 preemption. It is desirable that all irqchips in the GPIO subsystem keep this
@@ -404,7 +403,7 @@ time-compliance:
 
 
 Requesting self-owned GPIO pins
--------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Sometimes it is useful to allow a GPIO chip driver to request its own GPIO
 descriptors through the gpiolib API. Using gpio_request() for this purpose

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related

* Re: [PATCH 0/8] Move most GPIO documentation to driver-api/gpio/ and ReST
From: Linus Walleij @ 2018-03-10 11:50 UTC (permalink / raw)
  To: Jonathan Neuschäfer
  Cc: open list:GPIO SUBSYSTEM, linux-kernel@vger.kernel.org, linux-doc
In-Reply-To: <20180308234024.24145-1-j.neuschaefer@gmx.net>

On Fri, Mar 9, 2018 at 12:40 AM, Jonathan Neuschäfer
<j.neuschaefer@gmx.net> wrote:

> The aim of this patchset is to move the GPIO subsystem's documentation
> under Documentation/driver-api/gpio/ such that it is picked up by Sphinx
> and compiled into HTML.

Awesome!

> I moved everything except for sysfs.txt, because
> this file describes the legacy sysfs ABI, and doesn't seem to serve much
> (non-historical) purpose anymore.
>
> There are still some rough edges:
>
> * I think the API documentation (kernel-doc) should be moved out of
>   index.rst into more appropriate files.
> * The headings are arguably wrong, because driver.rst, consumer.rst,
>   etc. use the "document title" style, even though they are part of the
>   GPIO chapter. But the resulting TOC tree is consistent, and I did not
>   want to change almost all headings.
> * Some of the files could use more :c:func:`...` references and general
>   ReST polish.
>
> But I don't want to make this patchset too large.

It's fine, we take it one step at a time.

On Fri, Mar 09, 2018 at 10:41:20AM -0700, Jonathan Corbet wrote:

> Anyway, I'm happy to take these through the docs tree or see them go via
> GPIO; Linus, what's your preference?

I might have some doc patches that could collide so I can take them.

I take it there will be a v2?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
From: Peter Zijlstra @ 2018-03-10 13:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin
In-Reply-To: <f0fca497-e4e7-3dbd-1e5c-a7f502c05dcb@redhat.com>

On Fri, Mar 09, 2018 at 06:06:29PM -0500, Waiman Long wrote:
> So you are talking about sched_relax_domain_level and

That one I wouldn't be sad to see the back of.

> sched_load_balance.

This one, that's critical. And this is the perfect time to try and fix
the whole isolcpus issue.

The primary issue is that to make equivalent functionality available
through cpuset, we need to basically start all tasks outside the root
group.

The equivalent of isolcpus=xxx is a cgroup setup like:

        root
	/  \
  system    other

Where other has the @xxx cpus and system the remainder and
root.sched_load_balance = 0.

Back before cgroups (and the new workqueue stuff), we could've started
everything in the !root group, no worry. But now that doesn't work,
because a bunch of controllers can't deal with that and everything
cgroup expects the cgroupfs to be empty on boot.

It's one of my biggest regrets that I didn't 'fix' this before cgroups
came along.

> I have not removed any bits. I just haven't exposed
> them yet. It does seem like these 2 control knobs are useful from the
> scheduling perspective. Do we also need cpu_exclusive or just the two
> sched control knobs are enough?

I always forget if we need exclusive for load_balance to work; I'll
peruse the document/code.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/8] Move most GPIO documentation to driver-api/gpio/ and ReST
From: Jonathan Neuschäfer @ 2018-03-10 13:29 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Jonathan Neuschäfer, open list:GPIO SUBSYSTEM,
	linux-kernel@vger.kernel.org, linux-doc
In-Reply-To: <CACRpkda=5i-j=W8+yosF9WiZRKFF=iRC-djja7MJE+k6b1xtaQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

On Sat, Mar 10, 2018 at 12:50:46PM +0100, Linus Walleij wrote:
> On Fri, Mar 9, 2018 at 12:40 AM, Jonathan Neuschäfer
> <j.neuschaefer@gmx.net> wrote:
> 
> > The aim of this patchset is to move the GPIO subsystem's documentation
> > under Documentation/driver-api/gpio/ such that it is picked up by Sphinx
> > and compiled into HTML.
> 
> Awesome!
> 
> > I moved everything except for sysfs.txt, because
> > this file describes the legacy sysfs ABI, and doesn't seem to serve much
> > (non-historical) purpose anymore.
> >
> > There are still some rough edges:
> >
> > * I think the API documentation (kernel-doc) should be moved out of
> >   index.rst into more appropriate files.
> > * The headings are arguably wrong, because driver.rst, consumer.rst,
> >   etc. use the "document title" style, even though they are part of the
> >   GPIO chapter. But the resulting TOC tree is consistent, and I did not
> >   want to change almost all headings.
> > * Some of the files could use more :c:func:`...` references and general
> >   ReST polish.
> >
> > But I don't want to make this patchset too large.
> 
> It's fine, we take it one step at a time.
> 
> On Fri, Mar 09, 2018 at 10:41:20AM -0700, Jonathan Corbet wrote:
> 
> > Anyway, I'm happy to take these through the docs tree or see them go via
> > GPIO; Linus, what's your preference?
> 
> I might have some doc patches that could collide so I can take them.
> 
> I take it there will be a v2?

Not necessarily, but if I need to rebase on a different branch, or on
future linux-next, I'll do that. (Or if some other issue is found.)

I thought that patch 8/8 currently requires commit 93db446a424c ("mtd:
nand: move raw NAND related code to the raw/ subdir"), which is in next
through Boris Brezillon's nand tree, but this isn't true. The patchset
(currently) applies to both v4.16-rc4 and next, AFAICS.


Thanks,
Jonathan Neuschäfer

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH v2 2/2] iio: chemical: sgp30: Support Sensirion SGPxx sensors
From: Andreas Brauchli @ 2018-03-10 22:07 UTC (permalink / raw)
  To: Jonathan Cameron, Hartmut Knaack, Lars-Peter Clausen,
	Peter Meerwald-Stadler, Rob Herring, Mark Rutland,
	Jonathan Corbet
  Cc: linux-iio, linux-kernel, devicetree, linux-doc

Support Sensirion SGP30 and SGPC3 multi-pixel I2C gas sensors

Supported Features:

* Indoor Air Quality (IAQ) concentrations for
  - tVOC (in_concentration_voc_input)
  - CO2eq (in_concentration_co2_input) - SGP30 only
  IAQ concentrations are periodically read out by a background thread
  to allow the sensor to maintain its internal baseline.
* Baseline support for IAQ (in_iaq_baseline, set_iaq_baseline)
* Gas concentration signals
  - Ethanol (in_concentration_ethanol_raw)
  - H2 (in_concentration_h2_raw) - SGP30 only
* On-chip self test (in_selftest)
* Sensor interface version (in_feature_set_version)
* Sensor serial number (in_serial_id)
* Humidity compensation
  With the help of an external humidity signal, the gas signals can be
  humidity-compensated. The sensor performs the compensation on-chip.
* Power mode switching between low power mode (1/2Hz) and
  ultra low power mode (1/30Hz) sampling - SGPC3 only
* Checksummed I2C communication

For all features, refer to the data sheet or the documentation in
Documentation/iio/chemical/sgp30.txt for more details.

The ABI is documented in
Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30

Signed-off-by: Andreas Brauchli <andreas.brauchli@sensirion.com>
---
 .../ABI/testing/sysfs-bus-iio-chemical-sgp30       |   62 ++
 .../bindings/iio/chemical/sensirion,sgp30.txt      |   15 +
 Documentation/iio/chemical/sgp30.txt               |   97 ++
 drivers/iio/chemical/Kconfig                       |   13 +
 drivers/iio/chemical/Makefile                      |    1 +
 drivers/iio/chemical/sgp30.c                       | 1120 ++++++++++++++++++++
 6 files changed, 1308 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
 create mode 100644 Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt
 create mode 100644 Documentation/iio/chemical/sgp30.txt
 create mode 100644 drivers/iio/chemical/sgp30.c

diff --git a/Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30 b/Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
new file mode 100644
index 000000000000..3b97c0bd5878
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
@@ -0,0 +1,62 @@
+what:		/sys/bus/iio/devices/iio:devicex/in_featureset_version
+date:		March 2018
+kernelversion:	4.17
+contact:	andreas brauchli <andreas.brauchli@sensirion.com>
+description:
+		Retrieve the sensor interface version. The sensor-interface should
+		remain forward-compatible across minor versions.
+
+what:		/sys/bus/iio/devices/iio:devicex/in_iaq_baseline
+date:		March 2018
+kernelversion:	4.17
+contact:	andreas brauchli <andreas.brauchli@sensirion.com>
+description:
+		Retrieve the on-chip iaq baseline state. the baseline is an internal
+		state that should not be modified.
+
+What:		/sys/bus/iio/devices/iio:deviceX/set_iaq_baseline
+Date:		March 2018
+KernelVersion:	4.17
+Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
+Description:
+		Restore the on-chip IAQ baseline state to a previous state. Useful for
+		fast-resuming after a restart. A baseline is valid for one week when the
+		sensor is not operated.
+
+What:		/sys/bus/iio/devices/iio:deviceX/in_selftest
+Date:		March 2018
+KernelVersion:	4.17
+Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
+Description:
+		Run the on-chip self test. Output values:
+		* OK
+		* FAILED
+
+What:		/sys/bus/iio/devices/iio:deviceX/in_serial_id
+Date:		March 2018
+KernelVersion:	4.17
+Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
+Description:
+		Retrieve the unique sensor serial number. May be useful to associate
+		with the IAQ baseline when the baseline is stored remotely or the sensor
+		is on a plug-and-play device.
+
+Date:		March 2018
+KernelVersion:	4.17
+Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
+Description:
+		Set the absolute humidity in milligrams per cubic meter (mg/m^3) to
+		enable on-sensor humidity compensation. Affects all gas signals and IAQ
+		values. Accepted values:
+		* 0 (disables humidity compensation)
+		* 1..256000 (mg/m^3)
+
+What:		/sys/bus/iio/devices/iio:deviceX/set_power_mode
+Date:		March 2018
+KernelVersion:	4.17
+Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
+Description:
+		Change the power mode on an SGPC3 with ultra-low power mode support.
+		Accepted values:
+		* "low" (default/startup mode)
+		* "ultra-low" (default/startup mode)
diff --git a/Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt b/Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt
new file mode 100644
index 000000000000..bb909b0f1aba
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt
@@ -0,0 +1,15 @@
+* Sensirion SGPxx multi-pixel Gas Sensor
+
+Required properties:
+
+  - compatible: must be one of
+    "sensirion,sgp30"
+    "sensirion,sgpc3"
+  - reg: the I2C address of the sensor
+
+Example:
+
+sgpxx@58 {
+	compatible = "sensirion,sgp30";
+	reg = <0x58>;
+};
diff --git a/Documentation/iio/chemical/sgp30.txt b/Documentation/iio/chemical/sgp30.txt
new file mode 100644
index 000000000000..c236f7d0fa80
--- /dev/null
+++ b/Documentation/iio/chemical/sgp30.txt
@@ -0,0 +1,97 @@
+sgp30: Industrial IO driver for Sensirion I2C Multi-Pixel Gas Sensors
+
+1. Overview
+
+The sgp30 driver supports the Sensirion SGP30 and SGPC3 multi-pixel gas sensors.
+
+2. Modes of Operation
+
+Postprocessed Indoor Air Quality (IAQ) gas concentrations as well as raw gas
+signals are available. Both signals can be humidity-compensated by the sensor.
+
+2.1. IAQ Gas Concentrations
+
+* tVOC (in_concentration_voc_input) at ppb precision (1e-9)
+* CO2eq (in_concentration_co2_input) at ppm precision (1e-6) -- SGP30 only
+
+2.1.1. IAQ Initialization
+Before Indoor Air Quality (IAQ) values can be read, the IAQ mode must be
+initialized by sending the "iaq_init" command from the data sheet to the sensor.
+The sgp30 driver performs this initialization automatically on instantiation.
+
+After initializing IAQ, at least one IAQ signal must be read out every second
+(SGP30) / every two seconds (SGPC3) for the sensor to correctly maintain its
+internal baseline. The driver continuously performs this task in a background
+thread and caches the latest measurement for retrieval.
+
+For the first 15s of operation after "iaq_init", default values are returned by
+the sensor. Default values are not returned by the driver, instead an -EBUSY
+error is returned.
+
+2.1.2. Pausing and Resuming IAQ
+
+For best performance and faster subsequent startup times, the baseline should be
+saved once every hour, after 12h of operation. The baseline is restored by
+writing the last known baseline back to set_iaq_baseline.
+
+    Saving the baseline:
+    $ baseline=$(cat in_iaq_baseline)
+
+    Restoring the baseline:
+    $ echo -n $baseline > set_iaq_baseline
+
+2.2. Humidity Compensation
+
+The SGP features an on-chip humidity compensation that requires the (in-device)
+environment's absolute humidity.
+
+Set the absolute humidity by writing the absolute humidity concentration (in
+mg/m^3) to set_absolute_humidity. The absolute humidity is obtained by
+converting the relative humidity and temperature. The following units are used:
+AH in mg/m^3, RH in percent (0..100), T in degrees Celsius, and exp() being the
+base-e exponential function.
+
+                  RH            exp(17.62 * T)
+                ----- * 6.112 * --------------
+                100.0            243.12 + T
+  AH = 216.7 * ------------------------------- * 1000
+                        273.15 + T
+
+Writing a value of 0 to set_absolute_humidity disables the humidity
+compensation. Devices not supporting humidity compensation (SGPC3 with feature
+set < 0.6) will report an error (-EINVAL) when setting the humidity.
+
+2.3. On-chip self test
+
+    $ cat in_selftest
+
+in_selftest returns OK or FAILED.
+
+On the SGP30, self test interferes with IAQ operations and requires a
+re-initialization thereof. If a valid baseline is available (after 12h of
+operation), the driver will take care of resuming the operation after the
+selftest, otherwise the "iaq_init" procedure will be restarted.
+
+2.4 SGPC3 Low Power Gas Sensor Features
+
+2.4.1 Accelerated Startup
+
+The SGPC3 can be pre-heated differently by writing a duration to the
+set_iaq_preheat_seconds attribute. Pre-heating causes the sensor to draw more
+power but results in more accurate readings, thus also marketed as accelerated
+startup mode. Sensors prior to feature set 0.6 allow pre-heating durations of 0,
+16, 64 and 184s (other values rounded to the next higher setting). Feature set
+0.6 allows pre-heating for all integer durations up to 300s (a software limit).
+
+Pre-heating occurs on IAQ initialization, thus after changing the power mode or
+setting an IAQ baseline. When the sensor has been operated for a duration of
+12h or more, it is not necessary to pre-heat after short down periods (e.g.
+reboots of < 5min.)
+
+2.4.2 Ultra Low Power Mode (SGPC3 Feature Set 0.6+)
+
+The SGPC3 with feature set 0.6+ features an ultra-low-power mode in which the
+sensor is only polled every 30s instead of every 2s. The polling interval in the
+background thread is adjusted by the driver when changing the power mode with
+set_power_mode. Supported options: {"low", "ultra-low"}. Changing the power mode
+on sensors without power-mode switching results in an -EINVAL error.
diff --git a/drivers/iio/chemical/Kconfig b/drivers/iio/chemical/Kconfig
index 5cb5be7612b4..6f0e2d1240bd 100644
--- a/drivers/iio/chemical/Kconfig
+++ b/drivers/iio/chemical/Kconfig
@@ -38,6 +38,19 @@ config IAQCORE
 	  iAQ-Core Continuous/Pulsed VOC (Volatile Organic Compounds)
 	  sensors
 
+config SENSIRION_SGP30
+	tristate "Sensirion SGPxx gas sensors"
+	depends on I2C
+	select CRC8
+	help
+	  Say Y here to build I2C interface support for the following
+	  Sensirion SGP gas sensors:
+	    * SGP30 gas sensor
+	    * SGPC3 low power gas sensor
+
+	  To compile this driver as module, choose M here: the
+	  module will be called sgp30.
+
 config VZ89X
 	tristate "SGX Sensortech MiCS VZ89X VOC sensor"
 	depends on I2C
diff --git a/drivers/iio/chemical/Makefile b/drivers/iio/chemical/Makefile
index a629b29d1e0b..9aeb3fce1408 100644
--- a/drivers/iio/chemical/Makefile
+++ b/drivers/iio/chemical/Makefile
@@ -6,4 +6,5 @@
 obj-$(CONFIG_ATLAS_PH_SENSOR)	+= atlas-ph-sensor.o
 obj-$(CONFIG_CCS811)		+= ccs811.o
 obj-$(CONFIG_IAQCORE)		+= ams-iaq-core.o
+obj-$(CONFIG_SENSIRION_SGP30)	+= sgp30.o
 obj-$(CONFIG_VZ89X)		+= vz89x.o
diff --git a/drivers/iio/chemical/sgp30.c b/drivers/iio/chemical/sgp30.c
new file mode 100644
index 000000000000..f336a89ad906
--- /dev/null
+++ b/drivers/iio/chemical/sgp30.c
@@ -0,0 +1,1120 @@
+/*
+ * sgp30.c - Support for Sensirion SGP Gas Sensors
+ *
+ * Copyright (C) 2017 Andreas Brauchli <andreas.brauchli@sensirion.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * Datasheets:
+ * https://www.sensirion.com/file/datasheet_sgp30
+ * https://www.sensirion.com/file/datasheet_sgpc3
+ */
+
+#include <linux/crc8.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/init.h>
+#include <linux/i2c.h>
+#include <linux/of_device.h>
+#include <linux/iio/iio.h>
+#include <linux/iio/buffer.h>
+#include <linux/iio/sysfs.h>
+#include <linux/string.h>
+#include <linux/version.h>
+
+#define SGP_WORD_LEN				2
+#define SGP_CRC8_POLYNOMIAL			0x31
+#define SGP_CRC8_INIT				0xff
+#define SGP_CRC8_LEN				1
+#define SGP_CMD(cmd_word)			cpu_to_be16(cmd_word)
+#define SGP_CMD_DURATION_US			12000
+#define SGP_MEASUREMENT_DURATION_US		50000
+#define SGP_SELFTEST_DURATION_US		220000
+#define SGP_CMD_HANDLING_DURATION_US		10000
+#define SGP_CMD_LEN				SGP_WORD_LEN
+#define SGP_CMD_MAX_BUF_SIZE			(SGP_CMD_LEN + 2 * SGP_WORD_LEN)
+#define SGP_MEASUREMENT_LEN			2
+#define SGP30_MEASURE_INTERVAL_HZ		1
+#define SGPC3_MEASURE_INTERVAL_HZ		2
+#define SGPC3_MEASURE_ULP_INTERVAL_HZ		30
+#define SGPC3_POWER_MODE_ULTRA_LOW_POWER	0
+#define SGPC3_POWER_MODE_LOW_POWER		1
+#define SGPC3_DEFAULT_IAQ_INIT_DURATION_HZ	64
+#define SGP_SELFTEST_OK				0xd400
+#define SGP_VERS_PRODUCT(data)	((((data)->feature_set) & 0xf000) >> 12)
+#define SGP_VERS_RESERVED(data)	((((data)->feature_set) & 0x0e00) >> 9)
+#define SGP_VERS_ENG_BIT(data)	((((data)->feature_set) & 0x0100) >> 8)
+#define SGP_VERS_MAJOR(data)	((((data)->feature_set) & 0x00e0) >> 5)
+#define SGP_VERS_MINOR(data)	(((data)->feature_set) & 0x001f)
+
+DECLARE_CRC8_TABLE(sgp_crc8_table);
+
+enum sgp_product_id {
+	SGP30 = 0,
+	SGPC3,
+};
+
+enum sgp30_channel_idx {
+	SGP30_IAQ_TVOC_IDX = 0,
+	SGP30_IAQ_CO2EQ_IDX,
+	SGP30_SIG_ETOH_IDX,
+	SGP30_SIG_H2_IDX,
+};
+
+enum sgpc3_channel_idx {
+	SGPC3_IAQ_TVOC_IDX = 10,
+	SGPC3_SIG_ETOH_IDX,
+};
+
+enum sgp_cmd {
+	SGP_CMD_IAQ_INIT		= SGP_CMD(0x2003),
+	SGP_CMD_IAQ_MEASURE		= SGP_CMD(0x2008),
+	SGP_CMD_GET_BASELINE		= SGP_CMD(0x2015),
+	SGP_CMD_SET_BASELINE		= SGP_CMD(0x201e),
+	SGP_CMD_GET_FEATURE_SET		= SGP_CMD(0x202f),
+	SGP_CMD_GET_SERIAL_ID		= SGP_CMD(0x3682),
+	SGP_CMD_SELFTEST		= SGP_CMD(0x2032),
+	SGP_CMD_SET_ABSOLUTE_HUMIDITY	= SGP_CMD(0x2061),
+
+	SGP30_CMD_MEASURE_SIGNAL	= SGP_CMD(0x2050),
+
+	SGPC3_CMD_IAQ_INIT_0		= SGP_CMD(0x2089),
+	SGPC3_CMD_IAQ_INIT_16		= SGP_CMD(0x2024),
+	SGPC3_CMD_IAQ_INIT_64		= SGP_CMD(0x2003),
+	SGPC3_CMD_IAQ_INIT_184		= SGP_CMD(0x206a),
+	SGPC3_CMD_IAQ_INIT_CONTINUOUS	= SGP_CMD(0x20ae),
+	SGPC3_CMD_MEASURE_RAW		= SGP_CMD(0x2046),
+	SGPC3_CMD_SET_POWER_MODE	= SGP_CMD(0x209f),
+};
+
+struct sgp_version {
+	u8 major;
+	u8 minor;
+};
+
+struct sgp_crc_word {
+	__be16 value;
+	u8 crc8;
+} __attribute__((__packed__));
+
+union sgp_reading {
+	u8 start;
+	struct sgp_crc_word raw_words[4];
+};
+
+enum _iaq_buffer_state {
+	IAQ_BUFFER_EMPTY = 0,
+	IAQ_BUFFER_DEFAULT_VALS,
+	IAQ_BUFFER_VALID,
+};
+
+struct sgp_data {
+	struct i2c_client *client;
+	struct task_struct *iaq_thread;
+	struct mutex data_lock;
+	unsigned long iaq_init_start_jiffies;
+	unsigned long iaq_init_duration_jiffies;
+	unsigned long iaq_defval_skip_jiffies;
+	u64 serial_id;
+	void (*iaq_init_fn)(struct sgp_data *data);
+	u32 iaq_init_user_duration;
+	u16 product_id;
+	u16 feature_set;
+	unsigned long measure_interval_jiffies;
+	enum sgp_cmd iaq_init_cmd;
+	enum sgp_cmd measure_iaq_cmd;
+	enum sgp_cmd measure_gas_signals_cmd;
+	u8 baseline_len;
+	u16 user_baseline[2];
+	u16 power_mode;
+	union sgp_reading buffer;
+	union sgp_reading iaq_buffer;
+	enum _iaq_buffer_state iaq_buffer_state;
+	bool supports_humidity_compensation;
+	bool supports_power_mode;
+};
+
+struct sgp_device {
+	const struct iio_chan_spec *channels;
+	int num_channels;
+};
+
+static const struct sgp_version supported_versions_sgp30[] = {
+	{
+		.major = 1,
+		.minor = 0,
+	},
+};
+
+static const struct sgp_version supported_versions_sgpc3[] = {
+	{
+		.major = 0,
+		.minor = 4,
+	},
+};
+
+static const struct iio_chan_spec sgp30_channels[] = {
+	{
+		.type = IIO_CONCENTRATION,
+		.channel2 = IIO_MOD_VOC,
+		.datasheet_name = "TVOC signal",
+		.modified = 1,
+		.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
+		.address = SGP30_IAQ_TVOC_IDX,
+	},
+	{
+		.type = IIO_CONCENTRATION,
+		.channel2 = IIO_MOD_CO2,
+		.datasheet_name = "CO2eq signal",
+		.modified = 1,
+		.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
+		.address = SGP30_IAQ_CO2EQ_IDX,
+	},
+	{
+		.type = IIO_CONCENTRATION,
+		.channel2 = IIO_MOD_ETHANOL,
+		.info_mask_separate =
+			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
+		.address = SGP30_SIG_ETOH_IDX,
+		.datasheet_name = "Ethanol signal",
+		.modified = 1,
+		.scan_type = {
+			.endianness = IIO_BE,
+		},
+	},
+	{
+		.type = IIO_CONCENTRATION,
+		.channel2 = IIO_MOD_H2,
+		.info_mask_separate =
+			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
+		.address = SGP30_SIG_H2_IDX,
+		.datasheet_name = "H2 signal",
+		.modified = 1,
+		.scan_type = {
+			.endianness = IIO_BE,
+		},
+	},
+};
+
+static const struct iio_chan_spec sgpc3_channels[] = {
+	{
+		.type = IIO_CONCENTRATION,
+		.channel2 = IIO_MOD_VOC,
+		.datasheet_name = "TVOC signal",
+		.modified = 1,
+		.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
+		.address = SGPC3_IAQ_TVOC_IDX,
+	},
+	{
+		.type = IIO_CONCENTRATION,
+		.channel2 = IIO_MOD_ETHANOL,
+		.info_mask_separate =
+			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
+		.address = SGPC3_SIG_ETOH_IDX,
+		.datasheet_name = "Ethanol signal",
+		.modified = 1,
+		.scan_type = {
+			.endianness = IIO_BE,
+		},
+	},
+};
+
+static const struct sgp_device sgp_devices[] = {
+	[SGP30] = {
+		.channels = sgp30_channels,
+		.num_channels = ARRAY_SIZE(sgp30_channels),
+	},
+	[SGPC3] = {
+		.channels = sgpc3_channels,
+		.num_channels = ARRAY_SIZE(sgpc3_channels),
+	},
+};
+
+/**
+ * sgp_verify_buffer() - verify the checksums of the data buffer words
+ *
+ * @data:       SGP data
+ * @buf:        Raw data buffer
+ * @word_count: Num data words stored in the buffer, excluding CRC bytes
+ *
+ * Return:      0 on success, negative error otherwise.
+ */
+static int sgp_verify_buffer(const struct sgp_data *data,
+			     union sgp_reading *buf, size_t word_count)
+{
+	size_t size = word_count * (SGP_WORD_LEN + SGP_CRC8_LEN);
+	int i;
+	u8 crc;
+	u8 *data_buf = &buf->start;
+
+	for (i = 0; i < size; i += SGP_WORD_LEN + SGP_CRC8_LEN) {
+		crc = crc8(sgp_crc8_table, &data_buf[i], SGP_WORD_LEN,
+			   SGP_CRC8_INIT);
+		if (crc != data_buf[i + SGP_WORD_LEN]) {
+			dev_err(&data->client->dev, "CRC error\n");
+			return -EIO;
+		}
+	}
+	return 0;
+}
+
+/**
+ * sgp_read_cmd() - reads data from sensor after issuing a command
+ * The caller must hold data->data_lock for the duration of the call.
+ * @data:        SGP data
+ * @cmd:         SGP Command to issue
+ * @buf:         Raw data buffer to use
+ * @word_count:  Num words to read, excluding CRC bytes
+ *
+ * Return:       0 on success, negative error otherwise.
+ */
+static int sgp_read_cmd(struct sgp_data *data, enum sgp_cmd cmd,
+			union sgp_reading *buf, size_t word_count,
+			unsigned long duration_us)
+{
+	int ret;
+	struct i2c_client *client = data->client;
+	size_t size = word_count * (SGP_WORD_LEN + SGP_CRC8_LEN);
+	u8 *data_buf;
+
+	ret = i2c_master_send(client, (const char *)&cmd, SGP_CMD_LEN);
+	if (ret != SGP_CMD_LEN)
+		return -EIO;
+	usleep_range(duration_us, duration_us + 1000);
+
+	if (word_count == 0)
+		return 0;
+
+	data_buf = &buf->start;
+	ret = i2c_master_recv(client, data_buf, size);
+	if (ret < 0)
+		return ret;
+	if (ret != size)
+		return -EIO;
+
+	return sgp_verify_buffer(data, buf, word_count);
+}
+
+/**
+ * sgp_i2c_write_cmd() - write data to SGP sensor with a command
+ * @data:       SGP data
+ * @cmd:        SGP Command to issue
+ * @buf:        Data to write
+ * @buf_size:   Data size of the buffer
+ *
+ * Return:      0 on success, negative error otherwise.
+ */
+static int sgp_write_cmd(struct sgp_data *data, enum sgp_cmd cmd, u16 *buf,
+			 size_t buf_size, unsigned long duration_us)
+{
+	int ret, ix;
+	u16 buf_idx;
+	u8 buffer[SGP_CMD_MAX_BUF_SIZE];
+
+	/* assemble buffer */
+	*((__be16 *)&buffer[0]) = cmd;
+	buf_idx = SGP_CMD_LEN;
+	for (ix = 0; ix < buf_size; ix++) {
+		*((__be16 *)&buffer[buf_idx]) = cpu_to_be16(buf[ix]);
+		buf_idx += SGP_WORD_LEN;
+		buffer[buf_idx] = crc8(sgp_crc8_table,
+				       &buffer[buf_idx - SGP_WORD_LEN],
+				       SGP_WORD_LEN, SGP_CRC8_INIT);
+		buf_idx += SGP_CRC8_LEN;
+	}
+	ret = i2c_master_send(data->client, buffer, buf_idx);
+	if (ret != buf_idx)
+		return -EIO;
+	ret = 0;
+	usleep_range(duration_us, duration_us + 1000);
+
+	return ret;
+}
+
+/**
+ * sgp_measure_iaq() - measure and retrieve IAQ values from sensor
+ * The caller must hold data->data_lock for the duration of the call.
+ * @data:       SGP data
+ *
+ * Return:      0 on success, -EBUSY on default values, negative error
+ *              otherwise.
+ */
+
+static int sgp_measure_iaq(struct sgp_data *data)
+{
+	int ret;
+	/* data contains default values */
+	bool default_vals = !time_after(jiffies, data->iaq_init_start_jiffies +
+						 data->iaq_defval_skip_jiffies);
+
+	ret = sgp_read_cmd(data, data->measure_iaq_cmd, &data->iaq_buffer,
+			   SGP_MEASUREMENT_LEN, SGP_MEASUREMENT_DURATION_US);
+	if (ret < 0)
+		return ret;
+
+	data->iaq_buffer_state = IAQ_BUFFER_DEFAULT_VALS;
+
+	if (default_vals)
+		return -EBUSY;
+
+	data->iaq_buffer_state = IAQ_BUFFER_VALID;
+	return 0;
+}
+
+static void sgp_iaq_thread_sleep_until(const struct sgp_data *data,
+				       unsigned long sleep_jiffies)
+{
+	const long IAQ_POLL = 50000;
+
+	while (!time_after(jiffies, sleep_jiffies)) {
+		usleep_range(IAQ_POLL, IAQ_POLL + 10000);
+		if (kthread_should_stop() || data->iaq_init_start_jiffies == 0)
+			return;
+	}
+}
+
+static int sgp_iaq_threadfn(void *p)
+{
+	struct sgp_data *data = (struct sgp_data *)p;
+	unsigned long next_update_jiffies;
+	int ret;
+
+	while (!kthread_should_stop()) {
+		mutex_lock(&data->data_lock);
+		if (data->iaq_init_start_jiffies == 0) {
+			if (data->iaq_init_fn)
+				data->iaq_init_fn(data);
+			ret = sgp_read_cmd(data, data->iaq_init_cmd, NULL, 0,
+					   SGP_CMD_DURATION_US);
+			if (ret < 0)
+				goto unlock_sleep_continue;
+			data->iaq_init_start_jiffies = jiffies;
+			if (data->user_baseline[0]) {
+				ret = sgp_write_cmd(data, SGP_CMD_SET_BASELINE,
+						    data->user_baseline,
+						    data->baseline_len,
+						    SGP_CMD_DURATION_US);
+				if (ret < 0) {
+					dev_err(&data->client->dev,
+						"IAQ set baseline failed [%d]\n",
+						ret);
+					goto unlock_sleep_continue;
+				}
+			}
+			if (data->iaq_init_duration_jiffies) {
+				mutex_unlock(&data->data_lock);
+				sgp_iaq_thread_sleep_until(data,
+					    data->iaq_init_start_jiffies +
+					    data->iaq_init_duration_jiffies);
+				continue;
+			}
+		}
+
+		ret = sgp_measure_iaq(data);
+		if (ret && ret != -EBUSY) {
+			dev_warn(&data->client->dev, "IAQ measurement error [%d]\n",
+				 ret);
+		}
+unlock_sleep_continue:
+		next_update_jiffies = jiffies + data->measure_interval_jiffies;
+		mutex_unlock(&data->data_lock);
+		sgp_iaq_thread_sleep_until(data, next_update_jiffies);
+	}
+	return 0;
+}
+
+static ssize_t sgp_absolute_humidity_store(struct device *dev,
+					   __attribute__((unused))
+					   struct device_attribute *attr,
+					   const char *buf, size_t count)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+	u32 ah;
+	u16 ah_scaled;
+	int ret;
+
+	if (!data->supports_humidity_compensation)
+		return -EINVAL;
+
+	ret = kstrtou32(buf, 10, &ah);
+	if (ret != 1 || ah < 0 || ah > 256000)
+		return -EINVAL;
+
+	/* ah_scaled = (u16)((ah / 1000.0) * 256.0) */
+	ah_scaled = (u16)(((u64)ah * 256 * 16777) >> 24);
+
+	/* don't disable AH compensation due to rounding */
+	if (ah > 0 && ah_scaled == 0)
+		ah_scaled = 1;
+
+	mutex_lock(&data->data_lock);
+	ret = sgp_write_cmd(data, SGP_CMD_SET_ABSOLUTE_HUMIDITY,
+			    &ah_scaled, 1, SGP_CMD_DURATION_US);
+	mutex_unlock(&data->data_lock);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static int sgp_read_raw(struct iio_dev *indio_dev,
+			struct iio_chan_spec const *chan, int *val,
+			int *val2, long mask)
+{
+	struct sgp_data *data = iio_priv(indio_dev);
+	struct sgp_crc_word *words;
+	int ret;
+
+	switch (mask) {
+	case IIO_CHAN_INFO_PROCESSED:
+		mutex_lock(&data->data_lock);
+		if (data->iaq_buffer_state != IAQ_BUFFER_VALID) {
+			mutex_unlock(&data->data_lock);
+			return -EBUSY;
+		}
+		words = data->iaq_buffer.raw_words;
+		switch (chan->address) {
+		case SGP30_IAQ_TVOC_IDX:
+		case SGPC3_IAQ_TVOC_IDX:
+			*val = 0;
+			*val2 = be16_to_cpu(words[1].value);
+			ret = IIO_VAL_INT_PLUS_NANO;
+			break;
+		case SGP30_IAQ_CO2EQ_IDX:
+			*val = 0;
+			*val2 = be16_to_cpu(words[0].value);
+			ret = IIO_VAL_INT_PLUS_MICRO;
+			break;
+		default:
+			ret = -EINVAL;
+			break;
+		}
+		mutex_unlock(&data->data_lock);
+		break;
+	case IIO_CHAN_INFO_RAW:
+		mutex_lock(&data->data_lock);
+		if (chan->address == SGPC3_SIG_ETOH_IDX) {
+			if (data->iaq_buffer_state == IAQ_BUFFER_EMPTY)
+				ret = -EBUSY;
+			else
+				ret = 0;
+			words = data->iaq_buffer.raw_words;
+		} else {
+			ret = sgp_read_cmd(data, data->measure_gas_signals_cmd,
+					   &data->buffer, SGP_MEASUREMENT_LEN,
+					   SGP_MEASUREMENT_DURATION_US);
+			words = data->buffer.raw_words;
+		}
+		if (ret) {
+			mutex_unlock(&data->data_lock);
+			return ret;
+		}
+
+		switch (chan->address) {
+		case SGP30_SIG_ETOH_IDX:
+			*val = be16_to_cpu(words[1].value);
+			ret = IIO_VAL_INT;
+			break;
+		case SGPC3_SIG_ETOH_IDX:
+		case SGP30_SIG_H2_IDX:
+			*val = be16_to_cpu(words[0].value);
+			ret = IIO_VAL_INT;
+			break;
+		}
+		mutex_unlock(&data->data_lock);
+		break;
+	case IIO_CHAN_INFO_SCALE:
+		switch (chan->address) {
+		case SGP30_SIG_ETOH_IDX:
+		case SGPC3_SIG_ETOH_IDX:
+		case SGP30_SIG_H2_IDX:
+			*val = 0;
+			*val2 = 1953125;
+			return IIO_VAL_INT_PLUS_NANO;
+		default:
+			return -EINVAL;
+		}
+	default:
+		return -EINVAL;
+	}
+	return ret;
+}
+
+static void sgp_restart_iaq_thread(struct sgp_data *data)
+{
+	mutex_lock(&data->data_lock);
+	data->iaq_buffer_state = IAQ_BUFFER_EMPTY;
+	data->iaq_init_start_jiffies = 0;
+	mutex_unlock(&data->data_lock);
+}
+
+static void sgpc3_iaq_init(struct sgp_data *data)
+{
+	int skip_cycles;
+
+	data->iaq_init_duration_jiffies = data->iaq_init_user_duration * HZ;
+	if (data->supports_power_mode) {
+		data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_CONTINUOUS;
+
+		if (data->power_mode == SGPC3_POWER_MODE_ULTRA_LOW_POWER) {
+			data->measure_interval_jiffies =
+				SGPC3_MEASURE_ULP_INTERVAL_HZ * HZ;
+		} else {
+			data->measure_interval_jiffies =
+				SGPC3_MEASURE_INTERVAL_HZ * HZ;
+		}
+
+		if (data->user_baseline[0])
+			skip_cycles = 1;
+		else if (data->power_mode == SGPC3_POWER_MODE_ULTRA_LOW_POWER)
+			skip_cycles = 2;
+		else
+			skip_cycles = 11;
+	} else {
+		switch (data->iaq_init_user_duration) {
+		case 0:
+			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_0;
+			skip_cycles = 1;
+			break;
+		case 16:
+			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_16;
+			skip_cycles = 9;
+			break;
+		case 64:
+			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_64;
+			skip_cycles = 33;
+			break;
+		case 184:
+			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_184;
+			skip_cycles = 93;
+			break;
+		default:
+			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_64;
+			skip_cycles = 33;
+			break;
+		}
+		if (!data->user_baseline[0])
+			skip_cycles += 10;
+	}
+	data->iaq_defval_skip_jiffies = data->iaq_init_duration_jiffies +
+					(skip_cycles *
+					 data->measure_interval_jiffies);
+}
+
+static ssize_t sgp_iaq_preheat_store(struct device *dev,
+				     __attribute__((unused))
+				     struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+	u32 init_duration;
+	int ret;
+
+	if (SGP_VERS_PRODUCT(data) != SGPC3)
+		return -EINVAL;
+
+	mutex_lock(&data->data_lock);
+	ret = kstrtou32(buf, 10, &init_duration);
+	if (ret) {
+		ret = -EINVAL;
+		goto unlock_fail;
+	}
+
+	if (data->supports_power_mode) {
+		if (init_duration > 300) {
+			ret = -EINVAL;
+			goto unlock_fail;
+		}
+	} else {
+		if (init_duration == 0) {
+		} else if (init_duration <= 16) {
+			init_duration = 16;
+		} else if (init_duration <= 64) {
+			init_duration = 64;
+		} else if (init_duration <= 184) {
+			init_duration = 184;
+		} else {
+			ret = -EINVAL;
+			goto unlock_fail;
+		}
+	}
+
+	data->iaq_init_user_duration = init_duration;
+	mutex_unlock(&data->data_lock);
+
+	return count;
+
+unlock_fail:
+	mutex_unlock(&data->data_lock);
+	return ret;
+}
+
+static ssize_t sgp_power_mode_store(struct device *dev,
+				    __attribute__((unused))
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+	u16 power_mode;
+	int ret;
+	const char ULTRA_LOW[] = "ultra-low";
+	const char LOW[] = "low";
+
+	if (!data->supports_power_mode)
+		return -EINVAL;
+
+	if (strncmp(buf, ULTRA_LOW, sizeof(ULTRA_LOW) - 1) == 0)
+		power_mode = SGPC3_POWER_MODE_ULTRA_LOW_POWER;
+	else if (strncmp(buf, LOW, sizeof(LOW) - 1) == 0)
+		power_mode = SGPC3_POWER_MODE_LOW_POWER;
+	else
+		return -EINVAL;
+
+	mutex_lock(&data->data_lock);
+	if (power_mode == data->power_mode) {
+		mutex_unlock(&data->data_lock);
+		return count;
+	}
+
+	/* Switching power mode invalidates any set baselines */
+	if (power_mode != data->power_mode) {
+		data->user_baseline[0] = 0;
+		data->user_baseline[1] = 0;
+	}
+
+	ret = sgp_write_cmd(data, SGPC3_CMD_SET_POWER_MODE,
+			    &power_mode, 1, SGP_CMD_DURATION_US);
+	if (ret) {
+		mutex_unlock(&data->data_lock);
+		return ret;
+	}
+	data->power_mode = power_mode;
+	mutex_unlock(&data->data_lock);
+
+	sgp_restart_iaq_thread(data);
+	return count;
+}
+
+static int sgp_get_baseline(struct sgp_data *data, u16 *baseline_words)
+{
+	u16 baseline_word;
+	int ret, ix, jx;
+
+	mutex_lock(&data->data_lock);
+	ret = sgp_read_cmd(data, SGP_CMD_GET_BASELINE, &data->buffer,
+			   data->baseline_len, SGP_CMD_DURATION_US);
+	if (ret < 0)
+		goto unlock_fail;
+
+	for (ix = 0, jx = data->baseline_len - 1; ix < data->baseline_len;
+	     ix++, jx--) {
+		baseline_word = be16_to_cpu(data->buffer.raw_words[jx].value);
+		baseline_words[ix] = baseline_word;
+	}
+
+	if (!baseline_words[0])
+		ret = -EBUSY;
+
+unlock_fail:
+	mutex_unlock(&data->data_lock);
+	return ret;
+}
+
+static ssize_t sgp_iaq_baseline_show(struct device *dev,
+				     __attribute__((unused))
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	int ret;
+	u16 baseline_words[2];
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+
+	ret = sgp_get_baseline(data, baseline_words);
+	if (ret < 0)
+		return ret;
+
+	if (data->baseline_len == 1)
+		return sprintf(buf, "%04hx\n", baseline_words[0]);
+
+	return sprintf(buf, "%04hx%04hx\n", baseline_words[0],
+		       baseline_words[1]);
+}
+
+/**
+ * Retrieve the sensor's baseline and report whether it's valid for persistence
+ * @baseline:   sensor's current baseline
+ *
+ * Return:      1 if the baseline was set manually or the sensor has been
+ *              operating for at least 12h, -EBUSY if the baseline is not yet
+ *              valid, another negative error otherwise.
+ */
+static int sgp_is_baseline_valid(struct sgp_data *data, u16 *baseline_words)
+{
+	int ret;
+
+	ret = sgp_get_baseline(data, baseline_words);
+	if (ret < 0)
+		return ret;
+
+	mutex_lock(&data->data_lock);
+	ret = (data->user_baseline[0] ||
+	       time_after(jiffies,
+			  data->iaq_init_start_jiffies + 60 * 60 * 12 * HZ));
+	mutex_unlock(&data->data_lock);
+
+	return ret;
+}
+
+static void sgp_set_baseline(struct sgp_data *data, u16 *baseline_words)
+{
+	mutex_lock(&data->data_lock);
+	data->user_baseline[0] = baseline_words[0];
+	if (data->baseline_len == 2)
+		data->user_baseline[1] = baseline_words[1];
+	mutex_unlock(&data->data_lock);
+	sgp_restart_iaq_thread(data);
+}
+
+static ssize_t sgp_iaq_baseline_store(struct device *dev,
+				      __attribute__((unused))
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+	int newline = (count > 0 && buf[count - 1] == '\n');
+	u16 words[2];
+	int ret = 0;
+
+	if (count - newline == (data->baseline_len * 4)) {
+		if (data->baseline_len == 1)
+			ret = sscanf(buf, "%04hx", &words[0]);
+		else
+			ret = sscanf(buf, "%04hx%04hx", &words[0], &words[1]);
+	}
+
+	if (ret != data->baseline_len) {
+		dev_err(&data->client->dev, "invalid baseline format\n");
+		return -EINVAL;
+	}
+
+	sgp_set_baseline(data, words);
+	return count;
+}
+
+static ssize_t sgp_selftest_show(struct device *dev,
+				 __attribute__((unused))
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+	u16 baseline_words[2];
+	u16 measure_test;
+	int baseline_valid = 0;
+	int ret;
+
+	if (data->product_id == SGP30) {
+		/* On the SGP30, the self-test interferes with iaq_init */
+		baseline_valid = sgp_is_baseline_valid(data, baseline_words);
+	}
+
+	mutex_lock(&data->data_lock);
+	ret = sgp_read_cmd(data, SGP_CMD_SELFTEST, &data->buffer,
+			   1, SGP_SELFTEST_DURATION_US);
+	if (ret < 0) {
+		mutex_unlock(&data->data_lock);
+		return ret;
+	}
+
+	measure_test = be16_to_cpu(data->buffer.raw_words[0].value);
+	mutex_unlock(&data->data_lock);
+
+	if (data->product_id == SGP30) {
+		if (baseline_valid > 0)
+			sgp_set_baseline(data, baseline_words);
+		else
+			sgp_restart_iaq_thread(data);
+	}
+
+	return sprintf(buf, "%s\n",
+		       measure_test ^ SGP_SELFTEST_OK ? "FAILED" : "OK");
+}
+
+static ssize_t sgp_serial_id_show(struct device *dev,
+				 __attribute__((unused))
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+
+	return sprintf(buf, "%llu\n", data->serial_id);
+}
+
+static ssize_t sgp_feature_set_version_show(struct device *dev,
+					    __attribute__((unused))
+					    struct device_attribute *attr,
+					    char *buf)
+{
+	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
+
+	return sprintf(buf, "%hu.%hu\n", (data->feature_set & 0x00e0) >> 5,
+		       data->feature_set & 0x001f);
+}
+
+static int sgp_get_serial_id(struct sgp_data *data)
+{
+	int ret;
+	struct sgp_crc_word *words;
+
+	mutex_lock(&data->data_lock);
+	ret = sgp_read_cmd(data, SGP_CMD_GET_SERIAL_ID, &data->buffer, 3,
+			   SGP_CMD_DURATION_US);
+	if (ret < 0)
+		goto unlock_fail;
+
+	words = data->buffer.raw_words;
+	data->serial_id = (u64)(be16_to_cpu(words[2].value) & 0xffff)       |
+			  (u64)(be16_to_cpu(words[1].value) & 0xffff) << 16 |
+			  (u64)(be16_to_cpu(words[0].value) & 0xffff) << 32;
+
+unlock_fail:
+	mutex_unlock(&data->data_lock);
+	return ret;
+}
+
+static int sgp_check_compat(struct sgp_data *data,
+			    unsigned int product_id)
+{
+	struct sgp_version *supported_versions;
+	u16 ix, num_fs;
+	u16 product = SGP_VERS_PRODUCT(data);
+	u16 reserved = SGP_VERS_RESERVED(data);
+	u16 major = SGP_VERS_MAJOR(data);
+	u16 minor = SGP_VERS_MINOR(data);
+
+	/* driver does not match product */
+	if (product != product_id) {
+		dev_err(&data->client->dev,
+			"sensor reports a different product: 0x%04hx\n",
+			product);
+		return -ENODEV;
+	}
+	if (reserved) {
+		dev_warn(&data->client->dev, "reserved bits set: 0x%04hx\n",
+			 reserved);
+	}
+	/* engineering samples are not supported: no interface guarantees */
+	if (SGP_VERS_ENG_BIT(data))
+		return -ENODEV;
+
+	switch (product) {
+	case SGP30:
+		supported_versions =
+			(struct sgp_version *)supported_versions_sgp30;
+		num_fs = ARRAY_SIZE(supported_versions_sgp30);
+		break;
+	case SGPC3:
+		supported_versions =
+			(struct sgp_version *)supported_versions_sgpc3;
+		num_fs = ARRAY_SIZE(supported_versions_sgpc3);
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	for (ix = 0; ix < num_fs; ix++) {
+		if (major == supported_versions[ix].major &&
+		    minor >= supported_versions[ix].minor)
+			return 0;
+	}
+	dev_err(&data->client->dev, "unsupported sgp version: %d.%d\n",
+		major, minor);
+	return -ENODEV;
+}
+
+static void sgp_init(struct sgp_data *data)
+{
+	data->iaq_init_cmd = SGP_CMD_IAQ_INIT;
+	data->iaq_init_start_jiffies = 0;
+	data->iaq_init_duration_jiffies = 0;
+	data->iaq_init_user_duration = 0;
+	data->iaq_init_fn = NULL;
+	data->iaq_buffer_state = IAQ_BUFFER_EMPTY;
+	data->user_baseline[0] = 0;
+	data->user_baseline[1] = 0;
+	switch (SGP_VERS_PRODUCT(data)) {
+	case SGP30:
+		data->measure_interval_jiffies = SGP30_MEASURE_INTERVAL_HZ * HZ;
+		data->measure_iaq_cmd = SGP_CMD_IAQ_MEASURE;
+		data->measure_gas_signals_cmd = SGP30_CMD_MEASURE_SIGNAL;
+		data->product_id = SGP30;
+		data->baseline_len = 2;
+		data->supports_humidity_compensation = true;
+		data->iaq_defval_skip_jiffies = 15 * HZ;
+		break;
+	case SGPC3:
+		data->measure_interval_jiffies = SGPC3_MEASURE_INTERVAL_HZ * HZ;
+		data->measure_iaq_cmd = SGPC3_CMD_MEASURE_RAW;
+		data->measure_gas_signals_cmd = SGPC3_CMD_MEASURE_RAW;
+		data->product_id = SGPC3;
+		data->baseline_len = 1;
+		data->power_mode = SGPC3_POWER_MODE_LOW_POWER;
+		data->iaq_init_user_duration =
+			SGPC3_DEFAULT_IAQ_INIT_DURATION_HZ;
+		if (SGP_VERS_MAJOR(data) == 0 && SGP_VERS_MINOR(data) >= 6) {
+			data->supports_humidity_compensation = true;
+			data->supports_power_mode = true;
+		}
+		data->iaq_init_fn = &sgpc3_iaq_init;
+		break;
+	};
+	data->iaq_thread = kthread_run(sgp_iaq_threadfn, data,
+				       "%s-iaq", data->client->name);
+}
+
+static IIO_DEVICE_ATTR(in_serial_id, 0444, sgp_serial_id_show, NULL, 0);
+static IIO_DEVICE_ATTR(in_feature_set_version, 0444,
+		       sgp_feature_set_version_show, NULL, 0);
+static IIO_DEVICE_ATTR(in_selftest, 0444, sgp_selftest_show, NULL, 0);
+static IIO_DEVICE_ATTR(set_iaq_preheat_seconds, 0220, NULL,
+		       sgp_iaq_preheat_store, 0);
+static IIO_DEVICE_ATTR(in_iaq_baseline, 0444, sgp_iaq_baseline_show, NULL, 0);
+static IIO_DEVICE_ATTR(set_iaq_baseline, 0220, NULL, sgp_iaq_baseline_store, 0);
+static IIO_DEVICE_ATTR(set_absolute_humidity, 0220, NULL,
+		       sgp_absolute_humidity_store, 0);
+static IIO_DEVICE_ATTR(set_power_mode, 0220, NULL,
+		       sgp_power_mode_store, 0);
+
+static struct attribute *sgp_attributes[] = {
+	&iio_dev_attr_in_serial_id.dev_attr.attr,
+	&iio_dev_attr_in_feature_set_version.dev_attr.attr,
+	&iio_dev_attr_in_selftest.dev_attr.attr,
+	&iio_dev_attr_in_iaq_baseline.dev_attr.attr,
+	&iio_dev_attr_set_iaq_baseline.dev_attr.attr,
+	&iio_dev_attr_set_iaq_preheat_seconds.dev_attr.attr,
+	&iio_dev_attr_set_absolute_humidity.dev_attr.attr,
+	&iio_dev_attr_set_power_mode.dev_attr.attr,
+	NULL
+};
+
+static const struct attribute_group sgp_attr_group = {
+	.attrs = sgp_attributes,
+};
+
+static const struct iio_info sgp_info = {
+	.attrs		= &sgp_attr_group,
+	.read_raw	= sgp_read_raw,
+};
+
+static const struct of_device_id sgp_dt_ids[] = {
+	{ .compatible = "sensirion,sgp30", .data = (void *)SGP30 },
+	{ .compatible = "sensirion,sgpc3", .data = (void *)SGPC3 },
+	{ }
+};
+
+static int sgp_probe(struct i2c_client *client,
+		     const struct i2c_device_id *id)
+{
+	struct iio_dev *indio_dev;
+	struct sgp_data *data;
+	const struct sgp_device *chip;
+	const struct of_device_id *of_id;
+	unsigned long product_id;
+	int ret;
+
+	indio_dev = devm_iio_device_alloc(&client->dev, sizeof(*data));
+	if (!indio_dev)
+		return -ENOMEM;
+
+	of_id = of_match_device(sgp_dt_ids, &client->dev);
+	if (of_id)
+		product_id = (unsigned long)of_id->data;
+	else
+		product_id = id->driver_data;
+
+	chip = &sgp_devices[product_id];
+	data = iio_priv(indio_dev);
+	i2c_set_clientdata(client, indio_dev);
+	data->client = client;
+	crc8_populate_msb(sgp_crc8_table, SGP_CRC8_POLYNOMIAL);
+	mutex_init(&data->data_lock);
+
+	/* get serial id and write it to client data */
+	ret = sgp_get_serial_id(data);
+	if (ret < 0)
+		return ret;
+
+	/* get feature set version and write it to client data */
+	ret = sgp_read_cmd(data, SGP_CMD_GET_FEATURE_SET, &data->buffer, 1,
+			   SGP_CMD_DURATION_US);
+	if (ret < 0)
+		return ret;
+
+	data->feature_set = be16_to_cpu(data->buffer.raw_words[0].value);
+
+	ret = sgp_check_compat(data, product_id);
+	if (ret)
+		return ret;
+
+	indio_dev->dev.parent = &client->dev;
+	indio_dev->info = &sgp_info;
+	indio_dev->name = id->name;
+	indio_dev->modes = INDIO_DIRECT_MODE;
+
+	indio_dev->channels = chip->channels;
+	indio_dev->num_channels = chip->num_channels;
+
+	ret = devm_iio_device_register(&client->dev, indio_dev);
+	if (ret) {
+		dev_err(&client->dev, "failed to register iio device\n");
+		return ret;
+	}
+
+	sgp_init(data);
+	return ret;
+}
+
+static int sgp_remove(struct i2c_client *client)
+{
+	struct iio_dev *indio_dev = i2c_get_clientdata(client);
+	struct sgp_data *data = iio_priv(indio_dev);
+
+	if (data->iaq_thread)
+		kthread_stop(data->iaq_thread);
+	return 0;
+}
+
+static const struct i2c_device_id sgp_id[] = {
+	{ "sgp30", SGP30 },
+	{ "sgpc3", SGPC3 },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(i2c, sgp_id);
+MODULE_DEVICE_TABLE(of, sgp_dt_ids);
+
+static struct i2c_driver sgp_driver = {
+	.driver = {
+		.name	= "sgp30",
+		.of_match_table = of_match_ptr(sgp_dt_ids),
+	},
+	.probe = sgp_probe,
+	.remove = sgp_remove,
+	.id_table = sgp_id,
+};
+module_i2c_driver(sgp_driver);
+
+MODULE_AUTHOR("Andreas Brauchli <andreas.brauchli@sensirion.com>");
+MODULE_AUTHOR("Pascal Sachs <pascal.sachs@sensirion.com>");
+MODULE_DESCRIPTION("Sensirion SGP gas sensors");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION("0.6.0");
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v3 14/15] selinux: allow setxattr on rootfs so initramfs code can set them
From: Victor Kamensky @ 2018-03-11  3:07 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Taras Kondratiuk, H. Peter Anvin, Al Viro, Arnd Bergmann,
	Rob Landley, Mimi Zohar, Jonathan Corbet, James McMechan,
	initramfs, linux-doc, linux-kernel, linux-security-module,
	xe-linux-external, Paul Moore, Eric Paris
In-Reply-To: <1519153284.14218.18.camel@tycho.nsa.gov>



On Tue, 20 Feb 2018, Stephen Smalley wrote:

> On Fri, 2018-02-16 at 20:33 +0000, Taras Kondratiuk wrote:
>> From: Victor Kamensky <kamensky@cisco.com>
>>
>> initramfs code supporting extended cpio format have ability to
>> fill extended attributes from cpio archive, but if SELinux enabled
>> and security server is not initialized yet, selinux callback would
>> refuse setxattr made by initramfs code.
>>
>> Solution enable SBLABEL_MNT on rootfs even if secrurity server is
>> not initialized yet.
>
> What if we were to instead skip the SBLABEL_MNT check in
> selinux_inode_setxattr() if !ss_initialized?  Not dependent on
> filesystem type.

Stephen, thank you for looking into this. Sorry, for dealyed reponse -
I needed to find time to require context about these changes.

As you suggested I've tried this and it works:

>From 6bf35bd055fdb12e94f3d5188eccfdbaa30dbcf4 Mon Sep 17 00:00:00 2001
From: Victor Kamensky <kamensky@cisco.com>
Date: Fri, 9 Mar 2018 23:01:20 -0800
Subject: [PATCH 1/2] selinux: allow setxattr on file systems if policy is not
  loaded

initramfs code supporting extended cpio format have ability to
fill extended attributes from cpio archive, but if SELinux enabled
and security server is not initialized yet, selinux callback would
refuse setxattr made by initramfs code because file system is not
yet marked as one that support labeling (SBLABEL_MNT flag).

Solution do not refuse setxattr even if SBLABEL_MNT is not set
for file systems when policy is not loaded yet.

Signed-off-by: Victor Kamensky <kamensky@cisco.com>
---
  security/selinux/hooks.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 819fd68..31303ed 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3120,7 +3120,7 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
  		return selinux_inode_setotherxattr(dentry, name);

  	sbsec = inode->i_sb->s_security;
-	if (!(sbsec->flags & SBLABEL_MNT))
+	if (!(sbsec->flags & SBLABEL_MNT) && ss_initialized)
  		return -EOPNOTSUPP;

  	if (!inode_owner_or_capable(inode))
-- 
2.7.4

But with this change it would mean for that filesystem types, that
never are supposed to get SBLABEL_MNT flag, code may go through
if !ss_initialized. I have hard time evaluating impication of this,
but it is not existing case or not a big deal.

Generally I agree with your concern that the issue is not "rootfs"
specific. Other thought that it could be solved with use of
selinux_is_sblabel_mnt instead of "rootfs" specific check inside
of selinux_set_mnt_opts, in addition to similar code
in sb_finish_set_opts function. I.e something like this:

>From a94548b5ecde43ccc9c2b02b29becc086b4343a3 Mon Sep 17 00:00:00 2001
From: Victor Kamensky <kamensky@cisco.com>
Date: Fri, 9 Mar 2018 23:01:20 -0800
Subject: [PATCH 1/2] selinux: allow setxattr on rootfs so initramfs code can
  set them

initramfs code supporting extended cpio format have ability to
fill extended attributes from cpio archive, but if SELinux enabled
and security server is not initialized yet, selinux callback would
refuse setxattr made by initramfs code.

Solution enable set SBLABEL_MNT on file systems for which we can
figure out that they support securit labels even if secrurity
server is not initialized yet.

Signed-off-by: Victor Kamensky <kamensky@cisco.com>
---
  security/selinux/hooks.c | 12 ++++++++++++
  1 file changed, 12 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 819fd68..326aca9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -701,6 +701,18 @@ static int selinux_set_mnt_opts(struct super_block *sb,

  	if (!ss_initialized) {
  		if (!num_opts) {
+                        /*
+                         * For some of file systems we can mark them as
+                         * supporting security labels even before policy
+                         * loaded. It may be used by code that may want
+                         * to do setxatts before polict load.
+                         *
+                         * Note after policy loaded this check and marking
+                         * happens again.
+                         */
+                        if (selinux_is_sblabel_mnt(sb))
+                                sbsec->flags |= SBLABEL_MNT;
+
  			/* Defer initialization until selinux_complete_init,
  			   after the initial policy is loaded and the security
  			   server is ready to handle calls. */
-- 
2.7.4

Thanks,
Victor

>>
>> Signed-off-by: Victor Kamensky <kamensky@cisco.com>
>> ---
>>  security/selinux/hooks.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 8644d864e3c1..f3fe65589f02 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -706,6 +706,18 @@ static int selinux_set_mnt_opts(struct
>> super_block *sb,
>>
>>  	if (!ss_initialized) {
>>  		if (!num_opts) {
>> +			/*
>> +			 * Special handling for rootfs. Is genfs but
>> supports
>> +			 * setting SELinux context on in-core
>> inodes.
>> +			 *
>> +			 * Chicken and egg problem: policy may
>> reside in rootfs
>> +			 * but for initramfs code to fill in
>> attributes, it
>> +			 * needs selinux to allow that.
>> +			 */
>> +			if (!strncmp(sb->s_type->name, "rootfs",
>> +				     sizeof("rootfs")))
>> +				sbsec->flags |= SBLABEL_MNT;
>> +
>>  			/* Defer initialization until
>> selinux_complete_init,
>>  			   after the initial policy is loaded and
>> the security
>>  			   server is ready to handle calls. */
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v3 15/15] selinux: delay sid population for rootfs till init is complete
From: Victor Kamensky @ 2018-03-11  3:08 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Taras Kondratiuk, H. Peter Anvin, Al Viro, Arnd Bergmann,
	Rob Landley, Mimi Zohar, Jonathan Corbet, James McMechan,
	initramfs, linux-doc, linux-kernel, linux-security-module,
	xe-linux-external, Paul Moore, Eric Paris
In-Reply-To: <1519152994.14218.15.camel@tycho.nsa.gov>



On Tue, 20 Feb 2018, Stephen Smalley wrote:

> On Fri, 2018-02-16 at 20:33 +0000, Taras Kondratiuk wrote:
>> From: Victor Kamensky <kamensky@cisco.com>
>>
>> With initramfs cpio format that supports extended attributes
>> we need to skip sid population on sys_lsetxattr call from
>> initramfs for rootfs if security server is not initialized yet.
>>
>> Otherwise callback in selinux_inode_post_setxattr will try to
>> translate give security.selinux label into sid context and since
>> security server is not available yet inode will receive default
>> sid (typically kernel_t). Note that in the same time proper
>> label will be stored in inode xattrs. Later, since inode sid
>> would be already populated system will never look back at
>> actual xattrs. But if we skip sid population for rootfs and
>> we have policy that direct use of xattrs for rootfs, proper
>> sid will be filled in from extended attributes one node is
>> accessed and server is initialized.
>>
>> Note new DELAYAFTERINIT_MNT super block flag is introduced
>> to only mark rootfs for such behavior. For other types of
>> tmpfs original logic is still used.
>
> (cc selinux maintainers)
>
> Wondering if we shouldn't just do this always, for all filesystem
> types.

Ok, I think it makes sense. The one that do not support xattrs
will not reach selinux_inode_post_setxattr anyway. And try
to cache sid while !ss_initialized is not good idea for any
filesystem types.

> Also, I think this should likely also be done in
> selinux_inode_setsecurity() for consistency.

I am not sure that I follow selinux_inode_setsecurity suggestion.
selinux_inode_setsecurity is about permission check. And
selinux_inode_post_setxattr deals with processing and setting
side effects if xattr was "security.selinux", it does not
matter what happens in selinux_inode_setsecurity if it
returns access_ok, LSM will still call selinux_inode_post_setxattr
and we would need to check and not produce any sid caching
side effects if !ss_initialized.

Sitll keeping logic in selinux_inode_post_setxattr, checked
that the following with much simple code works too:

>From bfc54e4805f3059671417ff2cda1266bc68e18f9 Mon Sep 17 00:00:00 2001
From: Victor Kamensky <kamensky@cisco.com>
Date: Fri, 9 Mar 2018 23:06:08 -0800
Subject: [PATCH 2/2] selinux: delay sid population in setxattr till policy
  loaded

With initramfs cpio format that supports extended attributes
we need to skip sid population on sys_lsetxattr call from
initramfs for rootfs if security server is not initialized yet.

Otherwise callback in selinux_inode_post_setxattr will try to
translate give security.selinux label into sid context and since
security server is not available yet inode will receive default
sid (typically kernel_t). Note that in the same time proper
label will be stored in inode xattrs. Later, since inode sid
would be already populated system will never look back at
actual xattrs. But if we skip sid population for rootfs and
we have policy that direct use of xattrs for rootfs, proper
sid will be filled in from extended attributes one node is
accessed and server is initialized.

Signed-off-by: Victor Kamensky <kamensky@cisco.com>
---
  security/selinux/hooks.c | 4 ++++
  1 file changed, 4 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 31303ed..4c13759 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3197,6 +3197,10 @@ static void selinux_inode_post_setxattr(struct dentry *dentry, const char *name,
  		return;
  	}

+        if (!ss_initialized) {
+                return;
+        }
+
  	rc = security_context_to_sid_force(value, size, &newsid);
  	if (rc) {
  		printk(KERN_ERR "SELinux:  unable to map context to SID"
-- 
2.7.4

Thanks,
Victor

>>
>> Signed-off-by: Victor Kamensky <kamensky@cisco.com>
>> ---
>>  security/selinux/hooks.c            | 9 ++++++++-
>>  security/selinux/include/security.h | 1 +
>>  2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index f3fe65589f02..bb25268f734e 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -716,7 +716,7 @@ static int selinux_set_mnt_opts(struct
>> super_block *sb,
>>  			 */
>>  			if (!strncmp(sb->s_type->name, "rootfs",
>>  				     sizeof("rootfs")))
>> -				sbsec->flags |= SBLABEL_MNT;
>> +				sbsec->flags |=
>> SBLABEL_MNT|DELAYAFTERINIT_MNT;
>>
>>  			/* Defer initialization until
>> selinux_complete_init,
>>  			   after the initial policy is loaded and
>> the security
>> @@ -3253,6 +3253,7 @@ static void selinux_inode_post_setxattr(struct
>> dentry *dentry, const char *name,
>>  {
>>  	struct inode *inode = d_backing_inode(dentry);
>>  	struct inode_security_struct *isec;
>> +	struct superblock_security_struct *sbsec;
>>  	u32 newsid;
>>  	int rc;
>>
>> @@ -3261,6 +3262,12 @@ static void selinux_inode_post_setxattr(struct
>> dentry *dentry, const char *name,
>>  		return;
>>  	}
>>
>> +	if (!ss_initialized) {
>> +		sbsec = inode->i_sb->s_security;
>> +		if (sbsec->flags & DELAYAFTERINIT_MNT)
>> +			return;
>> +	}
>> +
>>  	rc = security_context_to_sid_force(value, size, &newsid);
>>  	if (rc) {
>>  		printk(KERN_ERR "SELinux:  unable to map context to
>> SID"
>> diff --git a/security/selinux/include/security.h
>> b/security/selinux/include/security.h
>> index 02f0412d42f2..585acfd6cbcf 100644
>> --- a/security/selinux/include/security.h
>> +++ b/security/selinux/include/security.h
>> @@ -52,6 +52,7 @@
>>  #define ROOTCONTEXT_MNT	0x04
>>  #define DEFCONTEXT_MNT	0x08
>>  #define SBLABEL_MNT	0x10
>> +#define DELAYAFTERINIT_MNT 0x20
>>  /* Non-mount related flags */
>>  #define SE_SBINITIALIZED	0x0100
>>  #define SE_SBPROC		0x0200
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v2 2/2] iio: chemical: sgp30: Support Sensirion SGPxx sensors
From: Jonathan Cameron @ 2018-03-11 11:42 UTC (permalink / raw)
  To: Andreas Brauchli
  Cc: Hartmut Knaack, Lars-Peter Clausen, Peter Meerwald-Stadler,
	Rob Herring, Mark Rutland, Jonathan Corbet, linux-iio,
	linux-kernel, devicetree, linux-doc
In-Reply-To: <58a6dcdc4097d60e7bcee61ec1d0cbdf8ad3ae49.camel@elementarea.net>

On Sat, 10 Mar 2018 23:07:30 +0100
Andreas Brauchli <a.brauchli@elementarea.net> wrote:

> Support Sensirion SGP30 and SGPC3 multi-pixel I2C gas sensors
> 
> Supported Features:
> 
> * Indoor Air Quality (IAQ) concentrations for
>   - tVOC (in_concentration_voc_input)
>   - CO2eq (in_concentration_co2_input) - SGP30 only
>   IAQ concentrations are periodically read out by a background thread
>   to allow the sensor to maintain its internal baseline.
> * Baseline support for IAQ (in_iaq_baseline, set_iaq_baseline)
> * Gas concentration signals
>   - Ethanol (in_concentration_ethanol_raw)
>   - H2 (in_concentration_h2_raw) - SGP30 only
> * On-chip self test (in_selftest)
> * Sensor interface version (in_feature_set_version)
> * Sensor serial number (in_serial_id)
> * Humidity compensation
>   With the help of an external humidity signal, the gas signals can be
>   humidity-compensated. The sensor performs the compensation on-chip.
> * Power mode switching between low power mode (1/2Hz) and
>   ultra low power mode (1/30Hz) sampling - SGPC3 only
> * Checksummed I2C communication
> 
> For all features, refer to the data sheet or the documentation in
> Documentation/iio/chemical/sgp30.txt for more details.
> 
> The ABI is documented in
> Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
> 
> Signed-off-by: Andreas Brauchli <andreas.brauchli@sensirion.com>
Hi Andreas,

This is a fairly superficial review still as we still need to pin down
some of the userspace ABI elements rather more.
Find details in the code can wait for a later revision.

Jonathan
> ---
>  .../ABI/testing/sysfs-bus-iio-chemical-sgp30       |   62 ++
>  .../bindings/iio/chemical/sensirion,sgp30.txt      |   15 +
>  Documentation/iio/chemical/sgp30.txt               |   97 ++
>  drivers/iio/chemical/Kconfig                       |   13 +
>  drivers/iio/chemical/Makefile                      |    1 +
>  drivers/iio/chemical/sgp30.c                       | 1120 ++++++++++++++++++++
>  6 files changed, 1308 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
>  create mode 100644 Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt
>  create mode 100644 Documentation/iio/chemical/sgp30.txt
>  create mode 100644 drivers/iio/chemical/sgp30.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30 b/Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
> new file mode 100644
> index 000000000000..3b97c0bd5878
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-iio-chemical-sgp30
> @@ -0,0 +1,62 @@
> +what:		/sys/bus/iio/devices/iio:devicex/in_featureset_version
> +date:		March 2018
> +kernelversion:	4.17
> +contact:	andreas brauchli <andreas.brauchli@sensirion.com>
> +description:
> +		Retrieve the sensor interface version. The sensor-interface should
> +		remain forward-compatible across minor versions.
I'm really not keen on this.  What features are available should be apparent
from the userspace interface.

We are always interested in being able to use generic userspace code.
As such anything that requires checking a version number will inherently
mean we can't do generic code.

> +
> +what:		/sys/bus/iio/devices/iio:devicex/in_iaq_baseline
> +date:		March 2018
> +kernelversion:	4.17
> +contact:	andreas brauchli <andreas.brauchli@sensirion.com>
> +description:
> +		Retrieve the on-chip iaq baseline state. the baseline is an internal
> +		state that should not be modified.
Please indicate what use this is (becomes somewhat apparent from the below).

> +
> +What:		/sys/bus/iio/devices/iio:deviceX/set_iaq_baseline
> +Date:		March 2018
> +KernelVersion:	4.17
> +Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
> +Description:
> +		Restore the on-chip IAQ baseline state to a previous state. Useful for
> +		fast-resuming after a restart. A baseline is valid for one week when the
> +		sensor is not operated.
Why separate attributes to set this from that used to read it?  Just have one and
call it simply iaq_baseline.

> +
> +What:		/sys/bus/iio/devices/iio:deviceX/in_selftest
> +Date:		March 2018
> +KernelVersion:	4.17
> +Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
> +Description:
> +		Run the on-chip self test. Output values:
> +		* OK
> +		* FAILED
Usual approach on this is to do it on module probe and not otherwise.
The advantage is that we don't have a userspace interface that is difficult
for userspace applications to interpret (self tests are extremely varied
in what the return).   If there is a strong reason to do this during
normal use then let use know.
Also, it's not an input channel so selftest would be the correct name.

> +
> +What:		/sys/bus/iio/devices/iio:deviceX/in_serial_id
> +Date:		March 2018
> +KernelVersion:	4.17
> +Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
> +Description:
> +		Retrieve the unique sensor serial number. May be useful to associate
> +		with the IAQ baseline when the baseline is stored remotely or the sensor
> +		is on a plug-and-play device.
Ok, but it's not an input. Mostly we have previously had these accessible via
debugfs or printed to the log.
Probably go with the more common terminology of serial_number though.

> +
Umm. I think this one lost the What part ;)

> +Date:		March 2018
> +KernelVersion:	4.17
> +Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
> +Description:
> +		Set the absolute humidity in milligrams per cubic meter (mg/m^3) to
> +		enable on-sensor humidity compensation. Affects all gas signals and IAQ
> +		values. Accepted values:
> +		* 0 (disables humidity compensation)
> +		* 1..256000 (mg/m^3)
> +
> +What:		/sys/bus/iio/devices/iio:deviceX/set_power_mode
> +Date:		March 2018
> +KernelVersion:	4.17
> +Contact:	Andreas Brauchli <andreas.brauchli@sensirion.com>
> +Description:
> +		Change the power mode on an SGPC3 with ultra-low power mode support.
> +		Accepted values:
> +		* "low" (default/startup mode)
> +		* "ultra-low" (default/startup mode)
As ever with power modes (this one comes up a lot) how does userspace know
what to do with it?

I discuss this below - I think what we actually care about is the resulting
sampling frequency (from a userspace point of view)

> diff --git a/Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt b/Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt
> new file mode 100644
> index 000000000000..bb909b0f1aba
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iio/chemical/sensirion,sgp30.txt
> @@ -0,0 +1,15 @@
> +* Sensirion SGPxx multi-pixel Gas Sensor
> +
> +Required properties:
> +
> +  - compatible: must be one of
> +    "sensirion,sgp30"
> +    "sensirion,sgpc3"
> +  - reg: the I2C address of the sensor
> +
> +Example:
> +
> +sgpxx@58 {
> +	compatible = "sensirion,sgp30";
> +	reg = <0x58>;
> +};
Device tree bindings should be a separate patch to make it easy for the devicetree maintainers
to review.

> diff --git a/Documentation/iio/chemical/sgp30.txt b/Documentation/iio/chemical/sgp30.txt
> new file mode 100644
> index 000000000000..c236f7d0fa80
> --- /dev/null
> +++ b/Documentation/iio/chemical/sgp30.txt
> @@ -0,0 +1,97 @@
> +sgp30: Industrial IO driver for Sensirion I2C Multi-Pixel Gas Sensors
> +
> +1. Overview
> +
> +The sgp30 driver supports the Sensirion SGP30 and SGPC3 multi-pixel gas sensors.
> +
> +2. Modes of Operation
> +
> +Postprocessed Indoor Air Quality (IAQ) gas concentrations as well as raw gas
> +signals are available. Both signals can be humidity-compensated by the sensor.
> +
> +2.1. IAQ Gas Concentrations
> +
> +* tVOC (in_concentration_voc_input) at ppb precision (1e-9)
> +* CO2eq (in_concentration_co2_input) at ppm precision (1e-6) -- SGP30 only

> +
> +2.1.1. IAQ Initialization
> +Before Indoor Air Quality (IAQ) values can be read, the IAQ mode must be
> +initialized by sending the "iaq_init" command from the data sheet to the sensor.
> +The sgp30 driver performs this initialization automatically on instantiation.
> +
> +After initializing IAQ, at least one IAQ signal must be read out every second
> +(SGP30) / every two seconds (SGPC3) for the sensor to correctly maintain its
> +internal baseline. The driver continuously performs this task in a background
> +thread and caches the latest measurement for retrieval.
> +
> +For the first 15s of operation after "iaq_init", default values are returned by
> +the sensor. Default values are not returned by the driver, instead an -EBUSY
> +error is returned.
Good - that's a sensible move.

> +
> +2.1.2. Pausing and Resuming IAQ
> +
> +For best performance and faster subsequent startup times, the baseline should be
> +saved once every hour, after 12h of operation. The baseline is restored by
> +writing the last known baseline back to set_iaq_baseline.
> +
> +    Saving the baseline:
> +    $ baseline=$(cat in_iaq_baseline)
> +
> +    Restoring the baseline:
> +    $ echo -n $baseline > set_iaq_baseline
As mentioned above, separate files for read and write makes no sense.
Use one for both.

> +
> +2.2. Humidity Compensation
> +
> +The SGP features an on-chip humidity compensation that requires the (in-device)
> +environment's absolute humidity.
> +
> +Set the absolute humidity by writing the absolute humidity concentration (in
> +mg/m^3) to set_absolute_humidity. The absolute humidity is obtained by
> +converting the relative humidity and temperature. The following units are used:
> +AH in mg/m^3, RH in percent (0..100), T in degrees Celsius, and exp() being the
> +base-e exponential function.
> +
> +                  RH            exp(17.62 * T)
> +                ----- * 6.112 * --------------
> +                100.0            243.12 + T
> +  AH = 216.7 * ------------------------------- * 1000
> +                        273.15 + T
> +
> +Writing a value of 0 to set_absolute_humidity disables the humidity
> +compensation. Devices not supporting humidity compensation (SGPC3 with feature
> +set < 0.6) will report an error (-EINVAL) when setting the humidity.
Devices not supporting humidity should not provide a sysfs file to write it.
That should be all the indication needed to establish this function doesn't
exist.

> +
> +2.3. On-chip self test
> +
> +    $ cat in_selftest
> +
> +in_selftest returns OK or FAILED.
> +
> +On the SGP30, self test interferes with IAQ operations and requires a
> +re-initialization thereof. If a valid baseline is available (after 12h of
> +operation), the driver will take care of resuming the operation after the
> +selftest, otherwise the "iaq_init" procedure will be restarted.
> +
> +2.4 SGPC3 Low Power Gas Sensor Features
> +
> +2.4.1 Accelerated Startup
> +
> +The SGPC3 can be pre-heated differently by writing a duration to the
> +set_iaq_preheat_seconds attribute. Pre-heating causes the sensor to draw more
Get rid of the set - writing to a file makes this obvious.  Reading that file
should tell us what it is set to.


> +power but results in more accurate readings, thus also marketed as accelerated
> +startup mode. Sensors prior to feature set 0.6 allow pre-heating durations of 0,
> +16, 64 and 184s (other values rounded to the next higher setting). Feature set
> +0.6 allows pre-heating for all integer durations up to 300s (a software limit).
I hope there is an available attribute to let userspace know what options exist?

> +
> +Pre-heating occurs on IAQ initialization, thus after changing the power mode or
> +setting an IAQ baseline. When the sensor has been operated for a duration of
> +12h or more, it is not necessary to pre-heat after short down periods (e.g.
> +reboots of < 5min.)
> +
> +2.4.2 Ultra Low Power Mode (SGPC3 Feature Set 0.6+)
> +
> +The SGPC3 with feature set 0.6+ features an ultra-low-power mode in which the
> +sensor is only polled every 30s instead of every 2s. The polling interval in the
> +background thread is adjusted by the driver when changing the power mode with
> +set_power_mode. Supported options: {"low", "ultra-low"}. Changing the power mode
> +on sensors without power-mode switching results in an -EINVAL error.

That is hard to support cleanly as any generic userspace will have no idea of
what to do with this.  Hmm. We'll have to think about how to do this cleanly.
Could do it via sampling_frequency as I think that is what userspace will care
about.

> diff --git a/drivers/iio/chemical/Kconfig b/drivers/iio/chemical/Kconfig
> index 5cb5be7612b4..6f0e2d1240bd 100644
> --- a/drivers/iio/chemical/Kconfig
> +++ b/drivers/iio/chemical/Kconfig
> @@ -38,6 +38,19 @@ config IAQCORE
>  	  iAQ-Core Continuous/Pulsed VOC (Volatile Organic Compounds)
>  	  sensors
>  
> +config SENSIRION_SGP30
> +	tristate "Sensirion SGPxx gas sensors"
> +	depends on I2C
> +	select CRC8
> +	help
> +	  Say Y here to build I2C interface support for the following
> +	  Sensirion SGP gas sensors:
> +	    * SGP30 gas sensor
> +	    * SGPC3 low power gas sensor
> +
> +	  To compile this driver as module, choose M here: the
> +	  module will be called sgp30.
> +
>  config VZ89X
>  	tristate "SGX Sensortech MiCS VZ89X VOC sensor"
>  	depends on I2C
> diff --git a/drivers/iio/chemical/Makefile b/drivers/iio/chemical/Makefile
> index a629b29d1e0b..9aeb3fce1408 100644
> --- a/drivers/iio/chemical/Makefile
> +++ b/drivers/iio/chemical/Makefile
> @@ -6,4 +6,5 @@
>  obj-$(CONFIG_ATLAS_PH_SENSOR)	+= atlas-ph-sensor.o
>  obj-$(CONFIG_CCS811)		+= ccs811.o
>  obj-$(CONFIG_IAQCORE)		+= ams-iaq-core.o
> +obj-$(CONFIG_SENSIRION_SGP30)	+= sgp30.o
>  obj-$(CONFIG_VZ89X)		+= vz89x.o
> diff --git a/drivers/iio/chemical/sgp30.c b/drivers/iio/chemical/sgp30.c
> new file mode 100644
> index 000000000000..f336a89ad906
> --- /dev/null
> +++ b/drivers/iio/chemical/sgp30.c
> @@ -0,0 +1,1120 @@
> +/*
> + * sgp30.c - Support for Sensirion SGP Gas Sensors
> + *
> + * Copyright (C) 2017 Andreas Brauchli <andreas.brauchli@sensirion.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
Consider changing to SPDX license identifiers rather than the standard block
of text.

> + *
> + * Datasheets:
> + * https://www.sensirion.com/file/datasheet_sgp30
> + * https://www.sensirion.com/file/datasheet_sgpc3
> + */
> +
> +#include <linux/crc8.h>
> +#include <linux/delay.h>
> +#include <linux/kthread.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/init.h>
> +#include <linux/i2c.h>
> +#include <linux/of_device.h>
> +#include <linux/iio/iio.h>
> +#include <linux/iio/buffer.h>
> +#include <linux/iio/sysfs.h>
> +#include <linux/string.h>
> +#include <linux/version.h>
> +
> +#define SGP_WORD_LEN				2
> +#define SGP_CRC8_POLYNOMIAL			0x31
> +#define SGP_CRC8_INIT				0xff
> +#define SGP_CRC8_LEN				1
> +#define SGP_CMD(cmd_word)			cpu_to_be16(cmd_word)
> +#define SGP_CMD_DURATION_US			12000
> +#define SGP_MEASUREMENT_DURATION_US		50000
> +#define SGP_SELFTEST_DURATION_US		220000
> +#define SGP_CMD_HANDLING_DURATION_US		10000
> +#define SGP_CMD_LEN				SGP_WORD_LEN
> +#define SGP_CMD_MAX_BUF_SIZE			(SGP_CMD_LEN + 2 * SGP_WORD_LEN)
> +#define SGP_MEASUREMENT_LEN			2
> +#define SGP30_MEASURE_INTERVAL_HZ		1
> +#define SGPC3_MEASURE_INTERVAL_HZ		2
> +#define SGPC3_MEASURE_ULP_INTERVAL_HZ		30
> +#define SGPC3_POWER_MODE_ULTRA_LOW_POWER	0
> +#define SGPC3_POWER_MODE_LOW_POWER		1
> +#define SGPC3_DEFAULT_IAQ_INIT_DURATION_HZ	64
> +#define SGP_SELFTEST_OK				0xd400
> +#define SGP_VERS_PRODUCT(data)	((((data)->feature_set) & 0xf000) >> 12)
> +#define SGP_VERS_RESERVED(data)	((((data)->feature_set) & 0x0e00) >> 9)
> +#define SGP_VERS_ENG_BIT(data)	((((data)->feature_set) & 0x0100) >> 8)
> +#define SGP_VERS_MAJOR(data)	((((data)->feature_set) & 0x00e0) >> 5)
> +#define SGP_VERS_MINOR(data)	(((data)->feature_set) & 0x001f)
> +
> +DECLARE_CRC8_TABLE(sgp_crc8_table);
> +
> +enum sgp_product_id {
> +	SGP30 = 0,
> +	SGPC3,
> +};
> +
> +enum sgp30_channel_idx {
> +	SGP30_IAQ_TVOC_IDX = 0,
> +	SGP30_IAQ_CO2EQ_IDX,
> +	SGP30_SIG_ETOH_IDX,
> +	SGP30_SIG_H2_IDX,
> +};
> +
> +enum sgpc3_channel_idx {
> +	SGPC3_IAQ_TVOC_IDX = 10,
> +	SGPC3_SIG_ETOH_IDX,
> +};
> +
> +enum sgp_cmd {
> +	SGP_CMD_IAQ_INIT		= SGP_CMD(0x2003),
> +	SGP_CMD_IAQ_MEASURE		= SGP_CMD(0x2008),
> +	SGP_CMD_GET_BASELINE		= SGP_CMD(0x2015),
> +	SGP_CMD_SET_BASELINE		= SGP_CMD(0x201e),
> +	SGP_CMD_GET_FEATURE_SET		= SGP_CMD(0x202f),
> +	SGP_CMD_GET_SERIAL_ID		= SGP_CMD(0x3682),
> +	SGP_CMD_SELFTEST		= SGP_CMD(0x2032),
> +	SGP_CMD_SET_ABSOLUTE_HUMIDITY	= SGP_CMD(0x2061),
> +
> +	SGP30_CMD_MEASURE_SIGNAL	= SGP_CMD(0x2050),
> +
> +	SGPC3_CMD_IAQ_INIT_0		= SGP_CMD(0x2089),
> +	SGPC3_CMD_IAQ_INIT_16		= SGP_CMD(0x2024),
> +	SGPC3_CMD_IAQ_INIT_64		= SGP_CMD(0x2003),
> +	SGPC3_CMD_IAQ_INIT_184		= SGP_CMD(0x206a),
> +	SGPC3_CMD_IAQ_INIT_CONTINUOUS	= SGP_CMD(0x20ae),
> +	SGPC3_CMD_MEASURE_RAW		= SGP_CMD(0x2046),
> +	SGPC3_CMD_SET_POWER_MODE	= SGP_CMD(0x209f),
> +};
> +
> +struct sgp_version {
> +	u8 major;
> +	u8 minor;
> +};
> +
> +struct sgp_crc_word {
> +	__be16 value;
> +	u8 crc8;
> +} __attribute__((__packed__));
> +
> +union sgp_reading {
> +	u8 start;
> +	struct sgp_crc_word raw_words[4];
> +};
> +
> +enum _iaq_buffer_state {
> +	IAQ_BUFFER_EMPTY = 0,
> +	IAQ_BUFFER_DEFAULT_VALS,
> +	IAQ_BUFFER_VALID,
> +};
> +
> +struct sgp_data {
> +	struct i2c_client *client;
> +	struct task_struct *iaq_thread;
> +	struct mutex data_lock;
> +	unsigned long iaq_init_start_jiffies;
> +	unsigned long iaq_init_duration_jiffies;
> +	unsigned long iaq_defval_skip_jiffies;
> +	u64 serial_id;
> +	void (*iaq_init_fn)(struct sgp_data *data);
> +	u32 iaq_init_user_duration;
> +	u16 product_id;
> +	u16 feature_set;
> +	unsigned long measure_interval_jiffies;
> +	enum sgp_cmd iaq_init_cmd;
> +	enum sgp_cmd measure_iaq_cmd;
> +	enum sgp_cmd measure_gas_signals_cmd;
> +	u8 baseline_len;
> +	u16 user_baseline[2];
> +	u16 power_mode;
> +	union sgp_reading buffer;
> +	union sgp_reading iaq_buffer;
> +	enum _iaq_buffer_state iaq_buffer_state;
> +	bool supports_humidity_compensation;
> +	bool supports_power_mode;
> +};
> +
> +struct sgp_device {
> +	const struct iio_chan_spec *channels;
> +	int num_channels;
> +};
> +
> +static const struct sgp_version supported_versions_sgp30[] = {
> +	{
> +		.major = 1,
> +		.minor = 0,
> +	},
> +};
> +
> +static const struct sgp_version supported_versions_sgpc3[] = {
> +	{
> +		.major = 0,
> +		.minor = 4,
> +	},
> +};
> +
> +static const struct iio_chan_spec sgp30_channels[] = {
> +	{
> +		.type = IIO_CONCENTRATION,
> +		.channel2 = IIO_MOD_VOC,
> +		.datasheet_name = "TVOC signal",
> +		.modified = 1,
> +		.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
> +		.address = SGP30_IAQ_TVOC_IDX,
> +	},
> +	{
> +		.type = IIO_CONCENTRATION,
> +		.channel2 = IIO_MOD_CO2,
> +		.datasheet_name = "CO2eq signal",
> +		.modified = 1,
> +		.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
> +		.address = SGP30_IAQ_CO2EQ_IDX,
> +	},
> +	{
> +		.type = IIO_CONCENTRATION,
> +		.channel2 = IIO_MOD_ETHANOL,
> +		.info_mask_separate =
> +			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
> +		.address = SGP30_SIG_ETOH_IDX,
> +		.datasheet_name = "Ethanol signal",
> +		.modified = 1,
> +		.scan_type = {
> +			.endianness = IIO_BE,
> +		},
> +	},
> +	{
> +		.type = IIO_CONCENTRATION,
> +		.channel2 = IIO_MOD_H2,
> +		.info_mask_separate =
> +			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
> +		.address = SGP30_SIG_H2_IDX,
> +		.datasheet_name = "H2 signal",
> +		.modified = 1,
> +		.scan_type = {
> +			.endianness = IIO_BE,
No need to specify this unless you are supporting buffered output
(which makes no sense for a slow sensor)
> +		},
> +	},
> +};
> +
> +static const struct iio_chan_spec sgpc3_channels[] = {
> +	{
> +		.type = IIO_CONCENTRATION,
> +		.channel2 = IIO_MOD_VOC,
> +		.datasheet_name = "TVOC signal",
This is usually only worth providing if you are expecting in kernel
users which I doubt will occur here.  Name doesn't really provide
any additional information.

> +		.modified = 1,
> +		.info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
> +		.address = SGPC3_IAQ_TVOC_IDX,
> +	},
> +	{
> +		.type = IIO_CONCENTRATION,
> +		.channel2 = IIO_MOD_ETHANOL,
> +		.info_mask_separate =
> +			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
> +		.address = SGPC3_SIG_ETOH_IDX,
> +		.datasheet_name = "Ethanol signal",
> +		.modified = 1,
> +		.scan_type = {
> +			.endianness = IIO_BE,
> +		},
> +	},
> +};
> +
> +static const struct sgp_device sgp_devices[] = {
> +	[SGP30] = {
> +		.channels = sgp30_channels,
> +		.num_channels = ARRAY_SIZE(sgp30_channels),
> +	},
> +	[SGPC3] = {
> +		.channels = sgpc3_channels,
> +		.num_channels = ARRAY_SIZE(sgpc3_channels),
> +	},
> +};
> +
> +/**
> + * sgp_verify_buffer() - verify the checksums of the data buffer words
> + *
> + * @data:       SGP data
> + * @buf:        Raw data buffer
> + * @word_count: Num data words stored in the buffer, excluding CRC bytes
> + *
> + * Return:      0 on success, negative error otherwise.
> + */
> +static int sgp_verify_buffer(const struct sgp_data *data,
> +			     union sgp_reading *buf, size_t word_count)
> +{
> +	size_t size = word_count * (SGP_WORD_LEN + SGP_CRC8_LEN);
> +	int i;
> +	u8 crc;
> +	u8 *data_buf = &buf->start;
> +
> +	for (i = 0; i < size; i += SGP_WORD_LEN + SGP_CRC8_LEN) {
> +		crc = crc8(sgp_crc8_table, &data_buf[i], SGP_WORD_LEN,
> +			   SGP_CRC8_INIT);
> +		if (crc != data_buf[i + SGP_WORD_LEN]) {
> +			dev_err(&data->client->dev, "CRC error\n");
> +			return -EIO;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
> + * sgp_read_cmd() - reads data from sensor after issuing a command
> + * The caller must hold data->data_lock for the duration of the call.
> + * @data:        SGP data
> + * @cmd:         SGP Command to issue
> + * @buf:         Raw data buffer to use
> + * @word_count:  Num words to read, excluding CRC bytes
> + *
> + * Return:       0 on success, negative error otherwise.
> + */
> +static int sgp_read_cmd(struct sgp_data *data, enum sgp_cmd cmd,
> +			union sgp_reading *buf, size_t word_count,
> +			unsigned long duration_us)
> +{
> +	int ret;
> +	struct i2c_client *client = data->client;
> +	size_t size = word_count * (SGP_WORD_LEN + SGP_CRC8_LEN);
> +	u8 *data_buf;
> +
> +	ret = i2c_master_send(client, (const char *)&cmd, SGP_CMD_LEN);
> +	if (ret != SGP_CMD_LEN)
> +		return -EIO;
> +	usleep_range(duration_us, duration_us + 1000);
> +
> +	if (word_count == 0)
> +		return 0;
> +
> +	data_buf = &buf->start;
> +	ret = i2c_master_recv(client, data_buf, size);
> +	if (ret < 0)
> +		return ret;
> +	if (ret != size)
> +		return -EIO;
> +
> +	return sgp_verify_buffer(data, buf, word_count);
> +}
> +
> +/**
> + * sgp_i2c_write_cmd() - write data to SGP sensor with a command
> + * @data:       SGP data
> + * @cmd:        SGP Command to issue
> + * @buf:        Data to write
> + * @buf_size:   Data size of the buffer
> + *
> + * Return:      0 on success, negative error otherwise.
> + */
> +static int sgp_write_cmd(struct sgp_data *data, enum sgp_cmd cmd, u16 *buf,
> +			 size_t buf_size, unsigned long duration_us)
> +{
> +	int ret, ix;
> +	u16 buf_idx;
> +	u8 buffer[SGP_CMD_MAX_BUF_SIZE];
> +
> +	/* assemble buffer */
> +	*((__be16 *)&buffer[0]) = cmd;
> +	buf_idx = SGP_CMD_LEN;
> +	for (ix = 0; ix < buf_size; ix++) {
> +		*((__be16 *)&buffer[buf_idx]) = cpu_to_be16(buf[ix]);
> +		buf_idx += SGP_WORD_LEN;
> +		buffer[buf_idx] = crc8(sgp_crc8_table,
> +				       &buffer[buf_idx - SGP_WORD_LEN],
> +				       SGP_WORD_LEN, SGP_CRC8_INIT);
> +		buf_idx += SGP_CRC8_LEN;
> +	}
> +	ret = i2c_master_send(data->client, buffer, buf_idx);
> +	if (ret != buf_idx)
> +		return -EIO;
> +	ret = 0;
> +	usleep_range(duration_us, duration_us + 1000);
> +
> +	return ret;
> +}
> +
> +/**
> + * sgp_measure_iaq() - measure and retrieve IAQ values from sensor
> + * The caller must hold data->data_lock for the duration of the call.
> + * @data:       SGP data
> + *
> + * Return:      0 on success, -EBUSY on default values, negative error
> + *              otherwise.
> + */
> +
> +static int sgp_measure_iaq(struct sgp_data *data)
> +{
> +	int ret;
> +	/* data contains default values */
> +	bool default_vals = !time_after(jiffies, data->iaq_init_start_jiffies +
> +						 data->iaq_defval_skip_jiffies);
> +
> +	ret = sgp_read_cmd(data, data->measure_iaq_cmd, &data->iaq_buffer,
> +			   SGP_MEASUREMENT_LEN, SGP_MEASUREMENT_DURATION_US);
> +	if (ret < 0)
> +		return ret;
> +
> +	data->iaq_buffer_state = IAQ_BUFFER_DEFAULT_VALS;
> +
> +	if (default_vals)
> +		return -EBUSY;
> +
> +	data->iaq_buffer_state = IAQ_BUFFER_VALID;
> +	return 0;
> +}
> +
> +static void sgp_iaq_thread_sleep_until(const struct sgp_data *data,
> +				       unsigned long sleep_jiffies)
> +{
> +	const long IAQ_POLL = 50000;
> +
> +	while (!time_after(jiffies, sleep_jiffies)) {
> +		usleep_range(IAQ_POLL, IAQ_POLL + 10000);
> +		if (kthread_should_stop() || data->iaq_init_start_jiffies == 0)
> +			return;
> +	}
> +}
> +
> +static int sgp_iaq_threadfn(void *p)
> +{
> +	struct sgp_data *data = (struct sgp_data *)p;
> +	unsigned long next_update_jiffies;
> +	int ret;
> +
> +	while (!kthread_should_stop()) {
> +		mutex_lock(&data->data_lock);
> +		if (data->iaq_init_start_jiffies == 0) {
> +			if (data->iaq_init_fn)
> +				data->iaq_init_fn(data);
> +			ret = sgp_read_cmd(data, data->iaq_init_cmd, NULL, 0,
> +					   SGP_CMD_DURATION_US);
> +			if (ret < 0)
> +				goto unlock_sleep_continue;
> +			data->iaq_init_start_jiffies = jiffies;
> +			if (data->user_baseline[0]) {
> +				ret = sgp_write_cmd(data, SGP_CMD_SET_BASELINE,
> +						    data->user_baseline,
> +						    data->baseline_len,
> +						    SGP_CMD_DURATION_US);
> +				if (ret < 0) {
> +					dev_err(&data->client->dev,
> +						"IAQ set baseline failed [%d]\n",
> +						ret);
> +					goto unlock_sleep_continue;
> +				}
> +			}
> +			if (data->iaq_init_duration_jiffies) {
> +				mutex_unlock(&data->data_lock);
> +				sgp_iaq_thread_sleep_until(data,
> +					    data->iaq_init_start_jiffies +
> +					    data->iaq_init_duration_jiffies);
> +				continue;
> +			}
> +		}
> +
> +		ret = sgp_measure_iaq(data);
> +		if (ret && ret != -EBUSY) {
> +			dev_warn(&data->client->dev, "IAQ measurement error [%d]\n",
> +				 ret);
> +		}
> +unlock_sleep_continue:
> +		next_update_jiffies = jiffies + data->measure_interval_jiffies;
> +		mutex_unlock(&data->data_lock);
> +		sgp_iaq_thread_sleep_until(data, next_update_jiffies);
> +	}
> +	return 0;
> +}
> +
> +static ssize_t sgp_absolute_humidity_store(struct device *dev,
> +					   __attribute__((unused))
> +					   struct device_attribute *attr,
> +					   const char *buf, size_t count)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +	u32 ah;
> +	u16 ah_scaled;
> +	int ret;
> +
> +	if (!data->supports_humidity_compensation)
> +		return -EINVAL;
> +
> +	ret = kstrtou32(buf, 10, &ah);
> +	if (ret != 1 || ah < 0 || ah > 256000)
> +		return -EINVAL;
> +
> +	/* ah_scaled = (u16)((ah / 1000.0) * 256.0) */
> +	ah_scaled = (u16)(((u64)ah * 256 * 16777) >> 24);
> +
> +	/* don't disable AH compensation due to rounding */
> +	if (ah > 0 && ah_scaled == 0)
> +		ah_scaled = 1;
> +
> +	mutex_lock(&data->data_lock);
> +	ret = sgp_write_cmd(data, SGP_CMD_SET_ABSOLUTE_HUMIDITY,
> +			    &ah_scaled, 1, SGP_CMD_DURATION_US);
> +	mutex_unlock(&data->data_lock);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +
> +static int sgp_read_raw(struct iio_dev *indio_dev,
> +			struct iio_chan_spec const *chan, int *val,
> +			int *val2, long mask)
> +{
> +	struct sgp_data *data = iio_priv(indio_dev);
> +	struct sgp_crc_word *words;
> +	int ret;
> +
> +	switch (mask) {
> +	case IIO_CHAN_INFO_PROCESSED:
> +		mutex_lock(&data->data_lock);
> +		if (data->iaq_buffer_state != IAQ_BUFFER_VALID) {
> +			mutex_unlock(&data->data_lock);
> +			return -EBUSY;
> +		}
> +		words = data->iaq_buffer.raw_words;
> +		switch (chan->address) {
> +		case SGP30_IAQ_TVOC_IDX:
> +		case SGPC3_IAQ_TVOC_IDX:
> +			*val = 0;
> +			*val2 = be16_to_cpu(words[1].value);
> +			ret = IIO_VAL_INT_PLUS_NANO;
> +			break;
> +		case SGP30_IAQ_CO2EQ_IDX:
> +			*val = 0;
> +			*val2 = be16_to_cpu(words[0].value);
> +			ret = IIO_VAL_INT_PLUS_MICRO;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			break;
> +		}
> +		mutex_unlock(&data->data_lock);
> +		break;
> +	case IIO_CHAN_INFO_RAW:
> +		mutex_lock(&data->data_lock);
> +		if (chan->address == SGPC3_SIG_ETOH_IDX) {
> +			if (data->iaq_buffer_state == IAQ_BUFFER_EMPTY)
> +				ret = -EBUSY;
> +			else
> +				ret = 0;
> +			words = data->iaq_buffer.raw_words;
> +		} else {
> +			ret = sgp_read_cmd(data, data->measure_gas_signals_cmd,
> +					   &data->buffer, SGP_MEASUREMENT_LEN,
> +					   SGP_MEASUREMENT_DURATION_US);
> +			words = data->buffer.raw_words;
> +		}
> +		if (ret) {
> +			mutex_unlock(&data->data_lock);
> +			return ret;
> +		}
> +
> +		switch (chan->address) {
> +		case SGP30_SIG_ETOH_IDX:
> +			*val = be16_to_cpu(words[1].value);
> +			ret = IIO_VAL_INT;
> +			break;
> +		case SGPC3_SIG_ETOH_IDX:
> +		case SGP30_SIG_H2_IDX:
> +			*val = be16_to_cpu(words[0].value);
> +			ret = IIO_VAL_INT;
> +			break;
> +		}
> +		mutex_unlock(&data->data_lock);
> +		break;
Might as well return ret here rather than going to the end.

> +	case IIO_CHAN_INFO_SCALE:
> +		switch (chan->address) {
> +		case SGP30_SIG_ETOH_IDX:
> +		case SGPC3_SIG_ETOH_IDX:
> +		case SGP30_SIG_H2_IDX:
> +			*val = 0;
> +			*val2 = 1953125;
> +			return IIO_VAL_INT_PLUS_NANO;
> +		default:
> +			return -EINVAL;
> +		}
> +	default:
> +		return -EINVAL;
> +	}
> +	return ret;
> +}
> +
> +static void sgp_restart_iaq_thread(struct sgp_data *data)
> +{
> +	mutex_lock(&data->data_lock);
> +	data->iaq_buffer_state = IAQ_BUFFER_EMPTY;
> +	data->iaq_init_start_jiffies = 0;
> +	mutex_unlock(&data->data_lock);
> +}
> +
> +static void sgpc3_iaq_init(struct sgp_data *data)
> +{
> +	int skip_cycles;
> +
> +	data->iaq_init_duration_jiffies = data->iaq_init_user_duration * HZ;
> +	if (data->supports_power_mode) {
> +		data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_CONTINUOUS;
> +
> +		if (data->power_mode == SGPC3_POWER_MODE_ULTRA_LOW_POWER) {
No need for {} around single statements.
> +			data->measure_interval_jiffies =
> +				SGPC3_MEASURE_ULP_INTERVAL_HZ * HZ;
> +		} else {
> +			data->measure_interval_jiffies =
> +				SGPC3_MEASURE_INTERVAL_HZ * HZ;
> +		}
> +
> +		if (data->user_baseline[0])
> +			skip_cycles = 1;
> +		else if (data->power_mode == SGPC3_POWER_MODE_ULTRA_LOW_POWER)
> +			skip_cycles = 2;
> +		else
> +			skip_cycles = 11;
> +	} else {
> +		switch (data->iaq_init_user_duration) {
> +		case 0:
> +			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_0;
> +			skip_cycles = 1;
> +			break;
> +		case 16:
> +			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_16;
> +			skip_cycles = 9;
> +			break;
> +		case 64:
> +			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_64;
> +			skip_cycles = 33;
> +			break;
> +		case 184:
> +			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_184;
> +			skip_cycles = 93;
> +			break;
> +		default:
> +			data->iaq_init_cmd = SGPC3_CMD_IAQ_INIT_64;
> +			skip_cycles = 33;
> +			break;
> +		}
> +		if (!data->user_baseline[0])
> +			skip_cycles += 10;
> +	}
> +	data->iaq_defval_skip_jiffies = data->iaq_init_duration_jiffies +
> +					(skip_cycles *
> +					 data->measure_interval_jiffies);
> +}
> +
> +static ssize_t sgp_iaq_preheat_store(struct device *dev,
> +				     __attribute__((unused))
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t count)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +	u32 init_duration;
> +	int ret;
> +
> +	if (SGP_VERS_PRODUCT(data) != SGPC3)
> +		return -EINVAL;
> +
> +	mutex_lock(&data->data_lock);
> +	ret = kstrtou32(buf, 10, &init_duration);
> +	if (ret) {
> +		ret = -EINVAL;
> +		goto unlock_fail;
> +	}
> +
> +	if (data->supports_power_mode) {
> +		if (init_duration > 300) {
> +			ret = -EINVAL;
> +			goto unlock_fail;
> +		}
> +	} else {
> +		if (init_duration == 0) {
Given the wide and non obvious jumps, I'd have
an available attribute to let userspace know which values will work.

> +		} else if (init_duration <= 16) {
> +			init_duration = 16;
> +		} else if (init_duration <= 64) {
> +			init_duration = 64;
> +		} else if (init_duration <= 184) {
> +			init_duration = 184;
> +		} else {
> +			ret = -EINVAL;
> +			goto unlock_fail;
> +		}
> +	}
> +
> +	data->iaq_init_user_duration = init_duration;
> +	mutex_unlock(&data->data_lock);
> +
> +	return count;
> +
> +unlock_fail:
> +	mutex_unlock(&data->data_lock);
> +	return ret;
> +}
> +
> +static ssize_t sgp_power_mode_store(struct device *dev,
> +				    __attribute__((unused))
> +				    struct device_attribute *attr,
> +				    const char *buf, size_t count)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +	u16 power_mode;
> +	int ret;
> +	const char ULTRA_LOW[] = "ultra-low";
> +	const char LOW[] = "low";
> +
> +	if (!data->supports_power_mode)
> +		return -EINVAL;
> +
> +	if (strncmp(buf, ULTRA_LOW, sizeof(ULTRA_LOW) - 1) == 0)
> +		power_mode = SGPC3_POWER_MODE_ULTRA_LOW_POWER;
> +	else if (strncmp(buf, LOW, sizeof(LOW) - 1) == 0)
> +		power_mode = SGPC3_POWER_MODE_LOW_POWER;
> +	else
> +		return -EINVAL;
> +
> +	mutex_lock(&data->data_lock);
> +	if (power_mode == data->power_mode) {
> +		mutex_unlock(&data->data_lock);
> +		return count;
> +	}
> +
> +	/* Switching power mode invalidates any set baselines */
> +	if (power_mode != data->power_mode) {
> +		data->user_baseline[0] = 0;
> +		data->user_baseline[1] = 0;
> +	}
> +
> +	ret = sgp_write_cmd(data, SGPC3_CMD_SET_POWER_MODE,
> +			    &power_mode, 1, SGP_CMD_DURATION_US);
> +	if (ret) {
> +		mutex_unlock(&data->data_lock);
> +		return ret;
> +	}
> +	data->power_mode = power_mode;
> +	mutex_unlock(&data->data_lock);
> +
> +	sgp_restart_iaq_thread(data);
> +	return count;
> +}
> +
> +static int sgp_get_baseline(struct sgp_data *data, u16 *baseline_words)
> +{
> +	u16 baseline_word;
> +	int ret, ix, jx;
> +
> +	mutex_lock(&data->data_lock);
> +	ret = sgp_read_cmd(data, SGP_CMD_GET_BASELINE, &data->buffer,
> +			   data->baseline_len, SGP_CMD_DURATION_US);
> +	if (ret < 0)
> +		goto unlock_fail;
> +
> +	for (ix = 0, jx = data->baseline_len - 1; ix < data->baseline_len;
> +	     ix++, jx--) {
> +		baseline_word = be16_to_cpu(data->buffer.raw_words[jx].value);
> +		baseline_words[ix] = baseline_word;
> +	}
> +
> +	if (!baseline_words[0])
> +		ret = -EBUSY;
> +
> +unlock_fail:
> +	mutex_unlock(&data->data_lock);
> +	return ret;
> +}
> +
> +static ssize_t sgp_iaq_baseline_show(struct device *dev,
> +				     __attribute__((unused))
> +				     struct device_attribute *attr,
> +				     char *buf)
> +{
> +	int ret;
> +	u16 baseline_words[2];
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +
> +	ret = sgp_get_baseline(data, baseline_words);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (data->baseline_len == 1)
> +		return sprintf(buf, "%04hx\n", baseline_words[0]);
> +
> +	return sprintf(buf, "%04hx%04hx\n", baseline_words[0],
> +		       baseline_words[1]);
> +}
> +
> +/**
> + * Retrieve the sensor's baseline and report whether it's valid for persistence
> + * @baseline:   sensor's current baseline
> + *
> + * Return:      1 if the baseline was set manually or the sensor has been
> + *              operating for at least 12h, -EBUSY if the baseline is not yet
> + *              valid, another negative error otherwise.
> + */
> +static int sgp_is_baseline_valid(struct sgp_data *data, u16 *baseline_words)
> +{
> +	int ret;
> +
> +	ret = sgp_get_baseline(data, baseline_words);
> +	if (ret < 0)
> +		return ret;
> +
> +	mutex_lock(&data->data_lock);
> +	ret = (data->user_baseline[0] ||
> +	       time_after(jiffies,
> +			  data->iaq_init_start_jiffies + 60 * 60 * 12 * HZ));
> +	mutex_unlock(&data->data_lock);
> +
> +	return ret;
> +}
> +
> +static void sgp_set_baseline(struct sgp_data *data, u16 *baseline_words)
> +{
> +	mutex_lock(&data->data_lock);
> +	data->user_baseline[0] = baseline_words[0];
> +	if (data->baseline_len == 2)
> +		data->user_baseline[1] = baseline_words[1];
> +	mutex_unlock(&data->data_lock);
> +	sgp_restart_iaq_thread(data);
> +}
> +
> +static ssize_t sgp_iaq_baseline_store(struct device *dev,
> +				      __attribute__((unused))
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t count)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +	int newline = (count > 0 && buf[count - 1] == '\n');
> +	u16 words[2];
> +	int ret = 0;
> +
> +	if (count - newline == (data->baseline_len * 4)) {
> +		if (data->baseline_len == 1)
> +			ret = sscanf(buf, "%04hx", &words[0]);
> +		else
> +			ret = sscanf(buf, "%04hx%04hx", &words[0], &words[1]);
> +	}
> +
> +	if (ret != data->baseline_len) {
> +		dev_err(&data->client->dev, "invalid baseline format\n");
> +		return -EINVAL;
> +	}
> +
> +	sgp_set_baseline(data, words);
blank line before all simple returns at the end of functions.

> +	return count;
> +}
> +
> +static ssize_t sgp_selftest_show(struct device *dev,
> +				 __attribute__((unused))
> +				 struct device_attribute *attr,
> +				 char *buf)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +	u16 baseline_words[2];
> +	u16 measure_test;
> +	int baseline_valid = 0;
> +	int ret;
> +
> +	if (data->product_id == SGP30) {
> +		/* On the SGP30, the self-test interferes with iaq_init */
> +		baseline_valid = sgp_is_baseline_valid(data, baseline_words);
> +	}
> +
> +	mutex_lock(&data->data_lock);
> +	ret = sgp_read_cmd(data, SGP_CMD_SELFTEST, &data->buffer,
> +			   1, SGP_SELFTEST_DURATION_US);
> +	if (ret < 0) {
> +		mutex_unlock(&data->data_lock);
> +		return ret;
> +	}
> +
> +	measure_test = be16_to_cpu(data->buffer.raw_words[0].value);
> +	mutex_unlock(&data->data_lock);
> +
> +	if (data->product_id == SGP30) {
> +		if (baseline_valid > 0)
> +			sgp_set_baseline(data, baseline_words);
> +		else
> +			sgp_restart_iaq_thread(data);
> +	}
> +
> +	return sprintf(buf, "%s\n",
> +		       measure_test ^ SGP_SELFTEST_OK ? "FAILED" : "OK");
> +}
> +
> +static ssize_t sgp_serial_id_show(struct device *dev,
> +				 __attribute__((unused))
> +				 struct device_attribute *attr,
> +				 char *buf)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +
> +	return sprintf(buf, "%llu\n", data->serial_id);
> +}
> +
> +static ssize_t sgp_feature_set_version_show(struct device *dev,
> +					    __attribute__((unused))
> +					    struct device_attribute *attr,
> +					    char *buf)
> +{
> +	struct sgp_data *data = iio_priv(dev_to_iio_dev(dev));
> +
> +	return sprintf(buf, "%hu.%hu\n", (data->feature_set & 0x00e0) >> 5,
> +		       data->feature_set & 0x001f);
> +}
> +
> +static int sgp_get_serial_id(struct sgp_data *data)
> +{
> +	int ret;
> +	struct sgp_crc_word *words;
> +
> +	mutex_lock(&data->data_lock);
> +	ret = sgp_read_cmd(data, SGP_CMD_GET_SERIAL_ID, &data->buffer, 3,
> +			   SGP_CMD_DURATION_US);
> +	if (ret < 0)
> +		goto unlock_fail;
> +
> +	words = data->buffer.raw_words;
> +	data->serial_id = (u64)(be16_to_cpu(words[2].value) & 0xffff)       |
> +			  (u64)(be16_to_cpu(words[1].value) & 0xffff) << 16 |
> +			  (u64)(be16_to_cpu(words[0].value) & 0xffff) << 32;
> +
> +unlock_fail:
> +	mutex_unlock(&data->data_lock);
blank line here.

> +	return ret;
> +}
> +
> +static int sgp_check_compat(struct sgp_data *data,
> +			    unsigned int product_id)
> +{
> +	struct sgp_version *supported_versions;
> +	u16 ix, num_fs;
> +	u16 product = SGP_VERS_PRODUCT(data);
> +	u16 reserved = SGP_VERS_RESERVED(data);
> +	u16 major = SGP_VERS_MAJOR(data);
> +	u16 minor = SGP_VERS_MINOR(data);
> +
> +	/* driver does not match product */
> +	if (product != product_id) {
> +		dev_err(&data->client->dev,
> +			"sensor reports a different product: 0x%04hx\n",
> +			product);
> +		return -ENODEV;
> +	}
> +	if (reserved) {
Would be easier if reserved was set locally or local variable not used.
> +		dev_warn(&data->client->dev, "reserved bits set: 0x%04hx\n",
> +			 reserved);
No need for {} as only one statement.

> +	}
> +	/* engineering samples are not supported: no interface guarantees */
> +	if (SGP_VERS_ENG_BIT(data))
> +		return -ENODEV;
> +
> +	switch (product) {
> +	case SGP30:
> +		supported_versions =
> +			(struct sgp_version *)supported_versions_sgp30;
Shouldn't need to cast away the const - it's saying the pointer is to
constant data not that it is a constant pointer.

> +		num_fs = ARRAY_SIZE(supported_versions_sgp30);
> +		break;
> +	case SGPC3:
> +		supported_versions =
> +			(struct sgp_version *)supported_versions_sgpc3;
> +		num_fs = ARRAY_SIZE(supported_versions_sgpc3);
> +		break;
> +	default:
> +		return -ENODEV;
> +	}
> +
Again, easier to follow if you either set major and minor here, or
you don't use local variables.

> +	for (ix = 0; ix < num_fs; ix++) {
> +		if (major == supported_versions[ix].major &&
> +		    minor >= supported_versions[ix].minor)
> +			return 0;
> +	}
> +	dev_err(&data->client->dev, "unsupported sgp version: %d.%d\n",
> +		major, minor);
blank line here.
> +	return -ENODEV;
> +}
> +
> +static void sgp_init(struct sgp_data *data)
> +{
> +	data->iaq_init_cmd = SGP_CMD_IAQ_INIT;
> +	data->iaq_init_start_jiffies = 0;
> +	data->iaq_init_duration_jiffies = 0;
> +	data->iaq_init_user_duration = 0;
> +	data->iaq_init_fn = NULL;
> +	data->iaq_buffer_state = IAQ_BUFFER_EMPTY;
> +	data->user_baseline[0] = 0;
> +	data->user_baseline[1] = 0;
> +	switch (SGP_VERS_PRODUCT(data)) {
> +	case SGP30:
> +		data->measure_interval_jiffies = SGP30_MEASURE_INTERVAL_HZ * HZ;
> +		data->measure_iaq_cmd = SGP_CMD_IAQ_MEASURE;
> +		data->measure_gas_signals_cmd = SGP30_CMD_MEASURE_SIGNAL;
> +		data->product_id = SGP30;
> +		data->baseline_len = 2;
> +		data->supports_humidity_compensation = true;
> +		data->iaq_defval_skip_jiffies = 15 * HZ;
> +		break;
> +	case SGPC3:
> +		data->measure_interval_jiffies = SGPC3_MEASURE_INTERVAL_HZ * HZ;
> +		data->measure_iaq_cmd = SGPC3_CMD_MEASURE_RAW;
> +		data->measure_gas_signals_cmd = SGPC3_CMD_MEASURE_RAW;
> +		data->product_id = SGPC3;
> +		data->baseline_len = 1;
> +		data->power_mode = SGPC3_POWER_MODE_LOW_POWER;
> +		data->iaq_init_user_duration =
> +			SGPC3_DEFAULT_IAQ_INIT_DURATION_HZ;
> +		if (SGP_VERS_MAJOR(data) == 0 && SGP_VERS_MINOR(data) >= 6) {
> +			data->supports_humidity_compensation = true;
> +			data->supports_power_mode = true;
> +		}
> +		data->iaq_init_fn = &sgpc3_iaq_init;
> +		break;
default in here to suppress build warnings. 
> +	};
> +	data->iaq_thread = kthread_run(sgp_iaq_threadfn, data,
> +				       "%s-iaq", data->client->name);
> +}
> +
> +static IIO_DEVICE_ATTR(in_serial_id, 0444, sgp_serial_id_show, NULL, 0);
> +static IIO_DEVICE_ATTR(in_feature_set_version, 0444,
> +		       sgp_feature_set_version_show, NULL, 0);
> +static IIO_DEVICE_ATTR(in_selftest, 0444, sgp_selftest_show, NULL, 0);
> +static IIO_DEVICE_ATTR(set_iaq_preheat_seconds, 0220, NULL,
> +		       sgp_iaq_preheat_store, 0);
> +static IIO_DEVICE_ATTR(in_iaq_baseline, 0444, sgp_iaq_baseline_show, NULL, 0);
> +static IIO_DEVICE_ATTR(set_iaq_baseline, 0220, NULL, sgp_iaq_baseline_store, 0);
> +static IIO_DEVICE_ATTR(set_absolute_humidity, 0220, NULL,
> +		       sgp_absolute_humidity_store, 0);
> +static IIO_DEVICE_ATTR(set_power_mode, 0220, NULL,
> +		       sgp_power_mode_store, 0);
> +
> +static struct attribute *sgp_attributes[] = {
> +	&iio_dev_attr_in_serial_id.dev_attr.attr,
> +	&iio_dev_attr_in_feature_set_version.dev_attr.attr,
> +	&iio_dev_attr_in_selftest.dev_attr.attr,
> +	&iio_dev_attr_in_iaq_baseline.dev_attr.attr,
> +	&iio_dev_attr_set_iaq_baseline.dev_attr.attr,
> +	&iio_dev_attr_set_iaq_preheat_seconds.dev_attr.attr,
> +	&iio_dev_attr_set_absolute_humidity.dev_attr.attr,
> +	&iio_dev_attr_set_power_mode.dev_attr.attr,
I commented on most of these above so will ignore here.
Key think is if it isn't in the core IIO ABI most userspace
code will have no idea what to do with it.

> +	NULL
> +};
> +
> +static const struct attribute_group sgp_attr_group = {
> +	.attrs = sgp_attributes,
> +};
> +
> +static const struct iio_info sgp_info = {
> +	.attrs		= &sgp_attr_group,
> +	.read_raw	= sgp_read_raw,
> +};
> +
> +static const struct of_device_id sgp_dt_ids[] = {
> +	{ .compatible = "sensirion,sgp30", .data = (void *)SGP30 },
> +	{ .compatible = "sensirion,sgpc3", .data = (void *)SGPC3 },
> +	{ }
> +};
> +
> +static int sgp_probe(struct i2c_client *client,
> +		     const struct i2c_device_id *id)
> +{
> +	struct iio_dev *indio_dev;
> +	struct sgp_data *data;
> +	const struct sgp_device *chip;
> +	const struct of_device_id *of_id;
> +	unsigned long product_id;
> +	int ret;
> +
> +	indio_dev = devm_iio_device_alloc(&client->dev, sizeof(*data));
> +	if (!indio_dev)
> +		return -ENOMEM;
> +
> +	of_id = of_match_device(sgp_dt_ids, &client->dev);
> +	if (of_id)
> +		product_id = (unsigned long)of_id->data;
> +	else
> +		product_id = id->driver_data;
> +
> +	chip = &sgp_devices[product_id];
> +	data = iio_priv(indio_dev);
> +	i2c_set_clientdata(client, indio_dev);
> +	data->client = client;
> +	crc8_populate_msb(sgp_crc8_table, SGP_CRC8_POLYNOMIAL);
> +	mutex_init(&data->data_lock);
> +
> +	/* get serial id and write it to client data */
> +	ret = sgp_get_serial_id(data);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* get feature set version and write it to client data */
> +	ret = sgp_read_cmd(data, SGP_CMD_GET_FEATURE_SET, &data->buffer, 1,
> +			   SGP_CMD_DURATION_US);
> +	if (ret < 0)
> +		return ret;
> +
> +	data->feature_set = be16_to_cpu(data->buffer.raw_words[0].value);
> +
> +	ret = sgp_check_compat(data, product_id);
> +	if (ret)
> +		return ret;
> +
> +	indio_dev->dev.parent = &client->dev;
> +	indio_dev->info = &sgp_info;
> +	indio_dev->name = id->name;
> +	indio_dev->modes = INDIO_DIRECT_MODE;
> +
> +	indio_dev->channels = chip->channels;
> +	indio_dev->num_channels = chip->num_channels;
> +
> +	ret = devm_iio_device_register(&client->dev, indio_dev);
> +	if (ret) {
> +		dev_err(&client->dev, "failed to register iio device\n");
> +		return ret;
> +	}
> +
> +	sgp_init(data);
Blank line here.
> +	return ret;
> +}
> +
> +static int sgp_remove(struct i2c_client *client)
> +{
> +	struct iio_dev *indio_dev = i2c_get_clientdata(client);
> +	struct sgp_data *data = iio_priv(indio_dev);
> +
> +	if (data->iaq_thread)
> +		kthread_stop(data->iaq_thread);
I would introduce a sg_exit to make it clear that this
is reversing things that were in sgp_init.

> +	return 0;
> +}
> +
> +static const struct i2c_device_id sgp_id[] = {
> +	{ "sgp30", SGP30 },
> +	{ "sgpc3", SGPC3 },
> +	{ }
> +};
> +
> +MODULE_DEVICE_TABLE(i2c, sgp_id);
> +MODULE_DEVICE_TABLE(of, sgp_dt_ids);
> +
> +static struct i2c_driver sgp_driver = {
> +	.driver = {
> +		.name	= "sgp30",
> +		.of_match_table = of_match_ptr(sgp_dt_ids),
> +	},
> +	.probe = sgp_probe,
> +	.remove = sgp_remove,
> +	.id_table = sgp_id,
> +};
> +module_i2c_driver(sgp_driver);
> +
> +MODULE_AUTHOR("Andreas Brauchli <andreas.brauchli@sensirion.com>");
> +MODULE_AUTHOR("Pascal Sachs <pascal.sachs@sensirion.com>");
> +MODULE_DESCRIPTION("Sensirion SGP gas sensors");
> +MODULE_LICENSE("GPL v2");
> +MODULE_VERSION("0.6.0");

Please drop module version.  Given userspace is not obliged to read this
anyway an change to ABI that this is protecting against is breakage
of userspace ABI and hence not acceptable.  Result of this is that
having a module version exported is entirely useless.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V4] thermal: Add cooling device's statistics in sysfs
From: Viresh Kumar @ 2018-03-12  4:32 UTC (permalink / raw)
  To: Zhang Rui, Eduardo Valentin
  Cc: Vincent Guittot, linux-doc, linux-kernel, linux-pm
In-Reply-To: <6e557b90950723065d49c9a545146ce31d7dc6e1.1516096153.git.viresh.kumar@linaro.org>

On 16-01-18, 15:22, Viresh Kumar wrote:
> This extends the sysfs interface for thermal cooling devices and exposes
> some pretty useful statistics. These statistics have proven to be quite
> useful specially while doing benchmarks related to the task scheduler,
> where we want to make sure that nothing has disrupted the test,
> specially the cooling device which may have put constraints on the CPUs.
> The information exposed here tells us to what extent the CPUs were
> constrained by the thermal framework.

@Eduardo/Zhang: Are you going to merge this for 4.17 ?

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
From: Waiman Long @ 2018-03-12 14:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin
In-Reply-To: <20180310131647.GB4043@hirez.programming.kicks-ass.net>

On 03/10/2018 08:16 AM, Peter Zijlstra wrote:
> On Fri, Mar 09, 2018 at 06:06:29PM -0500, Waiman Long wrote:
>> So you are talking about sched_relax_domain_level and
> That one I wouldn't be sad to see the back of.
>
>> sched_load_balance.
> This one, that's critical. And this is the perfect time to try and fix
> the whole isolcpus issue.
>
> The primary issue is that to make equivalent functionality available
> through cpuset, we need to basically start all tasks outside the root
> group.
>
> The equivalent of isolcpus=xxx is a cgroup setup like:
>
>         root
> 	/  \
>   system    other
>
> Where other has the @xxx cpus and system the remainder and
> root.sched_load_balance = 0.

I saw in the kernel-parameters.txt file that the isolcpus option was
deprecated - use cpusets instead. However, there doesn't seem to have
document on the right way to do it. Of course, we can achieve similar
results with what you have outlined above, but the process is more
complex than just adding another boot command line argument with
isolcpus. So I doubt isolcpus will die anytime soon unless we can make
the alternative as easy to use.

> Back before cgroups (and the new workqueue stuff), we could've started
> everything in the !root group, no worry. But now that doesn't work,
> because a bunch of controllers can't deal with that and everything
> cgroup expects the cgroupfs to be empty on boot.

AFAIK, all the processes belong to the root cgroup on boot. And the root
cgroup is usually special that the controller may not exert any control
for processes in the root cgroup. Many controllers become active for
processes in the child cgroups only. Would you mind elaborating what
doesn't quite work currently?

 
> It's one of my biggest regrets that I didn't 'fix' this before cgroups
> came along.
>
>> I have not removed any bits. I just haven't exposed
>> them yet. It does seem like these 2 control knobs are useful from the
>> scheduling perspective. Do we also need cpu_exclusive or just the two
>> sched control knobs are enough?
> I always forget if we need exclusive for load_balance to work; I'll
> peruse the document/code.

I think the cpu_exclusive feature can be useful to enforce that CPUs
allocated to the "other" isolated cgroup cannot be used by the processes
under the "system" parent.

I know that there are special code to handle the isolcpus option. How
about changing it to create a exclusive cpuset automatically instead.
Applications that need to run in those isolated CPUs can then use the
standard cgroup process to be moved into the isolated cgroup. For example,

isolcpus=<cpuset-name>,<cpu-id-list>

or

isolcpuset=<cpuset-name>[,cpu:<cpu-id-list>][,mem:<memory-node-list>]

We can then retire the old usage and encourage users to use the cgroup
API to manage it.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
From: Mike Galbraith @ 2018-03-12 15:21 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin
In-Reply-To: <12ef7f60-5b14-8e09-2478-1453a3071f21@redhat.com>

On Mon, 2018-03-12 at 10:20 -0400, Waiman Long wrote:
> On 03/10/2018 08:16 AM, Peter Zijlstra wrote:
> 
> > The equivalent of isolcpus=xxx is a cgroup setup like:
> >
> >         root
> > 	/  \
> >   system    other
> >
> > Where other has the @xxx cpus and system the remainder and
> > root.sched_load_balance = 0.
> 
> I saw in the kernel-parameters.txt file that the isolcpus option was
> deprecated - use cpusets instead. However, there doesn't seem to have
> document on the right way to do it.

I use cset shield (cpuset package) in a script to create a set and
migrate everything that's permitted into the system set.

setup:
cset shield --userset=rtcpus --cpu=4-63 --kthread=on
<poke this/that>

teardown:
cset shield --userset=rtcpus --reset
<un-poke this/that>

Non-sexy, but works for simple stuff.

	-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] Doc: misc-devices: move lcd-panel-cgram.txt to auxdisplay/
From: Miguel Ojeda @ 2018-03-13  0:07 UTC (permalink / raw)
  To: w, corbet, linux-doc, linux-kernel

Commit 7005b58458e4beecaf5efacb872c456bc7d3541a ("Staging: add lcd-panel
driver") introduced the panel driver, which is now in
drivers/auxdisplay.

Cc: Willy Tarreau <w@1wt.eu>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
---
 Documentation/{misc-devices => auxdisplay}/lcd-panel-cgram.txt | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/{misc-devices => auxdisplay}/lcd-panel-cgram.txt (100%)

diff --git a/Documentation/misc-devices/lcd-panel-cgram.txt b/Documentation/auxdisplay/lcd-panel-cgram.txt
similarity index 100%
rename from Documentation/misc-devices/lcd-panel-cgram.txt
rename to Documentation/auxdisplay/lcd-panel-cgram.txt
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V4] thermal: Add cooling device's statistics in sysfs
From: Zhang Rui @ 2018-03-13  7:02 UTC (permalink / raw)
  To: Viresh Kumar, Eduardo Valentin
  Cc: Vincent Guittot, linux-doc, linux-kernel, linux-pm
In-Reply-To: <6e557b90950723065d49c9a545146ce31d7dc6e1.1516096153.git.viresh.kumar@linaro.org>

Hi, Viresh,

I will queue it for 4.17, with just one minor fix below.

On 二, 2018-01-16 at 15:22 +0530, Viresh Kumar wrote:
> This extends the sysfs interface for thermal cooling devices and
> exposes
> some pretty useful statistics. These statistics have proven to be
> quite
> useful specially while doing benchmarks related to the task
> scheduler,
> where we want to make sure that nothing has disrupted the test,
> specially the cooling device which may have put constraints on the
> CPUs.
> The information exposed here tells us to what extent the CPUs were
> constrained by the thermal framework.
> 
> The write-only "reset" file is used to reset the statistics.
> 
> The read-only "time_in_state" file shows the clock_t time spent by
> the
> device in the respective cooling states, and it prints one line per
> cooling state.
> 
> The read-only "total_trans" file shows single positive integer value
> showing the total number of cooling state transitions the device has
> gone through since the time the cooling device is registered or the
> time
> when statistics were reset last.
> 
> The read-only "trans_table" file shows a two dimensional matrix,
> where
> an entry <i,j> (row i, column j) represents the number of transitions
> from State_i to State_j.
> 
> This is how the directory structure looks like for a single cooling
> device:
> 
> $ ls -R /sys/class/thermal/cooling_device0/
> /sys/class/thermal/cooling_device0/:
> cur_state  max_state  power  stats  subsystem  type  uevent
> 
> /sys/class/thermal/cooling_device0/power:
> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> control               runtime_status
> 
> /sys/class/thermal/cooling_device0/stats:
> reset  time_in_state  total_trans  trans_table
> 
> This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu
> and
> ARM 64-bit Hisilicon hikey960 board running Android.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

[snip]

> +static void cooling_device_stats_setup(struct thermal_cooling_device
> *cdev)
> +{
> +	struct cooling_dev_stats *stats;
> +	unsigned long states;
> +	int var;
> +
> +	if (cdev->ops->get_max_state(cdev, &states))
> +		return;
> +
> +	states++; /* Total number of states is highest state + 1 */
> +
> +	var = sizeof(*stats);
> +	var += sizeof(*stats->time_in_state) * states;
> +	var += sizeof(*stats->trans_table) * states * states;
> +
> +	stats = kzalloc(var, GFP_KERNEL);
> +	if (!stats)
> +		return;
> +
> +	stats->time_in_state = (ktime_t *)(stats + 1);
> +	stats->trans_table = (unsigned int *)(stats->time_in_state +
> states);
> +	cdev->stats = stats;
> +	stats->last_time = ktime_get();
> +	stats->max_states = states;
> +	cdev->stats = stats;
> +

cdev->stats is set twice here, I will remove the first one.

thanks,
rui
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V4] thermal: Add cooling device's statistics in sysfs
From: Viresh Kumar @ 2018-03-13  8:36 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Eduardo Valentin, Vincent Guittot, linux-doc, linux-kernel,
	linux-pm
In-Reply-To: <1520924571.3766.8.camel@intel.com>

On 13-03-18, 15:02, Zhang Rui wrote:
> Hi, Viresh,
> 
> I will queue it for 4.17, with just one minor fix below.
> 
> On 二, 2018-01-16 at 15:22 +0530, Viresh Kumar wrote:

> > +	cdev->stats = stats;
> > +	stats->last_time = ktime_get();
> > +	stats->max_states = states;
> > +	cdev->stats = stats;
> > +
> 
> cdev->stats is set twice here, I will remove the first one.

Thanks for catching that :)

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
From: Stef van Os @ 2018-03-13  9:32 UTC (permalink / raw)
  To: Jae Hyun Yoo, joel, andrew, arnd, gregkh, jdelvare, linux, benh,
	andrew
  Cc: linux-hwmon, devicetree, linux-doc, openbmc, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20180221161606.32247-8-jae.hyun.yoo@linux.intel.com>

Hi Jae,

I tried version 1 and 2 of your PECI patch on our (AST2500 / Xeon E5 v4) 
system. The V1 patchset works as expected (reading back temperature 0 
until PECI is up), but the hwmon driver probe fails with version 2. It 
communicates with the Xeon and assumes during kernel boot of the Aspeed 
that PECI to the Xeon's is already up and running, but our system 
enables the main Xeon supplies from AST2500 userspace.

If I load the hwmon driver as a module to load later on, the driver does 
not call probe like e.g. a I2C driver on the I2C bus does. Am I using V2 
wrongly?

BR,
Stef

On 02/21/2018 05:16 PM, Jae Hyun Yoo wrote:
> This commit adds a generic PECI hwmon client driver implementation.
> 
> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> ---
>   drivers/hwmon/Kconfig      |  10 +
>   drivers/hwmon/Makefile     |   1 +
>   drivers/hwmon/peci-hwmon.c | 928 +++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 939 insertions(+)
>   create mode 100644 drivers/hwmon/peci-hwmon.c
> 
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index ef23553ff5cb..f22e0c31f597 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
>   	  This driver can also be built as a module.  If so, the module
>   	  will be called nct7904.
>   
> +config SENSORS_PECI_HWMON
> +	tristate "PECI hwmon support"
> +	depends on PECI
> +	help
> +	  If you say yes here you get support for the generic PECI hwmon
> +	  driver.
> +
> +	  This driver can also be built as a module.  If so, the module
> +	  will be called peci-hwmon.
> +
>   config SENSORS_NSA320
>   	tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors"
>   	depends on GPIOLIB && OF
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index f814b4ace138..946f54b168e5 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)	+= nct7802.o
>   obj-$(CONFIG_SENSORS_NCT7904)	+= nct7904.o
>   obj-$(CONFIG_SENSORS_NSA320)	+= nsa320-hwmon.o
>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)	+= ntc_thermistor.o
> +obj-$(CONFIG_SENSORS_PECI_HWMON)	+= peci-hwmon.o
>   obj-$(CONFIG_SENSORS_PC87360)	+= pc87360.o
>   obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
>   obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
> diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
> new file mode 100644
> index 000000000000..edd27744adcb
> --- /dev/null
> +++ b/drivers/hwmon/peci-hwmon.c
> @@ -0,0 +1,928 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2018 Intel Corporation
> +
> +#include <linux/delay.h>
> +#include <linux/hwmon.h>
> +#include <linux/hwmon-sysfs.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/of_device.h>
> +#include <linux/peci.h>
> +#include <linux/workqueue.h>
> +
> +#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks x 2) */
> +#define CORE_NUMS_MAX         28  /* Max core numbers (max on SKX Platinum) */
> +#define TEMP_TYPE_PECI        6   /* Sensor type 6: Intel PECI */
> +
> +#define CORE_TEMP_ATTRS       5
> +#define DIMM_TEMP_ATTRS       2
> +#define ATTR_NAME_LEN         24
> +
> +#define DEFAULT_ATTR_GRP_NUMS 5
> +
> +#define UPDATE_INTERVAL_MIN   HZ
> +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
> +
> +enum sign {
> +	POS,
> +	NEG
> +};
> +
> +struct temp_data {
> +	bool valid;
> +	s32  value;
> +	unsigned long last_updated;
> +};
> +
> +struct temp_group {
> +	struct temp_data tjmax;
> +	struct temp_data tcontrol;
> +	struct temp_data tthrottle;
> +	struct temp_data dts_margin;
> +	struct temp_data die;
> +	struct temp_data core[CORE_NUMS_MAX];
> +	struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
> +};
> +
> +struct core_temp_group {
> +	struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
> +	char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
> +	struct attribute *attrs[CORE_TEMP_ATTRS + 1];
> +	struct attribute_group attr_group;
> +};
> +
> +struct dimm_temp_group {
> +	struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
> +	char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
> +	struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
> +	struct attribute_group attr_group;
> +};
> +
> +struct peci_hwmon {
> +	struct peci_client *client;
> +	struct device *dev;
> +	struct device *hwmon_dev;
> +	struct workqueue_struct *work_queue;
> +	struct delayed_work work_handler;
> +	char name[PECI_NAME_SIZE];
> +	struct temp_group temp;
> +	u8 addr;
> +	uint cpu_no;
> +	u32 core_mask;
> +	u32 dimm_mask;
> +	const struct attribute_group *core_attr_groups[CORE_NUMS_MAX + 1];
> +	const struct attribute_group *dimm_attr_groups[DIMM_SLOT_NUMS_MAX + 1];
> +	uint global_idx;
> +	uint core_idx;
> +	uint dimm_idx;
> +};
> +
> +enum label {
> +	L_DIE,
> +	L_DTS,
> +	L_TCONTROL,
> +	L_TTHROTTLE,
> +	L_TJMAX,
> +	L_MAX
> +};
> +
> +static const char *peci_label[L_MAX] = {
> +	"Die\n",
> +	"DTS margin to Tcontrol\n",
> +	"Tcontrol\n",
> +	"Tthrottle\n",
> +	"Tjmax\n",
> +};
> +
> +static int send_peci_cmd(struct peci_hwmon *priv, enum peci_cmd cmd, void *msg)
> +{
> +	return peci_command(priv->client->adapter, cmd, msg);
> +}
> +
> +static int need_update(struct temp_data *temp)
> +{
> +	if (temp->valid &&
> +	    time_before(jiffies, temp->last_updated + UPDATE_INTERVAL_MIN))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static s32 ten_dot_six_to_millidegree(s32 x)
> +{
> +	return ((((x) ^ 0x8000) - 0x8000) * 1000 / 64);
> +}
> +
> +static int get_tjmax(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int rc;
> +
> +	if (!priv->temp.tjmax.valid) {
> +		msg.addr = priv->addr;
> +		msg.index = MBX_INDEX_TEMP_TARGET;
> +		msg.param = 0;
> +		msg.rx_len = 4;
> +
> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +		if (rc < 0)
> +			return rc;
> +
> +		priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
> +		priv->temp.tjmax.valid = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_tcontrol(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 tcontrol_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.tcontrol))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_TEMP_TARGET;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	tcontrol_margin = msg.pkg_config[1];
> +	tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
> +
> +	priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> +
> +	if (!priv->temp.tcontrol.valid) {
> +		priv->temp.tcontrol.last_updated = INITIAL_JIFFIES;
> +		priv->temp.tcontrol.valid = true;
> +	} else {
> +		priv->temp.tcontrol.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_tthrottle(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 tthrottle_offset;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.tthrottle))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_TEMP_TARGET;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
> +	priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
> +
> +	if (!priv->temp.tthrottle.valid) {
> +		priv->temp.tthrottle.last_updated = INITIAL_JIFFIES;
> +		priv->temp.tthrottle.valid = true;
> +	} else {
> +		priv->temp.tthrottle.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_die_temp(struct peci_hwmon *priv)
> +{
> +	struct peci_get_temp_msg msg;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.die))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->temp.die.value = priv->temp.tjmax.value +
> +			       ((s32)msg.temp_raw * 1000 / 64);
> +
> +	if (!priv->temp.die.valid) {
> +		priv->temp.die.last_updated = INITIAL_JIFFIES;
> +		priv->temp.die.valid = true;
> +	} else {
> +		priv->temp.die.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_dts_margin(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 dts_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.dts_margin))
> +		return 0;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_DTS_MARGIN;
> +	msg.param = 0;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> +
> +	/**
> +	 * Processors return a value of DTS reading in 10.6 format
> +	 * (10 bits signed decimal, 6 bits fractional).
> +	 * Error codes:
> +	 *   0x8000: General sensor error
> +	 *   0x8001: Reserved
> +	 *   0x8002: Underflow on reading value
> +	 *   0x8003-0x81ff: Reserved
> +	 */
> +	if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
> +		return -1;
> +
> +	dts_margin = ten_dot_six_to_millidegree(dts_margin);
> +
> +	priv->temp.dts_margin.value = dts_margin;
> +
> +	if (!priv->temp.dts_margin.valid) {
> +		priv->temp.dts_margin.last_updated = INITIAL_JIFFIES;
> +		priv->temp.dts_margin.valid = true;
> +	} else {
> +		priv->temp.dts_margin.last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_core_temp(struct peci_hwmon *priv, int core_index)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	s32 core_dts_margin;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.core[core_index]))
> +		return 0;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
> +	msg.param = core_index;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
> +
> +	/**
> +	 * Processors return a value of the core DTS reading in 10.6 format
> +	 * (10 bits signed decimal, 6 bits fractional).
> +	 * Error codes:
> +	 *   0x8000: General sensor error
> +	 *   0x8001: Reserved
> +	 *   0x8002: Underflow on reading value
> +	 *   0x8003-0x81ff: Reserved
> +	 */
> +	if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
> +		return -1;
> +
> +	core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
> +
> +	priv->temp.core[core_index].value = priv->temp.tjmax.value +
> +					    core_dts_margin;
> +
> +	if (!priv->temp.core[core_index].valid) {
> +		priv->temp.core[core_index].last_updated = INITIAL_JIFFIES;
> +		priv->temp.core[core_index].valid = true;
> +	} else {
> +		priv->temp.core[core_index].last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static int get_dimm_temp(struct peci_hwmon *priv, int dimm_index)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int channel = dimm_index / 2;
> +	int dimm_order = dimm_index % 2;
> +	int rc;
> +
> +	if (!need_update(&priv->temp.dimm[dimm_index]))
> +		return 0;
> +
> +	msg.addr = priv->addr;
> +	msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> +	msg.param = channel;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->temp.dimm[dimm_index].value = msg.pkg_config[dimm_order] * 1000;
> +
> +	if (!priv->temp.dimm[dimm_index].valid) {
> +		priv->temp.dimm[dimm_index].last_updated = INITIAL_JIFFIES;
> +		priv->temp.dimm[dimm_index].valid = true;
> +	} else {
> +		priv->temp.dimm[dimm_index].last_updated = jiffies;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t show_tcontrol(struct device *dev,
> +			     struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tcontrol(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tcontrol.value);
> +}
> +
> +static ssize_t show_tcontrol_margin(struct device *dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int rc;
> +
> +	rc = get_tcontrol(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", sensor_attr->index == POS ?
> +				    priv->temp.tjmax.value -
> +				    priv->temp.tcontrol.value :
> +				    priv->temp.tcontrol.value -
> +				    priv->temp.tjmax.value);
> +}
> +
> +static ssize_t show_tthrottle(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tthrottle(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tthrottle.value);
> +}
> +
> +static ssize_t show_tjmax(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_tjmax(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.tjmax.value);
> +}
> +
> +static ssize_t show_die_temp(struct device *dev,
> +			     struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_die_temp(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.die.value);
> +}
> +
> +static ssize_t show_dts_margin(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	int rc;
> +
> +	rc = get_dts_margin(priv);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.dts_margin.value);
> +}
> +
> +static ssize_t show_core_temp(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int core_index = sensor_attr->index;
> +	int rc;
> +
> +	rc = get_core_temp(priv, core_index);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.core[core_index].value);
> +}
> +
> +static ssize_t show_dimm_temp(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(dev);
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +	int dimm_index = sensor_attr->index;
> +	int rc;
> +
> +	rc = get_dimm_temp(priv, dimm_index);
> +	if (rc < 0)
> +		return rc;
> +
> +	return sprintf(buf, "%d\n", priv->temp.dimm[dimm_index].value);
> +}
> +
> +static ssize_t show_value(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, "%d\n", sensor_attr->index);
> +}
> +
> +static ssize_t show_label(struct device *dev,
> +			  struct device_attribute *attr,
> +			  char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, peci_label[sensor_attr->index]);
> +}
> +
> +static ssize_t show_core_label(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	return sprintf(buf, "Core %d\n", sensor_attr->index);
> +}
> +
> +static ssize_t show_dimm_label(struct device *dev,
> +			       struct device_attribute *attr,
> +			       char *buf)
> +{
> +	struct sensor_device_attribute *sensor_attr = to_sensor_dev_attr(attr);
> +
> +	char channel = 'A' + (sensor_attr->index / 2);
> +	int index = sensor_attr->index % 2;
> +
> +	return sprintf(buf, "DIMM %d (%c%d)\n",
> +		       sensor_attr->index, channel, index);
> +}
> +
> +/* Die temperature */
> +static SENSOR_DEVICE_ATTR(temp1_label, 0444, show_label, NULL, L_DIE);
> +static SENSOR_DEVICE_ATTR(temp1_input, 0444, show_die_temp, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_max, 0444, show_tcontrol, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_crit, 0444, show_tjmax, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp1_crit_hyst, 0444, show_tcontrol_margin, NULL,
> +			  POS);
> +
> +static struct attribute *die_temp_attrs[] = {
> +	&sensor_dev_attr_temp1_label.dev_attr.attr,
> +	&sensor_dev_attr_temp1_input.dev_attr.attr,
> +	&sensor_dev_attr_temp1_max.dev_attr.attr,
> +	&sensor_dev_attr_temp1_crit.dev_attr.attr,
> +	&sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group die_temp_attr_group = {
> +	.attrs = die_temp_attrs,
> +};
> +
> +/* DTS margin temperature */
> +static SENSOR_DEVICE_ATTR(temp2_label, 0444, show_label, NULL, L_DTS);
> +static SENSOR_DEVICE_ATTR(temp2_input, 0444, show_dts_margin, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp2_min, 0444, show_value, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp2_lcrit, 0444, show_tcontrol_margin, NULL, NEG);
> +
> +static struct attribute *dts_margin_temp_attrs[] = {
> +	&sensor_dev_attr_temp2_label.dev_attr.attr,
> +	&sensor_dev_attr_temp2_input.dev_attr.attr,
> +	&sensor_dev_attr_temp2_min.dev_attr.attr,
> +	&sensor_dev_attr_temp2_lcrit.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group dts_margin_temp_attr_group = {
> +	.attrs = dts_margin_temp_attrs,
> +};
> +
> +/* Tcontrol temperature */
> +static SENSOR_DEVICE_ATTR(temp3_label, 0444, show_label, NULL, L_TCONTROL);
> +static SENSOR_DEVICE_ATTR(temp3_input, 0444, show_tcontrol, NULL, 0);
> +static SENSOR_DEVICE_ATTR(temp3_crit, 0444, show_tjmax, NULL, 0);
> +
> +static struct attribute *tcontrol_temp_attrs[] = {
> +	&sensor_dev_attr_temp3_label.dev_attr.attr,
> +	&sensor_dev_attr_temp3_input.dev_attr.attr,
> +	&sensor_dev_attr_temp3_crit.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tcontrol_temp_attr_group = {
> +	.attrs = tcontrol_temp_attrs,
> +};
> +
> +/* Tthrottle temperature */
> +static SENSOR_DEVICE_ATTR(temp4_label, 0444, show_label, NULL, L_TTHROTTLE);
> +static SENSOR_DEVICE_ATTR(temp4_input, 0444, show_tthrottle, NULL, 0);
> +
> +static struct attribute *tthrottle_temp_attrs[] = {
> +	&sensor_dev_attr_temp4_label.dev_attr.attr,
> +	&sensor_dev_attr_temp4_input.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tthrottle_temp_attr_group = {
> +	.attrs = tthrottle_temp_attrs,
> +};
> +
> +/* Tjmax temperature */
> +static SENSOR_DEVICE_ATTR(temp5_label, 0444, show_label, NULL, L_TJMAX);
> +static SENSOR_DEVICE_ATTR(temp5_input, 0444, show_tjmax, NULL, 0);
> +
> +static struct attribute *tjmax_temp_attrs[] = {
> +	&sensor_dev_attr_temp5_label.dev_attr.attr,
> +	&sensor_dev_attr_temp5_input.dev_attr.attr,
> +	NULL
> +};
> +
> +static struct attribute_group tjmax_temp_attr_group = {
> +	.attrs = tjmax_temp_attrs,
> +};
> +
> +static const struct attribute_group *
> +default_attr_groups[DEFAULT_ATTR_GRP_NUMS + 1] = {
> +	&die_temp_attr_group,
> +	&dts_margin_temp_attr_group,
> +	&tcontrol_temp_attr_group,
> +	&tthrottle_temp_attr_group,
> +	&tjmax_temp_attr_group,
> +	NULL
> +};
> +
> +/* Core temperature */
> +static ssize_t (*const core_show_fn[CORE_TEMP_ATTRS]) (struct device *dev,
> +		struct device_attribute *devattr, char *buf) = {
> +	show_core_label,
> +	show_core_temp,
> +	show_tcontrol,
> +	show_tjmax,
> +	show_tcontrol_margin,
> +};
> +
> +static const char *const core_suffix[CORE_TEMP_ATTRS] = {
> +	"label",
> +	"input",
> +	"max",
> +	"crit",
> +	"crit_hyst",
> +};
> +
> +static int check_resolved_cores(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pci_cfg_local_msg msg;
> +	int rc;
> +
> +	if (!(priv->client->adapter->cmd_mask & BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
> +		return -EINVAL;
> +
> +	/* Get the RESOLVED_CORES register value */
> +	msg.addr = priv->addr;
> +	msg.bus = 1;
> +	msg.device = 30;
> +	msg.function = 3;
> +	msg.reg = 0xB4;
> +	msg.rx_len = 4;
> +
> +	rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, (void *)&msg);
> +	if (rc < 0)
> +		return rc;
> +
> +	priv->core_mask = msg.pci_config[3] << 24 |
> +			  msg.pci_config[2] << 16 |
> +			  msg.pci_config[1] << 8 |
> +			  msg.pci_config[0];
> +
> +	if (!priv->core_mask)
> +		return -EAGAIN;
> +
> +	dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", priv->core_mask);
> +	return 0;
> +}
> +
> +static int create_core_temp_group(struct peci_hwmon *priv, int core_no)
> +{
> +	struct core_temp_group *data;
> +	int i;
> +
> +	data = devm_kzalloc(priv->dev, sizeof(struct core_temp_group),
> +			    GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < CORE_TEMP_ATTRS; i++) {
> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> +			 "temp%d_%s", priv->global_idx, core_suffix[i]);
> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> +		data->sd_attrs[i].dev_attr.show = core_show_fn[i];
> +		if (i == 0 || i == 1) /* label or temp */
> +			data->sd_attrs[i].index = core_no;
> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> +	}
> +
> +	data->attr_group.attrs = data->attrs;
> +	priv->core_attr_groups[priv->core_idx++] = &data->attr_group;
> +	priv->global_idx++;
> +
> +	return 0;
> +}
> +
> +static int create_core_temp_groups(struct peci_hwmon *priv)
> +{
> +	int rc, i;
> +
> +	rc = check_resolved_cores(priv);
> +	if (!rc) {
> +		for (i = 0; i < CORE_NUMS_MAX; i++) {
> +			if (priv->core_mask & BIT(i)) {
> +				rc = create_core_temp_group(priv, i);
> +				if (rc)
> +					return rc;
> +			}
> +		}
> +
> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> +					 priv->core_attr_groups);
> +	}
> +
> +	return rc;
> +}
> +
> +/* DIMM temperature */
> +static ssize_t (*const dimm_show_fn[DIMM_TEMP_ATTRS]) (struct device *dev,
> +		struct device_attribute *devattr, char *buf) = {
> +	show_dimm_label,
> +	show_dimm_temp,
> +};
> +
> +static const char *const dimm_suffix[DIMM_TEMP_ATTRS] = {
> +	"label",
> +	"input",
> +};
> +
> +static int check_populated_dimms(struct peci_hwmon *priv)
> +{
> +	struct peci_rd_pkg_cfg_msg msg;
> +	int i, rc, pass = 0;
> +
> +do_scan:
> +	for (i = 0; i < (DIMM_SLOT_NUMS_MAX / 2); i++) {
> +		msg.addr = priv->addr;
> +		msg.index = MBX_INDEX_DDR_DIMM_TEMP;
> +		msg.param = i; /* channel */
> +		msg.rx_len = 4;
> +
> +		rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, (void *)&msg);
> +		if (rc < 0)
> +			return rc;
> +
> +		if (msg.pkg_config[0]) /* DIMM #0 on the channel */
> +			priv->dimm_mask |= BIT(i);
> +
> +		if (msg.pkg_config[1]) /* DIMM #1 on the channel */
> +			priv->dimm_mask |= BIT(i + 1);
> +	}
> +
> +	/* Do 2-pass scanning */
> +	if (priv->dimm_mask && pass == 0) {
> +		pass++;
> +		goto do_scan;
> +	}
> +
> +	if (!priv->dimm_mask)
> +		return -EAGAIN;
> +
> +	dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", priv->dimm_mask);
> +	return 0;
> +}
> +
> +static int create_dimm_temp_group(struct peci_hwmon *priv, int dimm_no)
> +{
> +	struct dimm_temp_group *data;
> +	int i;
> +
> +	data = devm_kzalloc(priv->dev, sizeof(struct dimm_temp_group),
> +			    GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < DIMM_TEMP_ATTRS; i++) {
> +		snprintf(data->attr_name[i], ATTR_NAME_LEN,
> +			 "temp%d_%s", priv->global_idx, dimm_suffix[i]);
> +		sysfs_attr_init(&data->sd_attrs[i].dev_attr.attr);
> +		data->sd_attrs[i].dev_attr.attr.name = data->attr_name[i];
> +		data->sd_attrs[i].dev_attr.attr.mode = 0444;
> +		data->sd_attrs[i].dev_attr.show = dimm_show_fn[i];
> +		data->sd_attrs[i].index = dimm_no;
> +		data->attrs[i] = &data->sd_attrs[i].dev_attr.attr;
> +	}
> +
> +	data->attr_group.attrs = data->attrs;
> +	priv->dimm_attr_groups[priv->dimm_idx++] = &data->attr_group;
> +	priv->global_idx++;
> +
> +	return 0;
> +}
> +
> +static int create_dimm_temp_groups(struct peci_hwmon *priv)
> +{
> +	int rc, i;
> +
> +	rc = check_populated_dimms(priv);
> +	if (!rc) {
> +		for (i = 0; i < DIMM_SLOT_NUMS_MAX; i++) {
> +			if (priv->dimm_mask & BIT(i)) {
> +				rc = create_dimm_temp_group(priv, i);
> +				if (rc)
> +					return rc;
> +			}
> +		}
> +
> +		rc = sysfs_create_groups(&priv->hwmon_dev->kobj,
> +					 priv->dimm_attr_groups);
> +		if (!rc)
> +			dev_dbg(priv->dev, "Done DIMM temp group creation\n");
> +	} else if (rc == -EAGAIN) {
> +		queue_delayed_work(priv->work_queue, &priv->work_handler,
> +				   DIMM_MASK_CHECK_DELAY);
> +		dev_dbg(priv->dev, "Diferred DIMM temp group creation\n");
> +	}
> +
> +	return rc;
> +}
> +
> +static void create_dimm_temp_groups_delayed(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct peci_hwmon *priv = container_of(dwork, struct peci_hwmon,
> +					       work_handler);
> +	int rc;
> +
> +	rc = create_dimm_temp_groups(priv);
> +	if (rc && rc != -EAGAIN)
> +		dev_dbg(priv->dev, "Skipped to creat DIMM temp groups\n");
> +}
> +
> +static int peci_hwmon_probe(struct peci_client *client)
> +{
> +	struct device *dev = &client->dev;
> +	struct peci_hwmon *priv;
> +	int rc;
> +
> +	if ((client->adapter->cmd_mask &
> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
> +	    (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
> +		dev_err(dev, "Client doesn't support temperature monitoring\n");
> +		return -EINVAL;
> +	}
> +
> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	dev_set_drvdata(dev, priv);
> +	priv->client = client;
> +	priv->dev = dev;
> +	priv->addr = client->addr;
> +	priv->cpu_no = priv->addr - PECI_BASE_ADDR;
> +
> +	snprintf(priv->name, PECI_NAME_SIZE, "peci_hwmon.cpu%d", priv->cpu_no);
> +
> +	priv->work_queue = create_singlethread_workqueue(priv->name);
> +	if (!priv->work_queue)
> +		return -ENOMEM;
> +
> +	priv->hwmon_dev = hwmon_device_register_with_groups(priv->dev,
> +							    priv->name,
> +							    priv,
> +							   default_attr_groups);
> +
> +	rc = PTR_ERR_OR_ZERO(priv->hwmon_dev);
> +	if (rc) {
> +		dev_err(dev, "Failed to register peci hwmon\n");
> +		return rc;
> +	}
> +
> +	priv->global_idx = DEFAULT_ATTR_GRP_NUMS + 1;
> +
> +	rc = create_core_temp_groups(priv);
> +	if (rc) {
> +		dev_err(dev, "Failed to create core groups\n");
> +		return rc;
> +	}
> +
> +	INIT_DELAYED_WORK(&priv->work_handler, create_dimm_temp_groups_delayed);
> +
> +	rc = create_dimm_temp_groups(priv);
> +	if (rc && rc != -EAGAIN)
> +		dev_dbg(dev, "Skipped to creat DIMM temp groups\n");
> +
> +	dev_dbg(dev, "peci hwmon for CPU at 0x%x registered\n", priv->addr);
> +
> +	return 0;
> +}
> +
> +static int peci_hwmon_remove(struct peci_client *client)
> +{
> +	struct peci_hwmon *priv = dev_get_drvdata(&client->dev);
> +
> +	cancel_delayed_work(&priv->work_handler);
> +	destroy_workqueue(priv->work_queue);
> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->core_attr_groups);
> +	sysfs_remove_groups(&priv->hwmon_dev->kobj, priv->dimm_attr_groups);
> +	hwmon_device_unregister(priv->hwmon_dev);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id peci_of_table[] = {
> +	{ .compatible = "intel,peci-hwmon", },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(of, peci_of_table);
> +
> +static struct peci_driver peci_hwmon_driver = {
> +	.probe  = peci_hwmon_probe,
> +	.remove = peci_hwmon_remove,
> +	.driver = {
> +		.name           = "peci-hwmon",
> +		.of_match_table = of_match_ptr(peci_of_table),
> +	},
> +};
> +module_peci_driver(peci_hwmon_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> +MODULE_DESCRIPTION("PECI hwmon driver");
> +MODULE_LICENSE("GPL v2");
> 

-- 
Stef van Os
Designer
Prodrive Technologies B.V.
Mobile: +31 63 17 76 319
Phone:  +31 40 26 76 200
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [trivial PATCH] Documentation/sparse: fix typo
From: Eric Engestrom @ 2018-03-13 11:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jonathan Corbet, Jiri Kosina, linux-doc

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
---
 Documentation/dev-tools/sparse.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/sparse.rst b/Documentation/dev-tools/sparse.rst
index 78aa00a604a009071361..c401c952a340a50fa769 100644
--- a/Documentation/dev-tools/sparse.rst
+++ b/Documentation/dev-tools/sparse.rst
@@ -67,7 +67,7 @@ __releases - The specified lock is held on function entry, but not exit.
 
 If the function enters and exits without the lock held, acquiring and
 releasing the lock inside the function in a balanced way, no
-annotation is needed.  The tree annotations above are for cases where
+annotation is needed.  The three annotations above are for cases where
 sparse would otherwise report a context imbalance.
 
 Getting sparse
-- 
Cheers,
  Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 0/8] trace_uprobe: Support SDT markers having reference count (semaphore)
From: Ravi Bangoria @ 2018-03-13 12:55 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }   

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed


Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
[3] https://lkml.org/lkml/2017/12/6/976


Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the 
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

RFC series can be found at:
  https://lkml.org/lkml/2018/2/28/76

Ravi Bangoria (8):
  Uprobe: Export vaddr <-> offset conversion functions
  mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
  Uprobe: Rename map_info to uprobe_map_info
  Uprobe: Export uprobe_map_info along with
    uprobe_{build/free}_map_info()
  trace_uprobe: Support SDT markers having reference count (semaphore)
  trace_uprobe/sdt: Fix multiple update of same reference counter
  perf probe: Support SDT markers having reference counter (semaphore)
  trace_uprobe/sdt: Document about reference counter

 Documentation/trace/uprobetracer.txt |  16 +-
 include/linux/mm.h                   |  12 ++
 include/linux/uprobes.h              |  11 ++
 kernel/events/uprobes.c              |  62 ++++----
 kernel/trace/trace.c                 |   2 +-
 kernel/trace/trace_uprobe.c          | 273 ++++++++++++++++++++++++++++++++++-
 tools/perf/util/probe-event.c        |  21 ++-
 tools/perf/util/probe-event.h        |   1 +
 tools/perf/util/probe-file.c         |  22 ++-
 tools/perf/util/symbol-elf.c         |  10 ++
 tools/perf/util/symbol.h             |   1 +
 11 files changed, 382 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 1/8] Uprobe: Export vaddr <-> offset conversion functions
From: Ravi Bangoria @ 2018-03-13 12:55 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria
In-Reply-To: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com>

No functionality changes.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 include/linux/mm.h      | 12 ++++++++++++
 kernel/events/uprobes.c | 10 ----------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..95909f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info {
 		return unmapped_area(info);
 }
 
+static inline unsigned long
+offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+{
+	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+}
+
+static inline loff_t
+vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+{
+	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
+}
+
 /* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ce6848e..bd6f230 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
-static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
-{
-	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-}
-
-static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
-{
-	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
-}
-
 /**
  * __replace_page - replace page in vma by new page.
  * based on replace_page in mm/ksm.c
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 4/8] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
From: Ravi Bangoria @ 2018-03-13 12:55 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria
In-Reply-To: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com>

These exported data structure and functions will be used by other
files in later patches.

No functionality changes.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 include/linux/uprobes.h |  9 +++++++++
 kernel/events/uprobes.c | 14 +++-----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..7bd2760 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -109,12 +109,19 @@ enum rp_check {
 	RP_CHECK_RET,
 };
 
+struct address_space;
 struct xol_area;
 
 struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
+	struct mm_struct *mm;
+	unsigned long vaddr;
+};
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -149,6 +156,8 @@ struct uprobes_state {
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info);
+extern struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping, loff_t offset, bool is_register);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 081b88c1..e7830b8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,23 +695,15 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct uprobe_map_info {
-	struct uprobe_map_info *next;
-	struct mm_struct *mm;
-	unsigned long vaddr;
-};
-
-static inline struct uprobe_map_info *
-uprobe_free_map_info(struct uprobe_map_info *info)
+struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info)
 {
 	struct uprobe_map_info *next = info->next;
 	kfree(info);
 	return next;
 }
 
-static struct uprobe_map_info *
-uprobe_build_map_info(struct address_space *mapping, loff_t offset,
-		      bool is_register)
+struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping,
+					      loff_t offset, bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
From: Ravi Bangoria @ 2018-03-13 12:56 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria
In-Reply-To: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com>

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
ommited by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in trace_uprobe, leaving core
uprobe infrastructure as is, except one new callback from uprobe_mmap()
to trace_uprobe.

trace_uprobe definition with reference counter will now be:

  <path>:<offset>[(ref_ctr_offset)]

There are two different cases while enabling the marker,
 1. Trace existing process. In this case, find all suitable processes
    and increment the reference counter in them.
 2. Enable trace before running target binary. In this case, all mmaps
    will get notified to trace_uprobe and trace_uprobe will increment
    the reference counter if corresponding uprobe is enabled.

At the time of disabling probes, decrement reference counter in all
existing target processes.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Fengguang reported/fixed build failure in RFC patch]
---
 include/linux/uprobes.h     |   2 +
 kernel/events/uprobes.c     |   6 ++
 kernel/trace/trace_uprobe.c | 172 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7bd2760..2d4df65 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -122,6 +122,8 @@ struct uprobe_map_info {
 	unsigned long vaddr;
 };
 
+extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e7830b8..06821bb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1041,6 +1041,9 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+/* Rightnow the only user of this is trace_uprobe. */
+void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1053,6 +1056,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
+	if (uprobe_mmap_callback)
+		uprobe_mmap_callback(vma);
+
 	if (no_uprobe_events() || !valid_vma(vma, true))
 		return 0;
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2014f43..b6c9b48 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -25,6 +25,8 @@
 #include <linux/namei.h>
 #include <linux/string.h>
 #include <linux/rculist.h>
+#include <linux/sched/mm.h>
+#include <linux/highmem.h>
 
 #include "trace_probe.h"
 
@@ -58,6 +60,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -362,10 +365,10 @@ static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
 	struct inode *inode;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -375,6 +378,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -454,6 +458,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(arg, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = 0;
+		*rctr_end = 0;
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -488,6 +512,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->inode = inode;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -620,6 +645,8 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			break;
 		}
 	}
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
 
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
@@ -894,6 +921,139 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma)
+{
+	unsigned long vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
+
+	return tu->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == tu->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *
+sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu)
+{
+	struct vm_area_struct *tmp;
+
+	for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next)
+		if (sdt_valid_vma(tu, tmp))
+			return tmp;
+
+	return NULL;
+}
+
+/*
+ * Reference count gate the invocation of probe. If present,
+ * by default reference count is 0. One needs to increment
+ * it before tracing the probe and decrement it when done.
+ */
+static int
+sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	unsigned short orig = 0;
+
+	if (vaddr == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (ret <= 0)
+		return ret;
+
+	kaddr = kmap_atomic(page);
+	memcpy(&orig, kaddr + (vaddr & ~PAGE_MASK), sizeof(orig));
+	orig += d;
+	memcpy(kaddr + (vaddr & ~PAGE_MASK), &orig, sizeof(orig));
+	kunmap_atomic(kaddr);
+
+	put_page(page);
+	return 0;
+}
+
+static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
+{
+	struct uprobe_map_info *info;
+	struct vm_area_struct *vma;
+	unsigned long vaddr;
+
+	uprobe_start_dup_mmap();
+	info = uprobe_build_map_info(tu->inode->i_mapping,
+				tu->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		vma = sdt_find_vma(info->mm, tu);
+		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
+		sdt_update_ref_ctr(info->mm, vaddr, 1);
+
+		up_write(&info->mm->mmap_sem);
+		mmput(info->mm);
+		info = uprobe_free_map_info(info);
+	}
+
+out:
+	uprobe_end_dup_mmap();
+}
+
+/* Called with down_write(&vma->vm_mm->mmap_sem) */
+void trace_uprobe_mmap_callback(struct vm_area_struct *vma)
+{
+	struct trace_uprobe *tu;
+	unsigned long vaddr;
+
+	if (!(vma->vm_flags & VM_WRITE))
+		return;
+
+	mutex_lock(&uprobe_lock);
+	list_for_each_entry(tu, &uprobe_list, list) {
+		if (!sdt_valid_vma(tu, vma) ||
+		    !trace_probe_is_enabled(&tu->tp))
+			continue;
+
+		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
+		sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
+static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
+{
+	struct vm_area_struct *vma;
+	unsigned long vaddr;
+	struct uprobe_map_info *info;
+
+	uprobe_start_dup_mmap();
+	info = uprobe_build_map_info(tu->inode->i_mapping,
+				tu->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		vma = sdt_find_vma(info->mm, tu);
+		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
+		sdt_update_ref_ctr(info->mm, vaddr, -1);
+
+		up_write(&info->mm->mmap_sem);
+		mmput(info->mm);
+		info = uprobe_free_map_info(info);
+	}
+
+out:
+	uprobe_end_dup_mmap();
+}
+
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
@@ -939,6 +1099,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 	if (ret)
 		goto err_buffer;
 
+	if (tu->ref_ctr_offset)
+		sdt_increment_ref_ctr(tu);
+
 	return 0;
 
  err_buffer:
@@ -979,6 +1142,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
+	if (tu->ref_ctr_offset)
+		sdt_decrement_ref_ctr(tu);
+
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
 	tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
 
@@ -1423,6 +1589,8 @@ static __init int init_uprobe_trace(void)
 	/* Profile interface */
 	trace_create_file("uprobe_profile", 0444, d_tracer,
 				    NULL, &uprobe_profile_ops);
+
+	uprobe_mmap_callback = trace_uprobe_mmap_callback;
 	return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 7/8] perf probe: Support SDT markers having reference counter (semaphore)
From: Ravi Bangoria @ 2018-03-13 12:56 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria
In-Reply-To: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com>

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/util/probe-event.c | 21 +++++++++++++++++----
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 22 +++++++++++++++++++---
 tools/perf/util/symbol-elf.c  | 10 ++++++++++
 tools/perf/util/symbol.h      |  1 +
 5 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e1dbc98..2cbe68a 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1832,6 +1832,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2054,15 +2060,22 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
-		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+	if (tev->uprobes) {
+		if (tp->ref_ctr_offset)
+			err = strbuf_addf(&buf, "%s:0x%lx(0x%lx)", tp->module,
+					  tp->address, tp->ref_ctr_offset);
+		else
+			err = strbuf_addf(&buf, "%s:0x%lx", tp->module,
+					  tp->address);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f0..15a98c3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4ae1123..08ba3a6 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -701,6 +701,12 @@ static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 		 : (unsigned long long)note->addr.a64[0];
 }
 
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ? (unsigned long long)note->addr.a32[2]
+		: (unsigned long long)note->addr.a64[2];
+}
+
 static const char * const type_to_suffix[] = {
 	":s64", "", "", "", ":s32", "", ":s16", ":s8",
 	"", ":u8", ":u16", "", ":u32", "", "", "", ":u64"
@@ -776,14 +782,24 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+
+	if (ref_ctr_offset)
+		err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx(0x%llx)",
 				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+				sdt_note__get_addr(note), ref_ctr_offset);
+	else
+		err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+				sdtgrp, note->name, pathname,
+				sdt_note__get_addr(note));
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 2de7705..76c7b54 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1928,6 +1928,16 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 		}
 	}
 
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL)) {
+		if (shdr.sh_offset) {
+			if (tmp->bit32)
+				tmp->addr.a32[2] -= (shdr.sh_addr - shdr.sh_offset);
+			else
+				tmp->addr.a64[2] -= (shdr.sh_addr - shdr.sh_offset);
+		}
+	}
+
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 70c16741..ad0c4f2 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -384,6 +384,7 @@ struct sdt_note {
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 8/8] trace_uprobe/sdt: Document about reference counter
From: Ravi Bangoria @ 2018-03-13 12:56 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria
In-Reply-To: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com>

No functionality changes.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 Documentation/trace/uprobetracer.txt | 16 +++++++++++++---
 kernel/trace/trace.c                 |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt
index bf526a7c..8fb13b0 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -19,15 +19,25 @@ user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -------------------------
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
                   on PATH+OFFSET.
   PATH          : Path to an executable or a library.
   OFFSET        : Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+                  gate the invocation of probe. If present, by default
+                  reference count is 0. Kernel needs to increment it before
+                  tracing the probe and decrement it when done. This is
+                  identical to semaphore in Userspace Statically Defined
+                  Tracepoints (USDT).
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 20a2300..2104d03 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4604,7 +4604,7 @@ static int tracing_trace_options_open(struct inode *inode, struct file *file)
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter
From: Ravi Bangoria @ 2018-03-13 12:56 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, gregkh, huawei.libin, hughd, jack, jglisse, jolsa,
	kan.liang, kirill.shutemov, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, mhocko, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, pombredanne, rostedt, tglx, tmricht, willy,
	yao.jin, fengguang.wu, Ravi Bangoria
In-Reply-To: <20180313125603.19819-1-ravi.bangoria@linux.vnet.ibm.com>

For tiny binaries/libraries, different mmap regions points to the
same file portion. In such cases, we may increment reference counter
multiple times. But while de-registration, reference counter will get
decremented only by once leaving reference counter > 0 even if no one
is tracing on that marker.

Ensure increment and decrement happens in sync by keeping list of
mms in trace_uprobe. Increment reference counter only if mm is not
present in the list and decrement only if mm is present in the list.

Example

  # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events

Before patch:

  # perf stat -a -e sdt_tick:loop2
  # /tmp/tick
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd
   0000000: 02                                       .

  # pkill perf
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd
  0000000: 01                                       .

After patch:

  # perf stat -a -e sdt_tick:loop2
  # /tmp/tick
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd
  0000000: 01                                       .

  # pkill perf
  # dd if=/proc/`pgrep tick`/mem bs=1 count=1 skip=$(( 0x10020036 )) 2>/dev/null | xxd
  0000000: 00                                       .

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/trace/trace_uprobe.c | 105 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 103 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index b6c9b48..9bf3f7a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -50,6 +50,11 @@ struct trace_uprobe_filter {
 	struct list_head	perf_events;
 };
 
+struct sdt_mm_list {
+	struct mm_struct *mm;
+	struct sdt_mm_list *next;
+};
+
 /*
  * uprobe event core functions
  */
@@ -61,6 +66,8 @@ struct trace_uprobe {
 	char				*filename;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
+	struct sdt_mm_list		*sml;
+	struct rw_semaphore		sml_rw_sem;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -274,6 +281,7 @@ static inline bool is_ret_probe(struct trace_uprobe *tu)
 	if (is_ret)
 		tu->consumer.ret_handler = uretprobe_dispatcher;
 	init_trace_uprobe_filter(&tu->filter);
+	init_rwsem(&tu->sml_rw_sem);
 	return tu;
 
 error:
@@ -921,6 +929,74 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *tmp = tu->sml;
+
+	if (!tu->sml || !mm)
+		return false;
+
+	while (tmp) {
+		if (tmp->mm == mm)
+			return true;
+		tmp = tmp->next;
+	}
+
+	return false;
+}
+
+static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *tmp;
+
+	tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
+	if (!tmp)
+		return;
+
+	tmp->mm = mm;
+	tmp->next = tu->sml;
+	tu->sml = tmp;
+}
+
+static void sdt_del_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *prev, *curr;
+
+	if (!tu->sml)
+		return;
+
+	if (tu->sml->mm == mm) {
+		curr = tu->sml;
+		tu->sml = tu->sml->next;
+		kfree(curr);
+		return;
+	}
+
+	prev = tu->sml;
+	curr = tu->sml->next;
+	while (curr) {
+		if (curr->mm == mm) {
+			prev->next = curr->next;
+			kfree(curr);
+			return;
+		}
+		prev = curr;
+		curr = curr->next;
+	}
+}
+
+static void sdt_flush_mm_list(struct trace_uprobe *tu)
+{
+	struct sdt_mm_list *next, *curr = tu->sml;
+
+	while (curr) {
+		next = curr->next;
+		kfree(curr);
+		curr = next;
+	}
+	tu->sml = NULL;
+}
+
 static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma)
 {
 	unsigned long vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
@@ -989,17 +1065,25 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	down_write(&tu->sml_rw_sem);
 	while (info) {
+		if (sdt_check_mm_list(tu, info->mm))
+			goto cont;
+
 		down_write(&info->mm->mmap_sem);
 
 		vma = sdt_find_vma(info->mm, tu);
 		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
-		sdt_update_ref_ctr(info->mm, vaddr, 1);
+		if (!sdt_update_ref_ctr(info->mm, vaddr, 1))
+			sdt_add_mm_list(tu, info->mm);
 
 		up_write(&info->mm->mmap_sem);
+
+cont:
 		mmput(info->mm);
 		info = uprobe_free_map_info(info);
 	}
+	up_write(&tu->sml_rw_sem);
 
 out:
 	uprobe_end_dup_mmap();
@@ -1020,8 +1104,16 @@ void trace_uprobe_mmap_callback(struct vm_area_struct *vma)
 		    !trace_probe_is_enabled(&tu->tp))
 			continue;
 
+		down_write(&tu->sml_rw_sem);
+		if (sdt_check_mm_list(tu, vma->vm_mm))
+			goto cont;
+
 		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
-		sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+		if (!sdt_update_ref_ctr(vma->vm_mm, vaddr, 1))
+			sdt_add_mm_list(tu, vma->vm_mm);
+
+cont:
+		up_write(&tu->sml_rw_sem);
 	}
 	mutex_unlock(&uprobe_lock);
 }
@@ -1038,7 +1130,11 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	down_write(&tu->sml_rw_sem);
 	while (info) {
+		if (!sdt_check_mm_list(tu, info->mm))
+			goto cont;
+
 		down_write(&info->mm->mmap_sem);
 
 		vma = sdt_find_vma(info->mm, tu);
@@ -1046,9 +1142,14 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 		sdt_update_ref_ctr(info->mm, vaddr, -1);
 
 		up_write(&info->mm->mmap_sem);
+		sdt_del_mm_list(tu, info->mm);
+
+cont:
 		mmput(info->mm);
 		info = uprobe_free_map_info(info);
 	}
+	sdt_flush_mm_list(tu);
+	up_write(&tu->sml_rw_sem);
 
 out:
 	uprobe_end_dup_mmap();
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox