Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers
From: Andy Shevchenko @ 2018-04-24 15:56 UTC (permalink / raw)
  To: Jae Hyun Yoo, Alan Cox, Andrew Jeffery, Andrew Lunn,
	Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
	Guenter Roeck, Haiyue Wang, James Feist, Jason M Biils,
	Jean Delvare, Joel Stanley, Julia Cartwright, Miguel Ojeda,
	Milton Miller II, Pavel Machek, Randy Dunlap, Stef van Os,
	Sumeet R Pawnikar, Vernon Mauery
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, openbmc
In-Reply-To: <20180410183212.16787-10-jae.hyun.yoo@linux.intel.com>

On Tue, 2018-04-10 at 11:32 -0700, Jae Hyun Yoo wrote:

>  drivers/hwmon/peci-cputemp.c  | 783
> ++++++++++++++++++++++++++++++++++++++++++
>  drivers/hwmon/peci-dimmtemp.c | 432 +++++++++++++++++++++++

Does it make sense one driver per patch?

> +#define CLIENT_CPU_ID_MASK    0xf0ff0  /* Mask for Family / Model
> info */

> +struct cpu_gen_info {
> +	u32 type;
> +	u32 cpu_id;
> +	u32 core_max;
> +};
> 

> +static const struct cpu_gen_info cpu_gen_info_table[] = {
> +	{ .type = CPU_GEN_HSX,
> +	  .cpu_id = 0x306f0, /* Family code: 6, Model number: 63
> (0x3f) */
> +	  .core_max = CORE_MAX_ON_HSX },
> +	{ .type = CPU_GEN_BRX,
> +	  .cpu_id = 0x406f0, /* Family code: 6, Model number: 79
> (0x4f) */
> +	  .core_max = CORE_MAX_ON_BDX },
> +	{ .type = CPU_GEN_SKX,
> +	  .cpu_id = 0x50650, /* Family code: 6, Model number: 85
> (0x55) */
> +	  .core_max = CORE_MAX_ON_SKX },
> +};

Are we talking about x86 CPU IDs here?
If so, why x86 corresponding headers, including intel-family.h are not
used?

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v1 1/1] misc: IBM Virtual Management Channel Driver
From: Randy Dunlap @ 2018-04-24 15:21 UTC (permalink / raw)
  To: Greg KH
  Cc: Bryant G. Ly, benh, mpe, arnd, corbet, seroyer, mrochs, adreznec,
	fbarrat, davem, linus.walleij, akpm, mikey, pombredanne, tlfalcon,
	msuchanek, linux-doc, linuxppc-dev
In-Reply-To: <20180424142918.GA2459@kroah.com>

On 04/24/2018 07:29 AM, Greg KH wrote:
> On Mon, Apr 23, 2018 at 02:17:28PM -0700, Randy Dunlap wrote:
>> On 04/23/18 12:53, Greg KH wrote:
>>> On Mon, Apr 23, 2018 at 11:38:18AM -0700, Randy Dunlap wrote:
>>>> On 04/23/18 07:46, Bryant G. Ly wrote:
>>>>> This driver is a logical device which provides an
>>>>> interface between the hypervisor and a management
>>>>> partition.
>>>>>
>>>>> This driver is to be used for the POWER Virtual
>>>>> Management Channel Virtual Adapter on the PowerVM
>>>>> platform. It provides both request/response and
>>>>> async message support through the /dev/ibmvmc node.
>>>>>
>>>>> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
>>>>> Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com>
>>>>> Reviewed-by: Adam Reznechek <adreznec@linux.vnet.ibm.com>
>>>>> Tested-by: Taylor Jakobson <tjakobs@us.ibm.com>
>>>>> Tested-by: Brad Warrum <bwarrum@us.ibm.com>
>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>>>> ---
>>>>>  Documentation/ioctl/ioctl-number.txt  |    1 +
>>>>>  Documentation/misc-devices/ibmvmc.txt |  161 +++
>>>>>  MAINTAINERS                           |    6 +
>>>>>  arch/powerpc/include/asm/hvcall.h     |    1 +
>>>>>  drivers/misc/Kconfig                  |   14 +
>>>>>  drivers/misc/Makefile                 |    1 +
>>>>>  drivers/misc/ibmvmc.c                 | 2415 +++++++++++++++++++++++++++++++++
>>>>>  drivers/misc/ibmvmc.h                 |  209 +++
>>>>>  8 files changed, 2808 insertions(+)
>>>>>  create mode 100644 Documentation/misc-devices/ibmvmc.txt
>>>>>  create mode 100644 drivers/misc/ibmvmc.c
>>>>>  create mode 100644 drivers/misc/ibmvmc.h
>>>>
>>>>> diff --git a/Documentation/misc-devices/ibmvmc.txt b/Documentation/misc-devices/ibmvmc.txt
>>>>> new file mode 100644
>>>>> index 0000000..bae1064
>>>>> --- /dev/null
>>>>> +++ b/Documentation/misc-devices/ibmvmc.txt
>>>
>>> Aren't we doing new documentation in .rst format instead of .txt?
>>
>> I am not aware that .rst format is a *requirement* for new documentation.
>>
>> ??
> 
> Why wouldn't you create new documentation in that format?  Just saves
> the step of having to change it again in the future.

Not everything needs to be in that format.
If a document contains graphics or lots of tables, then sure, use RST.

Do you do kernel Documentation builds yourself?  Do you know how slow they are?

-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v1 1/1] misc: IBM Virtual Management Channel Driver
From: Greg KH @ 2018-04-24 14:29 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Bryant G. Ly, benh, mpe, arnd, corbet, seroyer, mrochs, adreznec,
	fbarrat, davem, linus.walleij, akpm, mikey, pombredanne, tlfalcon,
	msuchanek, linux-doc, linuxppc-dev
In-Reply-To: <4142fd3b-05eb-d934-bcd3-f8f564e24fa7@infradead.org>

On Mon, Apr 23, 2018 at 02:17:28PM -0700, Randy Dunlap wrote:
> On 04/23/18 12:53, Greg KH wrote:
> > On Mon, Apr 23, 2018 at 11:38:18AM -0700, Randy Dunlap wrote:
> >> On 04/23/18 07:46, Bryant G. Ly wrote:
> >>> This driver is a logical device which provides an
> >>> interface between the hypervisor and a management
> >>> partition.
> >>>
> >>> This driver is to be used for the POWER Virtual
> >>> Management Channel Virtual Adapter on the PowerVM
> >>> platform. It provides both request/response and
> >>> async message support through the /dev/ibmvmc node.
> >>>
> >>> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
> >>> Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com>
> >>> Reviewed-by: Adam Reznechek <adreznec@linux.vnet.ibm.com>
> >>> Tested-by: Taylor Jakobson <tjakobs@us.ibm.com>
> >>> Tested-by: Brad Warrum <bwarrum@us.ibm.com>
> >>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>> Cc: Arnd Bergmann <arnd@arndb.de>
> >>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >>> Cc: Michael Ellerman <mpe@ellerman.id.au>
> >>> ---
> >>>  Documentation/ioctl/ioctl-number.txt  |    1 +
> >>>  Documentation/misc-devices/ibmvmc.txt |  161 +++
> >>>  MAINTAINERS                           |    6 +
> >>>  arch/powerpc/include/asm/hvcall.h     |    1 +
> >>>  drivers/misc/Kconfig                  |   14 +
> >>>  drivers/misc/Makefile                 |    1 +
> >>>  drivers/misc/ibmvmc.c                 | 2415 +++++++++++++++++++++++++++++++++
> >>>  drivers/misc/ibmvmc.h                 |  209 +++
> >>>  8 files changed, 2808 insertions(+)
> >>>  create mode 100644 Documentation/misc-devices/ibmvmc.txt
> >>>  create mode 100644 drivers/misc/ibmvmc.c
> >>>  create mode 100644 drivers/misc/ibmvmc.h
> >>
> >>> diff --git a/Documentation/misc-devices/ibmvmc.txt b/Documentation/misc-devices/ibmvmc.txt
> >>> new file mode 100644
> >>> index 0000000..bae1064
> >>> --- /dev/null
> >>> +++ b/Documentation/misc-devices/ibmvmc.txt
> > 
> > Aren't we doing new documentation in .rst format instead of .txt?
> 
> I am not aware that .rst format is a *requirement* for new documentation.
> 
> ??

Why wouldn't you create new documentation in that format?  Just saves
the step of having to change it again in the future.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC 01/10] PCI: dwc: Add MSI-X callbacks handler
From: Alan Douglas @ 2018-04-24 14:05 UTC (permalink / raw)
  To: Gustavo Pimentel, Kishon Vijay Abraham I, bhelgaas@google.com,
	lorenzo.pieralisi@arm.com, Joao.Pinto@synopsys.com,
	jingoohan1@gmail.com, niklas.cassel@axis.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <16d237b2-0052-447e-13cb-bc2f15848be4@synopsys.com>

Hi Kishon,

On 24 April 2018 10:36 Gustavo Pimentel wrote:
> Hi Kishon,
> 
> On 24/04/2018 08:07, Kishon Vijay Abraham I wrote:
> > Hi,
> >
> > On Monday 23 April 2018 03:06 PM, Gustavo Pimentel wrote:
> >> Hi Kishon,
> >>
> >> On 16/04/2018 10:29, Kishon Vijay Abraham I wrote:
> >>> Hi Gustavo,
> >>>
> >>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
> >>>> Changes the pcie_raise_irq function signature, namely the
> >>>> interrupt_num variable type from u8 to u16 to accommodate the MSI-X
> >>>> maximum interrupts of 2048.
> >>>>
> >>>> Implements a PCIe config space capability iterator function to
> >>>> search and save the MSI and MSI-X pointers. With this method the
> >>>> code becomes more generic and flexible.
> >>>>
> >>>> Implements MSI-X set/get functions for sysfs interface in order to
> >>>> change the EP entries number.
> >>>>
> >>>> Implements EP MSI-X interface for triggering interruptions.
> >>>>
> >>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> >>>> ---
> >>>>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
> >>>>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
> >>>>  drivers/pci/dwc/pcie-designware-ep.c   | 145
> ++++++++++++++++++++++++++++++++-
> >>>>  drivers/pci/dwc/pcie-designware-plat.c |   6 +-
> >>>>  drivers/pci/dwc/pcie-designware.h      |  23 +++++-
> >>>>  5 files changed, 173 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c
> >>>> b/drivers/pci/dwc/pci-dra7xx.c index ed8558d..5265725 100644
> >>>> --- a/drivers/pci/dwc/pci-dra7xx.c
> >>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
> >>>> @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct
> >>>> dra7xx_pcie *dra7xx,  }
> >>>>
> >>>>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> >>>> -				 enum pci_epc_irq_type type, u8
> interrupt_num)
> >>>> +				 enum pci_epc_irq_type type, u16
> interrupt_num)
> >>>>  {
> >>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >>>>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci); diff --git
> >>>> a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
> >>>> index e66cede..96dc259 100644
> >>>> --- a/drivers/pci/dwc/pcie-artpec6.c
> >>>> +++ b/drivers/pci/dwc/pcie-artpec6.c
> >>>> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct
> >>>> dw_pcie_ep *ep)  }
> >>>>
> >>>>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> >>>> -				  enum pci_epc_irq_type type, u8
> interrupt_num)
> >>>> +				  enum pci_epc_irq_type type, u16
> interrupt_num)
> >>>>  {
> >>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >>>>
> >>>> diff --git a/drivers/pci/dwc/pcie-designware-ep.c
> >>>> b/drivers/pci/dwc/pcie-designware-ep.c
> >>>> index 15b22a6..874d4c2 100644
> >>>> --- a/drivers/pci/dwc/pcie-designware-ep.c
> >>>> +++ b/drivers/pci/dwc/pcie-designware-ep.c
> >>>> @@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie
> *pci, enum pci_barno bar)
> >>>>  	__dw_pcie_ep_reset_bar(pci, bar, 0);  }
> >>>>
> >>>> +void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep) {
> >>>
> >>> This should be implemented in a generic way similar to
> pci_find_capability().
> >>> It'll be useful when we try to implement other capabilities as well.
> >>
> >> Hum, what you suggest? Something implemented on the pci-epf-core?
> >
> > yeah, Initially thought it could be implemented as a helper function
> > in pci-epc-core so that both designware and cadence can use it.
> 
> That would be nice, however I couldn't find out how to access the config
> space, through the pci_epf or pci_epc structs.
> 
> So, I reworked the functions like this:
> 
> (on pcie-designware-ep.c)
> 
> u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr,
>                               u8 cap)
> {
>         u8 cap_id, next_cap_ptr;
>         u16 reg;
> 
>         reg = dw_pcie_readw_dbi(pci, cap_ptr);
>         next_cap_ptr = (reg & 0xff00) >> 8;
>         cap_id = (reg & 0x00ff);
> 
>         if (!next_cap_ptr || cap_id > PCI_CAP_ID_MAX)
>                 return 0;
> 
>         if (cap_id == cap)
>                 return cap_ptr;
> 
>         return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap); }
> 
> u8 dw_pcie_ep_find_capability(struct dw_pcie *pci, u8 cap) {
>         u8 next_cap_ptr;
>         u16 reg;
> 
>         reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
>         next_cap_ptr = (reg & 0x00ff);
> 
>         if (!next_cap_ptr)
>                 return 0;
> 
>         return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap); }
> 
> int dw_pcie_ep_init(struct dw_pcie_ep *ep) { [...]
>         ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
>         ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX); [...]
> }
> 
> >
> > But do we really have to find the address like this? since all
> > designware IP's will have a particular capability at a fixed address
> > offset, why not follow use existing mechanism in dw_pcie_ep_get_msi?
> 
> The capabilities are not fixed to a specific address offset by default they
> assume those values, but they can be easily change at design stage.
> 
> >
> > Or is it possible for a particular capability to have address offsets
> > for different vendors? How is it for cadence?
> 
> Yes, it's possible to have different address offset for different vendors.
> 
For cadence the offsets are fixed for each specific hw (if a capability is 
disabled it will just be removed from the linked list) but it would be 
better to allow for the possibility for them to vary between different hw.

> >
> > Thanks
> > Kishon
> >
> 
> Thanks,
> Gustavo
Regards,
Alan

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] thermal: core: fix emulation of subzero temps
From: Zhang Rui @ 2018-04-24 13:01 UTC (permalink / raw)
  To: Kevin DuBois, linux-pm, linux-doc, corbet, edubezval
In-Reply-To: <20180423175340.32598-1-kevindubois@google.com>

On 一, 2018-04-23 at 10:53 -0700, Kevin DuBois wrote:
> The current implementation casted away its sign.
> It was also using '0' to disable emulation, but 0C is a valid
> thermal reading. A large negative value (below absolute zero) now
> disables the emulation.

Makes sense to me.
> 
> Test: Build kernel with CONFIG_THERMAL_EMULATION=y, then write
> negative
> values to the emul_temp sysfs entry. Check the temp entry, and see
> the
> negative value. Write -INT_MAX to emul_temp, and see the proper
> sensor
> reading appear.
> Bug: 77599739

can you paste the bug link here?

> Signed-off-by: Kevin DuBois <kevindubois@google.com>
> 
> Change-Id: Iee1a4f0aacf7b4299b67c7f8f4effedf2d7a8425
> ---
>  Documentation/thermal/sysfs-api.txt |  2 +-
>  drivers/thermal/thermal_core.c      | 13 ++++++++-----
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/thermal/sysfs-api.txt
> b/Documentation/thermal/sysfs-api.txt
> index 10f062ea6bc2..e240d9e165a0 100644
> --- a/Documentation/thermal/sysfs-api.txt
> +++ b/Documentation/thermal/sysfs-api.txt
> @@ -312,7 +312,7 @@ emul_temp
>  	this temperature to platform emulation function if
> registered or
>  	cache it locally. This is useful in debugging different
> temperature
>  	threshold and its associated cooling action. This is write
> only node
> -	and writing 0 on this node should disable emulation.
> +	and writing -INT_MAX on this node should disable emulation.

This is an ABI change. I'm not sure if there is any userspace tool
depends on this.
could you please resend the bug with a real bug link, and I will keep
it in my tree for a longer period to see if there is any objections.

thanks,
rui
>  	Unit: millidegree Celsius
>  	WO, Optional
>  
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 768783ab3dac..650a6b124fc7 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -636,8 +636,8 @@ int get_tz_trend(struct thermal_zone_device *tz,
> int trip)
>  {
>  	enum thermal_trend trend;
>  
> -	if (tz->emul_temperature || !tz->ops->get_trend ||
> -	    tz->ops->get_trend(tz, trip, &trend)) {
> +	if ((tz->emul_temperature > THERMAL_TEMP_INVALID) ||
> +		!tz->ops->get_trend || tz->ops->get_trend(tz, trip,
> &trend)) {
>  		if (tz->temperature > tz->last_temperature)
>  			trend = THERMAL_TREND_RAISING;
>  		else if (tz->temperature < tz->last_temperature)
> @@ -904,7 +904,8 @@ int thermal_zone_get_temp(struct
> thermal_zone_device *tz, int *temp)
>  
>  	ret = tz->ops->get_temp(tz, temp);
>  
> -	if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz-
> >emul_temperature) {
> +	if (IS_ENABLED(CONFIG_THERMAL_EMULATION) &&
> +		(tz->emul_temperature > THERMAL_TEMP_INVALID)) {
>  		for (count = 0; count < tz->trips; count++) {
>  			ret = tz->ops->get_trip_type(tz, count,
> &type);
>  			if (!ret && type == THERMAL_TRIP_CRITICAL) {
> @@ -961,6 +962,7 @@ static void thermal_zone_device_reset(struct
> thermal_zone_device *tz)
>  	struct thermal_instance *pos;
>  
>  	tz->temperature = THERMAL_TEMP_INVALID;
> +	tz->emul_temperature = THERMAL_TEMP_INVALID;
>  	tz->passive = 0;
>  	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
>  		pos->initialized = false;
> @@ -1351,9 +1353,9 @@ emul_temp_store(struct device *dev, struct
> device_attribute *attr,
>  {
>  	struct thermal_zone_device *tz = to_thermal_zone(dev);
>  	int ret = 0;
> -	unsigned long temperature;
> +	int temperature;
>  
> -	if (kstrtoul(buf, 10, &temperature))
> +	if (kstrtoint(buf, 10, &temperature))
>  		return -EINVAL;
>  
>  	if (!tz->ops->set_emul_temp) {
> @@ -2296,6 +2298,7 @@ struct thermal_zone_device
> *thermal_zone_device_register(const char *type,
>  	tz->trips = trips;
>  	tz->passive_delay = passive_delay;
>  	tz->polling_delay = polling_delay;
> +	tz->emul_temperature = THERMAL_TEMP_INVALID;
>  	/* A new thermal zone needs to be updated anyway. */
>  	atomic_set(&tz->need_update, 1);
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 01/10] PCI: dwc: Add MSI-X callbacks handler
From: Gustavo Pimentel @ 2018-04-24 11:43 UTC (permalink / raw)
  To: Alan Douglas, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR07MB4512414ADA94E98C0E4EA61DD8880@SN6PR07MB4512.namprd07.prod.outlook.com>

On 24/04/2018 10:15, Alan Douglas wrote:
> Hi,
> 
> On 10 April 2018 18:15 Gustavo Pimentel wrote:
>> Changes the pcie_raise_irq function signature, namely the interrupt_num
>> variable type from u8 to u16 to accommodate the MSI-X maximum interrupts
>> of 2048.
>>
>> Implements a PCIe config space capability iterator function to search and save
>> the MSI and MSI-X pointers. With this method the code becomes more
>> generic and flexible.
>>
>> Implements MSI-X set/get functions for sysfs interface in order to change the
>> EP entries number.
>>
>> Implements EP MSI-X interface for triggering interruptions.
>>
>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>> ---
>>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
>>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
>>  drivers/pci/dwc/pcie-designware-ep.c   | 145
>> ++++++++++++++++++++++++++++++++-
>>  drivers/pci/dwc/pcie-designware-plat.c |   6 +-
>>  drivers/pci/dwc/pcie-designware.h      |  23 +++++-
>>  5 files changed, 173 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c index
>> ed8558d..5265725 100644
>> --- a/drivers/pci/dwc/pci-dra7xx.c
>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>> @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct
>> dra7xx_pcie *dra7xx,  }
>>
>>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>> -				 enum pci_epc_irq_type type, u8
>> interrupt_num)
>> +				 enum pci_epc_irq_type type, u16
>> interrupt_num)
>>  {
>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci); diff --git
>> a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c index
>> e66cede..96dc259 100644
>> --- a/drivers/pci/dwc/pcie-artpec6.c
>> +++ b/drivers/pci/dwc/pcie-artpec6.c
>> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep
>> *ep)  }
>>
>>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>> -				  enum pci_epc_irq_type type, u8
>> interrupt_num)
>> +				  enum pci_epc_irq_type type, u16
>> interrupt_num)
>>  {
>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>
>> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-
>> designware-ep.c
>> index 15b22a6..874d4c2 100644
>> --- a/drivers/pci/dwc/pcie-designware-ep.c
>> +++ b/drivers/pci/dwc/pcie-designware-ep.c
>> @@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci,
>> enum pci_barno bar)
>>  	__dw_pcie_ep_reset_bar(pci, bar, 0);
>>  }
>>
>> +void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep) {
>> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>> +	u8 next_ptr, curr_ptr, cap_id;
>> +	u16 reg;
>> +
>> +	memset(&ep->cap_addr, 0, sizeof(ep->cap_addr));
>> +
>> +	reg = dw_pcie_readw_dbi(pci, PCI_STATUS);
>> +	if (!(reg & PCI_STATUS_CAP_LIST))
>> +		return;
>> +
>> +	reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
>> +	next_ptr = (reg & 0x00ff);
>> +	if (!next_ptr)
>> +		return;
>> +
>> +	reg = dw_pcie_readw_dbi(pci, next_ptr);
>> +	curr_ptr = next_ptr;
>> +	next_ptr = (reg & 0xff00) >> 8;
>> +	cap_id = (reg & 0x00ff);
>> +
>> +	while (next_ptr && (cap_id <= PCI_CAP_ID_MAX)) {
>> +		switch (cap_id) {
>> +		case PCI_CAP_ID_MSI:
>> +			ep->cap_addr.msi_addr = curr_ptr;
>> +			break;
>> +		case PCI_CAP_ID_MSIX:
>> +			ep->cap_addr.msix_addr = curr_ptr;
>> +			break;
>> +		}
>> +		reg = dw_pcie_readw_dbi(pci, next_ptr);
>> +		curr_ptr = next_ptr;
>> +		next_ptr = (reg & 0xff00) >> 8;
>> +		cap_id = (reg & 0x00ff);
>> +	}
>> +}
>> +
>>  static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
>>  				   struct pci_epf_header *hdr)
>>  {
>> @@ -241,8 +279,47 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc,
>> u8 func_no, u8 encode_int)
>>  	return 0;
>>  }
>>
>> +static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no) {
>> +	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>> +	u32 val, reg;
>> +
>> +	if (ep->cap_addr.msix_addr == 0)
>> +		return 0;
>> +
>> +	reg = ep->cap_addr.msix_addr + PCI_MSIX_FLAGS;
>> +	val = dw_pcie_readw_dbi(pci, reg);
>> +	if (!(val & PCI_MSIX_FLAGS_ENABLE))
>> +		return -EINVAL;
>> +
>> +	val &= PCI_MSIX_FLAGS_QSIZE;
>> +
>> +	return val;
>> +}
>> +
>> +static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16
>> +interrupts) {
>> +	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>> +	u32 val, reg;
>> +
>> +	if (ep->cap_addr.msix_addr == 0)
>> +		return 0;
>> +
>> +	reg = ep->cap_addr.msix_addr + PCI_MSIX_FLAGS;
>> +	val = dw_pcie_readw_dbi(pci, reg);
>> +	val &= ~PCI_MSIX_FLAGS_QSIZE;
>> +	val |= interrupts;
>> +	dw_pcie_dbi_ro_wr_en(pci);
>> +	dw_pcie_writew_dbi(pci, reg, val);
>> +	dw_pcie_dbi_ro_wr_dis(pci);
>> +
>> +	return 0;
>> +}
>> +
>>  static int dw_pcie_ep_raise_irq(struct pci_epc *epc, u8 func_no,
>> -				enum pci_epc_irq_type type, u8
>> interrupt_num)
>> +				enum pci_epc_irq_type type, u16
>> interrupt_num)
>>  {
>>  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>>
>> @@ -282,6 +359,8 @@ static const struct pci_epc_ops epc_ops = {
>>  	.unmap_addr		= dw_pcie_ep_unmap_addr,
>>  	.set_msi		= dw_pcie_ep_set_msi,
>>  	.get_msi		= dw_pcie_ep_get_msi,
>> +	.set_msix		= dw_pcie_ep_set_msix,
>> +	.get_msix		= dw_pcie_ep_get_msix,
>>  	.raise_irq		= dw_pcie_ep_raise_irq,
>>  	.start			= dw_pcie_ep_start,
>>  	.stop			= dw_pcie_ep_stop,
>> @@ -322,6 +401,60 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep
>> *ep, u8 func_no,
>>  	return 0;
>>  }
>>
>> +int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
>> +			     u16 interrupt_num)
>> +{
>> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>> +	struct pci_epc *epc = ep->epc;
>> +	u16 tbl_offset, bir;
>> +	u32 bar_addr_upper, bar_addr_lower;
>> +	u32 msg_addr_upper, msg_addr_lower;
>> +	u32 reg, msg_data;
>> +	u64 tbl_addr, msg_addr, reg_u64;
>> +	void __iomem *msix_tbl;
>> +	int ret;
>> +
>> +	reg = ep->cap_addr.msix_addr + PCI_MSIX_TABLE;
>> +	tbl_offset = dw_pcie_readl_dbi(pci, reg);
>> +	bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
>> +	tbl_offset &= PCI_MSIX_TABLE_OFFSET;
>> +	tbl_offset >>= 3;
>> +
>> +	reg = PCI_BASE_ADDRESS_0 + (4 * bir);
>> +	bar_addr_lower = dw_pcie_readl_dbi(pci, reg);
>> +	reg_u64 = (bar_addr_lower & PCI_BASE_ADDRESS_MEM_TYPE_MASK);
>> +	if (reg_u64 == PCI_BASE_ADDRESS_MEM_TYPE_64)
>> +		bar_addr_upper = dw_pcie_readl_dbi(pci, reg + 4);
>> +	else
>> +		bar_addr_upper = 0;
>> +
>> +	tbl_addr = ((u64) bar_addr_upper) << 32 | bar_addr_lower;
>> +	tbl_addr += (tbl_offset + ((interrupt_num - 1) * PCI_MSIX_ENTRY_SIZE));
>> +	tbl_addr &= PCI_BASE_ADDRESS_MEM_MASK;
>> +
>> +	msix_tbl = ioremap_nocache(ep->phys_base + tbl_addr, ep->addr_size);
>> +	if (!msix_tbl)
>> +		return -EINVAL;
>> +
> I think you need to check the mask bit in  vector control for the requested IRQ.
> You could set the pending bit if masked, but would then need some output 
> signal to inform when the mask bit is cleared (or poll it) so the message can be sent
> later.

I have to analyze this careful and with detail to provide you a valid and good
feedback as soon as possible.

> 
> Also, do you need to check PCI_MSIX_FLAGS_ENABLE here as well, or is it checked earlier?

I applied the same logic for MSI-X already implemented before for the MSI.

Before calling the pci_epc_raise_irq(), the number of interrupts is checked by
calling the pci_epc_get_msi() first, which leads to one of
dw_pcie_ep_get_msi()/cdns_pcie_ep_get_msi().

From what I could see the PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE flags are
check on the dw_pcie_ep_get_msi()/dw_pcie_ep_get_msix() funcitons, the same
logic is applied on cdns_pcie_ep_get_msi().

> 
> Regards,
> Alan
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 06/10] misc: pci_endpoint_test: Add MSI-X support
From: Kishon Vijay Abraham I @ 2018-04-24 11:43 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com,
	adouglas@cadence.com, niklas.cassel@axis.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <2bd77bf2-1d64-6043-422a-6a96c1a3d3a4@synopsys.com>

Hi,

On Tuesday 24 April 2018 04:27 PM, Gustavo Pimentel wrote:
> Hi Kishon,
> 
> On 24/04/2018 08:19, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On Tuesday 17 April 2018 11:08 PM, Gustavo Pimentel wrote:
>>> Hi Kishon,
>>>
>>> On 17/04/2018 11:33, Kishon Vijay Abraham I wrote:
>>>> Hi,
>>>>
>>>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
>>>>> Adds the MSI-X support and updates driver documentation accordingly.
>>>>>
>>>>> Changes the driver parameter in order to allow the interruption type
>>>>> selection.
>>>>>
>>>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>>>>> ---
>>>>>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>>>>>  drivers/misc/pci_endpoint_test.c                 | 102 +++++++++++++++++------
>>>>>  2 files changed, 79 insertions(+), 26 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/misc-devices/pci-endpoint-test.txt b/Documentation/misc-devices/pci-endpoint-test.txt
>>>>> index 4ebc359..fdfa0f6 100644
>>>>> --- a/Documentation/misc-devices/pci-endpoint-test.txt
>>>>> +++ b/Documentation/misc-devices/pci-endpoint-test.txt
>>>>> @@ -10,6 +10,7 @@ The PCI driver for the test device performs the following tests
>>>>>  	*) verifying addresses programmed in BAR
>>>>>  	*) raise legacy IRQ
>>>>>  	*) raise MSI IRQ
>>>>> +	*) raise MSI-X IRQ
>>>>>  	*) read data
>>>>>  	*) write data
>>>>>  	*) copy data
>>>>> @@ -25,6 +26,8 @@ ioctl
>>>>>   PCITEST_LEGACY_IRQ: Tests legacy IRQ
>>>>>   PCITEST_MSI: Tests message signalled interrupts. The MSI number
>>>>>  	      to be tested should be passed as argument.
>>>>> + PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number
>>>>> +	      to be tested should be passed as argument.
>>>>>   PCITEST_WRITE: Perform write tests. The size of the buffer should be passed
>>>>>  		as argument.
>>>>>   PCITEST_READ: Perform read tests. The size of the buffer should be passed
>>>>> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
>>>>> index 37db0fc..a7d9354 100644
>>>>> --- a/drivers/misc/pci_endpoint_test.c
>>>>> +++ b/drivers/misc/pci_endpoint_test.c
>>>>> @@ -42,11 +42,16 @@
>>>>>  #define PCI_ENDPOINT_TEST_COMMAND	0x4
>>>>>  #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
>>>>>  #define COMMAND_RAISE_MSI_IRQ		BIT(1)
>>>>> -#define MSI_NUMBER_SHIFT		2
>>>>> -/* 6 bits for MSI number */
>>>>> -#define COMMAND_READ                    BIT(8)
>>>>> -#define COMMAND_WRITE                   BIT(9)
>>>>> -#define COMMAND_COPY                    BIT(10)
>>>>> +#define COMMAND_RAISE_MSIX_IRQ		BIT(2)
>>>>> +#define IRQ_TYPE_SHIFT			3
>>>>> +#define IRQ_TYPE_LEGACY			0
>>>>> +#define IRQ_TYPE_MSI			1
>>>>> +#define IRQ_TYPE_MSIX			2
>>>>> +#define MSI_NUMBER_SHIFT		5
>>>>
>>>> Now that you are anyways fixing this, add a new register entry for MSI numbers.
>>>> Let's not keep COMMAND and MSI's together.
>>>
>>> What you suggest?
>>
>> #define PCI_ENDPOINT_TEST_COMMAND	0x4
>> #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
>> #define COMMAND_RAISE_MSI_IRQ		BIT(1)
>> #define COMMAND_RAISE_MSIX_IRQ		BIT(2)
>> #define COMMAND_READ                    BIT(3)
>> #define COMMAND_WRITE                   BIT(4)
>> #define COMMAND_COPY                    BIT(5)
>>
>> #define PCI_ENDPOINT_TEST_STATUS	0x8
>> #define STATUS_READ_SUCCESS             BIT(0)
>> #define STATUS_READ_FAIL                BIT(1)
>> #define STATUS_WRITE_SUCCESS            BIT(2)
>> #define STATUS_WRITE_FAIL               BIT(3)
>> #define STATUS_COPY_SUCCESS             BIT(4)
>> #define STATUS_COPY_FAIL                BIT(5)
>> #define STATUS_IRQ_RAISED               BIT(6)
>> #define STATUS_SRC_ADDR_INVALID         BIT(7)
>> #define STATUS_DST_ADDR_INVALID         BIT(8)
>>
>> #define PCI_ENDPOINT_TEST_LOWER_SRC_ADDR	0xc
>> #define PCI_ENDPOINT_TEST_UPPER_SRC_ADDR	0x10
>>
>> #define PCI_ENDPOINT_TEST_LOWER_DST_ADDR	0x14
>> #define PCI_ENDPOINT_TEST_UPPER_DST_ADDR	0x18
>>
>> #define PCI_ENDPOINT_TEST_SIZE		0x1c
>> #define PCI_ENDPOINT_TEST_CHECKSUM	0x20
>>
>> #define PCI_ENDPOINT_TEST_MSI_NUMBER	0x24
> 
> Ok. I will do it.
> 
>>
>> We should try not to modify either the existing register offsets or the bit
>> fields within these registers in the future as EP and RC will be running on
>> different systems and it is possible one of them might not have the updated
>> kernel.
> 
> I totally agree.
> 
>>>
>>>>> +/* 12 bits for MSI number */
>>>>> +#define COMMAND_READ                    BIT(17)
>>>>> +#define COMMAND_WRITE                   BIT(18)
>>>>> +#define COMMAND_COPY                    BIT(19)
>>>>
>>>> This change should be done along with the pci-epf-test in a single patch.
>>>
>>> To be clear, you're saying is this patch should be just be squashed into the
>>> patch number 8 [1], because there is a lot of dependencies namely the defines,
>>> that is used on the alter functions.
>>>
>>> [1] -> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_896841_&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=8urVwHCybXa1XMxsEbwHZAzzaEI_HJGXqmWgXpXb9TY&s=MRVr2YPY2Bk_WNFOxBfU4FGrFReTKdPhfzNDLiVxDbs&e=
>>
>> yeah. We have to make sure git bisect doesn't break functionality.
> 
> Ok, it'll be squashed.
> 
>>>
>>>>>  
>>>>>  #define PCI_ENDPOINT_TEST_STATUS	0x8
>>>>>  #define STATUS_READ_SUCCESS             BIT(0)
>>>>> @@ -73,9 +78,9 @@ static DEFINE_IDA(pci_endpoint_test_ida);
>>>>>  #define to_endpoint_test(priv) container_of((priv), struct pci_endpoint_test, \
>>>>>  					    miscdev)
>>>>>  
>>>>> -static bool no_msi;
>>>>> -module_param(no_msi, bool, 0444);
>>>>> -MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in pci_endpoint_test");
>>>>
>>>> Let's not remove this just to make sure existing users doesn't get affected.
>>>
>>> Hum, by making an internal conversion? Like this
>>> no_msi = false <=> irq_type = 1
>>> no_msi = true <=> irq_type = 0
>>
> Disregard previous comment, it doesn't make sense. I don't know where my head was.
> 
> It will be like this on probe:
> 
> if (no_msi)
> 	irq_type = IRQ_TYPE_LEGACY;
> 
> However since we are breaking the compatibility on terms of MSI/MSI-X
> bits/registers (discussion on the top), it makes sense to keep the compatibility
> on this parameter?

This is userspace compatibility, so lets not break it.
Btw can we have a sysfs entry per device for defining irq_type. Having a sysfs
entry might be helpful instead of insmod/rmmod with different irq_type values?
It will also help if a system has enumerated multiple PCI_ENDPOINT_TEST EP devices.

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 01/10] PCI: dwc: Add MSI-X callbacks handler
From: Kishon Vijay Abraham I @ 2018-04-24 11:24 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com,
	adouglas@cadence.com, niklas.cassel@axis.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <16d237b2-0052-447e-13cb-bc2f15848be4@synopsys.com>

Hi,

On Tuesday 24 April 2018 03:06 PM, Gustavo Pimentel wrote:
> Hi Kishon,
> 
> On 24/04/2018 08:07, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On Monday 23 April 2018 03:06 PM, Gustavo Pimentel wrote:
>>> Hi Kishon,
>>>
>>> On 16/04/2018 10:29, Kishon Vijay Abraham I wrote:
>>>> Hi Gustavo,
>>>>
>>>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
>>>>> Changes the pcie_raise_irq function signature, namely the interrupt_num
>>>>> variable type from u8 to u16 to accommodate the MSI-X maximum interrupts
>>>>> of 2048.
>>>>>
>>>>> Implements a PCIe config space capability iterator function to search and
>>>>> save the MSI and MSI-X pointers. With this method the code becomes more
>>>>> generic and flexible.
>>>>>
>>>>> Implements MSI-X set/get functions for sysfs interface in order to change
>>>>> the EP entries number.
>>>>>
>>>>> Implements EP MSI-X interface for triggering interruptions.
>>>>>
>>>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>>>>> ---
>>>>>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
>>>>>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
>>>>>  drivers/pci/dwc/pcie-designware-ep.c   | 145 ++++++++++++++++++++++++++++++++-
>>>>>  drivers/pci/dwc/pcie-designware-plat.c |   6 +-
>>>>>  drivers/pci/dwc/pcie-designware.h      |  23 +++++-
>>>>>  5 files changed, 173 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>>>>> index ed8558d..5265725 100644
>>>>> --- a/drivers/pci/dwc/pci-dra7xx.c
>>>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>>>>> @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct dra7xx_pcie *dra7xx,
>>>>>  }
>>>>>  
>>>>>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>>>> -				 enum pci_epc_irq_type type, u8 interrupt_num)
>>>>> +				 enum pci_epc_irq_type type, u16 interrupt_num)
>>>>>  {
>>>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>>>>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
>>>>> diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
>>>>> index e66cede..96dc259 100644
>>>>> --- a/drivers/pci/dwc/pcie-artpec6.c
>>>>> +++ b/drivers/pci/dwc/pcie-artpec6.c
>>>>> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep *ep)
>>>>>  }
>>>>>  
>>>>>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>>>> -				  enum pci_epc_irq_type type, u8 interrupt_num)
>>>>> +				  enum pci_epc_irq_type type, u16 interrupt_num)
>>>>>  {
>>>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>>>>  
>>>>> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c
>>>>> index 15b22a6..874d4c2 100644
>>>>> --- a/drivers/pci/dwc/pcie-designware-ep.c
>>>>> +++ b/drivers/pci/dwc/pcie-designware-ep.c
>>>>> @@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
>>>>>  	__dw_pcie_ep_reset_bar(pci, bar, 0);
>>>>>  }
>>>>>  
>>>>> +void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep)
>>>>> +{
>>>>
>>>> This should be implemented in a generic way similar to pci_find_capability().
>>>> It'll be useful when we try to implement other capabilities as well.
>>>
>>> Hum, what you suggest? Something implemented on the pci-epf-core?
>>
>> yeah, Initially thought it could be implemented as a helper function in
>> pci-epc-core so that both designware and cadence can use it.
> 
> That would be nice, however I couldn't find out how to access the config space,
> through the pci_epf or pci_epc structs.

It's just a helper function so it can directly take the base address of the
configuration space as argument (in our case, it should be dbi_base).

Thanks
Kishon

> 
> So, I reworked the functions like this:
> 
> (on pcie-designware-ep.c)
> 
> u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr,
>                               u8 cap)
> {
>         u8 cap_id, next_cap_ptr;
>         u16 reg;
> 
>         reg = dw_pcie_readw_dbi(pci, cap_ptr);
>         next_cap_ptr = (reg & 0xff00) >> 8;
>         cap_id = (reg & 0x00ff);
> 
>         if (!next_cap_ptr || cap_id > PCI_CAP_ID_MAX)
>                 return 0;
> 
>         if (cap_id == cap)
>                 return cap_ptr;
> 
>         return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap);
> }
> 
> u8 dw_pcie_ep_find_capability(struct dw_pcie *pci, u8 cap)
> {
>         u8 next_cap_ptr;
>         u16 reg;
> 
>         reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
>         next_cap_ptr = (reg & 0x00ff);
> 
>         if (!next_cap_ptr)
>                 return 0;
> 
>         return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap);
> }
> 
> int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> {
> [...]
>         ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
>         ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX);
> [...]
> }
> 
>>
>> But do we really have to find the address like this? since all designware IP's
>> will have a particular capability at a fixed address offset, why not follow use
>> existing mechanism in dw_pcie_ep_get_msi?
> 
> The capabilities are not fixed to a specific address offset by default they
> assume those values, but they can be easily change at design stage.
> 
>>
>> Or is it possible for a particular capability to have address offsets for
>> different vendors? How is it for cadence?
> 
> Yes, it's possible to have different address offset for different vendors.
> 
>>
>> Thanks
>> Kishon
>>
> 
> Thanks,
> Gustavo
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 06/10] misc: pci_endpoint_test: Add MSI-X support
From: Gustavo Pimentel @ 2018-04-24 11:11 UTC (permalink / raw)
  To: Alan Douglas, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR07MB45121288F5ACECF3AB3A6D0AD8880@SN6PR07MB4512.namprd07.prod.outlook.com>

Hi Alan,

On 24/04/2018 07:59, Alan Douglas wrote:
> Hi Gustavo,
> 
> On 10 April 2018 18:15 Gustavo Pimentel wrote:
>>
>> Adds the MSI-X support and updates driver documentation accordingly.
>>
>> Changes the driver parameter in order to allow the interruption type
>> selection.
>>
>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>> ---
>>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>>  drivers/misc/pci_endpoint_test.c                 | 102 +++++++++++++++++------
>>  2 files changed, 79 insertions(+), 26 deletions(-)
>>
>> diff --git a/Documentation/misc-devices/pci-endpoint-test.txt
>> b/Documentation/misc-devices/pci-endpoint-test.txt
>> index 4ebc359..fdfa0f6 100644
>> --- a/Documentation/misc-devices/pci-endpoint-test.txt
>> +++ b/Documentation/misc-devices/pci-endpoint-test.txt
>> @@ -10,6 +10,7 @@ The PCI driver for the test device performs the
>> following tests
>>  	*) verifying addresses programmed in BAR
>>  	*) raise legacy IRQ
>>  	*) raise MSI IRQ
>> +	*) raise MSI-X IRQ
>>  	*) read data
>>  	*) write data
>>  	*) copy data
>> @@ -25,6 +26,8 @@ ioctl
>>   PCITEST_LEGACY_IRQ: Tests legacy IRQ
>>   PCITEST_MSI: Tests message signalled interrupts. The MSI number
>>  	      to be tested should be passed as argument.
>> + PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number
>> +	      to be tested should be passed as argument.
>>   PCITEST_WRITE: Perform write tests. The size of the buffer should be passed
>>  		as argument.
>>   PCITEST_READ: Perform read tests. The size of the buffer should be passed
>> diff --git a/drivers/misc/pci_endpoint_test.c
>> b/drivers/misc/pci_endpoint_test.c
>> index 37db0fc..a7d9354 100644
>> --- a/drivers/misc/pci_endpoint_test.c
>> +++ b/drivers/misc/pci_endpoint_test.c
>> @@ -42,11 +42,16 @@
>>  #define PCI_ENDPOINT_TEST_COMMAND	0x4
>>  #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
>>  #define COMMAND_RAISE_MSI_IRQ		BIT(1)
>> -#define MSI_NUMBER_SHIFT		2
>> -/* 6 bits for MSI number */
>> -#define COMMAND_READ                    BIT(8)
>> -#define COMMAND_WRITE                   BIT(9)
>> -#define COMMAND_COPY                    BIT(10)
>> +#define COMMAND_RAISE_MSIX_IRQ		BIT(2)
>> +#define IRQ_TYPE_SHIFT			3
>> +#define IRQ_TYPE_LEGACY			0
>> +#define IRQ_TYPE_MSI			1
>> +#define IRQ_TYPE_MSIX			2
>> +#define MSI_NUMBER_SHIFT		5
>> +/* 12 bits for MSI number */
>> +#define COMMAND_READ                    BIT(17)
>> +#define COMMAND_WRITE                   BIT(18)
>> +#define COMMAND_COPY                    BIT(19)
>>
>>  #define PCI_ENDPOINT_TEST_STATUS	0x8
>>  #define STATUS_READ_SUCCESS             BIT(0)
>> @@ -73,9 +78,9 @@ static DEFINE_IDA(pci_endpoint_test_ida);
>>  #define to_endpoint_test(priv) container_of((priv), struct pci_endpoint_test,
>> \
>>  					    miscdev)
>>
>> -static bool no_msi;
>> -module_param(no_msi, bool, 0444);
>> -MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in
>> pci_endpoint_test");
>> +static int irq_type = IRQ_TYPE_MSIX;
>> +module_param(irq_type, int, 0444);
>> +MODULE_PARM_DESC(irq_type, "IRQ mode selection in pci_endpoint_test
>> (0
>> +- Legacy, 1 - MSI, 2 - MSI-X)");
>>
>>  enum pci_barno {
>>  	BAR_0,
>> @@ -103,7 +108,7 @@ struct pci_endpoint_test {  struct
>> pci_endpoint_test_data {
>>  	enum pci_barno test_reg_bar;
>>  	size_t alignment;
>> -	bool no_msi;
>> +	int irq_type;
>>  };
>>
>>  static inline u32 pci_endpoint_test_readl(struct pci_endpoint_test *test,
>> @@ -177,10 +182,10 @@ static bool pci_endpoint_test_bar(struct
>> pci_endpoint_test *test,
>>
>>  static bool pci_endpoint_test_legacy_irq(struct pci_endpoint_test *test)  {
>> -	u32 val;
>> +	u32 val = COMMAND_RAISE_LEGACY_IRQ;
>>
>> -	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
>> -				 COMMAND_RAISE_LEGACY_IRQ);
>> +	val |= (IRQ_TYPE_LEGACY << IRQ_TYPE_SHIFT);
>> +	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
>>  	val = wait_for_completion_timeout(&test->irq_raised,
>>  					  msecs_to_jiffies(1000));
>>  	if (!val)
>> @@ -192,12 +197,12 @@ static bool pci_endpoint_test_legacy_irq(struct
>> pci_endpoint_test *test)  static bool pci_endpoint_test_msi_irq(struct
>> pci_endpoint_test *test,
>>  				      u8 msi_num)
>>  {
>> -	u32 val;
>> +	u32 val = COMMAND_RAISE_MSI_IRQ;
>>  	struct pci_dev *pdev = test->pdev;
>>
>> -	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
>> -				 msi_num << MSI_NUMBER_SHIFT |
>> -				 COMMAND_RAISE_MSI_IRQ);
>> +	val |= (msi_num << MSI_NUMBER_SHIFT);
>> +	val |= (IRQ_TYPE_MSI << IRQ_TYPE_SHIFT);
>> +	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
>>  	val = wait_for_completion_timeout(&test->irq_raised,
>>  					  msecs_to_jiffies(1000));
>>  	if (!val)
>> @@ -209,6 +214,26 @@ static bool pci_endpoint_test_msi_irq(struct
>> pci_endpoint_test *test,
>>  	return false;
>>  }
>>
>> +static bool pci_endpoint_test_msix_irq(struct pci_endpoint_test *test,
>> +				       u16 msix_num)
>> +{
>> +	u32 val = COMMAND_RAISE_MSIX_IRQ;
>> +	struct pci_dev *pdev = test->pdev;
>> +
>> +	val |= (msix_num << MSI_NUMBER_SHIFT);
>> +	val |= (IRQ_TYPE_MSIX << IRQ_TYPE_SHIFT);
>> +	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
>> +	val = wait_for_completion_timeout(&test->irq_raised,
>> +					  msecs_to_jiffies(1000));
>> +	if (!val)
>> +		return false;
>> +
>> +	if (test->last_irq - pdev->irq == msix_num - 1)
>> +		return true;
> I think we should use pci_irq_vector() to convert from device vector to Linux IRQ.
> It can be done in pci_endpoint_test_irqhandler()
> (With MSI, pdev->irq will be updated during MSI init, but it is not for MSI-X.)
> 
> pci_irq_vector()  should also be used for devm_request_irq() and devm_free_irq()
> so some changes required in probe and remove.

We could try it, sounds good.
Let's see if Kishon also agrees.

> 
> Regards,
> Alan
> 

Regards,
Gustavo

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 06/10] misc: pci_endpoint_test: Add MSI-X support
From: Gustavo Pimentel @ 2018-04-24 10:57 UTC (permalink / raw)
  To: Kishon Vijay Abraham I, bhelgaas@google.com,
	lorenzo.pieralisi@arm.com, Joao.Pinto@synopsys.com,
	jingoohan1@gmail.com, adouglas@cadence.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <a982ca92-d34c-12ea-30bf-9a9b1661d50d@ti.com>

Hi Kishon,

On 24/04/2018 08:19, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On Tuesday 17 April 2018 11:08 PM, Gustavo Pimentel wrote:
>> Hi Kishon,
>>
>> On 17/04/2018 11:33, Kishon Vijay Abraham I wrote:
>>> Hi,
>>>
>>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
>>>> Adds the MSI-X support and updates driver documentation accordingly.
>>>>
>>>> Changes the driver parameter in order to allow the interruption type
>>>> selection.
>>>>
>>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>>>> ---
>>>>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>>>>  drivers/misc/pci_endpoint_test.c                 | 102 +++++++++++++++++------
>>>>  2 files changed, 79 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/Documentation/misc-devices/pci-endpoint-test.txt b/Documentation/misc-devices/pci-endpoint-test.txt
>>>> index 4ebc359..fdfa0f6 100644
>>>> --- a/Documentation/misc-devices/pci-endpoint-test.txt
>>>> +++ b/Documentation/misc-devices/pci-endpoint-test.txt
>>>> @@ -10,6 +10,7 @@ The PCI driver for the test device performs the following tests
>>>>  	*) verifying addresses programmed in BAR
>>>>  	*) raise legacy IRQ
>>>>  	*) raise MSI IRQ
>>>> +	*) raise MSI-X IRQ
>>>>  	*) read data
>>>>  	*) write data
>>>>  	*) copy data
>>>> @@ -25,6 +26,8 @@ ioctl
>>>>   PCITEST_LEGACY_IRQ: Tests legacy IRQ
>>>>   PCITEST_MSI: Tests message signalled interrupts. The MSI number
>>>>  	      to be tested should be passed as argument.
>>>> + PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number
>>>> +	      to be tested should be passed as argument.
>>>>   PCITEST_WRITE: Perform write tests. The size of the buffer should be passed
>>>>  		as argument.
>>>>   PCITEST_READ: Perform read tests. The size of the buffer should be passed
>>>> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
>>>> index 37db0fc..a7d9354 100644
>>>> --- a/drivers/misc/pci_endpoint_test.c
>>>> +++ b/drivers/misc/pci_endpoint_test.c
>>>> @@ -42,11 +42,16 @@
>>>>  #define PCI_ENDPOINT_TEST_COMMAND	0x4
>>>>  #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
>>>>  #define COMMAND_RAISE_MSI_IRQ		BIT(1)
>>>> -#define MSI_NUMBER_SHIFT		2
>>>> -/* 6 bits for MSI number */
>>>> -#define COMMAND_READ                    BIT(8)
>>>> -#define COMMAND_WRITE                   BIT(9)
>>>> -#define COMMAND_COPY                    BIT(10)
>>>> +#define COMMAND_RAISE_MSIX_IRQ		BIT(2)
>>>> +#define IRQ_TYPE_SHIFT			3
>>>> +#define IRQ_TYPE_LEGACY			0
>>>> +#define IRQ_TYPE_MSI			1
>>>> +#define IRQ_TYPE_MSIX			2
>>>> +#define MSI_NUMBER_SHIFT		5
>>>
>>> Now that you are anyways fixing this, add a new register entry for MSI numbers.
>>> Let's not keep COMMAND and MSI's together.
>>
>> What you suggest?
> 
> #define PCI_ENDPOINT_TEST_COMMAND	0x4
> #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
> #define COMMAND_RAISE_MSI_IRQ		BIT(1)
> #define COMMAND_RAISE_MSIX_IRQ		BIT(2)
> #define COMMAND_READ                    BIT(3)
> #define COMMAND_WRITE                   BIT(4)
> #define COMMAND_COPY                    BIT(5)
> 
> #define PCI_ENDPOINT_TEST_STATUS	0x8
> #define STATUS_READ_SUCCESS             BIT(0)
> #define STATUS_READ_FAIL                BIT(1)
> #define STATUS_WRITE_SUCCESS            BIT(2)
> #define STATUS_WRITE_FAIL               BIT(3)
> #define STATUS_COPY_SUCCESS             BIT(4)
> #define STATUS_COPY_FAIL                BIT(5)
> #define STATUS_IRQ_RAISED               BIT(6)
> #define STATUS_SRC_ADDR_INVALID         BIT(7)
> #define STATUS_DST_ADDR_INVALID         BIT(8)
> 
> #define PCI_ENDPOINT_TEST_LOWER_SRC_ADDR	0xc
> #define PCI_ENDPOINT_TEST_UPPER_SRC_ADDR	0x10
> 
> #define PCI_ENDPOINT_TEST_LOWER_DST_ADDR	0x14
> #define PCI_ENDPOINT_TEST_UPPER_DST_ADDR	0x18
> 
> #define PCI_ENDPOINT_TEST_SIZE		0x1c
> #define PCI_ENDPOINT_TEST_CHECKSUM	0x20
> 
> #define PCI_ENDPOINT_TEST_MSI_NUMBER	0x24

Ok. I will do it.

> 
> We should try not to modify either the existing register offsets or the bit
> fields within these registers in the future as EP and RC will be running on
> different systems and it is possible one of them might not have the updated
> kernel.

I totally agree.

>>
>>>> +/* 12 bits for MSI number */
>>>> +#define COMMAND_READ                    BIT(17)
>>>> +#define COMMAND_WRITE                   BIT(18)
>>>> +#define COMMAND_COPY                    BIT(19)
>>>
>>> This change should be done along with the pci-epf-test in a single patch.
>>
>> To be clear, you're saying is this patch should be just be squashed into the
>> patch number 8 [1], because there is a lot of dependencies namely the defines,
>> that is used on the alter functions.
>>
>> [1] -> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_896841_&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=8urVwHCybXa1XMxsEbwHZAzzaEI_HJGXqmWgXpXb9TY&s=MRVr2YPY2Bk_WNFOxBfU4FGrFReTKdPhfzNDLiVxDbs&e=
> 
> yeah. We have to make sure git bisect doesn't break functionality.

Ok, it'll be squashed.

>>
>>>>  
>>>>  #define PCI_ENDPOINT_TEST_STATUS	0x8
>>>>  #define STATUS_READ_SUCCESS             BIT(0)
>>>> @@ -73,9 +78,9 @@ static DEFINE_IDA(pci_endpoint_test_ida);
>>>>  #define to_endpoint_test(priv) container_of((priv), struct pci_endpoint_test, \
>>>>  					    miscdev)
>>>>  
>>>> -static bool no_msi;
>>>> -module_param(no_msi, bool, 0444);
>>>> -MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in pci_endpoint_test");
>>>
>>> Let's not remove this just to make sure existing users doesn't get affected.
>>
>> Hum, by making an internal conversion? Like this
>> no_msi = false <=> irq_type = 1
>> no_msi = true <=> irq_type = 0
> 
Disregard previous comment, it doesn't make sense. I don't know where my head was.

It will be like this on probe:

if (no_msi)
	irq_type = IRQ_TYPE_LEGACY;

However since we are breaking the compatibility on terms of MSI/MSI-X
bits/registers (discussion on the top), it makes sense to keep the compatibility
on this parameter?

> yeah..
> 
> Thanks
> Kishon
> 

Regards,
Gustavo

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC 10/10] tools: PCI: Add MSI-X support
From: Alan Douglas @ 2018-04-24  9:57 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <9865822fb89ea84cb55fe93e151295afcd5c96b3.1523379766.git.gustavo.pimentel@synopsys.com>

Hi Gustavo,

On 10 April 2018 18:15, Gustavo Pimentel wrote:
> Adds MSI-X support to the pcitest tool and modified the pcitest.sh script to
> accomodate this new type of interruption test.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> ---
>  include/uapi/linux/pcitest.h |  1 +
>  tools/pci/pcitest.c          | 18 +++++++++++++++++-
>  tools/pci/pcitest.sh         | 25 +++++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 1 deletion(-)
I found some possible problems when testing with the Cadence EP driver.  The problem
is that pcitest uses the BARs for tests, but we also use one for the MSI-X tables

In Cadence core the MSI-X table is in BAR0 by default, but this is configured to a size
of 0x80 in the test driver, since it is used as the test_reg_bar.  So, I changed the 
configuration to use BAR4 instead, which is configured to a size of 131072 
in pci-efp-test.c, and this gives me enough space.

However, if I run the BAR tests in pcitest before running the MSI-X tests, the
MSI-X tests fail, since the BAR content is overwritten.  It's not a problem with the 
scenario in pcitest.sh, but it would be if the module wasn't re-loaded.

So, wondering if we need to come up with some mechanism to specify that a specific
BAR will be used for MSI-X, and that its size and content shouldn't be modified by
pcitest?

Regards,
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 01/10] PCI: dwc: Add MSI-X callbacks handler
From: Gustavo Pimentel @ 2018-04-24  9:36 UTC (permalink / raw)
  To: Kishon Vijay Abraham I, bhelgaas@google.com,
	lorenzo.pieralisi@arm.com, Joao.Pinto@synopsys.com,
	jingoohan1@gmail.com, adouglas@cadence.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <0e8b8ce9-db12-1e11-3eb5-62f3fa686d59@ti.com>

Hi Kishon,

On 24/04/2018 08:07, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On Monday 23 April 2018 03:06 PM, Gustavo Pimentel wrote:
>> Hi Kishon,
>>
>> On 16/04/2018 10:29, Kishon Vijay Abraham I wrote:
>>> Hi Gustavo,
>>>
>>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
>>>> Changes the pcie_raise_irq function signature, namely the interrupt_num
>>>> variable type from u8 to u16 to accommodate the MSI-X maximum interrupts
>>>> of 2048.
>>>>
>>>> Implements a PCIe config space capability iterator function to search and
>>>> save the MSI and MSI-X pointers. With this method the code becomes more
>>>> generic and flexible.
>>>>
>>>> Implements MSI-X set/get functions for sysfs interface in order to change
>>>> the EP entries number.
>>>>
>>>> Implements EP MSI-X interface for triggering interruptions.
>>>>
>>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>>>> ---
>>>>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
>>>>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
>>>>  drivers/pci/dwc/pcie-designware-ep.c   | 145 ++++++++++++++++++++++++++++++++-
>>>>  drivers/pci/dwc/pcie-designware-plat.c |   6 +-
>>>>  drivers/pci/dwc/pcie-designware.h      |  23 +++++-
>>>>  5 files changed, 173 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>>>> index ed8558d..5265725 100644
>>>> --- a/drivers/pci/dwc/pci-dra7xx.c
>>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>>>> @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct dra7xx_pcie *dra7xx,
>>>>  }
>>>>  
>>>>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>>> -				 enum pci_epc_irq_type type, u8 interrupt_num)
>>>> +				 enum pci_epc_irq_type type, u16 interrupt_num)
>>>>  {
>>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>>>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
>>>> diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
>>>> index e66cede..96dc259 100644
>>>> --- a/drivers/pci/dwc/pcie-artpec6.c
>>>> +++ b/drivers/pci/dwc/pcie-artpec6.c
>>>> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep *ep)
>>>>  }
>>>>  
>>>>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>>> -				  enum pci_epc_irq_type type, u8 interrupt_num)
>>>> +				  enum pci_epc_irq_type type, u16 interrupt_num)
>>>>  {
>>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>>>  
>>>> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c
>>>> index 15b22a6..874d4c2 100644
>>>> --- a/drivers/pci/dwc/pcie-designware-ep.c
>>>> +++ b/drivers/pci/dwc/pcie-designware-ep.c
>>>> @@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
>>>>  	__dw_pcie_ep_reset_bar(pci, bar, 0);
>>>>  }
>>>>  
>>>> +void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep)
>>>> +{
>>>
>>> This should be implemented in a generic way similar to pci_find_capability().
>>> It'll be useful when we try to implement other capabilities as well.
>>
>> Hum, what you suggest? Something implemented on the pci-epf-core?
> 
> yeah, Initially thought it could be implemented as a helper function in
> pci-epc-core so that both designware and cadence can use it.

That would be nice, however I couldn't find out how to access the config space,
through the pci_epf or pci_epc structs.

So, I reworked the functions like this:

(on pcie-designware-ep.c)

u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr,
                              u8 cap)
{
        u8 cap_id, next_cap_ptr;
        u16 reg;

        reg = dw_pcie_readw_dbi(pci, cap_ptr);
        next_cap_ptr = (reg & 0xff00) >> 8;
        cap_id = (reg & 0x00ff);

        if (!next_cap_ptr || cap_id > PCI_CAP_ID_MAX)
                return 0;

        if (cap_id == cap)
                return cap_ptr;

        return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap);
}

u8 dw_pcie_ep_find_capability(struct dw_pcie *pci, u8 cap)
{
        u8 next_cap_ptr;
        u16 reg;

        reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
        next_cap_ptr = (reg & 0x00ff);

        if (!next_cap_ptr)
                return 0;

        return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap);
}

int dw_pcie_ep_init(struct dw_pcie_ep *ep)
{
[...]
        ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
        ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX);
[...]
}

> 
> But do we really have to find the address like this? since all designware IP's
> will have a particular capability at a fixed address offset, why not follow use
> existing mechanism in dw_pcie_ep_get_msi?

The capabilities are not fixed to a specific address offset by default they
assume those values, but they can be easily change at design stage.

> 
> Or is it possible for a particular capability to have address offsets for
> different vendors? How is it for cadence?

Yes, it's possible to have different address offset for different vendors.

> 
> Thanks
> Kishon
> 

Thanks,
Gustavo

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC 00/10] Adds pcitest tool support for MSI-X
From: Alan Douglas @ 2018-04-24  9:28 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <767b50e3-8890-c241-bc97-1687bd0661c1@synopsys.com>

On 24 April 2018 09:50, Gustavo Pimentel wrote:
> Hi Alan,
> 
> On 24/04/2018 07:48, Alan Douglas wrote:
> > Hi Gustavo,
> >
> > On 10 April 2018 18:15 Gustavo Pimentel wrote:
> >>  https://lkml.org/lkml/2018/4/10/421
> >>  This series aims to add pcitest tool support for MSI-X.
> >> Includes new callbacks methods and handlers to trigger the MSI-X
> >> interruptions on the EP Designware IP driver.
> >>
> >> Provides new methods on pci_epf_test driver that allows to set/get EP
> >> maximum number of MSI-X entries (similar to set/get MSI methods).
> >>
> >> Reworks on MSI set/get and triggering methods on EP Designware IP
> >> driver to be more generic and flexible.
> >>
> >> Adds a new input parameter (msix) and replicates the whole MSI
> >> mechanism applied to the MSI-X feature on pcitest tool. Also updates
> >> the pcitest script with the new test set applied to this new feature.
> >>
> >> Gustavo Pimentel (10):
> >>   PCI: dwc: Add MSI-X callbacks handler
> >>   PCI: cadence: Update cdns_pcie_ep_raise_irq function signature
> >>   PCI: endpoint: Add MSI-X interfaces
> >>   PCI: dwc: MSI callbacks handler rework
> >>   PCI: dwc: Add legacy interrupt callback handler
> >>   misc: pci_endpoint_test: Add MSI-X support
> >>   misc: pci_endpoint_test: Replace lower into upper case characters
> >>   PCI: endpoint: functions/pci-epf-test: Add MSI-X support
> >>   PCI: endpoint: functions/pci-epf-test: Replace lower into upper case
> >>     characters
> >>   tools: PCI: Add MSI-X support
> >>
> >>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
> >>  drivers/misc/pci_endpoint_test.c                 | 120 ++++++++++----
> >>  drivers/pci/cadence/pcie-cadence-ep.c            |   2 +-
> >>  drivers/pci/dwc/pci-dra7xx.c                     |   2 +-
> >>  drivers/pci/dwc/pcie-artpec6.c                   |   2 +-
> >>  drivers/pci/dwc/pcie-designware-ep.c             | 201
> >> +++++++++++++++++++++--
> >>  drivers/pci/dwc/pcie-designware-plat.c           |   9 +-
> >>  drivers/pci/dwc/pcie-designware.h                |  40 +++--
> >>  drivers/pci/endpoint/functions/pci-epf-test.c    | 113 +++++++++----
> >>  drivers/pci/endpoint/pci-ep-cfs.c                |  24 +++
> >>  drivers/pci/endpoint/pci-epc-core.c              |  60 ++++++-
> >>  include/linux/pci-epc.h                          |  11 +-
> >>  include/linux/pci-epf.h                          |   1 +
> >>  include/uapi/linux/pcitest.h                     |   1 +
> >>  tools/pci/pcitest.c                              |  18 +-
> >>  tools/pci/pcitest.sh                             |  25 +++
> >>  16 files changed, 528 insertions(+), 104 deletions(-)
> >>
> >> --
> >> 2.7.4
> >>
> > Nice set of patches.  I have tested this with the Cadence EP driver after
> adding MSI-X support, and found a few changes required.
> > I will send you comments.
> 
> Ok, great news!
> 
> Maybe after this patch series submission we could start a new thread about
> new features that could be tested/verified using pcitest. I think this could be
> helpful for everybody.
> 
Great, I think that would definitely be of benefit.
> >
> > Thanks,
> > Alan
> >
> 
> Thanks,
> Gustavo
Thanks,
Alan



^ permalink raw reply

* RE: [RFC 01/10] PCI: dwc: Add MSI-X callbacks handler
From: Alan Douglas @ 2018-04-24  9:15 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <77b7b2687e9618d3f7d1f11c3fc6ecec9a9442ef.1523379766.git.gustavo.pimentel@synopsys.com>

Hi,

On 10 April 2018 18:15 Gustavo Pimentel wrote:
> Changes the pcie_raise_irq function signature, namely the interrupt_num
> variable type from u8 to u16 to accommodate the MSI-X maximum interrupts
> of 2048.
> 
> Implements a PCIe config space capability iterator function to search and save
> the MSI and MSI-X pointers. With this method the code becomes more
> generic and flexible.
> 
> Implements MSI-X set/get functions for sysfs interface in order to change the
> EP entries number.
> 
> Implements EP MSI-X interface for triggering interruptions.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> ---
>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
>  drivers/pci/dwc/pcie-designware-ep.c   | 145
> ++++++++++++++++++++++++++++++++-
>  drivers/pci/dwc/pcie-designware-plat.c |   6 +-
>  drivers/pci/dwc/pcie-designware.h      |  23 +++++-
>  5 files changed, 173 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c index
> ed8558d..5265725 100644
> --- a/drivers/pci/dwc/pci-dra7xx.c
> +++ b/drivers/pci/dwc/pci-dra7xx.c
> @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct
> dra7xx_pcie *dra7xx,  }
> 
>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> -				 enum pci_epc_irq_type type, u8
> interrupt_num)
> +				 enum pci_epc_irq_type type, u16
> interrupt_num)
>  {
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci); diff --git
> a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c index
> e66cede..96dc259 100644
> --- a/drivers/pci/dwc/pcie-artpec6.c
> +++ b/drivers/pci/dwc/pcie-artpec6.c
> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep
> *ep)  }
> 
>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> -				  enum pci_epc_irq_type type, u8
> interrupt_num)
> +				  enum pci_epc_irq_type type, u16
> interrupt_num)
>  {
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> 
> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-
> designware-ep.c
> index 15b22a6..874d4c2 100644
> --- a/drivers/pci/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/dwc/pcie-designware-ep.c
> @@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci,
> enum pci_barno bar)
>  	__dw_pcie_ep_reset_bar(pci, bar, 0);
>  }
> 
> +void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep) {
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u8 next_ptr, curr_ptr, cap_id;
> +	u16 reg;
> +
> +	memset(&ep->cap_addr, 0, sizeof(ep->cap_addr));
> +
> +	reg = dw_pcie_readw_dbi(pci, PCI_STATUS);
> +	if (!(reg & PCI_STATUS_CAP_LIST))
> +		return;
> +
> +	reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
> +	next_ptr = (reg & 0x00ff);
> +	if (!next_ptr)
> +		return;
> +
> +	reg = dw_pcie_readw_dbi(pci, next_ptr);
> +	curr_ptr = next_ptr;
> +	next_ptr = (reg & 0xff00) >> 8;
> +	cap_id = (reg & 0x00ff);
> +
> +	while (next_ptr && (cap_id <= PCI_CAP_ID_MAX)) {
> +		switch (cap_id) {
> +		case PCI_CAP_ID_MSI:
> +			ep->cap_addr.msi_addr = curr_ptr;
> +			break;
> +		case PCI_CAP_ID_MSIX:
> +			ep->cap_addr.msix_addr = curr_ptr;
> +			break;
> +		}
> +		reg = dw_pcie_readw_dbi(pci, next_ptr);
> +		curr_ptr = next_ptr;
> +		next_ptr = (reg & 0xff00) >> 8;
> +		cap_id = (reg & 0x00ff);
> +	}
> +}
> +
>  static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
>  				   struct pci_epf_header *hdr)
>  {
> @@ -241,8 +279,47 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc,
> u8 func_no, u8 encode_int)
>  	return 0;
>  }
> 
> +static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no) {
> +	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 val, reg;
> +
> +	if (ep->cap_addr.msix_addr == 0)
> +		return 0;
> +
> +	reg = ep->cap_addr.msix_addr + PCI_MSIX_FLAGS;
> +	val = dw_pcie_readw_dbi(pci, reg);
> +	if (!(val & PCI_MSIX_FLAGS_ENABLE))
> +		return -EINVAL;
> +
> +	val &= PCI_MSIX_FLAGS_QSIZE;
> +
> +	return val;
> +}
> +
> +static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16
> +interrupts) {
> +	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 val, reg;
> +
> +	if (ep->cap_addr.msix_addr == 0)
> +		return 0;
> +
> +	reg = ep->cap_addr.msix_addr + PCI_MSIX_FLAGS;
> +	val = dw_pcie_readw_dbi(pci, reg);
> +	val &= ~PCI_MSIX_FLAGS_QSIZE;
> +	val |= interrupts;
> +	dw_pcie_dbi_ro_wr_en(pci);
> +	dw_pcie_writew_dbi(pci, reg, val);
> +	dw_pcie_dbi_ro_wr_dis(pci);
> +
> +	return 0;
> +}
> +
>  static int dw_pcie_ep_raise_irq(struct pci_epc *epc, u8 func_no,
> -				enum pci_epc_irq_type type, u8
> interrupt_num)
> +				enum pci_epc_irq_type type, u16
> interrupt_num)
>  {
>  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> 
> @@ -282,6 +359,8 @@ static const struct pci_epc_ops epc_ops = {
>  	.unmap_addr		= dw_pcie_ep_unmap_addr,
>  	.set_msi		= dw_pcie_ep_set_msi,
>  	.get_msi		= dw_pcie_ep_get_msi,
> +	.set_msix		= dw_pcie_ep_set_msix,
> +	.get_msix		= dw_pcie_ep_get_msix,
>  	.raise_irq		= dw_pcie_ep_raise_irq,
>  	.start			= dw_pcie_ep_start,
>  	.stop			= dw_pcie_ep_stop,
> @@ -322,6 +401,60 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep
> *ep, u8 func_no,
>  	return 0;
>  }
> 
> +int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> +			     u16 interrupt_num)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	struct pci_epc *epc = ep->epc;
> +	u16 tbl_offset, bir;
> +	u32 bar_addr_upper, bar_addr_lower;
> +	u32 msg_addr_upper, msg_addr_lower;
> +	u32 reg, msg_data;
> +	u64 tbl_addr, msg_addr, reg_u64;
> +	void __iomem *msix_tbl;
> +	int ret;
> +
> +	reg = ep->cap_addr.msix_addr + PCI_MSIX_TABLE;
> +	tbl_offset = dw_pcie_readl_dbi(pci, reg);
> +	bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
> +	tbl_offset &= PCI_MSIX_TABLE_OFFSET;
> +	tbl_offset >>= 3;
> +
> +	reg = PCI_BASE_ADDRESS_0 + (4 * bir);
> +	bar_addr_lower = dw_pcie_readl_dbi(pci, reg);
> +	reg_u64 = (bar_addr_lower & PCI_BASE_ADDRESS_MEM_TYPE_MASK);
> +	if (reg_u64 == PCI_BASE_ADDRESS_MEM_TYPE_64)
> +		bar_addr_upper = dw_pcie_readl_dbi(pci, reg + 4);
> +	else
> +		bar_addr_upper = 0;
> +
> +	tbl_addr = ((u64) bar_addr_upper) << 32 | bar_addr_lower;
> +	tbl_addr += (tbl_offset + ((interrupt_num - 1) * PCI_MSIX_ENTRY_SIZE));
> +	tbl_addr &= PCI_BASE_ADDRESS_MEM_MASK;
> +
> +	msix_tbl = ioremap_nocache(ep->phys_base + tbl_addr, ep->addr_size);
> +	if (!msix_tbl)
> +		return -EINVAL;
> +
I think you need to check the mask bit in  vector control for the requested IRQ.
You could set the pending bit if masked, but would then need some output 
signal to inform when the mask bit is cleared (or poll it) so the message can be sent
later.

Also, do you need to check PCI_MSIX_FLAGS_ENABLE here as well, or is it checked earlier?

Regards,
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 00/10] Adds pcitest tool support for MSI-X
From: Gustavo Pimentel @ 2018-04-24  8:49 UTC (permalink / raw)
  To: Alan Douglas, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR07MB45123840F0C9487F1CD119A9D8880@SN6PR07MB4512.namprd07.prod.outlook.com>

Hi Alan,

On 24/04/2018 07:48, Alan Douglas wrote:
> Hi Gustavo,
> 
> On 10 April 2018 18:15 Gustavo Pimentel wrote:
>> This patch set depends the following series:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2018_4_10_421&d=DwIFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=R3h7a4AkN9--FFv6jtHaknKIzx6NieDdkYxyCe4obB8&s=J2A11L2_WwzMdaBheYSTwfjPDA7sx3_DyoGPAYsJ9A4&e=> This series aims to add pcitest tool support for MSI-X.
>>
>> Includes new callbacks methods and handlers to trigger the MSI-X
>> interruptions on the EP Designware IP driver.
>>
>> Provides new methods on pci_epf_test driver that allows to set/get EP
>> maximum number of MSI-X entries (similar to set/get MSI methods).
>>
>> Reworks on MSI set/get and triggering methods on EP Designware IP driver to
>> be more generic and flexible.
>>
>> Adds a new input parameter (msix) and replicates the whole MSI mechanism
>> applied to the MSI-X feature on pcitest tool. Also updates the pcitest script
>> with the new test set applied to this new feature.
>>
>> Gustavo Pimentel (10):
>>   PCI: dwc: Add MSI-X callbacks handler
>>   PCI: cadence: Update cdns_pcie_ep_raise_irq function signature
>>   PCI: endpoint: Add MSI-X interfaces
>>   PCI: dwc: MSI callbacks handler rework
>>   PCI: dwc: Add legacy interrupt callback handler
>>   misc: pci_endpoint_test: Add MSI-X support
>>   misc: pci_endpoint_test: Replace lower into upper case characters
>>   PCI: endpoint: functions/pci-epf-test: Add MSI-X support
>>   PCI: endpoint: functions/pci-epf-test: Replace lower into upper case
>>     characters
>>   tools: PCI: Add MSI-X support
>>
>>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>>  drivers/misc/pci_endpoint_test.c                 | 120 ++++++++++----
>>  drivers/pci/cadence/pcie-cadence-ep.c            |   2 +-
>>  drivers/pci/dwc/pci-dra7xx.c                     |   2 +-
>>  drivers/pci/dwc/pcie-artpec6.c                   |   2 +-
>>  drivers/pci/dwc/pcie-designware-ep.c             | 201
>> +++++++++++++++++++++--
>>  drivers/pci/dwc/pcie-designware-plat.c           |   9 +-
>>  drivers/pci/dwc/pcie-designware.h                |  40 +++--
>>  drivers/pci/endpoint/functions/pci-epf-test.c    | 113 +++++++++----
>>  drivers/pci/endpoint/pci-ep-cfs.c                |  24 +++
>>  drivers/pci/endpoint/pci-epc-core.c              |  60 ++++++-
>>  include/linux/pci-epc.h                          |  11 +-
>>  include/linux/pci-epf.h                          |   1 +
>>  include/uapi/linux/pcitest.h                     |   1 +
>>  tools/pci/pcitest.c                              |  18 +-
>>  tools/pci/pcitest.sh                             |  25 +++
>>  16 files changed, 528 insertions(+), 104 deletions(-)
>>
>> --
>> 2.7.4
>>
> Nice set of patches.  I have tested this with the Cadence EP driver after adding MSI-X support, and found a few changes required.
> I will send you comments.

Ok, great news!

Maybe after this patch series submission we could start a new thread about new
features that could be tested/verified using pcitest. I think this could be
helpful for everybody.

> 
> Thanks,
> Alan
> 

Thanks,
Gustavo

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 06/10] misc: pci_endpoint_test: Add MSI-X support
From: Kishon Vijay Abraham I @ 2018-04-24  7:19 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com,
	adouglas@cadence.com, niklas.cassel@axis.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <c6d1d492-38a0-6a03-7e61-b8b50226734b@synopsys.com>

Hi,

On Tuesday 17 April 2018 11:08 PM, Gustavo Pimentel wrote:
> Hi Kishon,
> 
> On 17/04/2018 11:33, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
>>> Adds the MSI-X support and updates driver documentation accordingly.
>>>
>>> Changes the driver parameter in order to allow the interruption type
>>> selection.
>>>
>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>>> ---
>>>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>>>  drivers/misc/pci_endpoint_test.c                 | 102 +++++++++++++++++------
>>>  2 files changed, 79 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/Documentation/misc-devices/pci-endpoint-test.txt b/Documentation/misc-devices/pci-endpoint-test.txt
>>> index 4ebc359..fdfa0f6 100644
>>> --- a/Documentation/misc-devices/pci-endpoint-test.txt
>>> +++ b/Documentation/misc-devices/pci-endpoint-test.txt
>>> @@ -10,6 +10,7 @@ The PCI driver for the test device performs the following tests
>>>  	*) verifying addresses programmed in BAR
>>>  	*) raise legacy IRQ
>>>  	*) raise MSI IRQ
>>> +	*) raise MSI-X IRQ
>>>  	*) read data
>>>  	*) write data
>>>  	*) copy data
>>> @@ -25,6 +26,8 @@ ioctl
>>>   PCITEST_LEGACY_IRQ: Tests legacy IRQ
>>>   PCITEST_MSI: Tests message signalled interrupts. The MSI number
>>>  	      to be tested should be passed as argument.
>>> + PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number
>>> +	      to be tested should be passed as argument.
>>>   PCITEST_WRITE: Perform write tests. The size of the buffer should be passed
>>>  		as argument.
>>>   PCITEST_READ: Perform read tests. The size of the buffer should be passed
>>> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
>>> index 37db0fc..a7d9354 100644
>>> --- a/drivers/misc/pci_endpoint_test.c
>>> +++ b/drivers/misc/pci_endpoint_test.c
>>> @@ -42,11 +42,16 @@
>>>  #define PCI_ENDPOINT_TEST_COMMAND	0x4
>>>  #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
>>>  #define COMMAND_RAISE_MSI_IRQ		BIT(1)
>>> -#define MSI_NUMBER_SHIFT		2
>>> -/* 6 bits for MSI number */
>>> -#define COMMAND_READ                    BIT(8)
>>> -#define COMMAND_WRITE                   BIT(9)
>>> -#define COMMAND_COPY                    BIT(10)
>>> +#define COMMAND_RAISE_MSIX_IRQ		BIT(2)
>>> +#define IRQ_TYPE_SHIFT			3
>>> +#define IRQ_TYPE_LEGACY			0
>>> +#define IRQ_TYPE_MSI			1
>>> +#define IRQ_TYPE_MSIX			2
>>> +#define MSI_NUMBER_SHIFT		5
>>
>> Now that you are anyways fixing this, add a new register entry for MSI numbers.
>> Let's not keep COMMAND and MSI's together.
> 
> What you suggest?

#define PCI_ENDPOINT_TEST_COMMAND	0x4
#define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
#define COMMAND_RAISE_MSI_IRQ		BIT(1)
#define COMMAND_RAISE_MSIX_IRQ		BIT(2)
#define COMMAND_READ                    BIT(3)
#define COMMAND_WRITE                   BIT(4)
#define COMMAND_COPY                    BIT(5)

#define PCI_ENDPOINT_TEST_STATUS	0x8
#define STATUS_READ_SUCCESS             BIT(0)
#define STATUS_READ_FAIL                BIT(1)
#define STATUS_WRITE_SUCCESS            BIT(2)
#define STATUS_WRITE_FAIL               BIT(3)
#define STATUS_COPY_SUCCESS             BIT(4)
#define STATUS_COPY_FAIL                BIT(5)
#define STATUS_IRQ_RAISED               BIT(6)
#define STATUS_SRC_ADDR_INVALID         BIT(7)
#define STATUS_DST_ADDR_INVALID         BIT(8)

#define PCI_ENDPOINT_TEST_LOWER_SRC_ADDR	0xc
#define PCI_ENDPOINT_TEST_UPPER_SRC_ADDR	0x10

#define PCI_ENDPOINT_TEST_LOWER_DST_ADDR	0x14
#define PCI_ENDPOINT_TEST_UPPER_DST_ADDR	0x18

#define PCI_ENDPOINT_TEST_SIZE		0x1c
#define PCI_ENDPOINT_TEST_CHECKSUM	0x20

#define PCI_ENDPOINT_TEST_MSI_NUMBER	0x24

We should try not to modify either the existing register offsets or the bit
fields within these registers in the future as EP and RC will be running on
different systems and it is possible one of them might not have the updated
kernel.
> 
>>> +/* 12 bits for MSI number */
>>> +#define COMMAND_READ                    BIT(17)
>>> +#define COMMAND_WRITE                   BIT(18)
>>> +#define COMMAND_COPY                    BIT(19)
>>
>> This change should be done along with the pci-epf-test in a single patch.
> 
> To be clear, you're saying is this patch should be just be squashed into the
> patch number 8 [1], because there is a lot of dependencies namely the defines,
> that is used on the alter functions.
> 
> [1] -> https://patchwork.ozlabs.org/patch/896841/

yeah. We have to make sure git bisect doesn't break functionality.
> 
>>>  
>>>  #define PCI_ENDPOINT_TEST_STATUS	0x8
>>>  #define STATUS_READ_SUCCESS             BIT(0)
>>> @@ -73,9 +78,9 @@ static DEFINE_IDA(pci_endpoint_test_ida);
>>>  #define to_endpoint_test(priv) container_of((priv), struct pci_endpoint_test, \
>>>  					    miscdev)
>>>  
>>> -static bool no_msi;
>>> -module_param(no_msi, bool, 0444);
>>> -MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in pci_endpoint_test");
>>
>> Let's not remove this just to make sure existing users doesn't get affected.
> 
> Hum, by making an internal conversion? Like this
> no_msi = false <=> irq_type = 1
> no_msi = true <=> irq_type = 0

yeah..

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 01/10] PCI: dwc: Add MSI-X callbacks handler
From: Kishon Vijay Abraham I @ 2018-04-24  7:07 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com,
	adouglas@cadence.com, niklas.cassel@axis.com,
	jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <e2fe7d5f-9f57-0e67-d321-541ff4706792@synopsys.com>

Hi,

On Monday 23 April 2018 03:06 PM, Gustavo Pimentel wrote:
> Hi Kishon,
> 
> On 16/04/2018 10:29, Kishon Vijay Abraham I wrote:
>> Hi Gustavo,
>>
>> On Tuesday 10 April 2018 10:44 PM, Gustavo Pimentel wrote:
>>> Changes the pcie_raise_irq function signature, namely the interrupt_num
>>> variable type from u8 to u16 to accommodate the MSI-X maximum interrupts
>>> of 2048.
>>>
>>> Implements a PCIe config space capability iterator function to search and
>>> save the MSI and MSI-X pointers. With this method the code becomes more
>>> generic and flexible.
>>>
>>> Implements MSI-X set/get functions for sysfs interface in order to change
>>> the EP entries number.
>>>
>>> Implements EP MSI-X interface for triggering interruptions.
>>>
>>> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
>>> ---
>>>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
>>>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
>>>  drivers/pci/dwc/pcie-designware-ep.c   | 145 ++++++++++++++++++++++++++++++++-
>>>  drivers/pci/dwc/pcie-designware-plat.c |   6 +-
>>>  drivers/pci/dwc/pcie-designware.h      |  23 +++++-
>>>  5 files changed, 173 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>>> index ed8558d..5265725 100644
>>> --- a/drivers/pci/dwc/pci-dra7xx.c
>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>>> @@ -369,7 +369,7 @@ static void dra7xx_pcie_raise_msi_irq(struct dra7xx_pcie *dra7xx,
>>>  }
>>>  
>>>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>> -				 enum pci_epc_irq_type type, u8 interrupt_num)
>>> +				 enum pci_epc_irq_type type, u16 interrupt_num)
>>>  {
>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
>>> diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
>>> index e66cede..96dc259 100644
>>> --- a/drivers/pci/dwc/pcie-artpec6.c
>>> +++ b/drivers/pci/dwc/pcie-artpec6.c
>>> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep *ep)
>>>  }
>>>  
>>>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>>> -				  enum pci_epc_irq_type type, u8 interrupt_num)
>>> +				  enum pci_epc_irq_type type, u16 interrupt_num)
>>>  {
>>>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>>  
>>> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c
>>> index 15b22a6..874d4c2 100644
>>> --- a/drivers/pci/dwc/pcie-designware-ep.c
>>> +++ b/drivers/pci/dwc/pcie-designware-ep.c
>>> @@ -40,6 +40,44 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
>>>  	__dw_pcie_ep_reset_bar(pci, bar, 0);
>>>  }
>>>  
>>> +void dw_pcie_ep_find_cap_addr(struct dw_pcie_ep *ep)
>>> +{
>>
>> This should be implemented in a generic way similar to pci_find_capability().
>> It'll be useful when we try to implement other capabilities as well.
> 
> Hum, what you suggest? Something implemented on the pci-epf-core?

yeah, Initially thought it could be implemented as a helper function in
pci-epc-core so that both designware and cadence can use it.

But do we really have to find the address like this? since all designware IP's
will have a particular capability at a fixed address offset, why not follow use
existing mechanism in dw_pcie_ep_get_msi?

Or is it possible for a particular capability to have address offsets for
different vendors? How is it for cadence?

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC 06/10] misc: pci_endpoint_test: Add MSI-X support
From: Alan Douglas @ 2018-04-24  6:59 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <8b88f8c2b766f36c71659deb0fce635154b2b39f.1523379766.git.gustavo.pimentel@synopsys.com>

Hi Gustavo,

On 10 April 2018 18:15 Gustavo Pimentel wrote:
> 
> Adds the MSI-X support and updates driver documentation accordingly.
> 
> Changes the driver parameter in order to allow the interruption type
> selection.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> ---
>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>  drivers/misc/pci_endpoint_test.c                 | 102 +++++++++++++++++------
>  2 files changed, 79 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/misc-devices/pci-endpoint-test.txt
> b/Documentation/misc-devices/pci-endpoint-test.txt
> index 4ebc359..fdfa0f6 100644
> --- a/Documentation/misc-devices/pci-endpoint-test.txt
> +++ b/Documentation/misc-devices/pci-endpoint-test.txt
> @@ -10,6 +10,7 @@ The PCI driver for the test device performs the
> following tests
>  	*) verifying addresses programmed in BAR
>  	*) raise legacy IRQ
>  	*) raise MSI IRQ
> +	*) raise MSI-X IRQ
>  	*) read data
>  	*) write data
>  	*) copy data
> @@ -25,6 +26,8 @@ ioctl
>   PCITEST_LEGACY_IRQ: Tests legacy IRQ
>   PCITEST_MSI: Tests message signalled interrupts. The MSI number
>  	      to be tested should be passed as argument.
> + PCITEST_MSIX: Tests message signalled interrupts. The MSI-X number
> +	      to be tested should be passed as argument.
>   PCITEST_WRITE: Perform write tests. The size of the buffer should be passed
>  		as argument.
>   PCITEST_READ: Perform read tests. The size of the buffer should be passed
> diff --git a/drivers/misc/pci_endpoint_test.c
> b/drivers/misc/pci_endpoint_test.c
> index 37db0fc..a7d9354 100644
> --- a/drivers/misc/pci_endpoint_test.c
> +++ b/drivers/misc/pci_endpoint_test.c
> @@ -42,11 +42,16 @@
>  #define PCI_ENDPOINT_TEST_COMMAND	0x4
>  #define COMMAND_RAISE_LEGACY_IRQ	BIT(0)
>  #define COMMAND_RAISE_MSI_IRQ		BIT(1)
> -#define MSI_NUMBER_SHIFT		2
> -/* 6 bits for MSI number */
> -#define COMMAND_READ                    BIT(8)
> -#define COMMAND_WRITE                   BIT(9)
> -#define COMMAND_COPY                    BIT(10)
> +#define COMMAND_RAISE_MSIX_IRQ		BIT(2)
> +#define IRQ_TYPE_SHIFT			3
> +#define IRQ_TYPE_LEGACY			0
> +#define IRQ_TYPE_MSI			1
> +#define IRQ_TYPE_MSIX			2
> +#define MSI_NUMBER_SHIFT		5
> +/* 12 bits for MSI number */
> +#define COMMAND_READ                    BIT(17)
> +#define COMMAND_WRITE                   BIT(18)
> +#define COMMAND_COPY                    BIT(19)
> 
>  #define PCI_ENDPOINT_TEST_STATUS	0x8
>  #define STATUS_READ_SUCCESS             BIT(0)
> @@ -73,9 +78,9 @@ static DEFINE_IDA(pci_endpoint_test_ida);
>  #define to_endpoint_test(priv) container_of((priv), struct pci_endpoint_test,
> \
>  					    miscdev)
> 
> -static bool no_msi;
> -module_param(no_msi, bool, 0444);
> -MODULE_PARM_DESC(no_msi, "Disable MSI interrupt in
> pci_endpoint_test");
> +static int irq_type = IRQ_TYPE_MSIX;
> +module_param(irq_type, int, 0444);
> +MODULE_PARM_DESC(irq_type, "IRQ mode selection in pci_endpoint_test
> (0
> +- Legacy, 1 - MSI, 2 - MSI-X)");
> 
>  enum pci_barno {
>  	BAR_0,
> @@ -103,7 +108,7 @@ struct pci_endpoint_test {  struct
> pci_endpoint_test_data {
>  	enum pci_barno test_reg_bar;
>  	size_t alignment;
> -	bool no_msi;
> +	int irq_type;
>  };
> 
>  static inline u32 pci_endpoint_test_readl(struct pci_endpoint_test *test,
> @@ -177,10 +182,10 @@ static bool pci_endpoint_test_bar(struct
> pci_endpoint_test *test,
> 
>  static bool pci_endpoint_test_legacy_irq(struct pci_endpoint_test *test)  {
> -	u32 val;
> +	u32 val = COMMAND_RAISE_LEGACY_IRQ;
> 
> -	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
> -				 COMMAND_RAISE_LEGACY_IRQ);
> +	val |= (IRQ_TYPE_LEGACY << IRQ_TYPE_SHIFT);
> +	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
>  	val = wait_for_completion_timeout(&test->irq_raised,
>  					  msecs_to_jiffies(1000));
>  	if (!val)
> @@ -192,12 +197,12 @@ static bool pci_endpoint_test_legacy_irq(struct
> pci_endpoint_test *test)  static bool pci_endpoint_test_msi_irq(struct
> pci_endpoint_test *test,
>  				      u8 msi_num)
>  {
> -	u32 val;
> +	u32 val = COMMAND_RAISE_MSI_IRQ;
>  	struct pci_dev *pdev = test->pdev;
> 
> -	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
> -				 msi_num << MSI_NUMBER_SHIFT |
> -				 COMMAND_RAISE_MSI_IRQ);
> +	val |= (msi_num << MSI_NUMBER_SHIFT);
> +	val |= (IRQ_TYPE_MSI << IRQ_TYPE_SHIFT);
> +	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
>  	val = wait_for_completion_timeout(&test->irq_raised,
>  					  msecs_to_jiffies(1000));
>  	if (!val)
> @@ -209,6 +214,26 @@ static bool pci_endpoint_test_msi_irq(struct
> pci_endpoint_test *test,
>  	return false;
>  }
> 
> +static bool pci_endpoint_test_msix_irq(struct pci_endpoint_test *test,
> +				       u16 msix_num)
> +{
> +	u32 val = COMMAND_RAISE_MSIX_IRQ;
> +	struct pci_dev *pdev = test->pdev;
> +
> +	val |= (msix_num << MSI_NUMBER_SHIFT);
> +	val |= (IRQ_TYPE_MSIX << IRQ_TYPE_SHIFT);
> +	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND, val);
> +	val = wait_for_completion_timeout(&test->irq_raised,
> +					  msecs_to_jiffies(1000));
> +	if (!val)
> +		return false;
> +
> +	if (test->last_irq - pdev->irq == msix_num - 1)
> +		return true;
I think we should use pci_irq_vector() to convert from device vector to Linux IRQ.
It can be done in pci_endpoint_test_irqhandler()
(With MSI, pdev->irq will be updated during MSI init, but it is not for MSI-X.)

pci_irq_vector()  should also be used for devm_request_irq() and devm_free_irq()
so some changes required in probe and remove.

Regards,
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC 00/10] Adds pcitest tool support for MSI-X
From: Alan Douglas @ 2018-04-24  6:48 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	Joao.Pinto@synopsys.com, jingoohan1@gmail.com, kishon@ti.com,
	niklas.cassel@axis.com, jesper.nilsson@axis.com
  Cc: linux-pci@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <cover.1523379766.git.gustavo.pimentel@synopsys.com>

Hi Gustavo,

On 10 April 2018 18:15 Gustavo Pimentel wrote:
> This patch set depends the following series:
> https://lkml.org/lkml/2018/4/10/421> This series aims to add pcitest tool support for MSI-X.
> 
> Includes new callbacks methods and handlers to trigger the MSI-X
> interruptions on the EP Designware IP driver.
> 
> Provides new methods on pci_epf_test driver that allows to set/get EP
> maximum number of MSI-X entries (similar to set/get MSI methods).
> 
> Reworks on MSI set/get and triggering methods on EP Designware IP driver to
> be more generic and flexible.
> 
> Adds a new input parameter (msix) and replicates the whole MSI mechanism
> applied to the MSI-X feature on pcitest tool. Also updates the pcitest script
> with the new test set applied to this new feature.
> 
> Gustavo Pimentel (10):
>   PCI: dwc: Add MSI-X callbacks handler
>   PCI: cadence: Update cdns_pcie_ep_raise_irq function signature
>   PCI: endpoint: Add MSI-X interfaces
>   PCI: dwc: MSI callbacks handler rework
>   PCI: dwc: Add legacy interrupt callback handler
>   misc: pci_endpoint_test: Add MSI-X support
>   misc: pci_endpoint_test: Replace lower into upper case characters
>   PCI: endpoint: functions/pci-epf-test: Add MSI-X support
>   PCI: endpoint: functions/pci-epf-test: Replace lower into upper case
>     characters
>   tools: PCI: Add MSI-X support
> 
>  Documentation/misc-devices/pci-endpoint-test.txt |   3 +
>  drivers/misc/pci_endpoint_test.c                 | 120 ++++++++++----
>  drivers/pci/cadence/pcie-cadence-ep.c            |   2 +-
>  drivers/pci/dwc/pci-dra7xx.c                     |   2 +-
>  drivers/pci/dwc/pcie-artpec6.c                   |   2 +-
>  drivers/pci/dwc/pcie-designware-ep.c             | 201
> +++++++++++++++++++++--
>  drivers/pci/dwc/pcie-designware-plat.c           |   9 +-
>  drivers/pci/dwc/pcie-designware.h                |  40 +++--
>  drivers/pci/endpoint/functions/pci-epf-test.c    | 113 +++++++++----
>  drivers/pci/endpoint/pci-ep-cfs.c                |  24 +++
>  drivers/pci/endpoint/pci-epc-core.c              |  60 ++++++-
>  include/linux/pci-epc.h                          |  11 +-
>  include/linux/pci-epf.h                          |   1 +
>  include/uapi/linux/pcitest.h                     |   1 +
>  tools/pci/pcitest.c                              |  18 +-
>  tools/pci/pcitest.sh                             |  25 +++
>  16 files changed, 528 insertions(+), 104 deletions(-)
> 
> --
> 2.7.4
> 
Nice set of patches.  I have tested this with the Cadence EP driver after adding MSI-X support, and found a few changes required.
I will send you comments.

Thanks,
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 2/7] docs/vm: ksm: (mostly) formatting updates
From: Mike Rapoport @ 2018-04-24  6:40 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Andrea Arcangeli, linux-doc, linux-mm, lkml,
	Mike Rapoport
In-Reply-To: <1524552028-7017-1-git-send-email-rppt@linux.vnet.ibm.com>

Aside from the formatting:
* fixed typos
* added section and sub-section headers
* moved ksmd overview after the description of KSM origins

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/ksm.rst | 110 +++++++++++++++++++++++++++++------------------
 1 file changed, 69 insertions(+), 41 deletions(-)

diff --git a/Documentation/vm/ksm.rst b/Documentation/vm/ksm.rst
index 87e7eef..786d460 100644
--- a/Documentation/vm/ksm.rst
+++ b/Documentation/vm/ksm.rst
@@ -4,34 +4,50 @@
 Kernel Samepage Merging
 =======================
 
+Overview
+========
+
 KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
 added to the Linux kernel in 2.6.32.  See ``mm/ksm.c`` for its implementation,
 and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/
 
-The KSM daemon ksmd periodically scans those areas of user memory which
-have been registered with it, looking for pages of identical content which
-can be replaced by a single write-protected page (which is automatically
-copied if a process later wants to update its content).
-
 KSM was originally developed for use with KVM (where it was known as
 Kernel Shared Memory), to fit more virtual machines into physical memory,
 by sharing the data common between them.  But it can be useful to any
 application which generates many instances of the same data.
 
+The KSM daemon ksmd periodically scans those areas of user memory
+which have been registered with it, looking for pages of identical
+content which can be replaced by a single write-protected page (which
+is automatically copied if a process later wants to update its
+content). The amount of pages that KSM daemon scans in a single pass
+and the time between the passes are configured using :ref:`sysfs
+intraface <ksm_sysfs>`
+
 KSM only merges anonymous (private) pages, never pagecache (file) pages.
 KSM's merged pages were originally locked into kernel memory, but can now
 be swapped out just like other user pages (but sharing is broken when they
 are swapped back in: ksmd must rediscover their identity and merge again).
 
+Controlling KSM with madvise
+============================
+
 KSM only operates on those areas of address space which an application
 has advised to be likely candidates for merging, by using the madvise(2)
-system call: int madvise(addr, length, MADV_MERGEABLE).
+system call::
+
+	int madvise(addr, length, MADV_MERGEABLE)
+
+The app may call
+
+::
+
+	int madvise(addr, length, MADV_UNMERGEABLE)
 
-The app may call int madvise(addr, length, MADV_UNMERGEABLE) to cancel
-that advice and restore unshared pages: whereupon KSM unmerges whatever
-it merged in that range.  Note: this unmerging call may suddenly require
-more memory than is available - possibly failing with EAGAIN, but more
-probably arousing the Out-Of-Memory killer.
+to cancel that advice and restore unshared pages: whereupon KSM
+unmerges whatever it merged in that range.  Note: this unmerging call
+may suddenly require more memory than is available - possibly failing
+with EAGAIN, but more probably arousing the Out-Of-Memory killer.
 
 If KSM is not configured into the running kernel, madvise MADV_MERGEABLE
 and MADV_UNMERGEABLE simply fail with EINVAL.  If the running kernel was
@@ -43,7 +59,7 @@ MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
 
 If a region of memory must be split into at least one new MADV_MERGEABLE
 or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
-will exceed vm.max_map_count (see Documentation/sysctl/vm.txt).
+will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt).
 
 Like other madvise calls, they are intended for use on mapped areas of
 the user address space: they will report ENOMEM if the specified range
@@ -54,21 +70,28 @@ Applications should be considerate in their use of MADV_MERGEABLE,
 restricting its use to areas likely to benefit.  KSM's scans may use a lot
 of processing power: some installations will disable KSM for that reason.
 
+.. _ksm_sysfs:
+
+KSM daemon sysfs interface
+==========================
+
 The KSM daemon is controlled by sysfs files in ``/sys/kernel/mm/ksm/``,
 readable by all but writable only by root:
 
 pages_to_scan
-        how many present pages to scan before ksmd goes to sleep
-        e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan`` Default: 100
-        (chosen for demonstration purposes)
+        how many pages to scan before ksmd goes to sleep
+        e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``.
+
+        Default: 100 (chosen for demonstration purposes)
 
 sleep_millisecs
         how many milliseconds ksmd should sleep before next scan
-        e.g. ``echo 20 > /sys/kernel/mm/ksm/sleep_millisecs`` Default: 20
-        (chosen for demonstration purposes)
+        e.g. ``echo 20 > /sys/kernel/mm/ksm/sleep_millisecs``
+
+        Default: 20 (chosen for demonstration purposes)
 
 merge_across_nodes
-        specifies if pages from different numa nodes can be merged.
+        specifies if pages from different NUMA nodes can be merged.
         When set to 0, ksm merges only pages which physically reside
         in the memory area of same NUMA node. That brings lower
         latency to access of shared pages. Systems with more nodes, at
@@ -77,19 +100,21 @@ merge_across_nodes
         minimize memory usage, are likely to benefit from the greater
         sharing of setting 1 (default). You may wish to compare how
         your system performs under each setting, before deciding on
-        which to use. merge_across_nodes setting can be changed only
-        when there are no ksm shared pages in system: set run 2 to
+        which to use. ``merge_across_nodes`` setting can be changed only
+        when there are no ksm shared pages in the system: set run 2 to
         unmerge pages first, then to 1 after changing
-        merge_across_nodes, to remerge according to the new setting.
+        ``merge_across_nodes``, to remerge according to the new setting.
+
         Default: 1 (merging across nodes as in earlier releases)
 
 run
-        set 0 to stop ksmd from running but keep merged pages,
-        set 1 to run ksmd e.g. ``echo 1 > /sys/kernel/mm/ksm/run``,
-        set 2 to stop ksmd and unmerge all pages currently merged, but
-        leave mergeable areas registered for next run Default: 0 (must
-        be changed to 1 to activate KSM, except if CONFIG_SYSFS is
-        disabled)
+        * set to 0 to stop ksmd from running but keep merged pages,
+        * set to 1 to run ksmd e.g. ``echo 1 > /sys/kernel/mm/ksm/run``,
+        * set to 2 to stop ksmd and unmerge all pages currently merged, but
+	  leave mergeable areas registered for next run.
+
+        Default: 0 (must be changed to 1 to activate KSM, except if
+        CONFIG_SYSFS is disabled)
 
 use_zero_pages
         specifies whether empty pages (i.e. allocated pages that only
@@ -102,8 +127,9 @@ use_zero_pages
         KSM for some workloads, for example if the checksums of pages
         candidate for merging match the checksum of an empty
         page. This setting can be changed at any time, it is only
-        effective for pages merged after the change.  Default: 0
-        (normal KSM behaviour as in earlier releases)
+        effective for pages merged after the change.
+
+        Default: 0 (normal KSM behaviour as in earlier releases)
 
 max_page_sharing
         Maximum sharing allowed for each KSM page. This enforces a
@@ -112,7 +138,7 @@ max_page_sharing
         page will have at least two sharers. The rmap walk has O(N)
         complexity where N is the number of rmap_items (i.e. virtual
         mappings) that are sharing the page, which is in turn capped
-        by max_page_sharing. So this effectively spread the the linear
+        by ``max_page_sharing``. So this effectively spreads the linear
         O(N) computational complexity from rmap walk context over
         different KSM pages. The ksmd walk over the stable_node
         "chains" is also O(N), but N is the number of stable_node
@@ -140,7 +166,7 @@ stable_node_chains_prune_millisecs
         metadata with lower latency, but they will make ksmd use more
         CPU during the scan. This only applies to the stable_node
         chains so it's a noop if not a single KSM page hit the
-        max_page_sharing yet (there would be no stable_node chains in
+        ``max_page_sharing`` yet (there would be no stable_node chains in
         such case).
 
 The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
@@ -157,27 +183,29 @@ full_scans
         how many times all mergeable areas have been scanned
 stable_node_chains
         number of stable node chains allocated, this is effectively
-        the number of KSM pages that hit the max_page_sharing limit
+        the number of KSM pages that hit the ``max_page_sharing`` limit
 stable_node_dups
         number of stable node dups queued into the stable_node chains
 
-A high ratio of pages_sharing to pages_shared indicates good sharing, but
-a high ratio of pages_unshared to pages_sharing indicates wasted effort.
-pages_volatile embraces several different kinds of activity, but a high
-proportion there would also indicate poor use of madvise MADV_MERGEABLE.
+A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
+sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
+indicates wasted effort.  ``pages_volatile`` embraces several
+different kinds of activity, but a high proportion there would also
+indicate poor use of madvise MADV_MERGEABLE.
 
-The maximum possible page_sharing/page_shared ratio is limited by the
-max_page_sharing tunable. To increase the ratio max_page_sharing must
+The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
+``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
 be increased accordingly.
 
-The stable_node_dups/stable_node_chains ratio is also affected by the
-max_page_sharing tunable, and an high ratio may indicate fragmentation
+The ``stable_node_dups/stable_node_chains`` ratio is also affected by the
+``max_page_sharing`` tunable, and an high ratio may indicate fragmentation
 in the stable_node dups, which could be solved by introducing
 fragmentation algorithms in ksmd which would refile rmap_items from
-one stable_node dup to another stable_node dup, in order to freeup
+one stable_node dup to another stable_node dup, in order to free up
 stable_node "dups" with few rmap_items in them, but that may increase
 the ksmd CPU usage and possibly slowdown the readonly computations on
 the KSM pages of the applications.
 
+--
 Izik Eidus,
 Hugh Dickins, 17 Nov 2009
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 6/7] docs/vm: ksm: udpate description of stable_node_{dups,chains}
From: Mike Rapoport @ 2018-04-24  6:40 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Andrea Arcangeli, linux-doc, linux-mm, lkml,
	Mike Rapoport
In-Reply-To: <1524552028-7017-1-git-send-email-rppt@linux.vnet.ibm.com>

Remove implementation details from sysfs parameter descriptions.
Also move the paragraph discussing fragmentation issues and their possible
solution to the "Design" section.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/ksm.rst | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/vm/ksm.rst b/Documentation/vm/ksm.rst
index 18d7c71..afcf5a8 100644
--- a/Documentation/vm/ksm.rst
+++ b/Documentation/vm/ksm.rst
@@ -170,10 +170,9 @@ pages_volatile
 full_scans
         how many times all mergeable areas have been scanned
 stable_node_chains
-        number of stable node chains allocated, this is effectively
         the number of KSM pages that hit the ``max_page_sharing`` limit
 stable_node_dups
-        number of stable node dups queued into the stable_node chains
+        number of duplicated KSM pages
 
 A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
 sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
@@ -185,15 +184,6 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
 ``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
 be increased accordingly.
 
-The ``stable_node_dups/stable_node_chains`` ratio is also affected by the
-``max_page_sharing`` tunable, and an high ratio may indicate fragmentation
-in the stable_node dups, which could be solved by introducing
-fragmentation algorithms in ksmd which would refile rmap_items from
-one stable_node dup to another stable_node dup, in order to free up
-stable_node "dups" with few rmap_items in them, but that may increase
-the ksmd CPU usage and possibly slowdown the readonly computations on
-the KSM pages of the applications.
-
 Design
 ======
 
@@ -247,6 +237,15 @@ deduplication factor at the expense of slower worst case for rmap
 walks for any KSM page which can happen during swapping, compaction,
 NUMA balancing and page migration.
 
+The ``stable_node_dups/stable_node_chains`` ratio is also affected by the
+``max_page_sharing`` tunable, and an high ratio may indicate fragmentation
+in the stable_node dups, which could be solved by introducing
+fragmentation algorithms in ksmd which would refile rmap_items from
+one stable_node dup to another stable_node dup, in order to free up
+stable_node "dups" with few rmap_items in them, but that may increase
+the ksmd CPU usage and possibly slowdown the readonly computations on
+the KSM pages of the applications.
+
 The whole list of stable_node "dups" linked in the stable_node
 "chains" is scanned periodically in order to prune stale stable_nodes.
 The frequency of such scans is defined by
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 7/7] docs/vm: ksm: split userspace interface to admin-guide/mm/ksm.rst
From: Mike Rapoport @ 2018-04-24  6:40 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Andrea Arcangeli, linux-doc, linux-mm, lkml,
	Mike Rapoport
In-Reply-To: <1524552028-7017-1-git-send-email-rppt@linux.vnet.ibm.com>

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/admin-guide/mm/index.rst |   1 +
 Documentation/admin-guide/mm/ksm.rst   | 189 +++++++++++++++++++++++++++++++++
 Documentation/vm/ksm.rst               | 176 +-----------------------------
 3 files changed, 191 insertions(+), 175 deletions(-)
 create mode 100644 Documentation/admin-guide/mm/ksm.rst

diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 6c8b554..ad28644 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -23,6 +23,7 @@ the Linux memory management.
 
    hugetlbpage
    idle_page_tracking
+   ksm
    pagemap
    soft-dirty
    userfaultfd
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
new file mode 100644
index 0000000..9303786
--- /dev/null
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -0,0 +1,189 @@
+.. _admin_guide_ksm:
+
+=======================
+Kernel Samepage Merging
+=======================
+
+Overview
+========
+
+KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
+added to the Linux kernel in 2.6.32.  See ``mm/ksm.c`` for its implementation,
+and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/
+
+KSM was originally developed for use with KVM (where it was known as
+Kernel Shared Memory), to fit more virtual machines into physical memory,
+by sharing the data common between them.  But it can be useful to any
+application which generates many instances of the same data.
+
+The KSM daemon ksmd periodically scans those areas of user memory
+which have been registered with it, looking for pages of identical
+content which can be replaced by a single write-protected page (which
+is automatically copied if a process later wants to update its
+content). The amount of pages that KSM daemon scans in a single pass
+and the time between the passes are configured using :ref:`sysfs
+intraface <ksm_sysfs>`
+
+KSM only merges anonymous (private) pages, never pagecache (file) pages.
+KSM's merged pages were originally locked into kernel memory, but can now
+be swapped out just like other user pages (but sharing is broken when they
+are swapped back in: ksmd must rediscover their identity and merge again).
+
+Controlling KSM with madvise
+============================
+
+KSM only operates on those areas of address space which an application
+has advised to be likely candidates for merging, by using the madvise(2)
+system call::
+
+	int madvise(addr, length, MADV_MERGEABLE)
+
+The app may call
+
+::
+
+	int madvise(addr, length, MADV_UNMERGEABLE)
+
+to cancel that advice and restore unshared pages: whereupon KSM
+unmerges whatever it merged in that range.  Note: this unmerging call
+may suddenly require more memory than is available - possibly failing
+with EAGAIN, but more probably arousing the Out-Of-Memory killer.
+
+If KSM is not configured into the running kernel, madvise MADV_MERGEABLE
+and MADV_UNMERGEABLE simply fail with EINVAL.  If the running kernel was
+built with CONFIG_KSM=y, those calls will normally succeed: even if the
+the KSM daemon is not currently running, MADV_MERGEABLE still registers
+the range for whenever the KSM daemon is started; even if the range
+cannot contain any pages which KSM could actually merge; even if
+MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
+
+If a region of memory must be split into at least one new MADV_MERGEABLE
+or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
+will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt).
+
+Like other madvise calls, they are intended for use on mapped areas of
+the user address space: they will report ENOMEM if the specified range
+includes unmapped gaps (though working on the intervening mapped areas),
+and might fail with EAGAIN if not enough memory for internal structures.
+
+Applications should be considerate in their use of MADV_MERGEABLE,
+restricting its use to areas likely to benefit.  KSM's scans may use a lot
+of processing power: some installations will disable KSM for that reason.
+
+.. _ksm_sysfs:
+
+KSM daemon sysfs interface
+==========================
+
+The KSM daemon is controlled by sysfs files in ``/sys/kernel/mm/ksm/``,
+readable by all but writable only by root:
+
+pages_to_scan
+        how many pages to scan before ksmd goes to sleep
+        e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``.
+
+        Default: 100 (chosen for demonstration purposes)
+
+sleep_millisecs
+        how many milliseconds ksmd should sleep before next scan
+        e.g. ``echo 20 > /sys/kernel/mm/ksm/sleep_millisecs``
+
+        Default: 20 (chosen for demonstration purposes)
+
+merge_across_nodes
+        specifies if pages from different NUMA nodes can be merged.
+        When set to 0, ksm merges only pages which physically reside
+        in the memory area of same NUMA node. That brings lower
+        latency to access of shared pages. Systems with more nodes, at
+        significant NUMA distances, are likely to benefit from the
+        lower latency of setting 0. Smaller systems, which need to
+        minimize memory usage, are likely to benefit from the greater
+        sharing of setting 1 (default). You may wish to compare how
+        your system performs under each setting, before deciding on
+        which to use. ``merge_across_nodes`` setting can be changed only
+        when there are no ksm shared pages in the system: set run 2 to
+        unmerge pages first, then to 1 after changing
+        ``merge_across_nodes``, to remerge according to the new setting.
+
+        Default: 1 (merging across nodes as in earlier releases)
+
+run
+        * set to 0 to stop ksmd from running but keep merged pages,
+        * set to 1 to run ksmd e.g. ``echo 1 > /sys/kernel/mm/ksm/run``,
+        * set to 2 to stop ksmd and unmerge all pages currently merged, but
+	  leave mergeable areas registered for next run.
+
+        Default: 0 (must be changed to 1 to activate KSM, except if
+        CONFIG_SYSFS is disabled)
+
+use_zero_pages
+        specifies whether empty pages (i.e. allocated pages that only
+        contain zeroes) should be treated specially.  When set to 1,
+        empty pages are merged with the kernel zero page(s) instead of
+        with each other as it would happen normally. This can improve
+        the performance on architectures with coloured zero pages,
+        depending on the workload. Care should be taken when enabling
+        this setting, as it can potentially degrade the performance of
+        KSM for some workloads, for example if the checksums of pages
+        candidate for merging match the checksum of an empty
+        page. This setting can be changed at any time, it is only
+        effective for pages merged after the change.
+
+        Default: 0 (normal KSM behaviour as in earlier releases)
+
+max_page_sharing
+        Maximum sharing allowed for each KSM page. This enforces a
+        deduplication limit to avoid high latency for virtual memory
+        operations that involve traversal of the virtual mappings that
+        share the KSM page. The minimum value is 2 as a newly created
+        KSM page will have at least two sharers. The higher this value
+        the faster KSM will merge the memory and the higher the
+        deduplication factor will be, but the slower the worst case
+        virtual mappings traversal could be for any given KSM
+        page. Slowing down this traversal means there will be higher
+        latency for certain virtual memory operations happening during
+        swapping, compaction, NUMA balancing and page migration, in
+        turn decreasing responsiveness for the caller of those virtual
+        memory operations. The scheduler latency of other tasks not
+        involved with the VM operations doing the virtual mappings
+        traversal is not affected by this parameter as these
+        traversals are always schedule friendly themselves.
+
+stable_node_chains_prune_millisecs
+        specifies how frequently KSM checks the metadata of the pages
+        that hit the deduplication limit for stale information.
+        Smaller milllisecs values will free up the KSM metadata with
+        lower latency, but they will make ksmd use more CPU during the
+        scan. It's a noop if not a single KSM page hit the
+        ``max_page_sharing`` yet.
+
+The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
+
+pages_shared
+        how many shared pages are being used
+pages_sharing
+        how many more sites are sharing them i.e. how much saved
+pages_unshared
+        how many pages unique but repeatedly checked for merging
+pages_volatile
+        how many pages changing too fast to be placed in a tree
+full_scans
+        how many times all mergeable areas have been scanned
+stable_node_chains
+        the number of KSM pages that hit the ``max_page_sharing`` limit
+stable_node_dups
+        number of duplicated KSM pages
+
+A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
+sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
+indicates wasted effort.  ``pages_volatile`` embraces several
+different kinds of activity, but a high proportion there would also
+indicate poor use of madvise MADV_MERGEABLE.
+
+The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
+``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
+be increased accordingly.
+
+--
+Izik Eidus,
+Hugh Dickins, 17 Nov 2009
diff --git a/Documentation/vm/ksm.rst b/Documentation/vm/ksm.rst
index afcf5a8..d32016d 100644
--- a/Documentation/vm/ksm.rst
+++ b/Documentation/vm/ksm.rst
@@ -4,185 +4,11 @@
 Kernel Samepage Merging
 =======================
 
-Overview
-========
-
 KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
 added to the Linux kernel in 2.6.32.  See ``mm/ksm.c`` for its implementation,
 and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/
 
-KSM was originally developed for use with KVM (where it was known as
-Kernel Shared Memory), to fit more virtual machines into physical memory,
-by sharing the data common between them.  But it can be useful to any
-application which generates many instances of the same data.
-
-The KSM daemon ksmd periodically scans those areas of user memory
-which have been registered with it, looking for pages of identical
-content which can be replaced by a single write-protected page (which
-is automatically copied if a process later wants to update its
-content). The amount of pages that KSM daemon scans in a single pass
-and the time between the passes are configured using :ref:`sysfs
-intraface <ksm_sysfs>`
-
-KSM only merges anonymous (private) pages, never pagecache (file) pages.
-KSM's merged pages were originally locked into kernel memory, but can now
-be swapped out just like other user pages (but sharing is broken when they
-are swapped back in: ksmd must rediscover their identity and merge again).
-
-Controlling KSM with madvise
-============================
-
-KSM only operates on those areas of address space which an application
-has advised to be likely candidates for merging, by using the madvise(2)
-system call::
-
-	int madvise(addr, length, MADV_MERGEABLE)
-
-The app may call
-
-::
-
-	int madvise(addr, length, MADV_UNMERGEABLE)
-
-to cancel that advice and restore unshared pages: whereupon KSM
-unmerges whatever it merged in that range.  Note: this unmerging call
-may suddenly require more memory than is available - possibly failing
-with EAGAIN, but more probably arousing the Out-Of-Memory killer.
-
-If KSM is not configured into the running kernel, madvise MADV_MERGEABLE
-and MADV_UNMERGEABLE simply fail with EINVAL.  If the running kernel was
-built with CONFIG_KSM=y, those calls will normally succeed: even if the
-the KSM daemon is not currently running, MADV_MERGEABLE still registers
-the range for whenever the KSM daemon is started; even if the range
-cannot contain any pages which KSM could actually merge; even if
-MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
-
-If a region of memory must be split into at least one new MADV_MERGEABLE
-or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
-will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt).
-
-Like other madvise calls, they are intended for use on mapped areas of
-the user address space: they will report ENOMEM if the specified range
-includes unmapped gaps (though working on the intervening mapped areas),
-and might fail with EAGAIN if not enough memory for internal structures.
-
-Applications should be considerate in their use of MADV_MERGEABLE,
-restricting its use to areas likely to benefit.  KSM's scans may use a lot
-of processing power: some installations will disable KSM for that reason.
-
-.. _ksm_sysfs:
-
-KSM daemon sysfs interface
-==========================
-
-The KSM daemon is controlled by sysfs files in ``/sys/kernel/mm/ksm/``,
-readable by all but writable only by root:
-
-pages_to_scan
-        how many pages to scan before ksmd goes to sleep
-        e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``.
-
-        Default: 100 (chosen for demonstration purposes)
-
-sleep_millisecs
-        how many milliseconds ksmd should sleep before next scan
-        e.g. ``echo 20 > /sys/kernel/mm/ksm/sleep_millisecs``
-
-        Default: 20 (chosen for demonstration purposes)
-
-merge_across_nodes
-        specifies if pages from different NUMA nodes can be merged.
-        When set to 0, ksm merges only pages which physically reside
-        in the memory area of same NUMA node. That brings lower
-        latency to access of shared pages. Systems with more nodes, at
-        significant NUMA distances, are likely to benefit from the
-        lower latency of setting 0. Smaller systems, which need to
-        minimize memory usage, are likely to benefit from the greater
-        sharing of setting 1 (default). You may wish to compare how
-        your system performs under each setting, before deciding on
-        which to use. ``merge_across_nodes`` setting can be changed only
-        when there are no ksm shared pages in the system: set run 2 to
-        unmerge pages first, then to 1 after changing
-        ``merge_across_nodes``, to remerge according to the new setting.
-
-        Default: 1 (merging across nodes as in earlier releases)
-
-run
-        * set to 0 to stop ksmd from running but keep merged pages,
-        * set to 1 to run ksmd e.g. ``echo 1 > /sys/kernel/mm/ksm/run``,
-        * set to 2 to stop ksmd and unmerge all pages currently merged, but
-	  leave mergeable areas registered for next run.
-
-        Default: 0 (must be changed to 1 to activate KSM, except if
-        CONFIG_SYSFS is disabled)
-
-use_zero_pages
-        specifies whether empty pages (i.e. allocated pages that only
-        contain zeroes) should be treated specially.  When set to 1,
-        empty pages are merged with the kernel zero page(s) instead of
-        with each other as it would happen normally. This can improve
-        the performance on architectures with coloured zero pages,
-        depending on the workload. Care should be taken when enabling
-        this setting, as it can potentially degrade the performance of
-        KSM for some workloads, for example if the checksums of pages
-        candidate for merging match the checksum of an empty
-        page. This setting can be changed at any time, it is only
-        effective for pages merged after the change.
-
-        Default: 0 (normal KSM behaviour as in earlier releases)
-
-max_page_sharing
-        Maximum sharing allowed for each KSM page. This enforces a
-        deduplication limit to avoid high latency for virtual memory
-        operations that involve traversal of the virtual mappings that
-        share the KSM page. The minimum value is 2 as a newly created
-        KSM page will have at least two sharers. The higher this value
-        the faster KSM will merge the memory and the higher the
-        deduplication factor will be, but the slower the worst case
-        virtual mappings traversal could be for any given KSM
-        page. Slowing down this traversal means there will be higher
-        latency for certain virtual memory operations happening during
-        swapping, compaction, NUMA balancing and page migration, in
-        turn decreasing responsiveness for the caller of those virtual
-        memory operations. The scheduler latency of other tasks not
-        involved with the VM operations doing the virtual mappings
-        traversal is not affected by this parameter as these
-        traversals are always schedule friendly themselves.
-
-stable_node_chains_prune_millisecs
-        specifies how frequently KSM checks the metadata of the pages
-        that hit the deduplication limit for stale information.
-        Smaller milllisecs values will free up the KSM metadata with
-        lower latency, but they will make ksmd use more CPU during the
-        scan. It's a noop if not a single KSM page hit the
-        ``max_page_sharing`` yet.
-
-The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
-
-pages_shared
-        how many shared pages are being used
-pages_sharing
-        how many more sites are sharing them i.e. how much saved
-pages_unshared
-        how many pages unique but repeatedly checked for merging
-pages_volatile
-        how many pages changing too fast to be placed in a tree
-full_scans
-        how many times all mergeable areas have been scanned
-stable_node_chains
-        the number of KSM pages that hit the ``max_page_sharing`` limit
-stable_node_dups
-        number of duplicated KSM pages
-
-A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
-sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
-indicates wasted effort.  ``pages_volatile`` embraces several
-different kinds of activity, but a high proportion there would also
-indicate poor use of madvise MADV_MERGEABLE.
-
-The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
-``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
-be increased accordingly.
+The userspace interface of KSM is described in :ref:`Documentation/admin-guide/mm/ksm.rst <admin_guide_ksm>`
 
 Design
 ======
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 5/7] docs/vm: ksm: update stable_node_chains_prune_millisecs description
From: Mike Rapoport @ 2018-04-24  6:40 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Andrea Arcangeli, linux-doc, linux-mm, lkml,
	Mike Rapoport
In-Reply-To: <1524552028-7017-1-git-send-email-rppt@linux.vnet.ibm.com>

Make the description of stable_node_chains_prune_millisecs sysfs parameter
less implementation aware and add a few words about this parameter in the
"Design" section.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/ksm.rst | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/Documentation/vm/ksm.rst b/Documentation/vm/ksm.rst
index 00961b8..18d7c71 100644
--- a/Documentation/vm/ksm.rst
+++ b/Documentation/vm/ksm.rst
@@ -150,14 +150,12 @@ max_page_sharing
         traversals are always schedule friendly themselves.
 
 stable_node_chains_prune_millisecs
-        How frequently to walk the whole list of stable_node "dups"
-        linked in the stable_node "chains" in order to prune stale
-        stable_nodes. Smaller milllisecs values will free up the KSM
-        metadata with lower latency, but they will make ksmd use more
-        CPU during the scan. This only applies to the stable_node
-        chains so it's a noop if not a single KSM page hit the
-        ``max_page_sharing`` yet (there would be no stable_node chains in
-        such case).
+        specifies how frequently KSM checks the metadata of the pages
+        that hit the deduplication limit for stale information.
+        Smaller milllisecs values will free up the KSM metadata with
+        lower latency, but they will make ksmd use more CPU during the
+        scan. It's a noop if not a single KSM page hit the
+        ``max_page_sharing`` yet.
 
 The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
 
@@ -249,6 +247,11 @@ deduplication factor at the expense of slower worst case for rmap
 walks for any KSM page which can happen during swapping, compaction,
 NUMA balancing and page migration.
 
+The whole list of stable_node "dups" linked in the stable_node
+"chains" is scanned periodically in order to prune stale stable_nodes.
+The frequency of such scans is defined by
+``stable_node_chains_prune_millisecs`` sysfs tunable.
+
 Reference
 ---------
 .. kernel-doc:: mm/ksm.c
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 3/7] docs/vm: ksm: add "Design" section
From: Mike Rapoport @ 2018-04-24  6:40 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Andrea Arcangeli, linux-doc, linux-mm, lkml,
	Mike Rapoport
In-Reply-To: <1524552028-7017-1-git-send-email-rppt@linux.vnet.ibm.com>

Include the KSM description from the source code comment, add a subsection
about reverse mapping and include kernel-doc references for KSM data
structures.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/ksm.rst | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/vm/ksm.rst b/Documentation/vm/ksm.rst
index 786d460..0e5a085 100644
--- a/Documentation/vm/ksm.rst
+++ b/Documentation/vm/ksm.rst
@@ -206,6 +206,45 @@ stable_node "dups" with few rmap_items in them, but that may increase
 the ksmd CPU usage and possibly slowdown the readonly computations on
 the KSM pages of the applications.
 
+Design
+======
+
+Overview
+--------
+
+.. kernel-doc:: mm/ksm.c
+   :DOC: Overview
+
+Reverse mapping
+---------------
+KSM maintains reverse mapping information for KSM pages in the stable
+tree.
+
+If a KSM page is shared between less than ``max_page_sharing`` VMAs,
+the node of the stable tree that represents such KSM page points to a
+list of :c:type:`struct rmap_item` and the ``page->mapping`` of the
+KSM page points to the stable tree node.
+
+When the sharing passes this threshold, KSM adds a second dimension to
+the stable tree. The tree node becomes a "chain" that links one or
+more "dups". Each "dup" keeps reverse mapping information for a KSM
+page with ``page->mapping`` pointing to that "dup".
+
+Every "chain" and all "dups" linked into a "chain" enforce the
+invariant that they represent the same write protected memory content,
+even if each "dup" will be pointed by a different KSM page copy of
+that content.
+
+This way the stable tree lookup computational complexity is unaffected
+if compared to an unlimited list of reverse mappings. It is still
+enforced that there cannot be KSM page content duplicates in the
+stable tree itself.
+
+Reference
+---------
+.. kernel-doc:: mm/ksm.c
+   :functions: mm_slot ksm_scan stable_node rmap_item
+
 --
 Izik Eidus,
 Hugh Dickins, 17 Nov 2009
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 4/7] docs/vm: ksm: reshuffle text between "sysfs" and "design" sections
From: Mike Rapoport @ 2018-04-24  6:40 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, Andrea Arcangeli, linux-doc, linux-mm, lkml,
	Mike Rapoport
In-Reply-To: <1524552028-7017-1-git-send-email-rppt@linux.vnet.ibm.com>

The description of "max_page_sharing" sysfs attribute includes lots of
implementation details that more naturally belong in the "Design"
section.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/ksm.rst | 51 ++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/Documentation/vm/ksm.rst b/Documentation/vm/ksm.rst
index 0e5a085..00961b8 100644
--- a/Documentation/vm/ksm.rst
+++ b/Documentation/vm/ksm.rst
@@ -133,31 +133,21 @@ use_zero_pages
 
 max_page_sharing
         Maximum sharing allowed for each KSM page. This enforces a
-        deduplication limit to avoid the virtual memory rmap lists to
-        grow too large. The minimum value is 2 as a newly created KSM
-        page will have at least two sharers. The rmap walk has O(N)
-        complexity where N is the number of rmap_items (i.e. virtual
-        mappings) that are sharing the page, which is in turn capped
-        by ``max_page_sharing``. So this effectively spreads the linear
-        O(N) computational complexity from rmap walk context over
-        different KSM pages. The ksmd walk over the stable_node
-        "chains" is also O(N), but N is the number of stable_node
-        "dups", not the number of rmap_items, so it has not a
-        significant impact on ksmd performance. In practice the best
-        stable_node "dup" candidate will be kept and found at the head
-        of the "dups" list. The higher this value the faster KSM will
-        merge the memory (because there will be fewer stable_node dups
-        queued into the stable_node chain->hlist to check for pruning)
-        and the higher the deduplication factor will be, but the
-        slowest the worst case rmap walk could be for any given KSM
-        page. Slowing down the rmap_walk means there will be higher
+        deduplication limit to avoid high latency for virtual memory
+        operations that involve traversal of the virtual mappings that
+        share the KSM page. The minimum value is 2 as a newly created
+        KSM page will have at least two sharers. The higher this value
+        the faster KSM will merge the memory and the higher the
+        deduplication factor will be, but the slower the worst case
+        virtual mappings traversal could be for any given KSM
+        page. Slowing down this traversal means there will be higher
         latency for certain virtual memory operations happening during
         swapping, compaction, NUMA balancing and page migration, in
         turn decreasing responsiveness for the caller of those virtual
         memory operations. The scheduler latency of other tasks not
-        involved with the VM operations doing the rmap walk is not
-        affected by this parameter as the rmap walks are always
-        schedule friendly themselves.
+        involved with the VM operations doing the virtual mappings
+        traversal is not affected by this parameter as these
+        traversals are always schedule friendly themselves.
 
 stable_node_chains_prune_millisecs
         How frequently to walk the whole list of stable_node "dups"
@@ -240,6 +230,25 @@ if compared to an unlimited list of reverse mappings. It is still
 enforced that there cannot be KSM page content duplicates in the
 stable tree itself.
 
+The deduplication limit enforced by ``max_page_sharing`` is required
+to avoid the virtual memory rmap lists to grow too large. The rmap
+walk has O(N) complexity where N is the number of rmap_items
+(i.e. virtual mappings) that are sharing the page, which is in turn
+capped by ``max_page_sharing``. So this effectively spreads the linear
+O(N) computational complexity from rmap walk context over different
+KSM pages. The ksmd walk over the stable_node "chains" is also O(N),
+but N is the number of stable_node "dups", not the number of
+rmap_items, so it has not a significant impact on ksmd performance. In
+practice the best stable_node "dup" candidate will be kept and found
+at the head of the "dups" list.
+
+High values of ``max_page_sharing`` result in faster memory merging
+(because there will be fewer stable_node dups queued into the
+stable_node chain->hlist to check for pruning) and higher
+deduplication factor at the expense of slower worst case for rmap
+walks for any KSM page which can happen during swapping, compaction,
+NUMA balancing and page migration.
+
 Reference
 ---------
 .. kernel-doc:: mm/ksm.c
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox