Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
From: Conor Dooley @ 2024-04-02 18:23 UTC (permalink / raw)
  To: Rob Herring
  Cc: Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner, devicetree,
	linux-arm-kernel, linux-rockchip, linux-kernel
In-Reply-To: <20240401204959.1698106-1-robh@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 251 bytes --]

On Mon, Apr 01, 2024 at 03:49:58PM -0500, Rob Herring wrote:
> 'pcie-phy' is missing any type. Add 'type: object' to indicate it's a
> node.
> 
> Signed-off-by: Rob Herring <robh@kernel.org>

Acked-by: Conor Dooley <conor.dooley@microchip.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 2/3] dt-bindings: phy: Add i.MX8Q HSIO SerDes PHY binding
From: Conor Dooley @ 2024-04-02 18:16 UTC (permalink / raw)
  To: Frank Li
  Cc: Richard Zhu, vkoul, kishon, robh+dt, krzysztof.kozlowski+dt,
	conor+dt, linux-phy, devicetree, linux-arm-kernel, linux-kernel,
	kernel, imx
In-Reply-To: <ZgwU8edE3VFYngWR@lizhi-Precision-Tower-5810>


[-- Attachment #1.1: Type: text/plain, Size: 1310 bytes --]

On Tue, Apr 02, 2024 at 10:23:45AM -0400, Frank Li wrote:
> On Tue, Apr 02, 2024 at 01:45:03PM +0800, Richard Zhu wrote:
> > Add i.MX8QM and i.MX8QXP HSIO SerDes PHY binding.
> > - Use the controller ID to specify which controller is binded to the
> > PHY.
> > - Introduce one HSIO configuration, mandatory required to set
> > "PCIE_AB_SELECT" and "PHY_X1_EPCS_SEL" during the initialization.
> > 
> > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> 
> You missed all conor's comments. 
> Please double check v1's comments.

> > +  fsl,refclk-pad-mode:
> > +    description: |
> > +      Specifies the mode of the refclk pad used. It can be UNUSED(PHY
> > +      refclock is derived from SoC internal source), INPUT(PHY refclock
> > +      is provided externally via the refclk pad) or OUTPUT(PHY refclock
> > +      is derived from SoC internal source and provided on the refclk pad).
> > +      Refer include/dt-bindings/phy/phy-imx8-pcie.h for the constants
> > +      to be used.
> > +    $ref: /schemas/types.yaml#/definitions/uint32
> > +    enum: [ 0, 1, 2 ]
> 
> I remember needn't enum because there are header file.

Yah, specifically my comments about this property were missed and were
probably the most meaningful comments I left.

Thanks for the reminder Frank.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 17/18] dt-bindings: pci: rockchip,rk3399-pcie-ep: Add ep-gpios property
From: Krzysztof Kozlowski @ 2024-04-02 18:10 UTC (permalink / raw)
  To: Damien Le Moal, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Kishon Vijay Abraham I, Shawn Lin, Krzysztof Wilczyński,
	Bjorn Helgaas, Heiko Stuebner, linux-pci, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, devicetree
  Cc: linux-rockchip, linux-arm-kernel, Rick Wertenbroek,
	Wilfred Mallawa, Niklas Cassel
In-Reply-To: <be2a0fa0-9d5d-45c3-810a-56d6924c8891@kernel.org>

On 02/04/2024 09:55, Damien Le Moal wrote:
> On 4/2/24 16:38, Damien Le Moal wrote:
>> On 4/2/24 16:33, Krzysztof Kozlowski wrote:
>>> On 02/04/2024 01:36, Damien Le Moal wrote:
>>>> On 4/1/24 18:57, Krzysztof Kozlowski wrote:
>>>>> On 01/04/2024 01:06, Damien Le Moal wrote:
>>>>>> On 3/30/24 18:16, Krzysztof Kozlowski wrote:
>>>>>>> On 30/03/2024 05:19, Damien Le Moal wrote:
>>>>>>>> From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
>>>>>>>>
>>>>>>>> Describe the `ep-gpios` property which is used to map the PERST# input
>>>>>>>> signal for endpoint mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
>>>>>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>>>>>> ---
>>>>>>>>  .../devicetree/bindings/pci/rockchip,rk3399-pcie-ep.yaml       | 3 +++
>>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/devicetree/bindings/pci/rockchip,rk3399-pcie-ep.yaml b/Documentation/devicetree/bindings/pci/rockchip,rk3399-pcie-ep.yaml
>>>>>>>> index 6b62f6f58efe..9331d44d6963 100644
>>>>>>>> --- a/Documentation/devicetree/bindings/pci/rockchip,rk3399-pcie-ep.yaml
>>>>>>>> +++ b/Documentation/devicetree/bindings/pci/rockchip,rk3399-pcie-ep.yaml
>>>>>>>> @@ -30,6 +30,9 @@ properties:
>>>>>>>>      maximum: 32
>>>>>>>>      default: 32
>>>>>>>>  
>>>>>>>> +  ep-gpios:
>>>>>>>> +    description: Input GPIO configured for the PERST# signal.
>>>>>>>
>>>>>>> Missing maxItems. But more important: why existing property perst-gpios,
>>>>>>> which you already have there in common schema, is not correct for this case?
>>>>>>
>>>>>> I am confused... Where do you find perst-gpios defined for the rk3399 ?
>>>>>> Under Documentation/devicetree/bindings/pci/, the only schema I see using
>>>>>> perst-gpios property are for the qcom (Qualcomm) controllers.
>>>>>
>>>>> You are right, it's so far only in Qualcomm.
>>>>>
>>>>>> The RC bindings for the rockchip rk3399 PCIe controller
>>>>>> (pci/rockchip,rk3399-pcie.yaml) already define the ep-gpios property. So if
>>>>>
>>>>> Any reason why this cannot be named like GPIO? Is there already a user
>>>>> of this in Linux kernel? Commit msg says nothing about this, so that's
>>>>> why I would expect name matching the signal.
>>>>
>>>> The RC-mode PCIe controller node of the rk3399 DTS already defines the ep-gpios
>>>> property for RC side PERST# signal handling. So we simply reused the exact same
>>>> name to be consistent between RC and EP. I personnally have no preferences. If
>>>> there is an effort to rename such signal with some preferred pattern, I will
>>>> follow. For the EP node, there was no PERST signal handling in the driver and
>>>> no property defined for it, so any name is fine. "perst-gpios" would indeed be
>>>> a better name, but again, given that the RC controller node has ep-gpios, we
>>>> reused that. What is your recommendation here ?
>>>
>>> Actually I don't know, perst and ep would work for me. If you do not
>>> have code for this in the driver yet (nothing is shared between ep and
>>> host), then maybe let's go with perst to match the actual name.
>>
>> That works for me. The other simple solution would be to move the RC node
>> ep-gpios description to the common schema pci/rockchip,rk3399-pcie-common.yaml,
>> maybe ? Otherwise, perst-gpios like the Qualcomm schemas would be nice too.
> 
> Thinking more about this, I think moving the ep-gpios description to the common
> schema is the right thing to do given that the driver uses common code between
> RC and EP to get that property. But if that is not acceptable, I can rename it
> and get that property in the controller EP mode initialization code. That will
> be add a little more code in the driver.

I forgot that it is actually the same hardware, so if host has
"ep-gpios" already then EP mode should have the same property. Common
schema is good idea.


Best regards,
Krzysztof


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] dt-bindings: watchdog: Convert Aspeed binding to DT schema
From: Rob Herring @ 2024-04-02 18:07 UTC (permalink / raw)
  To: Andrew Jeffery
  Cc: wim, linux, krzysztof.kozlowski+dt, conor+dt, joel, zev,
	linux-watchdog, devicetree, linux-arm-kernel, linux-aspeed,
	linux-kernel
In-Reply-To: <20240402120118.282035-1-andrew@codeconstruct.com.au>

On Tue, Apr 02, 2024 at 10:31:18PM +1030, Andrew Jeffery wrote:
> Squash warnings such as:
> 
> ```
> arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-galaxy100.dtb: /ahb/apb@1e600000/watchdog@1e785000: failed to match any schema with compatible: ['aspeed,ast2400-wdt']
> ```
> 
> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
> ---
>  .../bindings/watchdog/aspeed,ast2400-wdt.yaml | 130 ++++++++++++++++++
>  .../bindings/watchdog/aspeed-wdt.txt          |  73 ----------
>  2 files changed, 130 insertions(+), 73 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml
>  delete mode 100644 Documentation/devicetree/bindings/watchdog/aspeed-wdt.txt
> 
> diff --git a/Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml b/Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml
> new file mode 100644
> index 000000000000..10fcb50c4051
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml
> @@ -0,0 +1,130 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/watchdog/aspeed,ast2400-wdt.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Aspeed watchdog timer controllers
> +
> +maintainers:
> +  - Andrew Jeffery <andrew@codeconstruct.com.au>
> +
> +properties:
> +  compatible:
> +    enum:
> +      - aspeed,ast2400-wdt
> +      - aspeed,ast2500-wdt
> +      - aspeed,ast2600-wdt
> +
> +  reg:
> +    maxItems: 1
> +
> +  clocks: true

# and order/function if more than 1 must be defined.

Please note it was missing from the original binding in the commit 
message.

> +
> +  aspeed,reset-type:
> +    enum:
> +      - cpu
> +      - soc
> +      - system
> +      - none
> +    description: |
> +      Reset behaviour - The watchdog can be programmed to generate one of three
> +      different types of reset when a timeout occcurs.
> +
> +      Specifying 'cpu' will only reset the processor on a timeout event.
> +
> +      Specifying 'soc' will reset a configurable subset of the SoC's controllers
> +      on a timeout event. Controllers critical to the SoC's operation may remain untouched.
> +
> +      Specifying 'system' will reset all controllers on a timeout event, as if EXTRST had been asserted.
> +      Specifying "none" will cause the timeout event to have no reset effect.
> +      Another watchdog engine on the chip must be used for chip reset operations.
> +
> +      The default reset type is "system"

Express as schema:

default: system

> +
> +  aspeed,alt-boot:
> +    $ref: /schemas/types.yaml#/definitions/flag
> +    description: |

Don't need '|' if no formatting to preserve.

> +      Direct the watchdog to configure the SoC to boot from the alternative boot
> +      region if a timeout occurs.
> +
> +  aspeed,external-signal:
> +    $ref: /schemas/types.yaml#/definitions/flag
> +    description: |
> +      Assert the timeout event on an external signal pin associated with the
> +      watchdog controller instance. The pin must be muxed appropriately.
> +
> +  aspeed,ext-pulse-duration:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      The duration, in microseconds, of the pulse emitted on the external signal pin

Wrap at <80. Period at end needed.

> +
> +  aspeed,ext-push-pull:
> +    $ref: /schemas/types.yaml#/definitions/flag
> +    description: |
> +      If aspeed,external-signal is specified in the node, set the external
> +      signal pin's drive type to push-pull. If aspeed,ext-push-pull is not
> +      specified then the pin is configured as open-drain.
> +
> +  aspeed,ext-active-high:
> +    $ref: /schemas/types.yaml#/definitions/flag
> +    description: |
> +      If both aspeed,external-signal and aspeed,ext-push-pull are specified in
> +      the node, set the pulse polarity to active-high. If aspeed,ext-active-high
> +      is not specified then the pin is configured as active-low.
> +
> +  aspeed,reset-mask:
> +    $ref: /schemas/types.yaml#/definitions/uint32-array
> +    minItems: 1
> +    maxItems: 2
> +    description: |
> +      A bitmaks indicating which peripherals will be reset if the watchdog
> +      timer expires. On AST2500 SoCs this should be a single word defined using
> +      the AST2500_WDT_RESET_* macros; on AST2600 SoCs this should be a two-word
> +      array with the first word defined using the AST2600_WDT_RESET1_* macros,
> +      and the second word defined using the AST2600_WDT_RESET2_* macros.
> +
> +required:
> +  - compatible
> +  - reg
> +
> +allOf:
> +  - if:
> +      anyOf:
> +        - required:
> +            - aspeed,ext-push-pull
> +        - required:
> +            - aspeed,ext-active-high
> +        - required:
> +            - aspeed,reset-mask
> +    then:
> +      properties:
> +        compatible:
> +          enum:
> +            - aspeed,ast2500-wdt
> +            - aspeed,ast2600-wdt
> +  - if:
> +      required:
> +        - aspeed,ext-active-high
> +    then:
> +      required:
> +        - aspeed,ext-push-pull
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    wdt1: watchdog@1e785000 {

Drop unused labels.

> +        compatible = "aspeed,ast2400-wdt";
> +        reg = <0x1e785000 0x1c>;
> +        aspeed,reset-type = "system";
> +        aspeed,external-signal;
> +    };
> +  - |
> +    #include <dt-bindings/watchdog/aspeed-wdt.h>
> +    wdt2: watchdog@1e785040 {
> +        compatible = "aspeed,ast2600-wdt";
> +        reg = <0x1e785040 0x40>;
> +        aspeed,reset-mask = <AST2600_WDT_RESET1_DEFAULT
> +                            (AST2600_WDT_RESET2_DEFAULT & ~AST2600_WDT_RESET2_LPC)>;
> +    };
> diff --git a/Documentation/devicetree/bindings/watchdog/aspeed-wdt.txt b/Documentation/devicetree/bindings/watchdog/aspeed-wdt.txt
> deleted file mode 100644
> index 3208adb3e52e..000000000000
> --- a/Documentation/devicetree/bindings/watchdog/aspeed-wdt.txt
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -Aspeed Watchdog Timer
> -
> -Required properties:
> - - compatible: must be one of:
> -	- "aspeed,ast2400-wdt"
> -	- "aspeed,ast2500-wdt"
> -	- "aspeed,ast2600-wdt"
> -
> - - reg: physical base address of the controller and length of memory mapped
> -   region
> -
> -Optional properties:
> -
> - - aspeed,reset-type = "cpu|soc|system|none"
> -
> -   Reset behavior - Whenever a timeout occurs the watchdog can be programmed
> -   to generate one of three different, mutually exclusive, types of resets.
> -
> -   Type "none" can be specified to indicate that no resets are to be done.
> -   This is useful in situations where another watchdog engine on chip is
> -   to perform the reset.
> -
> -   If 'aspeed,reset-type=' is not specified the default is to enable system
> -   reset.
> -
> -   Reset types:
> -
> -        - cpu: Reset CPU on watchdog timeout
> -
> -        - soc: Reset 'System on Chip' on watchdog timeout
> -
> -        - system: Reset system on watchdog timeout
> -
> -        - none: No reset is performed on timeout. Assumes another watchdog
> -                engine is responsible for this.
> -
> - - aspeed,alt-boot:    If property is present then boot from alternate block.
> - - aspeed,external-signal: If property is present then signal is sent to
> -			external reset counter (only WDT1 and WDT2). If not
> -			specified no external signal is sent.
> - - aspeed,ext-pulse-duration: External signal pulse duration in microseconds
> -
> -Optional properties for AST2500-compatible watchdogs:
> - - aspeed,ext-push-pull: If aspeed,external-signal is present, set the pin's
> -			 drive type to push-pull. The default is open-drain.
> - - aspeed,ext-active-high: If aspeed,external-signal is present and and the pin
> -			   is configured as push-pull, then set the pulse
> -			   polarity to active-high. The default is active-low.
> -
> -Optional properties for AST2500- and AST2600-compatible watchdogs:
> - - aspeed,reset-mask: A bitmask indicating which peripherals will be reset if
> -		      the watchdog timer expires.  On AST2500 this should be a
> -		      single word defined using the AST2500_WDT_RESET_* macros;
> -		      on AST2600 this should be a two-word array with the first
> -		      word defined using the AST2600_WDT_RESET1_* macros and the
> -		      second word defined using the AST2600_WDT_RESET2_* macros.
> -
> -Examples:
> -
> -	wdt1: watchdog@1e785000 {
> -		compatible = "aspeed,ast2400-wdt";
> -		reg = <0x1e785000 0x1c>;
> -		aspeed,reset-type = "system";
> -		aspeed,external-signal;
> -	};
> -
> -	#include <dt-bindings/watchdog/aspeed-wdt.h>
> -	wdt2: watchdog@1e785040 {
> -		compatible = "aspeed,ast2600-wdt";
> -		reg = <0x1e785040 0x40>;
> -		aspeed,reset-mask = <AST2600_WDT_RESET1_DEFAULT
> -				     (AST2600_WDT_RESET2_DEFAULT & ~AST2600_WDT_RESET2_LPC)>;
> -	};
> -- 
> 2.39.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] dt-bindings: mfd: syscon: Add ti,am62p-cpsw-mac-efuse compatible
From: Krzysztof Kozlowski @ 2024-04-02 18:06 UTC (permalink / raw)
  To: Siddharth Vadapalli
  Cc: lee, robh, krzk+dt, conor+dt, devicetree, linux-kernel,
	linux-arm-kernel, srk
In-Reply-To: <30065bdc-ccef-4610-b1c1-7661f801b8e9@ti.com>

On 02/04/2024 14:30, Siddharth Vadapalli wrote:
> On Tue, Apr 02, 2024 at 02:08:32PM +0200, Krzysztof Kozlowski wrote:
>> On 02/04/2024 12:57, Siddharth Vadapalli wrote:
>>> The CTRLMMR_MAC_IDx registers within the CTRL_MMR space of TI's AM62p SoC
>>> contain the MAC Address programmed in the eFuse. Add compatible for
>>> allowing the CPSW driver to obtain a regmap for the CTRLMMR_MAC_IDx
>>> registers within the System Controller device-tree node. The default MAC
>>> Address for the interface corresponding to the first MAC port will be set
>>> to the value programmed in the eFuse.
>>>
>>> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
>>> ---
>>>
>>> This patch is based on linux-next tagged next-20240402.
>>
>> Where is the DTS using it?
> 
> The current implementation in the device-tree for older TI K3 SoCs is as
> follows:
> 
> 	cpsw_port1: port@1 {
> 		reg = <1>;
> 		ti,mac-only;
> 		label = "port1";
> 		phys = <&phy_gmii_sel 1>;
> 		mac-address = [00 00 00 00 00 00];
> 		ti,syscon-efuse = <&wkup_conf 0x200>;
> 	};
> 
> The "ti,syscon-efuse" property passes the reference to the System
> Controller node as well as the offset to the CTRLMMR_MAC_IDx registers
> within the CTRL_MMR space.

Please reference upstream DTS or lore link to patch under review.

Best regards,
Krzysztof


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: Peter Xu @ 2024-04-02 17:58 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand, linux-mm, linux-kernel, Yang Shi,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Michael Ellerman,
	Andrew Jones, Muchun Song, linux-riscv, linuxppc-dev,
	Christophe Leroy, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Matthew Wilcox, Rik van Riel, linux-arm-kernel,
	Andrea Arcangeli, Aneesh Kumar K . V, Vlastimil Babka,
	James Houghton, Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <f4aff3e9-1d78-4bb8-a6f3-2887b9928b54@arm.com>

On Tue, Apr 02, 2024 at 05:46:57PM +0100, Ryan Roberts wrote:
> I'll leave you to do the testing on this, if that's ok.

Definitely.  I'll test and send formal patches.

> 
> Just to make my config explicit, I have this kernel command line, which reserves
> hugetlbs of all sizes for the tests:
> 
> "transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2"

This helps, thanks.

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: Peter Xu @ 2024-04-02 17:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ryan Roberts, linux-mm, linux-kernel, Yang Shi,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Michael Ellerman,
	Andrew Jones, Muchun Song, linux-riscv, linuxppc-dev,
	Christophe Leroy, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Matthew Wilcox, Rik van Riel, linux-arm-kernel,
	Andrea Arcangeli, Aneesh Kumar K . V, Vlastimil Babka,
	James Houghton, Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <8b0b24bb-3c38-4f27-a2c9-f7d7adc4a115@redhat.com>

On Tue, Apr 02, 2024 at 06:39:31PM +0200, David Hildenbrand wrote:
> On 02.04.24 18:20, Peter Xu wrote:
> > On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
> > > On 02.04.24 16:48, Ryan Roberts wrote:
> > > > Hi Peter,
> > 
> > Hey, Ryan,
> > 
> > Thanks for the report!
> > 
> > > > 
> > > > On 27/03/2024 15:23, peterx@redhat.com wrote:
> > > > > From: Peter Xu <peterx@redhat.com>
> > > > > 
> > > > > Now follow_page() is ready to handle hugetlb pages in whatever form, and
> > > > > over all architectures.  Switch to the generic code path.
> > > > > 
> > > > > Time to retire hugetlb_follow_page_mask(), following the previous
> > > > > retirement of follow_hugetlb_page() in 4849807114b8.
> > > > > 
> > > > > There may be a slight difference of how the loops run when processing slow
> > > > > GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each
> > > > > loop of __get_user_pages() will resolve one pgtable entry with the patch
> > > > > applied, rather than relying on the size of hugetlb hstate, the latter may
> > > > > cover multiple entries in one loop.
> > > > > 
> > > > > A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over
> > > > > a tight loop of slow gup after the path switched.  That shouldn't be a
> > > > > problem because slow-gup should not be a hot path for GUP in general: when
> > > > > page is commonly present, fast-gup will already succeed, while when the
> > > > > page is indeed missing and require a follow up page fault, the slow gup
> > > > > degrade will probably buried in the fault paths anyway.  It also explains
> > > > > why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup:
> > > > > accelerate thp gup even for "pages != NULL"") lands, the latter not part of
> > > > > a performance analysis but a side benefit.  If the performance will be a
> > > > > concern, we can consider handle CONT_PTE in follow_page().
> > > > > 
> > > > > Before that is justified to be necessary, keep everything clean and simple.
> > > > > 
> > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > 
> > > > Afraid I'm seeing an oops when running gup_longterm test on arm64 with current mm-unstable. Git bisect blames this patch. The oops reproduces for me every time on 2 different machines:
> > > > 
> > > > 
> > > > [    9.340416] kernel BUG at mm/gup.c:778!
> > > > [    9.340746] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> > > > [    9.341199] Modules linked in:
> > > > [    9.341481] CPU: 1 PID: 1159 Comm: gup_longterm Not tainted 6.9.0-rc2-00210-g910ff1a347e4 #11
> > > > [    9.342232] Hardware name: linux,dummy-virt (DT)
> > > > [    9.342647] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [    9.343195] pc : follow_page_mask+0x4d4/0x880
> > > > [    9.343580] lr : follow_page_mask+0x4d4/0x880
> > > > [    9.344018] sp : ffff8000898b3aa0
> > > > [    9.344345] x29: ffff8000898b3aa0 x28: fffffdffc53973e8 x27: 00003c0005d08000
> > > > [    9.345028] x26: ffff00014e5cfd08 x25: ffffd3513a40c000 x24: fffffdffc5d08000
> > > > [    9.345682] x23: ffffc1ffc0000000 x22: 0000000000080101 x21: ffff8000898b3ba8
> > > > [    9.346337] x20: 0000fffff4200000 x19: ffff00014e52d508 x18: 0000000000000010
> > > > [    9.347005] x17: 5f656e6f7a5f7369 x16: 2120262620296567 x15: 6170286461654865
> > > > [    9.347713] x14: 6761502128454741 x13: 2929656761702865 x12: 6761705f65636976
> > > > [    9.348371] x11: 65645f656e6f7a5f x10: ffffd3513b31d6e0 x9 : ffffd3513852f090
> > > > [    9.349062] x8 : 00000000ffffefff x7 : ffffd3513b31d6e0 x6 : 0000000000000000
> > > > [    9.349753] x5 : ffff00017ff98cc8 x4 : 0000000000000fff x3 : 0000000000000000
> > > > [    9.350397] x2 : 0000000000000000 x1 : ffff000190e8b480 x0 : 0000000000000052
> > > > [    9.351097] Call trace:
> > > > [    9.351312]  follow_page_mask+0x4d4/0x880
> > > > [    9.351700]  __get_user_pages+0xf4/0x3e8
> > > > [    9.352089]  __gup_longterm_locked+0x204/0xa70
> > > > [    9.352516]  pin_user_pages+0x88/0xc0
> > > > [    9.352873]  gup_test_ioctl+0x860/0xc40
> > > > [    9.353249]  __arm64_sys_ioctl+0xb0/0x100
> > > > [    9.353648]  invoke_syscall+0x50/0x128
> > > > [    9.354022]  el0_svc_common.constprop.0+0x48/0xf8
> > > > [    9.354488]  do_el0_svc+0x28/0x40
> > > > [    9.354822]  el0_svc+0x34/0xe0
> > > > [    9.355128]  el0t_64_sync_handler+0x13c/0x158
> > > > [    9.355489]  el0t_64_sync+0x190/0x198
> > > > [    9.355793] Code: aa1803e0 d000d8e1 91220021 97fff560 (d4210000)
> > > > [    9.356280] ---[ end trace 0000000000000000 ]---
> > > > [    9.356651] note: gup_longterm[1159] exited with irqs disabled
> > > > [    9.357141] note: gup_longterm[1159] exited with preempt_count 2
> > > > [    9.358033] ------------[ cut here ]------------
> > > > [    9.358800] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x108/0x120
> > > > [    9.360157] Modules linked in:
> > > > [    9.360541] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.9.0-rc2-00210-g910ff1a347e4 #11
> > > > [    9.361626] Hardware name: linux,dummy-virt (DT)
> > > > [    9.362087] pstate: 204003c5 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [    9.362758] pc : ct_kernel_exit.constprop.0+0x108/0x120
> > > > [    9.363306] lr : ct_idle_enter+0x10/0x20
> > > > [    9.363845] sp : ffff8000801abdc0
> > > > [    9.364222] x29: ffff8000801abdc0 x28: 0000000000000000 x27: 0000000000000000
> > > > [    9.364961] x26: 0000000000000000 x25: ffff00014149d780 x24: 0000000000000000
> > > > [    9.365557] x23: 0000000000000000 x22: ffffd3513b299d48 x21: ffffd3513a785730
> > > > [    9.366239] x20: ffffd3513b299c28 x19: ffff00017ffa7da0 x18: 0000fffff5ffffff
> > > > [    9.366869] x17: 0000000000000000 x16: 1fffe0002a21a8c1 x15: 0000000000000000
> > > > [    9.367524] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
> > > > [    9.368207] x11: 0000000000000001 x10: 0000000000000ad0 x9 : ffffd35138589230
> > > > [    9.369123] x8 : ffff00014149e2b0 x7 : 0000000000000000 x6 : 000000000f8c0fb2
> > > > [    9.370403] x5 : 4000000000000002 x4 : ffff2cb045825000 x3 : ffff8000801abdc0
> > > > [    9.371170] x2 : ffffd3513a782da0 x1 : 4000000000000000 x0 : ffffd3513a782da0
> > > > [    9.372279] Call trace:
> > > > [    9.372519]  ct_kernel_exit.constprop.0+0x108/0x120
> > > > [    9.373216]  ct_idle_enter+0x10/0x20
> > > > [    9.373562]  default_idle_call+0x3c/0x160
> > > > [    9.374055]  do_idle+0x21c/0x280
> > > > [    9.374394]  cpu_startup_entry+0x3c/0x50
> > > > [    9.374797]  secondary_start_kernel+0x140/0x168
> > > > [    9.375220]  __secondary_switched+0xb8/0xc0
> > > > [    9.375875] ---[ end trace 0000000000000000 ]---
> > > > 
> > > > 
> > > > The oops trigger is at mm/gup.c:778:
> > > > VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
> > > > 
> > > > 
> > > > This is the output of gup_longterm (last output is just before oops):
> > > > 
> > > > # [INFO] detected hugetlb page size: 2048 KiB
> > > > # [INFO] detected hugetlb page size: 32768 KiB
> > > > # [INFO] detected hugetlb page size: 64 KiB
> > > > # [INFO] detected hugetlb page size: 1048576 KiB
> > > > TAP version 13
> > > > 1..70
> > > > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
> > > > ok 1 Should have worked
> > > > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
> > > > ok 2 Should have failed
> > > > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
> > > > ok 3 Should have failed
> > > > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
> > > > ok 4 Should have worked
> > > > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
> > > > 
> > > > 
> > > > So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm guessing you're trying to iterate 2M into a cont-pmd folio and ending up with an unexpected tail page?
> > > 
> > > I assume we find the expected tail page, it's just that the check
> > > 
> > > VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
> > > 
> > > Doesn't make sense with hugetlb folios. We might have a tail page mapped in
> > > a cont-pmd entry. As soon as we call follow_huge_pmd() on "not the first
> > > cont-pmd entry", we trigger this check.
> > > 
> > > Likely this sanity check must also allow for hugetlb folios. Or we should
> > > just remove it completely.
> > 
> > Right, IMHO it'll be easier we remove it, actually I see there's one more
> > at the end, so I think we need to remove both.
> > 
> > > 
> > > In the past, we wanted to make sure that we never get tail pages of THP from
> > > PMD entries, because something would currently be broken (we don't support
> > > THP > PMD).
> > 
> > There's probably one more thing we need to do, on allowing
> > PageAnonExclusive() to work with hugetlb tails. Even if we remove the
> > warnings and if I read the code right, we can BUG_ON again on checking tail
> > pages over anon-exclusive for PageHuge.
> > 
> > So I assume to fix it completely, we may need two changes: Patch 1 to
> > prepare PageAnonExclusive() to work on hugetlb tails, then patch 2 to be
> > squashed into the patch "mm/gup: handle huge pmd for follow_pmd_mask()".
> > Note: not this patch to fixup, as this patch only does the "switchover" to
> > the new path, the culprit should be the other patch..
> > 
> > I have them attached below first, before I'll also go and see whether I can
> > run some arm tests later today or tomorrow.  David, any comments from
> > anon-exclusive side?
> 
> I added the PageAnonExclusive checks for hugetlb back then, because calling
> it on a tail page indicated real trouble for hugetlb.
> 
> Well, and I didn't want to have runtime-hugetlb checks in PageAnonExclusive
> code called on certainly-not-hugetlb code paths.
> 
> Personally, I'd fixup the problematic callsite where we know nothing nasty
> is happening (like we did for gup_must_unshare(), because we don't expect
> hugetlb tail pages from arbitrary other code).
> 
> But as I'm getting closer to a folio_test_anon_exclusive() implementation as
> we speak (closer, but not done :) ... ), where I'd remove any such hugetlb
> special handling, I don't particularly care how we handle GUP here in the
> meantime.

That's what I was looking for and found missing just now, when I wanted to
allow follow_huge_pmd() pass page / folio (which will be the head then)
properly into different checks.  I think that patch 1 is the simplest I can
come up with that works mostly like what you said before a follow up
cleanup on top if possible.  It mostly pushed the existing runtime check in
gup_must_unshare() to be more generic.

IIUC it's also a matter of whether you'd want PageAnonExclusive() to take
care of both thp + hugetlb in one shot, rather than let callers handle it
by things like "if (PageHuge()) ... else ...", which I would try to avoid.
It seems so far cleaner to allow PageAnonExclusive() take whatever tail
pages, thp or hugetlb.  But maybe your ultimate patchset can be even better
than that.

Thanks,

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH RFT 01/10] arm64: dts: microchip: sparx5: fix mdio reg
From: Conor Dooley @ 2024-04-02 17:46 UTC (permalink / raw)
  To: Steen Hegelund
  Cc: Rob Herring, Daniel Machon, Bjarni Jonasson, linux-kernel,
	Krzysztof Kozlowski, Claudiu Beznea, devicetree, linux-arm-kernel,
	Krzysztof Kozlowski, UNGLinuxDriver, David S. Miller,
	Lars Povlsen
In-Reply-To: <b3d818df8819d2fb3e96fa61b277d49941d9b01b.camel@microchip.com>


[-- Attachment #1.1: Type: text/plain, Size: 2259 bytes --]

Hey,

On Tue, Apr 02, 2024 at 04:00:32PM +0200, Steen Hegelund wrote:
> On Mon, 2024-04-01 at 17:37 +0200, Krzysztof Kozlowski wrote:
> > [Some people who received this message don't often get email from
> > krzk@kernel.org. Learn why this is important at
> > https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > know the content is safe
> > 
> > Correct the reg address of mdio node to match unit address.  Assume
> > the
> > reg is not correct and unit address was correct, because there is
> > alerady node using the existing reg 0x110102d4.
> > 
> >   sparx5.dtsi:443.25-451.5: Warning (simple_bus_reg):
> > /axi@600000000/mdio@6110102f8: simple-bus unit address format error,
> > expected "6110102d4"
> > 
> > Fixes: d0f482bb06f9 ("arm64: dts: sparx5: Add the Sparx5 switch
> > node")
> > Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
> > 
> > ---
> > 
> > Not tested on hardware
> > ---
> >  arch/arm64/boot/dts/microchip/sparx5.dtsi | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/boot/dts/microchip/sparx5.dtsi
> > b/arch/arm64/boot/dts/microchip/sparx5.dtsi
> > index 24075cd91420..5d820da8c69d 100644
> > --- a/arch/arm64/boot/dts/microchip/sparx5.dtsi
> > +++ b/arch/arm64/boot/dts/microchip/sparx5.dtsi
> > @@ -447,7 +447,7 @@ mdio2: mdio@6110102f8 {
> >                         pinctrl-names = "default";
> >                         #address-cells = <1>;
> >                         #size-cells = <0>;
> > -                       reg = <0x6 0x110102d4 0x24>;
> > +                       reg = <0x6 0x110102f8 0x24>;
> >                 };
> > 
> >                 mdio3: mdio@61101031c {
> > --
> > 2.34.1
> > 
> 
> I did a check of our current Sparx5 EVBs and none of them uses
> controller 2 in any revision, so this is probably why it has not come
> up before, so as it stands we have no platform to test this change on
> currently.
> 
> Besides that the change looks good to me.
> 
> Best Regards
> Steen
> 
> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>

Are you okay with the rest of the series, or have you only looked at
this one patch?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 2/5] KVM: arm64: Add newly allocated ID registers to register descriptions
From: Mark Brown @ 2024-04-02 17:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, James Morse,
	Suzuki K Poulose, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	linux-kernel, Dave Martin, kvmarm, linux-doc, linux-kselftest
In-Reply-To: <87le5ysm4l.wl-maz@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 1681 bytes --]

On Sun, Mar 31, 2024 at 11:59:06AM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > The 2023 architecture extensions have allocated some new ID registers, add
> > them to the KVM system register descriptions so that they are visible to
> > guests.

> > We make the newly introduced dpISA features writeable, as well as
> > allowing writes to ID_AA64ISAR3_EL1.CPA for FEAT_CPA which only
> > introduces straigforward new instructions with no additional
> > architectural state or traps.

> FPMR actively gets trapped by HCRX_EL2.

Sure, I'm not clear what you're trying to say here?  The "no additional"
bit is referring to FEAT_CPA.

> > -	ID_UNALLOCATED(6,3),
> > +	ID_WRITABLE(ID_AA64ISAR3_EL1, ~(ID_AA64ISAR2_EL1_RES0 |
> > +					ID_AA64ISAR3_EL1_PACM |
> > +					ID_AA64ISAR3_EL1_TLBIW)),
> >  	ID_UNALLOCATED(6,4),
> >  	ID_UNALLOCATED(6,5),
> >  	ID_UNALLOCATED(6,6),

> Where is the code that enforces the lack of support for MTEFAR,
> MTESTOREONLY, and MTEPERM for SCTLR_ELx, EnPACM and EnFPM in HCRX_EL2?

Could you please be more explicit regarding what you're expecting to see
here?  Other than the writeability mask for the ID register I would have
expected to need explicit code to enable new features rather than
explicit code to keep currently unsupported features unsupported.  I'm
sure what you're referencing will be obvious once I see it but I'm
drawing a blank.

> And I haven't checked whether TLBI VMALLWS2 can be trapped.

I didn't see anything but I might not be aware of where to look, there
doesn't seem to be anything for that specifically in HFGITR_EL2 or
HFGITR2_EL2 which would be the main places I'd expect to find something.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 1/3] dt-bindings: phy: phy-imx8-pcie: Add binding for i.MX8Q HSIO SerDes PHY
From: Rob Herring @ 2024-04-02 17:21 UTC (permalink / raw)
  To: Richard Zhu
  Cc: vkoul, kishon, krzysztof.kozlowski+dt, frank.li, conor+dt,
	linux-phy, devicetree, linux-arm-kernel, linux-kernel, kernel,
	imx
In-Reply-To: <1712036704-21064-2-git-send-email-hongxing.zhu@nxp.com>

On Tue, Apr 02, 2024 at 01:45:02PM +0800, Richard Zhu wrote:
> Add binding for controller ID and HSIO configuration setting of the
> i.MX8Q HSIO SerDes PHY.
> 
> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> ---
>  include/dt-bindings/phy/phy-imx8-pcie.h | 29 +++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/dt-bindings/phy/phy-imx8-pcie.h b/include/dt-bindings/phy/phy-imx8-pcie.h
> index 8bbe2d6538d8..3292c8be3354 100644
> --- a/include/dt-bindings/phy/phy-imx8-pcie.h
> +++ b/include/dt-bindings/phy/phy-imx8-pcie.h
> @@ -11,4 +11,33 @@
>  #define IMX8_PCIE_REFCLK_PAD_INPUT	1
>  #define IMX8_PCIE_REFCLK_PAD_OUTPUT	2
>  
> +/*
> + * i.MX8QM HSIO subsystem has three lane PHYs and three controllers:
> + * PCIEA(2 lanes capapble PCIe controller), PCIEB (only support one

capable

> + * lane) and SATA.
> + *
> + * In the different use cases. PCIEA can be binded to PHY lane0, lane1

s/binded/bound/

And throughout your patches.

> + * or Lane0 and lane1. PCIEB can be binded to lane1 or lane2 PHY. SATA
> + * can only be binded to last lane2 PHY.
> + *
> + * Define i.MX8Q HSIO controller ID here to specify the controller
> + * binded to the PHY.
> + * Meanwhile, i.MX8QXP HSIO subsystem has one lane PHY and PCIEB(only
> + * support one lane) controller.
> + */
> +#define IMX8Q_HSIO_PCIEA_ID	0
> +#define IMX8Q_HSIO_PCIEB_ID	1
> +#define IMX8Q_HSIO_SATA_ID	2

Please use the standard phy mode defines.

> +
> +/*
> + * On i.MX8QM, PCIEA is mandatory required if the HSIO is enabled.
> + * Define configurations beside PCIEA is enabled.
> + *
> + * On i.MX8QXP, HSIO module only has PCIEB and one lane PHY.
> + * The "IMX8Q_HSIO_CFG_PCIEB" can be used on i.MX8QXP platforms.
> + */
> +#define IMX8Q_HSIO_CFG_SATA		1
> +#define IMX8Q_HSIO_CFG_PCIEB		2
> +#define IMX8Q_HSIO_CFG_PCIEBSATA	3

This seems somewhat redundant both as the 3rd define is just an OR of 
the first 2 and all 3 overlap with the prior defines.

Seems like with standard PHY modes, the only additional information you 
might need is PCIEB vs. PCIEA.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] wifi: mt76: mt7603: add debugfs attr for disabling frames buffering
From: Shengyu Qu @ 2024-04-02 16:54 UTC (permalink / raw)
  To: Rafał Miłecki, Felix Fietkau, Lorenzo Bianconi,
	Ryder Lee, Shayne Chen, Sean Wang
  Cc: wiagn233, Kalle Valo, Matthias Brugger,
	AngeloGioacchino Del Regno, linux-wireless, linux-kernel,
	linux-arm-kernel, linux-mediatek, openwrt-devel,
	Rafał Miłecki
In-Reply-To: <20240325223319.30125-1-zajec5@gmail.com>


[-- Attachment #1.1.1.1: Type: text/plain, Size: 4047 bytes --]

Hi Rafal,

Maybe we could disable frames buffering by default until it is fixed?
Also, maybe we could do more tests on newer models such as mt7986/81 to
make this patch benefit more models?

Best regards,
Shengyu


在 2024/3/26 6:33, Rafał Miłecki 写道:
> From: Rafał Miłecki <rafal@milecki.pl>
> 
> MT7603EN and MT7628AN were reported by multiple users to be unstable
> under high traffic. Transmission of packets could stop for seconds often
> leading to disconnections.
> 
> Long research & debugging revelaed a close relation between MCU
> interrupts of type PKT_TYPE_TXS and slowdowns / stalls. That led to
> questioning frames buffering feature.
> 
> It turns out that disabling SKBs loopback code makes mt7603 devices much
> more stable under load. There are still some traffic hiccups but those
> happen like once every an hour and end up in recovery in most cases.
> 
> Add a debugfs option for disabling frames buffering so users can give it
> a try. If this solution yields a success we can make this feature
> disabled by default.
> 
> This change was successfully tested using 2 GHz AP interface on:
> 1. Netgear R6220 - MT7621ST (SoC) + MT7603EN (WiFi) + MT7612EN (WiFi)
> 2. Xiaomi Mi Router 4C - MT7628AN (Wi-Fi SoC)
> 
> Link: https://lore.kernel.org/linux-wireless/7c96d5ee-86c1-8068-1b58-40db6087a24f@gmail.com/
> Closes: https://github.com/openwrt/mt76/issues/865
> Fixes: c8846e101502 ("mt76: add driver for MT7603E and MT7628/7688")
> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
> ---
>   drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c | 2 ++
>   drivers/net/wireless/mediatek/mt76/mt7603/dma.c     | 3 +++
>   drivers/net/wireless/mediatek/mt76/mt7603/init.c    | 1 +
>   drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h  | 2 ++
>   4 files changed, 8 insertions(+)
> 
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c b/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c
> index 3967f2f05774..c80baba7a402 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c
> @@ -115,4 +115,6 @@ void mt7603_init_debugfs(struct mt7603_dev *dev)
>   			    &dev->sensitivity_limit);
>   	debugfs_create_bool("dynamic_sensitivity", 0600, dir,
>   			    &dev->dynamic_sensitivity);
> +	debugfs_create_bool("frames_buffering", 0600, dir,
> +			    &dev->frames_buffering);
>   }
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c
> index 7a2f5d38562b..f5ab729dec31 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c
> @@ -27,6 +27,9 @@ mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb)
>   	u32 val;
>   	u8 tid = 0;
>   
> +	if (!dev->frames_buffering)
> +		goto free;
> +
>   	if (skb->len < MT_TXD_SIZE + sizeof(struct ieee80211_hdr))
>   		goto free;
>   
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/init.c b/drivers/net/wireless/mediatek/mt76/mt7603/init.c
> index 6c55c72f28a2..5abc2618ec8b 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7603/init.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7603/init.c
> @@ -515,6 +515,7 @@ int mt7603_register_device(struct mt7603_dev *dev)
>   	dev->slottime = 9;
>   	dev->sensitivity_limit = 28;
>   	dev->dynamic_sensitivity = true;
> +	dev->frames_buffering = true;
>   
>   	ret = mt7603_init_hardware(dev);
>   	if (ret)
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h b/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h
> index 9e58df7042ad..02c88334cdc0 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h
> +++ b/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h
> @@ -155,6 +155,8 @@ struct mt7603_dev {
>   	u32 reset_test;
>   
>   	unsigned int reset_cause[__RESET_CAUSE_MAX];
> +
> +	bool frames_buffering;
>   };
>   
>   extern const struct mt76_driver_ops mt7603_drv_ops;

[-- Attachment #1.1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 6977 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: stmmac: add support for RZ/N1 GMAC
From: Russell King (Oracle) @ 2024-04-02 15:48 UTC (permalink / raw)
  To: Romain Gantois
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, Clément Léger, Thomas Petazzoni,
	netdev, devicetree, linux-kernel, linux-renesas-soc, linux-stm32,
	linux-arm-kernel
In-Reply-To: <ZgwM/FIKTuN4vkQA@shell.armlinux.org.uk>

On Tue, Apr 02, 2024 at 02:49:48PM +0100, Russell King (Oracle) wrote:
> On Tue, Apr 02, 2024 at 02:37:01PM +0200, Romain Gantois wrote:
> > +	ret = stmmac_dvr_probe(dev, plat_dat, &stmmac_res);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ndev = platform_get_drvdata(pdev);
> > +	priv = netdev_priv(ndev);
> > +
> > +	pcs_node = of_parse_phandle(np, "pcs-handle", 0);
> > +	if (pcs_node) {
> > +		pcs = miic_create(dev, pcs_node);
> > +		of_node_put(pcs_node);
> > +		if (IS_ERR(pcs))
> > +			return PTR_ERR(pcs);
> > +
> > +		priv->hw->phylink_pcs = pcs;
> > +	}
> 
> I'm afraid that this fails at one of the most basic principles of kernel
> multi-threaded programming. stmmac_dvr_probe() as part of its work calls
> register_netdev() which publishes to userspace the network device.
> 
> Everything that is required must be setup _prior_ to publication to
> userspace to avoid races, because as soon as the network device is
> published, userspace can decide to bring that interface up. If one
> hasn't finished the initialisation, the interface can be brought up
> before that initialisation is complete.
> 
> I don't see anything obvious in the stmmac data structures that would
> allow you to hook in at an appropriate point before the
> register_netdev() but after the netdev has been created. The
> priv->hw data structure is created by stmmac_hwif_init()
> 
> I see that drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c is also
> guilty of this as well, and should be fixed. It's even worse because it
> does a truck load of stuff after stmmac_dvr_probe() which it most
> definitely should not be doing.
> 
> I definitely get the feeling that the structure of the stmmac driver
> is really getting out of hand, and is making stuff harder for people,
> and it's not improving over time - in fact, it's getting worse. It
> needs a *lot* of work to bring it back to a sane model.

I'm not going to say that the two patches threaded to this email are
"sane" but at least it avoids the problem. socfpga still has issues
with initialisation done after register_netdev() though.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] KVM: arm64: Limit stage2_apply_range() batch size to smallest block
From: Krister Johansen @ 2024-04-02 17:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Krister Johansen, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Ali Saidi, David Reaver,
	linux-arm-kernel, kvmarm, linux-kernel
In-Reply-To: <87r0fsrpko.wl-maz@kernel.org>

Hi Marc,

On Sat, Mar 30, 2024 at 10:17:43AM +0000, Marc Zyngier wrote:
> On Fri, 29 Mar 2024 19:15:37 +0000,
> Krister Johansen <kjlx@templeofstupid.com> wrote:
> > On Fri, Mar 29, 2024 at 06:48:38AM -0700, Oliver Upton wrote:
> > > On Thu, Mar 28, 2024 at 12:05:08PM -0700, Krister Johansen wrote:
> > > > stage2_apply_range() for unmap operations can interfere with the
> > > > performance of IO if the device's interrupts share the CPU where the
> > > > unmap operation is occurring.  commit 5994bc9e05c2 ("KVM: arm64: Limit
> > > > stage2_apply_range() batch size to largest block") improved this.  Prior
> > > > to that commit, workloads that were unfortunate enough to have their IO
> > > > interrupts pinned to the same CPU as the unmap operation would observe a
> > > > complete stall.  With the switch to using the largest block size, it is
> > > > possible for IO to make progress, albeit at a reduced speed.
> > > 
> > > Can you describe the workload a bit more? I'm having a hard time
> > > understanding how you're unmapping that much memory on the fly in
> > > your workload. Is guest memory getting swapped? Are VMs being torn
> > > down?
> > 
> > Sorry I wasn't clear here.  Yes, it's the VMs getting torn down that's
> > causing the problems.  The container VMs don't have long lifetimes, but
> > some may be up to 256Gb in size, depending on the user.  The workloads
> > running the VMs aren't especially performance sensitive, but their users
> > do notice when network connections time-out.  IOW, if the performance is
> > bad enough to temporarily prevent new TCP connections from being
> > established or requests / responses being recieved in a timely fashion,
> > we'll hear about it.  Users deploy their services a lot, so there's a
> > lot of container vm churn.  (Really it's automation redeploying the
> > services on behalf of the users in response to new commits to their
> > repos...)
> 
> I think this advocates for a teardown-specific code path rather than
> just relying on the usual S2 unmapping which is really designed for
> eviction. There are two things to consider here:
> 
> - TLB invalidation: this should only take a single VMALLS12E1, rather
>   than iterating over the PTs
> 
> - Cache maintenance: this could be elided with FWB, or *optionally*
>   elided if userspace buys in a "I don't need to see the memory of the
>   guest after teardown" type of behaviour

This approach would work for this workload, I think.  The hardware
supports FWB and AFAIK isn't looking at the guest memory after teardown.
This is also desirable because in the future we'd like to support
hotplug of VFIO devices. A separate path for unmap the memory used by
the device vs unmap all of the guest seems smart.

> > > Also, it seems a bit odd to steer interrupts *into* the workload you
> > > care about...
> > 
> > Ah, that was only intentionally done for the purposes of measuring the
> > impact.  That's not done on purpose in production.
> > 
> > Nevertheless, the example we tend to run into is that a box may have 2
> > NICs and each NIC has 32 Tx-Rx queues.  This means we've got 64 NIC
> > interrupts, each assigned to a different CPU.  Our systems have 64 CPUs.
> > What happens in practice is that a VM will get torn down, and that has a
> > 1-in-64 chance of impacting the performance of the subset of the flows
> > that are mapped via RSS to the interrupt that happens to be assigned to
> > the CPU where the VM is being torn down.
> > 
> > Of course, the obvious next question is why not just bind the VMs flows
> > to the CPUs the VM is running on?  We don't have a 1:1 mapping of
> > network device to VM, or VM to CPU right now, which frustrates this
> > approach.
> > 
> > > > Further reducing the stage2_apply_range() batch size has substantial
> > > > performance improvements for IO that share a CPU performing an unmap
> > > > operation.  By switching to a 2mb chunk, IO performance regressions were
> > > > no longer observed in this author's tests.  E.g. it was possible to
> > > > obtain the advertised device throughput despite an unmap operation
> > > > occurring on the CPU where the interrupt was running.  There is a
> > > > tradeoff, however.  No changes were observed in per-operation timings
> > > > when running the kvm_pagetable_test without an interrupt load.  However,
> > > > with a 64gb VM, 1 vcpu, and 4k pages and a IO load, map times increased
> > > > by about 15% and unmap times increased by about 58%.  In essence, this
> > > > trades slower map/unmap times for improved IO throughput.
> > > 
> > > There are other users of the range-based operations, like
> > > write-protection. Live migration is especially sensitive to the latency
> > > of page table updates as it can affect the VMM's ability to converge
> > > with the guest.
> > 
> > To be clear, the reduction in performance was observed when I
> > concurrently executed both the kvm_pagetable_test and a networking
> > benchmark where the NIC's interrupts were assigned to the same CPU where
> > the pagetable test was executing.  I didn't see a slowdown just running
> > the pagetable test.
> 
> Any chance you could share more details about your HW configuration
> (what CPU is that?)  and the type of traffic? This is the sort of
> things I'd like to be able to reproduce in order to experiment various
> strategies.

Sure, I only have access to documentation that is publicly available.

The hardware where we ran into this inititally was Graviton 3, which is
a Neoverse-V1 based core.  It does not support FEAT_TLBIRANGE.  I've
also tested on Graviton 4, which is Neoverse-V2 based.  It _does_
support FEAT_TLBIRANGE.  The deferred range based invalidation
support, was enough to allow us to teardown a large VM based on 4k pages
and not incur a visible performance penalty.  I haven't had a chance to
test to see if and how Will's patches change this, though.

The tests themselves were not especially fancy. The networking hardware
was a ENA device on an EC2 box with 30Gbps limit (5/10 Gbps per flow,
depending on the config).  The storage tested was a gp3 EBS device
configured to max IOPS/throughput (16,000 IOPS / 1000Mb/s).

Networking tests were iperf3 with a 9001 byte packet size.  The storage
tests were fio's randwrite workload in directio mode using the libaio
backend.  The "IOPS" test used a 4k blocksize and a queue depth of 128.
The "throughput" test used a blocksize of 64k and an iodepth of 32.  For
the fio tests, it was a 10gb file and 2 workers, mostly because the EBS
devices have two hardware queues for data.

I ran the kvm_page_table_test with a few different sizes, but settled on
64G with 1 vcpu for most tests.

Let me know if there's anything else I can share here.

-K

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] arm64: dts: ti: k3-j722s: Disable ethernet ports by default
From: Francesco Dolcini @ 2024-04-02 16:58 UTC (permalink / raw)
  To: Michael Walle
  Cc: Nishanth Menon, Vignesh Raghavendra, Tero Kristo, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, linux-arm-kernel, devicetree,
	linux-kernel
In-Reply-To: <20240402151802.3803708-1-mwalle@kernel.org>

Hello Michael,

On Tue, Apr 02, 2024 at 05:18:02PM +0200, Michael Walle wrote:
> Device tree best practice is to disable any external interface in the
> dtsi and just enable them if needed in the device tree. Thus, disable
> both ethernet ports by default and just enable the one used by the EVM
> in its device tree.
> 
> There is no functional change.
> 
> Signed-off-by: Michael Walle <mwalle@kernel.org>
> ---
> This should also be true for all the other SoCs. But I don't wanted to
> touch all the (older) device trees. j722s is pretty new, so there we
> should get it right.
> ---
>  arch/arm64/boot/dts/ti/k3-j722s-evm.dts | 5 +----
>  arch/arm64/boot/dts/ti/k3-j722s.dtsi    | 8 ++++++++
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/ti/k3-j722s-evm.dts b/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
> index d045dc7dde0c..afe7f68e6a4b 100644
> --- a/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
> +++ b/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
> @@ -224,14 +224,11 @@ cpsw3g_phy0: ethernet-phy@0 {
>  };
>  
>  &cpsw_port1 {
> +	status = "okay";

status should be the last property, according to the dts coding guidelines.

>  	phy-mode = "rgmii-rxid";
>  	phy-handle = <&cpsw3g_phy0>;
>  };

Francesco


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: Ryan Roberts @ 2024-04-02 16:46 UTC (permalink / raw)
  To: Peter Xu, David Hildenbrand
  Cc: linux-mm, linux-kernel, Yang Shi, Kirill A . Shutemov,
	Mike Kravetz, John Hubbard, Michael Ellerman, Andrew Jones,
	Muchun Song, linux-riscv, linuxppc-dev, Christophe Leroy,
	Andrew Morton, Christoph Hellwig, Lorenzo Stoakes, Matthew Wilcox,
	Rik van Riel, linux-arm-kernel, Andrea Arcangeli,
	Aneesh Kumar K . V, Vlastimil Babka, James Houghton,
	Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <ZgwwOq3XXKlS_7LQ@x1n>

On 02/04/2024 17:20, Peter Xu wrote:
> On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
>> On 02.04.24 16:48, Ryan Roberts wrote:
>>> Hi Peter,
> 
> Hey, Ryan,
> 
> Thanks for the report!
> 
>>>
>>> On 27/03/2024 15:23, peterx@redhat.com wrote:
>>>> From: Peter Xu <peterx@redhat.com>
>>>>
>>>> Now follow_page() is ready to handle hugetlb pages in whatever form, and
>>>> over all architectures.  Switch to the generic code path.
>>>>
>>>> Time to retire hugetlb_follow_page_mask(), following the previous
>>>> retirement of follow_hugetlb_page() in 4849807114b8.
>>>>
>>>> There may be a slight difference of how the loops run when processing slow
>>>> GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each
>>>> loop of __get_user_pages() will resolve one pgtable entry with the patch
>>>> applied, rather than relying on the size of hugetlb hstate, the latter may
>>>> cover multiple entries in one loop.
>>>>
>>>> A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over
>>>> a tight loop of slow gup after the path switched.  That shouldn't be a
>>>> problem because slow-gup should not be a hot path for GUP in general: when
>>>> page is commonly present, fast-gup will already succeed, while when the
>>>> page is indeed missing and require a follow up page fault, the slow gup
>>>> degrade will probably buried in the fault paths anyway.  It also explains
>>>> why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup:
>>>> accelerate thp gup even for "pages != NULL"") lands, the latter not part of
>>>> a performance analysis but a side benefit.  If the performance will be a
>>>> concern, we can consider handle CONT_PTE in follow_page().
>>>>
>>>> Before that is justified to be necessary, keep everything clean and simple.
>>>>
>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>
>>> Afraid I'm seeing an oops when running gup_longterm test on arm64 with current mm-unstable. Git bisect blames this patch. The oops reproduces for me every time on 2 different machines:
>>>
>>>
>>> [    9.340416] kernel BUG at mm/gup.c:778!
>>> [    9.340746] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>> [    9.341199] Modules linked in:
>>> [    9.341481] CPU: 1 PID: 1159 Comm: gup_longterm Not tainted 6.9.0-rc2-00210-g910ff1a347e4 #11
>>> [    9.342232] Hardware name: linux,dummy-virt (DT)
>>> [    9.342647] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    9.343195] pc : follow_page_mask+0x4d4/0x880
>>> [    9.343580] lr : follow_page_mask+0x4d4/0x880
>>> [    9.344018] sp : ffff8000898b3aa0
>>> [    9.344345] x29: ffff8000898b3aa0 x28: fffffdffc53973e8 x27: 00003c0005d08000
>>> [    9.345028] x26: ffff00014e5cfd08 x25: ffffd3513a40c000 x24: fffffdffc5d08000
>>> [    9.345682] x23: ffffc1ffc0000000 x22: 0000000000080101 x21: ffff8000898b3ba8
>>> [    9.346337] x20: 0000fffff4200000 x19: ffff00014e52d508 x18: 0000000000000010
>>> [    9.347005] x17: 5f656e6f7a5f7369 x16: 2120262620296567 x15: 6170286461654865
>>> [    9.347713] x14: 6761502128454741 x13: 2929656761702865 x12: 6761705f65636976
>>> [    9.348371] x11: 65645f656e6f7a5f x10: ffffd3513b31d6e0 x9 : ffffd3513852f090
>>> [    9.349062] x8 : 00000000ffffefff x7 : ffffd3513b31d6e0 x6 : 0000000000000000
>>> [    9.349753] x5 : ffff00017ff98cc8 x4 : 0000000000000fff x3 : 0000000000000000
>>> [    9.350397] x2 : 0000000000000000 x1 : ffff000190e8b480 x0 : 0000000000000052
>>> [    9.351097] Call trace:
>>> [    9.351312]  follow_page_mask+0x4d4/0x880
>>> [    9.351700]  __get_user_pages+0xf4/0x3e8
>>> [    9.352089]  __gup_longterm_locked+0x204/0xa70
>>> [    9.352516]  pin_user_pages+0x88/0xc0
>>> [    9.352873]  gup_test_ioctl+0x860/0xc40
>>> [    9.353249]  __arm64_sys_ioctl+0xb0/0x100
>>> [    9.353648]  invoke_syscall+0x50/0x128
>>> [    9.354022]  el0_svc_common.constprop.0+0x48/0xf8
>>> [    9.354488]  do_el0_svc+0x28/0x40
>>> [    9.354822]  el0_svc+0x34/0xe0
>>> [    9.355128]  el0t_64_sync_handler+0x13c/0x158
>>> [    9.355489]  el0t_64_sync+0x190/0x198
>>> [    9.355793] Code: aa1803e0 d000d8e1 91220021 97fff560 (d4210000)
>>> [    9.356280] ---[ end trace 0000000000000000 ]---
>>> [    9.356651] note: gup_longterm[1159] exited with irqs disabled
>>> [    9.357141] note: gup_longterm[1159] exited with preempt_count 2
>>> [    9.358033] ------------[ cut here ]------------
>>> [    9.358800] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x108/0x120
>>> [    9.360157] Modules linked in:
>>> [    9.360541] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.9.0-rc2-00210-g910ff1a347e4 #11
>>> [    9.361626] Hardware name: linux,dummy-virt (DT)
>>> [    9.362087] pstate: 204003c5 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    9.362758] pc : ct_kernel_exit.constprop.0+0x108/0x120
>>> [    9.363306] lr : ct_idle_enter+0x10/0x20
>>> [    9.363845] sp : ffff8000801abdc0
>>> [    9.364222] x29: ffff8000801abdc0 x28: 0000000000000000 x27: 0000000000000000
>>> [    9.364961] x26: 0000000000000000 x25: ffff00014149d780 x24: 0000000000000000
>>> [    9.365557] x23: 0000000000000000 x22: ffffd3513b299d48 x21: ffffd3513a785730
>>> [    9.366239] x20: ffffd3513b299c28 x19: ffff00017ffa7da0 x18: 0000fffff5ffffff
>>> [    9.366869] x17: 0000000000000000 x16: 1fffe0002a21a8c1 x15: 0000000000000000
>>> [    9.367524] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
>>> [    9.368207] x11: 0000000000000001 x10: 0000000000000ad0 x9 : ffffd35138589230
>>> [    9.369123] x8 : ffff00014149e2b0 x7 : 0000000000000000 x6 : 000000000f8c0fb2
>>> [    9.370403] x5 : 4000000000000002 x4 : ffff2cb045825000 x3 : ffff8000801abdc0
>>> [    9.371170] x2 : ffffd3513a782da0 x1 : 4000000000000000 x0 : ffffd3513a782da0
>>> [    9.372279] Call trace:
>>> [    9.372519]  ct_kernel_exit.constprop.0+0x108/0x120
>>> [    9.373216]  ct_idle_enter+0x10/0x20
>>> [    9.373562]  default_idle_call+0x3c/0x160
>>> [    9.374055]  do_idle+0x21c/0x280
>>> [    9.374394]  cpu_startup_entry+0x3c/0x50
>>> [    9.374797]  secondary_start_kernel+0x140/0x168
>>> [    9.375220]  __secondary_switched+0xb8/0xc0
>>> [    9.375875] ---[ end trace 0000000000000000 ]---
>>>
>>>
>>> The oops trigger is at mm/gup.c:778:
>>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>>
>>>
>>> This is the output of gup_longterm (last output is just before oops):
>>>
>>> # [INFO] detected hugetlb page size: 2048 KiB
>>> # [INFO] detected hugetlb page size: 32768 KiB
>>> # [INFO] detected hugetlb page size: 64 KiB
>>> # [INFO] detected hugetlb page size: 1048576 KiB
>>> TAP version 13
>>> 1..70
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
>>> ok 1 Should have worked
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
>>> ok 2 Should have failed
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
>>> ok 3 Should have failed
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
>>> ok 4 Should have worked
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
>>>
>>>
>>> So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm guessing you're trying to iterate 2M into a cont-pmd folio and ending up with an unexpected tail page?
>>
>> I assume we find the expected tail page, it's just that the check
>>
>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>
>> Doesn't make sense with hugetlb folios. We might have a tail page mapped in
>> a cont-pmd entry. As soon as we call follow_huge_pmd() on "not the first
>> cont-pmd entry", we trigger this check.
>>
>> Likely this sanity check must also allow for hugetlb folios. Or we should
>> just remove it completely.
> 
> Right, IMHO it'll be easier we remove it, actually I see there's one more
> at the end, so I think we need to remove both.
> 
>>
>> In the past, we wanted to make sure that we never get tail pages of THP from
>> PMD entries, because something would currently be broken (we don't support
>> THP > PMD).
> 
> There's probably one more thing we need to do, on allowing
> PageAnonExclusive() to work with hugetlb tails. Even if we remove the
> warnings and if I read the code right, we can BUG_ON again on checking tail
> pages over anon-exclusive for PageHuge.
> 
> So I assume to fix it completely, we may need two changes: Patch 1 to
> prepare PageAnonExclusive() to work on hugetlb tails, then patch 2 to be
> squashed into the patch "mm/gup: handle huge pmd for follow_pmd_mask()".
> Note: not this patch to fixup, as this patch only does the "switchover" to
> the new path, the culprit should be the other patch..

I'll leave you to do the testing on this, if that's ok.

Just to make my config explicit, I have this kernel command line, which reserves
hugetlbs of all sizes for the tests:

"transparent_hugepage=madvise earlycon root=/dev/vda2 secretmem.enable
hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2"

Thanks,
Ryan

> 
> I have them attached below first, before I'll also go and see whether I can
> run some arm tests later today or tomorrow.  David, any comments from
> anon-exclusive side?
> 
> Thanks,
> 
> ===8<===
> 
> From 26f0670acea948945222c97a9cab58428782ca69 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Tue, 2 Apr 2024 11:52:28 -0400
> Subject: [PATCH 1/2] mm: Allow anon exclusive check over hugetlb tail pages
> 
> PageAnonExclusive() used to forbid tail pages for hugetlbfs, as that used
> to be called mostly in hugetlb specific paths and the head page was
> guaranteed.
> 
> As we move forward towards merging hugetlb paths into generic mm, we may
> start to pass in tail hugetlb pages (when with cont-pte/cont-pmd huge
> pages) for such check.  Allow it to properly fetch the head, in which case
> the anon-exclusiveness of the head will always represents the tail page.
> 
> There's already a sign of it when we look at the fast-gup which already
> contain the hugetlb processing altogether: we used to have a specific
> commit 5805192c7b72 ("mm/gup: handle cont-PTE hugetlb pages correctly in
> gup_must_unshare() via GUP-fast") covering that area.  Now with this more
> generic change, that can also go away.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/page-flags.h |  8 +++++++-
>  mm/internal.h              | 10 ----------
>  2 files changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 888353c209c0..225357f48a79 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -1095,7 +1095,13 @@ PAGEFLAG(Isolated, isolated, PF_ANY);
>  static __always_inline int PageAnonExclusive(const struct page *page)
>  {
>  	VM_BUG_ON_PGFLAGS(!PageAnon(page), page);
> -	VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
> +	/*
> +	 * Allow the anon-exclusive check to work on hugetlb tail pages.
> +	 * Here hugetlb pages will always guarantee the anon-exclusiveness
> +	 * of the head page represents the tail pages.
> +	 */
> +	if (PageHuge(page) && !PageHead(page))
> +		page = compound_head(page);
>  	return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 9512de7398d5..87f6e4fd56a5 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1259,16 +1259,6 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma,
>  	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>  		smp_rmb();
>  
> -	/*
> -	 * During GUP-fast we might not get called on the head page for a
> -	 * hugetlb page that is mapped using cont-PTE, because GUP-fast does
> -	 * not work with the abstracted hugetlb PTEs that always point at the
> -	 * head page. For hugetlb, PageAnonExclusive only applies on the head
> -	 * page (as it cannot be partially COW-shared), so lookup the head page.
> -	 */
> -	if (unlikely(!PageHead(page) && PageHuge(page)))
> -		page = compound_head(page);
> -
>  	/*
>  	 * Note that PageKsm() pages cannot be exclusive, and consequently,
>  	 * cannot get pinned.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 3/4] firmware: arm_scmi: Add SCMI v3.2 pincontrol protocol basic support
From: Andy Shevchenko @ 2024-04-02 16:39 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Peng Fan, Peng Fan (OSS), Sudeep Holla, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Linus Walleij, Dan Carpenter,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	linux-gpio@vger.kernel.org, Oleksii Moisieiev
In-Reply-To: <ZgwrKnx3hb59OG77@pluto>

On Tue, Apr 2, 2024 at 6:58 PM Cristian Marussi
<cristian.marussi@arm.com> wrote:
> On Tue, Apr 02, 2024 at 04:06:06PM +0300, Andy Shevchenko wrote:
> > On Tue, Apr 2, 2024 at 10:48 AM Cristian Marussi
> > <cristian.marussi@arm.com> wrote:
> > > On Sun, Mar 31, 2024 at 01:44:28PM +0000, Peng Fan wrote:
> > > > > Sat, Mar 23, 2024 at 08:15:16PM +0800, Peng Fan (OSS) kirjoitti:

...

> > > > > > +#include <linux/module.h>
> > > > > > +#include <linux/scmi_protocol.h>
> > > > > > +#include <linux/slab.h>
> > > > >
> > > > > This is semi-random list of headers. Please, follow IWYU principle (include
> > > > > what you use). There are a lot of inclusions I see missing (just in the context of
> > > > > this page I see bits.h, types.h, and  asm/byteorder.h).
> > > >
> > > > Is there any documentation about this requirement?
> > > > Some headers are already included by others.
> >
> > The documentation here is called "a common sense".
> > The C language is built like this and we expect that nobody will
> > invest into the dependency hell that we have already, that's why IWYU
> > principle, please follow it.
>
> Yes, but given that we have a growing number of SCMI protocols there is a
> common local protocols.h header to group all includes needed by any
> protocols: the idea behind this (and the devm_ saga down below) was to ease
> development of protocols, since there are lots of them and growing, given
> the SCMI spec is extensible.

Yes, and what you are effectively suggesting is: "Technical debt? Oh,
fine, we do not care!" This is not good. I'm in a long term of
cleaning up the dependency hell in the kernel (my main focus is
kernel.h for now) and I am talking from my experience. I do not like
what people are doing in 95% of the code, that's why I really want to
stop the bad practices as soon as possible.

Last to add, but not least is that your code may be used as an example
for others, hence we really have to do our best in order to avoid bad
design, practices, and cargo cults. If this requires more refactoring
of the existing code, then do it sooner than later.

...

> > > Andy made (mostly) the same remarks on this same patch ~1-year ago on
> > > this same patch while it was posted by Oleksii.
> > >
> > > And I told that time that most of the remarks around devm_ usage were
> > > wrong due to how the SCMI core handles protocol initialization (using a
> > > devres group transparently).
> > >
> > > This is what I answered that time.
> > >
> > > https://lore.kernel.org/linux-arm-kernel/ZJ78hBcjAhiU+ZBO@e120937-lin/#t
> > >
> > > I wont repeat myself, but, in a nutshell the memory allocation like it
> > > is now is fine: a bit happens via devm_ at protocol initialization, the
> > > other is doe via explicit kmalloc at runtime and freed via kfree at
> > > remove time (if needed...i.e. checking the present flag of some structs)
> >
> > This sounds like a mess. devm_ is expected to be used only for the
> > ->probe() stage, otherwise you may consider cleanup.h (__free() macro)
> > to have automatic free at the paths where memory is not needed.
>
> Indeed, this protocol_init code is called by the SCMI core once for all when
> an SCMI driver tries at first to use this specific protocol by 'getting' its
> protocol_ops, so it is indeed called inside the probe chain of the driver:
> at this point you *can* decide to use devres to allocate memory and be assured
> that if the init fails, or when the driver cease to use this protocol (calling
> its remove()) and no other driver is using it, all the stuff that have been
> allocated related to this protocol will be released by the core for you.
> (using an internal devres group)
>
> Without this you should handle manually all the deallocation manually on
> the init error-paths AND also provide all the cleanup explicitly when
> the protocol is no more used by any driver (multiple users of the same
> protocol instance are possible)...for all protocols.

Yes. Is it a problem?

> This is/was handy since, till now, all the SCMI querying and resources
> allocation happened anyway all at once at init time...
>
> ...the mess, as you kindly called it, derives from the fact that this specific
> protocol is the first and only one that does NOT allocate all that it needs
> during the initialization (to minimize needless allocs for a lot of possibly
> unused resources) and this lazy-initialization phase, done after init at runtime,
> must be handled manually since it cannot be managed by the devres group that is
> open/clsoed around init by the SCMI core.
>
> I dont like particularly this split allocation but it has a reason and any
> other solution seems more messy to me at the moment.
>
> And I dont feel like changing all the SCMI protocol initialziation core code
> (that address a lot more under the hood) is a desirable solution to address a
> non-existent problem really.
>
> > And the function naming doesn't suggest that you have a probe-remove
> > pair. Moreover, if the init-deinit part is called in the probe-remove,
> > the devm_ must not be mixed with non-devm ones, as it breaks the order
> > and leads to subtle mistakes.
>
> Initialization order is enforced by SCMI core like this:
>
>  @driver_probe->get_protocol_ops()
>   @core/get_protocol_ops
>      -> devres_group_open()
>      -> protocol_init->devm_*()
>      -> devres_group_close()
>      -> driver_probing
>
>    @runtime optional explicit_lazy_kmallocs inside the protocol
>
>  @driver_remove->put_protocol_ops()
>    @core/put_protocol_ops()
>      -> protocol_denit->optional_explicit_kfree_of_the_above
>      -> devres_group_release()
>    -> driver_removing
>
> ... dont think there's an ordering problem.

The mess with devm_ vs. non-devm is quite easy to achieve. You are
probably out of the control of what the protocol driver wants to do in
the init. Is the usage of devm (which APIs and in which order can be
used)  WRT SCMI documented somewhere?

Misuse of devm is a common issue, I'm not surprised it will hit your
subsystem one day with such an approach.

> ...note that the ph->dev provided in the protocol_init and used by devm_
> is NOT the dev of the SCMI driver probe/remove that uses the get_protocol_ops,
> it is an internal SCMI device associated with the core SCMI stack probing and
> allocations, within which a devres group for the specific protocol is created
> when that specific protocol is initialized...protocols are not fully
> fledged drivers are just bits of the SCMI stack that are initialized when needed
> (and possibly also loaded when needed for vendor protocols) and
> de-initialzed when no more SCMI driver users exist for that protocol.

P.S. I guess from now on it's your call, but this code and in case
other drivers use similar, is badly written. I hope some documentation
exists to at least justify all this mess and explaining why there no
and (what's really important) will never be a problem.

-- 
With Best Regards,
Andy Shevchenko

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: David Hildenbrand @ 2024-04-02 16:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ryan Roberts, peterx, linux-mm, linux-kernel, Yang Shi,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Michael Ellerman,
	Andrew Jones, Muchun Song, linux-riscv, linuxppc-dev,
	Christophe Leroy, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Rik van Riel, linux-arm-kernel, Andrea Arcangeli,
	Aneesh Kumar K . V, Vlastimil Babka, James Houghton,
	Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <ZgwrhKuypBtSpKdI@casper.infradead.org>

On 02.04.24 18:00, Matthew Wilcox wrote:
> On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
>>> The oops trigger is at mm/gup.c:778:
>>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>>
>>> So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm guessing you're trying to iterate 2M into a cont-pmd folio and ending up with an unexpected tail page?
>>
>> I assume we find the expected tail page, it's just that the check
>>
>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>
>> Doesn't make sense with hugetlb folios. We might have a tail page mapped in
>> a cont-pmd entry. As soon as we call follow_huge_pmd() on "not the first
>> cont-pmd entry", we trigger this check.
>>
>> Likely this sanity check must also allow for hugetlb folios. Or we should
>> just remove it completely.
>>
>> In the past, we wanted to make sure that we never get tail pages of THP from
>> PMD entries, because something would currently be broken (we don't support
>> THP > PMD).
> 
> That was a practical limitation on my part.  We have various parts of
> the MM which assume that pmd_page() returns a head page and until we
> get all of those fixed, adding support for folios larger than PMD_SIZE
> was only going to cause trouble for no significant wins.
> 
> I agree with you we should get rid of this assertion entirely.  We should
> fix all the places which assume that pmd_page() returns a head page,
> but that may take some time.
> 
> As an example, filemap_map_pmd() has:
> 
>         if (pmd_none(*vmf->pmd) && folio_test_pmd_mappable(folio)) {
>                  struct page *page = folio_file_page(folio, start);
>                  vm_fault_t ret = do_set_pmd(vmf, page);
> 
> and then do_set_pmd() has:
> 
>          if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
>                  return ret;
> 
> so we'd simply refuse to use a PMD to map a folio larger than PMD_SIZE.
> There's a lot of work to be done to make this work generally (not to
> mention figuring out how to handle mapcount for such folios ;-).

Yes :)

-- 
Cheers,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v7 4/4] pinctrl: Implementation of the generic scmi-pinctrl driver
From: Cristian Marussi @ 2024-04-02 16:40 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Andy Shevchenko, Peng Fan (OSS), Sudeep Holla, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Linus Walleij,
	linux-arm-kernel, linux-kernel, devicetree, linux-gpio, Peng Fan,
	Oleksii Moisieiev
In-Reply-To: <c5bdf039-c43b-4611-9f0b-81585e296206@moroto.mountain>

On Tue, Apr 02, 2024 at 05:09:34PM +0300, Dan Carpenter wrote:
> On Tue, Apr 02, 2024 at 04:22:45PM +0300, Andy Shevchenko wrote:
> > On Tue, Apr 02, 2024 at 10:22:24AM +0800, Peng Fan (OSS) wrote:
> > > +static int pinctrl_scmi_get_pins(struct scmi_pinctrl *pmx,
> > > +				 struct pinctrl_desc *desc)
> > > +{
> > > +	struct pinctrl_pin_desc *pins;
> > > +	unsigned int npins;
> > > +	int ret, i;
> > > +
> > > +	npins = pinctrl_ops->count_get(pmx->ph, PIN_TYPE);
> > > +	/*
> > > +	 * npins will never be zero, the scmi pinctrl driver has bailed out
> > > +	 * if npins is zero.
> > > +	 */
> > 
> > This is fragile, but at least it is documented.
> > 
> 
> It was never clear to me where the crash would happen if npins was zero.
> Does some part of pinctrl internals assume we have at least one pin?

Dont think there were any possible crashes since at the protoocl layer
(not here) kcalloc returns ZERO_SIZE_PTR into pinfo->pins for a zero-bytes
allocation BUT it is indeed never accessed since any attempt to access a
pin will be considerd invalid (any u32 index >= (nr_pins=0))...

...but what is the point of loading protocol and drivers with zero pins ?
You can have zero grouos and zero functions, but zero pins ?

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: David Hildenbrand @ 2024-04-02 16:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: Ryan Roberts, linux-mm, linux-kernel, Yang Shi,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Michael Ellerman,
	Andrew Jones, Muchun Song, linux-riscv, linuxppc-dev,
	Christophe Leroy, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Matthew Wilcox, Rik van Riel, linux-arm-kernel,
	Andrea Arcangeli, Aneesh Kumar K . V, Vlastimil Babka,
	James Houghton, Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <ZgwwOq3XXKlS_7LQ@x1n>

On 02.04.24 18:20, Peter Xu wrote:
> On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
>> On 02.04.24 16:48, Ryan Roberts wrote:
>>> Hi Peter,
> 
> Hey, Ryan,
> 
> Thanks for the report!
> 
>>>
>>> On 27/03/2024 15:23, peterx@redhat.com wrote:
>>>> From: Peter Xu <peterx@redhat.com>
>>>>
>>>> Now follow_page() is ready to handle hugetlb pages in whatever form, and
>>>> over all architectures.  Switch to the generic code path.
>>>>
>>>> Time to retire hugetlb_follow_page_mask(), following the previous
>>>> retirement of follow_hugetlb_page() in 4849807114b8.
>>>>
>>>> There may be a slight difference of how the loops run when processing slow
>>>> GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each
>>>> loop of __get_user_pages() will resolve one pgtable entry with the patch
>>>> applied, rather than relying on the size of hugetlb hstate, the latter may
>>>> cover multiple entries in one loop.
>>>>
>>>> A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over
>>>> a tight loop of slow gup after the path switched.  That shouldn't be a
>>>> problem because slow-gup should not be a hot path for GUP in general: when
>>>> page is commonly present, fast-gup will already succeed, while when the
>>>> page is indeed missing and require a follow up page fault, the slow gup
>>>> degrade will probably buried in the fault paths anyway.  It also explains
>>>> why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup:
>>>> accelerate thp gup even for "pages != NULL"") lands, the latter not part of
>>>> a performance analysis but a side benefit.  If the performance will be a
>>>> concern, we can consider handle CONT_PTE in follow_page().
>>>>
>>>> Before that is justified to be necessary, keep everything clean and simple.
>>>>
>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>>
>>> Afraid I'm seeing an oops when running gup_longterm test on arm64 with current mm-unstable. Git bisect blames this patch. The oops reproduces for me every time on 2 different machines:
>>>
>>>
>>> [    9.340416] kernel BUG at mm/gup.c:778!
>>> [    9.340746] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>> [    9.341199] Modules linked in:
>>> [    9.341481] CPU: 1 PID: 1159 Comm: gup_longterm Not tainted 6.9.0-rc2-00210-g910ff1a347e4 #11
>>> [    9.342232] Hardware name: linux,dummy-virt (DT)
>>> [    9.342647] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    9.343195] pc : follow_page_mask+0x4d4/0x880
>>> [    9.343580] lr : follow_page_mask+0x4d4/0x880
>>> [    9.344018] sp : ffff8000898b3aa0
>>> [    9.344345] x29: ffff8000898b3aa0 x28: fffffdffc53973e8 x27: 00003c0005d08000
>>> [    9.345028] x26: ffff00014e5cfd08 x25: ffffd3513a40c000 x24: fffffdffc5d08000
>>> [    9.345682] x23: ffffc1ffc0000000 x22: 0000000000080101 x21: ffff8000898b3ba8
>>> [    9.346337] x20: 0000fffff4200000 x19: ffff00014e52d508 x18: 0000000000000010
>>> [    9.347005] x17: 5f656e6f7a5f7369 x16: 2120262620296567 x15: 6170286461654865
>>> [    9.347713] x14: 6761502128454741 x13: 2929656761702865 x12: 6761705f65636976
>>> [    9.348371] x11: 65645f656e6f7a5f x10: ffffd3513b31d6e0 x9 : ffffd3513852f090
>>> [    9.349062] x8 : 00000000ffffefff x7 : ffffd3513b31d6e0 x6 : 0000000000000000
>>> [    9.349753] x5 : ffff00017ff98cc8 x4 : 0000000000000fff x3 : 0000000000000000
>>> [    9.350397] x2 : 0000000000000000 x1 : ffff000190e8b480 x0 : 0000000000000052
>>> [    9.351097] Call trace:
>>> [    9.351312]  follow_page_mask+0x4d4/0x880
>>> [    9.351700]  __get_user_pages+0xf4/0x3e8
>>> [    9.352089]  __gup_longterm_locked+0x204/0xa70
>>> [    9.352516]  pin_user_pages+0x88/0xc0
>>> [    9.352873]  gup_test_ioctl+0x860/0xc40
>>> [    9.353249]  __arm64_sys_ioctl+0xb0/0x100
>>> [    9.353648]  invoke_syscall+0x50/0x128
>>> [    9.354022]  el0_svc_common.constprop.0+0x48/0xf8
>>> [    9.354488]  do_el0_svc+0x28/0x40
>>> [    9.354822]  el0_svc+0x34/0xe0
>>> [    9.355128]  el0t_64_sync_handler+0x13c/0x158
>>> [    9.355489]  el0t_64_sync+0x190/0x198
>>> [    9.355793] Code: aa1803e0 d000d8e1 91220021 97fff560 (d4210000)
>>> [    9.356280] ---[ end trace 0000000000000000 ]---
>>> [    9.356651] note: gup_longterm[1159] exited with irqs disabled
>>> [    9.357141] note: gup_longterm[1159] exited with preempt_count 2
>>> [    9.358033] ------------[ cut here ]------------
>>> [    9.358800] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x108/0x120
>>> [    9.360157] Modules linked in:
>>> [    9.360541] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.9.0-rc2-00210-g910ff1a347e4 #11
>>> [    9.361626] Hardware name: linux,dummy-virt (DT)
>>> [    9.362087] pstate: 204003c5 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    9.362758] pc : ct_kernel_exit.constprop.0+0x108/0x120
>>> [    9.363306] lr : ct_idle_enter+0x10/0x20
>>> [    9.363845] sp : ffff8000801abdc0
>>> [    9.364222] x29: ffff8000801abdc0 x28: 0000000000000000 x27: 0000000000000000
>>> [    9.364961] x26: 0000000000000000 x25: ffff00014149d780 x24: 0000000000000000
>>> [    9.365557] x23: 0000000000000000 x22: ffffd3513b299d48 x21: ffffd3513a785730
>>> [    9.366239] x20: ffffd3513b299c28 x19: ffff00017ffa7da0 x18: 0000fffff5ffffff
>>> [    9.366869] x17: 0000000000000000 x16: 1fffe0002a21a8c1 x15: 0000000000000000
>>> [    9.367524] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
>>> [    9.368207] x11: 0000000000000001 x10: 0000000000000ad0 x9 : ffffd35138589230
>>> [    9.369123] x8 : ffff00014149e2b0 x7 : 0000000000000000 x6 : 000000000f8c0fb2
>>> [    9.370403] x5 : 4000000000000002 x4 : ffff2cb045825000 x3 : ffff8000801abdc0
>>> [    9.371170] x2 : ffffd3513a782da0 x1 : 4000000000000000 x0 : ffffd3513a782da0
>>> [    9.372279] Call trace:
>>> [    9.372519]  ct_kernel_exit.constprop.0+0x108/0x120
>>> [    9.373216]  ct_idle_enter+0x10/0x20
>>> [    9.373562]  default_idle_call+0x3c/0x160
>>> [    9.374055]  do_idle+0x21c/0x280
>>> [    9.374394]  cpu_startup_entry+0x3c/0x50
>>> [    9.374797]  secondary_start_kernel+0x140/0x168
>>> [    9.375220]  __secondary_switched+0xb8/0xc0
>>> [    9.375875] ---[ end trace 0000000000000000 ]---
>>>
>>>
>>> The oops trigger is at mm/gup.c:778:
>>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>>
>>>
>>> This is the output of gup_longterm (last output is just before oops):
>>>
>>> # [INFO] detected hugetlb page size: 2048 KiB
>>> # [INFO] detected hugetlb page size: 32768 KiB
>>> # [INFO] detected hugetlb page size: 64 KiB
>>> # [INFO] detected hugetlb page size: 1048576 KiB
>>> TAP version 13
>>> 1..70
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
>>> ok 1 Should have worked
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
>>> ok 2 Should have failed
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
>>> ok 3 Should have failed
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
>>> ok 4 Should have worked
>>> # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
>>>
>>>
>>> So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm guessing you're trying to iterate 2M into a cont-pmd folio and ending up with an unexpected tail page?
>>
>> I assume we find the expected tail page, it's just that the check
>>
>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>
>> Doesn't make sense with hugetlb folios. We might have a tail page mapped in
>> a cont-pmd entry. As soon as we call follow_huge_pmd() on "not the first
>> cont-pmd entry", we trigger this check.
>>
>> Likely this sanity check must also allow for hugetlb folios. Or we should
>> just remove it completely.
> 
> Right, IMHO it'll be easier we remove it, actually I see there's one more
> at the end, so I think we need to remove both.
> 
>>
>> In the past, we wanted to make sure that we never get tail pages of THP from
>> PMD entries, because something would currently be broken (we don't support
>> THP > PMD).
> 
> There's probably one more thing we need to do, on allowing
> PageAnonExclusive() to work with hugetlb tails. Even if we remove the
> warnings and if I read the code right, we can BUG_ON again on checking tail
> pages over anon-exclusive for PageHuge.
> 
> So I assume to fix it completely, we may need two changes: Patch 1 to
> prepare PageAnonExclusive() to work on hugetlb tails, then patch 2 to be
> squashed into the patch "mm/gup: handle huge pmd for follow_pmd_mask()".
> Note: not this patch to fixup, as this patch only does the "switchover" to
> the new path, the culprit should be the other patch..
> 
> I have them attached below first, before I'll also go and see whether I can
> run some arm tests later today or tomorrow.  David, any comments from
> anon-exclusive side?

I added the PageAnonExclusive checks for hugetlb back then, because 
calling it on a tail page indicated real trouble for hugetlb.

Well, and I didn't want to have runtime-hugetlb checks in 
PageAnonExclusive code called on certainly-not-hugetlb code paths.

Personally, I'd fixup the problematic callsite where we know nothing 
nasty is happening (like we did for gup_must_unshare(), because we don't 
expect hugetlb tail pages from arbitrary other code).

But as I'm getting closer to a folio_test_anon_exclusive() 
implementation as we speak (closer, but not done :) ... ), where I'd 
remove any such hugetlb special handling, I don't particularly care how 
we handle GUP here in the meantime.

-- 
Cheers,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?
From: Robin Murphy @ 2024-04-02 16:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Tyler Hicks, Jason Gunthorpe, Jerry Snitselaar, linux-arm-kernel,
	iommu, linux-kernel, Dexuan Cui, Easwar Hariharan
In-Reply-To: <20240322155157.GD5634@willie-the-truck>

On 2024-03-22 3:51 pm, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 06:17:39PM +0000, Robin Murphy wrote:
>> In terms of the shutdown behaviour, I think it actually works out as-is. For
>> the normal case we haven't touched GBPA, so we are truly returning to the
>> boot-time condition; in the unexpected case where SMMUEN was already enabled
>> then we'll go into an explicit GPBA abort state, but that seems a
>> not-unreasonable compromise for not preserving the entire boot-time Stream
>> Table etc., whose presence kind of implies it wouldn't have been bypassing
>> everything anyway.
>>
>> The more I look at the remaining aspect of disable_bypass for controlling
>> broken-DT behaviour the more I suspect it can't actually be useful either
>> way, especially not since default domains. I have no memory of what my
>> original reasoning might have been, so I'm inclined to just rip that all out
>> and let probe fail. I see no reason these days not to expect a broken DT to
>> leads to a broken system, especially not now with DTSchema validation.
> 
> That sounds reasonable to me, although we may end up having to back it
> out if we regress systems with borked firmware :(
> 
>> Then there's just the kdump warning it suppresses, of which I also have no
>> idea why it's there either, but apparently that one's on you :P
> 
> I think _that_ one is because the previous (crashed) kernel won't have
> torn anything down, so there could be active DMA using translations in
> the SMMU. In that case, the crashkernel (which is running from some
> carveout) may find the SMMU enabled, but it really can't stick it into
> bypass mode because that's likely to corrupt random memory. So in that
> case, we do stick it into abort before we reinitialise it and then we
> disabling fault reporting altogether to avoid the log spam:
> 
> 	if (is_kdump_kernel())
> 		enables &= ~(CR0_EVTQEN | CR0_PRIQEN)

Oh, I know why we do what we do for the kdump situation in general - it 
was merely the matter of why we chose to demand that the user explicitly 
tells us to do what we know is the right thing (and scream at them if 
they don't), rather than to just go ahead and do the right thing anyway.

(the significance of disable_bypass for kdump is after we turn the SMMU 
back on from GBPA Abort state - we don't want any ongoing traffic being 
able to inadvertently bypass via an STE config either)

Cheers,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: Peter Xu @ 2024-04-02 16:26 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Matthew Wilcox, David Hildenbrand, linux-mm, linux-kernel,
	Yang Shi, Kirill A . Shutemov, Mike Kravetz, John Hubbard,
	Michael Ellerman, Andrew Jones, Muchun Song, linux-riscv,
	linuxppc-dev, Christophe Leroy, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Rik van Riel, linux-arm-kernel, Andrea Arcangeli,
	Aneesh Kumar K . V, Vlastimil Babka, James Houghton,
	Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <3f670f03-ee97-4368-94ca-e8d18a1b1a69@arm.com>

On Tue, Apr 02, 2024 at 05:18:36PM +0100, Ryan Roberts wrote:
> On 02/04/2024 17:00, Matthew Wilcox wrote:
> > On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
> >>> The oops trigger is at mm/gup.c:778:
> >>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
> >>>
> >>> So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm guessing you're trying to iterate 2M into a cont-pmd folio and ending up with an unexpected tail page?
> >>
> >> I assume we find the expected tail page, it's just that the check
> >>
> >> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
> >>
> >> Doesn't make sense with hugetlb folios. We might have a tail page mapped in
> >> a cont-pmd entry. As soon as we call follow_huge_pmd() on "not the first
> >> cont-pmd entry", we trigger this check.
> >>
> >> Likely this sanity check must also allow for hugetlb folios. Or we should
> >> just remove it completely.
> >>
> >> In the past, we wanted to make sure that we never get tail pages of THP from
> >> PMD entries, because something would currently be broken (we don't support
> >> THP > PMD).
> > 
> > That was a practical limitation on my part.  We have various parts of
> > the MM which assume that pmd_page() returns a head page and until we
> > get all of those fixed, adding support for folios larger than PMD_SIZE
> > was only going to cause trouble for no significant wins.
> > 
> > I agree with you we should get rid of this assertion entirely.  We should
> > fix all the places which assume that pmd_page() returns a head page,
> > but that may take some time.
> > 
> > As an example, filemap_map_pmd() has:
> > 
> >        if (pmd_none(*vmf->pmd) && folio_test_pmd_mappable(folio)) {
> >                 struct page *page = folio_file_page(folio, start);
> >                 vm_fault_t ret = do_set_pmd(vmf, page);
> > 
> > and then do_set_pmd() has:
> > 
> >         if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
> >                 return ret;
> > 
> > so we'd simply refuse to use a PMD to map a folio larger than PMD_SIZE.
> > There's a lot of work to be done to make this work generally (not to
> > mention figuring out how to handle mapcount for such folios ;-).

Hmm, I think it means there're more work than I was thinking... but that's
okay, let's move one step at a time..

> > 
> > This particular case seems straightforward though.  Just remove the
> > assertion.
> 
> Removing the assertion gets me further, but then I end up with this:
> 
> [    9.748422] kernel BUG at include/linux/page-flags.h:1098!
> [    9.748897] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> [    9.749590] Modules linked in:
> [    9.749867] CPU: 2 PID: 1155 Comm: gup_longterm Not tainted 6.9.0-rc2-00210-g910ff1a347e4-dirty #12
> [    9.750682] Hardware name: linux,dummy-virt (DT)
> [    9.751095] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    9.751729] pc : follow_page_mask+0x730/0x850
> [    9.752152] lr : follow_page_mask+0x730/0x850
> [    9.752573] sp : ffff8000898f3aa0
> [    9.752882] x29: ffff8000898f3aa0 x28: fffffdffc52b91a8 x27: 0000000000000001
> [    9.753543] x26: ffff00014ae46d08 x25: 00003c0005d88000 x24: fffffdffc5d88000
> [    9.754221] x23: ffffc1ffc0000000 x22: 0000000000080101 x21: ffff8000898f3ba8
> [    9.754875] x20: 0000fffff4200000 x19: ffff0001a3d64450 x18: 0000000000000010
> [    9.755567] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548
> [    9.756254] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021
> [    9.756953] x11: 2026262029656761 x10: ffffaaac08f1d6e0 x9 : ffffaaac0612f090
> [    9.757671] x8 : 00000000ffffefff x7 : ffffaaac08f1d6e0 x6 : 0000000000000000
> [    9.758356] x5 : ffff00017ffb9cc8 x4 : 0000000000000fff x3 : 0000000000000000
> [    9.758983] x2 : 0000000000000000 x1 : ffff000189ecb480 x0 : 0000000000000046
> [    9.759663] Call trace:
> [    9.759901]  follow_page_mask+0x730/0x850
> [    9.760293]  __get_user_pages+0xf4/0x3e8
> [    9.760683]  __gup_longterm_locked+0x204/0xa70
> [    9.761110]  pin_user_pages+0x88/0xc0
> [    9.761486]  gup_test_ioctl+0x860/0xc40
> [    9.761866]  __arm64_sys_ioctl+0xb0/0x100
> [    9.762254]  invoke_syscall+0x50/0x128
> [    9.762630]  el0_svc_common.constprop.0+0x48/0xf8
> [    9.763104]  do_el0_svc+0x28/0x40
> [    9.763413]  el0_svc+0x34/0xe0
> [    9.763699]  el0t_64_sync_handler+0x13c/0x158
> [    9.764139]  el0t_64_sync+0x190/0x198
> [    9.764465] Code: aa1803e0 d000d8e1 911d6021 97fff4c9 (d4210000) 
> [    9.765053] ---[ end trace 0000000000000000 ]---
> [    9.765520] note: gup_longterm[1155] exited with irqs disabled
> [    9.766146] note: gup_longterm[1155] exited with preempt_count 2
> [    9.767366] ------------[ cut here ]------------
> [    9.768062] WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x108/0x120
> [    9.769146] Modules linked in:
> [    9.769429] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D            6.9.0-rc2-00210-g910ff1a347e4-dirty #12
> [    9.770338] Hardware name: linux,dummy-virt (DT)
> [    9.770837] pstate: 204003c5 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    9.771615] pc : ct_kernel_exit.constprop.0+0x108/0x120
> [    9.772150] lr : ct_idle_enter+0x10/0x20
> [    9.772539] sp : ffff8000801b3dc0
> [    9.772913] x29: ffff8000801b3dc0 x28: 0000000000000000 x27: 0000000000000000
> [    9.773769] x26: 0000000000000000 x25: ffff00014149e900 x24: 0000000000000000
> [    9.774526] x23: 0000000000000000 x22: ffffaaac08e99d48 x21: ffffaaac08385730
> [    9.775255] x20: ffffaaac08e99c28 x19: ffff00017ffc8da0 x18: 0000fffff5ffffff
> [    9.775924] x17: 0000000000000000 x16: 1fffe0002a57c9e1 x15: 0000000000000001
> [    9.776619] x14: ffffffffffffffff x13: 0000000000000000 x12: ffffaaac07a06968
> [    9.777246] x11: 000000ae44c42eec x10: 0000000000000ad0 x9 : ffffaaac06189230
> [    9.777942] x8 : ffff00014149f430 x7 : 02c9acb509db422c x6 : 000000001015a9f0
> [    9.778635] x5 : 4000000000000002 x4 : ffff555577c46000 x3 : ffff8000801b3dc0
> [    9.779671] x2 : ffffaaac08382da0 x1 : 4000000000000000 x0 : ffffaaac08382da0
> [    9.780703] Call trace:
> [    9.781150]  ct_kernel_exit.constprop.0+0x108/0x120
> [    9.781949]  ct_idle_enter+0x10/0x20
> [    9.782246]  default_idle_call+0x3c/0x160
> [    9.782624]  do_idle+0x21c/0x280
> [    9.782945]  cpu_startup_entry+0x3c/0x50
> [    9.783268]  secondary_start_kernel+0x140/0x168
> [    9.783818]  __secondary_switched+0xb8/0xc0
> [    9.784163] ---[ end trace 0000000000000000 ]---
> 
> 
> Which is caused by this:
> 
> static __always_inline int PageAnonExclusive(const struct page *page)
> {
> 	VM_BUG_ON_PGFLAGS(!PageAnon(page), page);
> 	VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); <<<<
> 	return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);
> }
> 
> Which is called from can_follow_write_pmd(), called just after the assert I just commented out.
> 
> 
> It's triggered by this test:
> 
> # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (32768 kB)
> 
> Which is the first MAP_PRIVATE test for cont-pmd mapped hugetlb. (All MAP_SHARED tests are passing).
> 
> 
> Looks like can_follow_write_pmd() returns early for VM_SHARED mappings.
> 
> I don't think we only keep the PAE flag in the head page for hugetlb pages? So we can't just remove this assert?
> 
> I tried just commenting it out and get assert further down follow_huge_pmd():
> 
> VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
> 			!PageAnonExclusive(page), page);

I just replied in another email; we can try the two patches I attached, or
we can wait until I do some tests (but will be mostly unavailable this
afternoon).

Thanks,

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 1/5] KVM: arm64: Share all userspace hardened thread data with the hypervisor
From: Mark Brown @ 2024-04-02 16:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, James Morse,
	Suzuki K Poulose, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
	linux-kernel, Dave Martin, kvmarm, linux-doc, linux-kselftest
In-Reply-To: <86h6gju87m.wl-maz@kernel.org>

[-- Attachment #1.1: Type: text/plain, Size: 1471 bytes --]

On Tue, Apr 02, 2024 at 03:53:33PM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > Sure, those patches are still in flight though.  It does seem reasonable
> > to target the current code.

> Sure, if your intent is for this code not to be merged.

> Because it means this series assumes a different data life cycle, and
> the review effort spent on it will be invalidated once you move to the
> per-CPU state.

I don't have any visibility on when those patches are likely to get
merged or the general practices with in flight serieses here, last time
around with some of the serieses that were in flight it was quite late
which did make it unclear if things would go in during that release
cycle at all.

The amount of churn in KVM recently and long periods where the relevant
patches are apparently pre accepted but for various not always clear
reasons not actually merged is making it quite hard to target, you're
obviously going to be a lot more in the loop so this is doubtless
clearer to you than to me.  It's also been a little unclear what the
expectations are for basing things on - some people do prefer to do
their own merging for example, and while you have mentioned your in
flight serieses your communication style means that it's not been
entirely clear if you're just noting the overlap.  Is it just that
refactoring series you want taking into account here or are there other
in flight serieses that should be rolled into a base?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: Peter Xu @ 2024-04-02 16:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ryan Roberts, linux-mm, linux-kernel, Yang Shi,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Michael Ellerman,
	Andrew Jones, Muchun Song, linux-riscv, linuxppc-dev,
	Christophe Leroy, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Matthew Wilcox, Rik van Riel, linux-arm-kernel,
	Andrea Arcangeli, Aneesh Kumar K . V, Vlastimil Babka,
	James Houghton, Jason Gunthorpe, Mike Rapoport, Axel Rasmussen
In-Reply-To: <5d9dd9a7-e544-4741-944c-469b79c2c649@redhat.com>

On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
> On 02.04.24 16:48, Ryan Roberts wrote:
> > Hi Peter,

Hey, Ryan,

Thanks for the report!

> > 
> > On 27/03/2024 15:23, peterx@redhat.com wrote:
> > > From: Peter Xu <peterx@redhat.com>
> > > 
> > > Now follow_page() is ready to handle hugetlb pages in whatever form, and
> > > over all architectures.  Switch to the generic code path.
> > > 
> > > Time to retire hugetlb_follow_page_mask(), following the previous
> > > retirement of follow_hugetlb_page() in 4849807114b8.
> > > 
> > > There may be a slight difference of how the loops run when processing slow
> > > GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each
> > > loop of __get_user_pages() will resolve one pgtable entry with the patch
> > > applied, rather than relying on the size of hugetlb hstate, the latter may
> > > cover multiple entries in one loop.
> > > 
> > > A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over
> > > a tight loop of slow gup after the path switched.  That shouldn't be a
> > > problem because slow-gup should not be a hot path for GUP in general: when
> > > page is commonly present, fast-gup will already succeed, while when the
> > > page is indeed missing and require a follow up page fault, the slow gup
> > > degrade will probably buried in the fault paths anyway.  It also explains
> > > why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup:
> > > accelerate thp gup even for "pages != NULL"") lands, the latter not part of
> > > a performance analysis but a side benefit.  If the performance will be a
> > > concern, we can consider handle CONT_PTE in follow_page().
> > > 
> > > Before that is justified to be necessary, keep everything clean and simple.
> > > 
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > Afraid I'm seeing an oops when running gup_longterm test on arm64 with current mm-unstable. Git bisect blames this patch. The oops reproduces for me every time on 2 different machines:
> > 
> > 
> > [    9.340416] kernel BUG at mm/gup.c:778!
> > [    9.340746] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> > [    9.341199] Modules linked in:
> > [    9.341481] CPU: 1 PID: 1159 Comm: gup_longterm Not tainted 6.9.0-rc2-00210-g910ff1a347e4 #11
> > [    9.342232] Hardware name: linux,dummy-virt (DT)
> > [    9.342647] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    9.343195] pc : follow_page_mask+0x4d4/0x880
> > [    9.343580] lr : follow_page_mask+0x4d4/0x880
> > [    9.344018] sp : ffff8000898b3aa0
> > [    9.344345] x29: ffff8000898b3aa0 x28: fffffdffc53973e8 x27: 00003c0005d08000
> > [    9.345028] x26: ffff00014e5cfd08 x25: ffffd3513a40c000 x24: fffffdffc5d08000
> > [    9.345682] x23: ffffc1ffc0000000 x22: 0000000000080101 x21: ffff8000898b3ba8
> > [    9.346337] x20: 0000fffff4200000 x19: ffff00014e52d508 x18: 0000000000000010
> > [    9.347005] x17: 5f656e6f7a5f7369 x16: 2120262620296567 x15: 6170286461654865
> > [    9.347713] x14: 6761502128454741 x13: 2929656761702865 x12: 6761705f65636976
> > [    9.348371] x11: 65645f656e6f7a5f x10: ffffd3513b31d6e0 x9 : ffffd3513852f090
> > [    9.349062] x8 : 00000000ffffefff x7 : ffffd3513b31d6e0 x6 : 0000000000000000
> > [    9.349753] x5 : ffff00017ff98cc8 x4 : 0000000000000fff x3 : 0000000000000000
> > [    9.350397] x2 : 0000000000000000 x1 : ffff000190e8b480 x0 : 0000000000000052
> > [    9.351097] Call trace:
> > [    9.351312]  follow_page_mask+0x4d4/0x880
> > [    9.351700]  __get_user_pages+0xf4/0x3e8
> > [    9.352089]  __gup_longterm_locked+0x204/0xa70
> > [    9.352516]  pin_user_pages+0x88/0xc0
> > [    9.352873]  gup_test_ioctl+0x860/0xc40
> > [    9.353249]  __arm64_sys_ioctl+0xb0/0x100
> > [    9.353648]  invoke_syscall+0x50/0x128
> > [    9.354022]  el0_svc_common.constprop.0+0x48/0xf8
> > [    9.354488]  do_el0_svc+0x28/0x40
> > [    9.354822]  el0_svc+0x34/0xe0
> > [    9.355128]  el0t_64_sync_handler+0x13c/0x158
> > [    9.355489]  el0t_64_sync+0x190/0x198
> > [    9.355793] Code: aa1803e0 d000d8e1 91220021 97fff560 (d4210000)
> > [    9.356280] ---[ end trace 0000000000000000 ]---
> > [    9.356651] note: gup_longterm[1159] exited with irqs disabled
> > [    9.357141] note: gup_longterm[1159] exited with preempt_count 2
> > [    9.358033] ------------[ cut here ]------------
> > [    9.358800] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x108/0x120
> > [    9.360157] Modules linked in:
> > [    9.360541] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.9.0-rc2-00210-g910ff1a347e4 #11
> > [    9.361626] Hardware name: linux,dummy-virt (DT)
> > [    9.362087] pstate: 204003c5 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    9.362758] pc : ct_kernel_exit.constprop.0+0x108/0x120
> > [    9.363306] lr : ct_idle_enter+0x10/0x20
> > [    9.363845] sp : ffff8000801abdc0
> > [    9.364222] x29: ffff8000801abdc0 x28: 0000000000000000 x27: 0000000000000000
> > [    9.364961] x26: 0000000000000000 x25: ffff00014149d780 x24: 0000000000000000
> > [    9.365557] x23: 0000000000000000 x22: ffffd3513b299d48 x21: ffffd3513a785730
> > [    9.366239] x20: ffffd3513b299c28 x19: ffff00017ffa7da0 x18: 0000fffff5ffffff
> > [    9.366869] x17: 0000000000000000 x16: 1fffe0002a21a8c1 x15: 0000000000000000
> > [    9.367524] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
> > [    9.368207] x11: 0000000000000001 x10: 0000000000000ad0 x9 : ffffd35138589230
> > [    9.369123] x8 : ffff00014149e2b0 x7 : 0000000000000000 x6 : 000000000f8c0fb2
> > [    9.370403] x5 : 4000000000000002 x4 : ffff2cb045825000 x3 : ffff8000801abdc0
> > [    9.371170] x2 : ffffd3513a782da0 x1 : 4000000000000000 x0 : ffffd3513a782da0
> > [    9.372279] Call trace:
> > [    9.372519]  ct_kernel_exit.constprop.0+0x108/0x120
> > [    9.373216]  ct_idle_enter+0x10/0x20
> > [    9.373562]  default_idle_call+0x3c/0x160
> > [    9.374055]  do_idle+0x21c/0x280
> > [    9.374394]  cpu_startup_entry+0x3c/0x50
> > [    9.374797]  secondary_start_kernel+0x140/0x168
> > [    9.375220]  __secondary_switched+0xb8/0xc0
> > [    9.375875] ---[ end trace 0000000000000000 ]---
> > 
> > 
> > The oops trigger is at mm/gup.c:778:
> > VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
> > 
> > 
> > This is the output of gup_longterm (last output is just before oops):
> > 
> > # [INFO] detected hugetlb page size: 2048 KiB
> > # [INFO] detected hugetlb page size: 32768 KiB
> > # [INFO] detected hugetlb page size: 64 KiB
> > # [INFO] detected hugetlb page size: 1048576 KiB
> > TAP version 13
> > 1..70
> > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd
> > ok 1 Should have worked
> > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile
> > ok 2 Should have failed
> > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile
> > ok 3 Should have failed
> > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB)
> > ok 4 Should have worked
> > # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (32768 kB)
> > 
> > 
> > So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm guessing you're trying to iterate 2M into a cont-pmd folio and ending up with an unexpected tail page?
> 
> I assume we find the expected tail page, it's just that the check
> 
> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
> 
> Doesn't make sense with hugetlb folios. We might have a tail page mapped in
> a cont-pmd entry. As soon as we call follow_huge_pmd() on "not the first
> cont-pmd entry", we trigger this check.
> 
> Likely this sanity check must also allow for hugetlb folios. Or we should
> just remove it completely.

Right, IMHO it'll be easier we remove it, actually I see there's one more
at the end, so I think we need to remove both.

> 
> In the past, we wanted to make sure that we never get tail pages of THP from
> PMD entries, because something would currently be broken (we don't support
> THP > PMD).

There's probably one more thing we need to do, on allowing
PageAnonExclusive() to work with hugetlb tails. Even if we remove the
warnings and if I read the code right, we can BUG_ON again on checking tail
pages over anon-exclusive for PageHuge.

So I assume to fix it completely, we may need two changes: Patch 1 to
prepare PageAnonExclusive() to work on hugetlb tails, then patch 2 to be
squashed into the patch "mm/gup: handle huge pmd for follow_pmd_mask()".
Note: not this patch to fixup, as this patch only does the "switchover" to
the new path, the culprit should be the other patch..

I have them attached below first, before I'll also go and see whether I can
run some arm tests later today or tomorrow.  David, any comments from
anon-exclusive side?

Thanks,

===8<===

From 26f0670acea948945222c97a9cab58428782ca69 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 2 Apr 2024 11:52:28 -0400
Subject: [PATCH 1/2] mm: Allow anon exclusive check over hugetlb tail pages

PageAnonExclusive() used to forbid tail pages for hugetlbfs, as that used
to be called mostly in hugetlb specific paths and the head page was
guaranteed.

As we move forward towards merging hugetlb paths into generic mm, we may
start to pass in tail hugetlb pages (when with cont-pte/cont-pmd huge
pages) for such check.  Allow it to properly fetch the head, in which case
the anon-exclusiveness of the head will always represents the tail page.

There's already a sign of it when we look at the fast-gup which already
contain the hugetlb processing altogether: we used to have a specific
commit 5805192c7b72 ("mm/gup: handle cont-PTE hugetlb pages correctly in
gup_must_unshare() via GUP-fast") covering that area.  Now with this more
generic change, that can also go away.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/page-flags.h |  8 +++++++-
 mm/internal.h              | 10 ----------
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 888353c209c0..225357f48a79 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -1095,7 +1095,13 @@ PAGEFLAG(Isolated, isolated, PF_ANY);
 static __always_inline int PageAnonExclusive(const struct page *page)
 {
 	VM_BUG_ON_PGFLAGS(!PageAnon(page), page);
-	VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
+	/*
+	 * Allow the anon-exclusive check to work on hugetlb tail pages.
+	 * Here hugetlb pages will always guarantee the anon-exclusiveness
+	 * of the head page represents the tail pages.
+	 */
+	if (PageHuge(page) && !PageHead(page))
+		page = compound_head(page);
 	return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 9512de7398d5..87f6e4fd56a5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1259,16 +1259,6 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma,
 	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
 		smp_rmb();
 
-	/*
-	 * During GUP-fast we might not get called on the head page for a
-	 * hugetlb page that is mapped using cont-PTE, because GUP-fast does
-	 * not work with the abstracted hugetlb PTEs that always point at the
-	 * head page. For hugetlb, PageAnonExclusive only applies on the head
-	 * page (as it cannot be partially COW-shared), so lookup the head page.
-	 */
-	if (unlikely(!PageHead(page) && PageHuge(page)))
-		page = compound_head(page);
-
 	/*
 	 * Note that PageKsm() pages cannot be exclusive, and consequently,
 	 * cannot get pinned.
-- 
2.44.0

From 2411d1c6cc95a937c8b7c74e13a206c22c034fab Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 2 Apr 2024 12:04:35 -0400
Subject: [PATCH 2/2] fixup! mm/gup: handle huge pmd for follow_pmd_mask()

Allow follow_pmd_mask() to take hugetlb tail pages.  The old warnings do
not help now as hugetlb now allows it to happen, so drop them.

Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 91d70057aea0..d60b63fcfc82 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -775,8 +775,6 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
 	assert_spin_locked(pmd_lockptr(mm, pmd));
 
 	page = pmd_page(pmdval);
-	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
-
 	if ((flags & FOLL_WRITE) &&
 	    !can_follow_write_pmd(pmdval, page, vma, flags))
 		return NULL;
@@ -805,7 +803,6 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
 
 	page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
 	ctx->page_mask = HPAGE_PMD_NR - 1;
-	VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);
 
 	return page;
 }
-- 
2.44.0

-- 
Peter Xu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH] arm64: dts: ti: k3-j722s: Disable ethernet ports by default
From: Nishanth Menon @ 2024-04-02 16:19 UTC (permalink / raw)
  To: Michael Walle
  Cc: Vignesh Raghavendra, Tero Kristo, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, linux-arm-kernel, devicetree,
	linux-kernel
In-Reply-To: <D09RSRPKV9L9.2TS01DA84TEKK@kernel.org>

On 18:16-20240402, Michael Walle wrote:
> Hi,
> 
> > This is better at arch/arm64/boot/dts/ti/k3-am62p-main.dtsi.
> 
> I'm fine with that, but be aware that we'll also change the am62p
> SoC dtsi in that case.

This change is valid to me (sigh.. it's been a whack-a-mole as you can expect).
62p and j722s/am67 are too closely related anyways.. but, this will need
to percolate consistently to all k3 SoCs.. (different problem).

-- 
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3  1A34 DDB5 849D 1736 249D

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v1 6/6] clk: meson: a1: add Amlogic A1 CPU clock controller driver
From: Jerome Brunet @ 2024-04-02 16:17 UTC (permalink / raw)
  To: Dmitry Rokosov
  Cc: Jerome Brunet, neil.armstrong, mturquette, sboyd, robh+dt,
	krzysztof.kozlowski+dt, khilman, martin.blumenstingl, kernel,
	rockosov, linux-amlogic, linux-clk, devicetree, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20240402161110.v7t72s6t7m3bzifi@CAB-WSD-L081021>


On Tue 02 Apr 2024 at 19:11, Dmitry Rokosov <ddrokosov@salutedevices.com> wrote:

> On Tue, Apr 02, 2024 at 04:11:12PM +0200, Jerome Brunet wrote:
>> 
>> On Tue 02 Apr 2024 at 14:05, Dmitry Rokosov <ddrokosov@salutedevices.com> wrote:
>> 
>> > Hello Jerome,
>> >
>> > On Tue, Apr 02, 2024 at 11:35:49AM +0200, Jerome Brunet wrote:
>> >> 
>> >> On Fri 29 Mar 2024 at 23:58, Dmitry Rokosov <ddrokosov@salutedevices.com> wrote:
>> >> 
>> >> > The CPU clock controller plays a general role in the Amlogic A1 SoC
>> >> > family by generating CPU clocks. As an APB slave module, it offers the
>> >> > capability to inherit the CPU clock from two sources: the internal fixed
>> >> > clock known as 'cpu fixed clock' and the external input provided by the
>> >> > A1 PLL clock controller, referred to as 'syspll'.
>> >> >
>> >> > It is important for the driver to handle cpu_clk rate switching
>> >> > effectively by transitioning to the CPU fixed clock to avoid any
>> >> > potential execution freezes.
>> >> >
>> >> > Signed-off-by: Dmitry Rokosov <ddrokosov@salutedevices.com>
>> >> > ---
>> >> >  drivers/clk/meson/Kconfig  |  10 ++
>> >> >  drivers/clk/meson/Makefile |   1 +
>> >> >  drivers/clk/meson/a1-cpu.c | 324 +++++++++++++++++++++++++++++++++++++
>> >> >  drivers/clk/meson/a1-cpu.h |  16 ++
>> >> >  4 files changed, 351 insertions(+)
>> >> >  create mode 100644 drivers/clk/meson/a1-cpu.c
>> >> >  create mode 100644 drivers/clk/meson/a1-cpu.h
>> >> >
>> >> > diff --git a/drivers/clk/meson/Kconfig b/drivers/clk/meson/Kconfig
>> >> > index 80c4a18c83d2..148d4495eee3 100644
>> >> > --- a/drivers/clk/meson/Kconfig
>> >> > +++ b/drivers/clk/meson/Kconfig
>> >> > @@ -111,6 +111,16 @@ config COMMON_CLK_AXG_AUDIO
>> >> >  	  Support for the audio clock controller on AmLogic A113D devices,
>> >> >  	  aka axg, Say Y if you want audio subsystem to work.
>> >> >  
>> >> > +config COMMON_CLK_A1_CPU
>> >> > +	tristate "Amlogic A1 SoC CPU controller support"
>> >> > +	depends on ARM64
>> >> > +	select COMMON_CLK_MESON_REGMAP
>> >> > +	select COMMON_CLK_MESON_CLKC_UTILS
>> >> > +	help
>> >> > +	  Support for the CPU clock controller on Amlogic A113L based
>> >> > +	  device, A1 SoC Family. Say Y if you want A1 CPU clock controller
>> >> > +	  to work.
>> >> > +
>> >> >  config COMMON_CLK_A1_PLL
>> >> >  	tristate "Amlogic A1 SoC PLL controller support"
>> >> >  	depends on ARM64
>> >> > diff --git a/drivers/clk/meson/Makefile b/drivers/clk/meson/Makefile
>> >> > index 4968fc7ad555..2a06eb0303d6 100644
>> >> > --- a/drivers/clk/meson/Makefile
>> >> > +++ b/drivers/clk/meson/Makefile
>> >> > @@ -18,6 +18,7 @@ obj-$(CONFIG_COMMON_CLK_MESON_AUDIO_RSTC) += meson-audio-rstc.o
>> >> >  
>> >> >  obj-$(CONFIG_COMMON_CLK_AXG) += axg.o axg-aoclk.o
>> >> >  obj-$(CONFIG_COMMON_CLK_AXG_AUDIO) += axg-audio.o
>> >> > +obj-$(CONFIG_COMMON_CLK_A1_CPU) += a1-cpu.o
>> >> >  obj-$(CONFIG_COMMON_CLK_A1_PLL) += a1-pll.o
>> >> >  obj-$(CONFIG_COMMON_CLK_A1_PERIPHERALS) += a1-peripherals.o
>> >> >  obj-$(CONFIG_COMMON_CLK_A1_AUDIO) += a1-audio.o
>> >> > diff --git a/drivers/clk/meson/a1-cpu.c b/drivers/clk/meson/a1-cpu.c
>> >> > new file mode 100644
>> >> > index 000000000000..5f5d8ae112e5
>> >> > --- /dev/null
>> >> > +++ b/drivers/clk/meson/a1-cpu.c
>> >> > @@ -0,0 +1,324 @@
>> >> > +// SPDX-License-Identifier: GPL-2.0+
>> >> > +/*
>> >> > + * Amlogic A1 SoC family CPU Clock Controller driver.
>> >> > + *
>> >> > + * Copyright (c) 2024, SaluteDevices. All Rights Reserved.
>> >> > + * Author: Dmitry Rokosov <ddrokosov@salutedevices.com>
>> >> > + */
>> >> > +
>> >> > +#include <linux/clk.h>
>> >> > +#include <linux/clk-provider.h>
>> >> > +#include <linux/mod_devicetable.h>
>> >> > +#include <linux/platform_device.h>
>> >> > +#include "a1-cpu.h"
>> >> > +#include "clk-regmap.h"
>> >> > +#include "meson-clkc-utils.h"
>> >> > +
>> >> > +#include <dt-bindings/clock/amlogic,a1-cpu-clkc.h>
>> >> > +
>> >> > +static u32 cpu_fsource_sel_table[] = { 0, 1, 2 };
>> >> > +static const struct clk_parent_data cpu_fsource_sel_parents[] = {
>> >> > +	{ .fw_name = "xtal" },
>> >> > +	{ .fw_name = "fclk_div2" },
>> >> > +	{ .fw_name = "fclk_div3" },
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fsource_sel0 = {
>> >> > +	.data = &(struct clk_regmap_mux_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.mask = 0x3,
>> >> > +		.shift = 0,
>> >> > +		.table = cpu_fsource_sel_table,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fsource_sel0",
>> >> > +		.ops = &clk_regmap_mux_ops,
>> >> > +		.parent_data = cpu_fsource_sel_parents,
>> >> > +		.num_parents = ARRAY_SIZE(cpu_fsource_sel_parents),
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fsource_div0 = {
>> >> > +	.data = &(struct clk_regmap_div_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.shift = 4,
>> >> > +		.width = 6,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fsource_div0",
>> >> > +		.ops = &clk_regmap_divider_ops,
>> >> > +		.parent_hws = (const struct clk_hw *[]) {
>> >> > +			&cpu_fsource_sel0.hw
>> >> > +		},
>> >> > +		.num_parents = 1,
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fsel0 = {
>> >> > +	.data = &(struct clk_regmap_mux_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.mask = 0x1,
>> >> > +		.shift = 2,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fsel0",
>> >> > +		.ops = &clk_regmap_mux_ops,
>> >> > +		.parent_hws = (const struct clk_hw *[]) {
>> >> > +			&cpu_fsource_sel0.hw,
>> >> > +			&cpu_fsource_div0.hw,
>> >> > +		},
>> >> > +		.num_parents = 2,
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fsource_sel1 = {
>> >> > +	.data = &(struct clk_regmap_mux_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.mask = 0x3,
>> >> > +		.shift = 16,
>> >> > +		.table = cpu_fsource_sel_table,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fsource_sel1",
>> >> > +		.ops = &clk_regmap_mux_ops,
>> >> > +		.parent_data = cpu_fsource_sel_parents,
>> >> > +		.num_parents = ARRAY_SIZE(cpu_fsource_sel_parents),
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fsource_div1 = {
>> >> > +	.data = &(struct clk_regmap_div_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.shift = 20,
>> >> > +		.width = 6,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fsource_div1",
>> >> > +		.ops = &clk_regmap_divider_ops,
>> >> > +		.parent_hws = (const struct clk_hw *[]) {
>> >> > +			&cpu_fsource_sel1.hw
>> >> > +		},
>> >> > +		.num_parents = 1,
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fsel1 = {
>> >> > +	.data = &(struct clk_regmap_mux_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.mask = 0x1,
>> >> > +		.shift = 18,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fsel1",
>> >> > +		.ops = &clk_regmap_mux_ops,
>> >> > +		.parent_hws = (const struct clk_hw *[]) {
>> >> > +			&cpu_fsource_sel1.hw,
>> >> > +			&cpu_fsource_div1.hw,
>> >> > +		},
>> >> > +		.num_parents = 2,
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_fclk = {
>> >> > +	.data = &(struct clk_regmap_mux_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.mask = 0x1,
>> >> > +		.shift = 10,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_fclk",
>> >> > +		.ops = &clk_regmap_mux_ops,
>> >> > +		.parent_hws = (const struct clk_hw *[]) {
>> >> > +			&cpu_fsel0.hw,
>> >> > +			&cpu_fsel1.hw,
>> >> > +		},
>> >> > +		.num_parents = 2,
>> >> > +		.flags = CLK_SET_RATE_PARENT,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap cpu_clk = {
>> >> > +	.data = &(struct clk_regmap_mux_data) {
>> >> > +		.offset = CPUCTRL_CLK_CTRL0,
>> >> > +		.mask = 0x1,
>> >> > +		.shift = 11,
>> >> > +	},
>> >> > +	.hw.init = &(struct clk_init_data) {
>> >> > +		.name = "cpu_clk",
>> >> > +		.ops = &clk_regmap_mux_ops,
>> >> > +		.parent_data = (const struct clk_parent_data []) {
>> >> > +			{ .hw = &cpu_fclk.hw },
>> >> > +			{ .fw_name = "sys_pll", },
>> >> > +		},
>> >> > +		.num_parents = 2,
>> >> > +		.flags = CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +/* Array of all clocks registered by this provider */
>> >> > +static struct clk_hw *a1_cpu_hw_clks[] = {
>> >> > +	[CLKID_CPU_FSOURCE_SEL0]	= &cpu_fsource_sel0.hw,
>> >> > +	[CLKID_CPU_FSOURCE_DIV0]	= &cpu_fsource_div0.hw,
>> >> > +	[CLKID_CPU_FSEL0]		= &cpu_fsel0.hw,
>> >> > +	[CLKID_CPU_FSOURCE_SEL1]	= &cpu_fsource_sel1.hw,
>> >> > +	[CLKID_CPU_FSOURCE_DIV1]	= &cpu_fsource_div1.hw,
>> >> > +	[CLKID_CPU_FSEL1]		= &cpu_fsel1.hw,
>> >> > +	[CLKID_CPU_FCLK]		= &cpu_fclk.hw,
>> >> > +	[CLKID_CPU_CLK]			= &cpu_clk.hw,
>> >> > +};
>> >> > +
>> >> > +static struct clk_regmap *const a1_cpu_regmaps[] = {
>> >> > +	&cpu_fsource_sel0,
>> >> > +	&cpu_fsource_div0,
>> >> > +	&cpu_fsel0,
>> >> > +	&cpu_fsource_sel1,
>> >> > +	&cpu_fsource_div1,
>> >> > +	&cpu_fsel1,
>> >> > +	&cpu_fclk,
>> >> > +	&cpu_clk,
>> >> > +};
>> >> > +
>> >> > +static struct regmap_config a1_cpu_regmap_cfg = {
>> >> > +	.reg_bits   = 32,
>> >> > +	.val_bits   = 32,
>> >> > +	.reg_stride = 4,
>> >> > +	.max_register = CPUCTRL_CLK_CTRL1,
>> >> > +};
>> >> > +
>> >> > +static struct meson_clk_hw_data a1_cpu_clks = {
>> >> > +	.hws = a1_cpu_hw_clks,
>> >> > +	.num = ARRAY_SIZE(a1_cpu_hw_clks),
>> >> > +};
>> >> > +
>> >> > +struct a1_cpu_clk_nb_data {
>> >> > +	const struct clk_ops *mux_ops;
>> >> 
>> >> That's fishy ...
>> >> 
>> >> > +	struct clk_hw *cpu_clk;
>> >> > +	struct notifier_block nb;
>> >> > +	u8 parent;
>> >> > +};
>> >> > +
>> >> > +#define MESON_A1_CPU_CLK_GET_PARENT(nbd) \
>> >> > +	((nbd)->mux_ops->get_parent((nbd)->cpu_clk))
>> >> > +#define MESON_A1_CPU_CLK_SET_PARENT(nbd, index) \
>> >> > +	((nbd)->mux_ops->set_parent((nbd)->cpu_clk, index))
>> >> 
>> >> ... Directly going for the mux ops ??!?? No way !
>> >> 
>> >> We have a framework to handle the clocks, the whole point is to use it,
>> >> not bypass it ! 
>> >> 
>> >
>> > I suppose you understand my approach, which is quite similar to what is
>> > happening in the Mediatek driver:
>> >
>> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/mediatek/clk-mux.c#L295
>> >
>> > Initially, I attempted to set the parent using the clk_set_parent() API.
>> > However, I encountered a problem with recursive calling of the
>> > notifier_block. This issue arises because the parent triggers
>> > notifications for its children, leading to repeated calls to the
>> > notifier_block.
>> >
>> > I find it puzzling why I cannot call an internal function or callback
>> > within the internal driver context. After all, the notifier block is
>> > just a part of the set_rate() flow. From a global Clock Control
>> > Framework perspective, the context should not change.
>> 
>> I don't care what MTK is doing TBH. Forcefully calling ops on a clock,
>> hoping they are going to match with the clock type is wrong.
>> 
>> There is a clk_hw API. Please it.
>> 
>
> Yes, if sys_pll has a notifier_block instead of cpu_clk, there will be
> no problem with the clk_hw API.
>
> I will rework that point.
>
>> >
>> >> > +
>> >> > +static int meson_a1_cpu_clk_notifier_cb(struct notifier_block *nb,
>> >> > +					unsigned long event, void *data)
>> >> > +{
>> >> > +	struct a1_cpu_clk_nb_data *nbd;
>> >> > +	int ret = 0;
>> >> > +
>> >> > +	nbd = container_of(nb, struct a1_cpu_clk_nb_data, nb);
>> >> > +
>> >> > +	switch (event) {
>> >> > +	case PRE_RATE_CHANGE:
>> >> > +		nbd->parent = MESON_A1_CPU_CLK_GET_PARENT(nbd);
>> >> > +		/* Fallback to the CPU fixed clock */
>> >> > +		ret = MESON_A1_CPU_CLK_SET_PARENT(nbd, 0);
>> >> > +		/* Wait for clock propagation */
>> >> > +		udelay(100);
>> >> > +		break;
>> >> > +
>> >> > +	case POST_RATE_CHANGE:
>> >> > +	case ABORT_RATE_CHANGE:
>> >> > +		/* Back to the original parent clock */
>> >> > +		ret = MESON_A1_CPU_CLK_SET_PARENT(nbd, nbd->parent);
>> >> > +		/* Wait for clock propagation */
>> >> > +		udelay(100);
>> >> > +		break;
>> >> > +
>> >> > +	default:
>> >> > +		pr_warn("Unknown event %lu for %s notifier block\n",
>> >> > +			event, clk_hw_get_name(nbd->cpu_clk));
>> >> > +		break;
>> >> > +	}
>> >> > +
>> >> > +	return notifier_from_errno(ret);
>> >> > +}
>> >> > +
>> >> > +static struct a1_cpu_clk_nb_data a1_cpu_clk_nb_data = {
>> >> > +	.mux_ops = &clk_regmap_mux_ops,
>> >> > +	.cpu_clk = &cpu_clk.hw,
>> >> > +	.nb.notifier_call = meson_a1_cpu_clk_notifier_cb,
>> >> > +};
>> >> > +
>> >> > +static int meson_a1_dvfs_setup(struct platform_device *pdev)
>> >> > +{
>> >> > +	struct device *dev = &pdev->dev;
>> >> > +	struct clk *notifier_clk;
>> >> > +	int ret;
>> >> > +
>> >> > +	/* Setup clock notifier for cpu_clk */
>> >> > +	notifier_clk = devm_clk_hw_get_clk(dev, &cpu_clk.hw, "dvfs");
>> >> > +	if (IS_ERR(notifier_clk))
>> >> > +		return dev_err_probe(dev, PTR_ERR(notifier_clk),
>> >> > +				     "can't get cpu_clk as notifier clock\n");
>> >> > +
>> >> > +	ret = devm_clk_notifier_register(dev, notifier_clk,
>> >> > +					 &a1_cpu_clk_nb_data.nb);
>> >> > +	if (ret)
>> >> > +		return dev_err_probe(dev, ret,
>> >> > +				     "can't register cpu_clk notifier\n");
>> >> > +
>> >> > +	return ret;
>> >> > +}
>> >> > +
>> >> > +static int meson_a1_cpu_probe(struct platform_device *pdev)
>> >> > +{
>> >> > +	struct device *dev = &pdev->dev;
>> >> > +	void __iomem *base;
>> >> > +	struct regmap *map;
>> >> > +	int clkid, i, err;
>> >> > +
>> >> > +	base = devm_platform_ioremap_resource(pdev, 0);
>> >> > +	if (IS_ERR(base))
>> >> > +		return dev_err_probe(dev, PTR_ERR(base),
>> >> > +				     "can't ioremap resource\n");
>> >> > +
>> >> > +	map = devm_regmap_init_mmio(dev, base, &a1_cpu_regmap_cfg);
>> >> > +	if (IS_ERR(map))
>> >> > +		return dev_err_probe(dev, PTR_ERR(map),
>> >> > +				     "can't init regmap mmio region\n");
>> >> > +
>> >> > +	/* Populate regmap for the regmap backed clocks */
>> >> > +	for (i = 0; i < ARRAY_SIZE(a1_cpu_regmaps); i++)
>> >> > +		a1_cpu_regmaps[i]->map = map;
>> >> > +
>> >> > +	for (clkid = 0; clkid < a1_cpu_clks.num; clkid++) {
>> >> > +		err = devm_clk_hw_register(dev, a1_cpu_clks.hws[clkid]);
>> >> > +		if (err)
>> >> > +			return dev_err_probe(dev, err,
>> >> > +					     "clock[%d] registration failed\n",
>> >> > +					     clkid);
>> >> > +	}
>> >> > +
>> >> > +	err = devm_of_clk_add_hw_provider(dev, meson_clk_hw_get, &a1_cpu_clks);
>> >> > +	if (err)
>> >> > +		return dev_err_probe(dev, err, "can't add clk hw provider\n");
>> >> 
>> >> I wonder if there is a window of opportunity to poke the syspll without
>> >> your notifier here. That being said, the situation would be similar on g12.
>> >> 
>> >
>> > Yes, I have taken into account what you did in the G12A CPU clock
>> > relations. My thoughts were that it might not be applicable for the A1
>> > case. This is because the sys_pll should be located in a different
>> > driver from a logical perspective. Consequently, we cannot configure the
>> > sys_pll notifier block to manage the cpu_clk from a different driver.
>> > However, if I were to move the sys_pll clock object to the A1 CPU clock
>> > controller, I believe the g12a sys_pll notifier approach would work.
>> >
>> 
>> I fail to see the connection between the number of device and the
>> approach you took.
>> 
>> 
>> >> > +
>> >> > +	return meson_a1_dvfs_setup(pdev);
>> >> 
>> >> 
>> >> 
>> >> > +}
>> >> > +
>> >> > +static const struct of_device_id a1_cpu_clkc_match_table[] = {
>> >> > +	{ .compatible = "amlogic,a1-cpu-clkc", },
>> >> > +	{}
>> >> > +};
>> >> > +MODULE_DEVICE_TABLE(of, a1_cpu_clkc_match_table);
>> >> > +
>> >> > +static struct platform_driver a1_cpu_clkc_driver = {
>> >> > +	.probe = meson_a1_cpu_probe,
>> >> > +	.driver = {
>> >> > +		.name = "a1-cpu-clkc",
>> >> > +		.of_match_table = a1_cpu_clkc_match_table,
>> >> > +	},
>> >> > +};
>> >> > +
>> >> > +module_platform_driver(a1_cpu_clkc_driver);
>> >> > +MODULE_AUTHOR("Dmitry Rokosov <ddrokosov@salutedevices.com>");
>> >> > +MODULE_LICENSE("GPL");
>> >> > diff --git a/drivers/clk/meson/a1-cpu.h b/drivers/clk/meson/a1-cpu.h
>> >> > new file mode 100644
>> >> > index 000000000000..e9af4117e26f
>> >> > --- /dev/null
>> >> > +++ b/drivers/clk/meson/a1-cpu.h
>> >> 
>> >> There is not point putting the definition here in a header
>> >> These are clearly not going to be shared with another driver.
>> >> 
>> >> Please drop this file
>> >> 
>> >
>> > The same approach was applied to the Peripherals and PLL A1 drivers.
>> > Honestly, I am not a fan of having different file organization within a
>> > single logical code folder.
>> >
>> > Please refer to:
>> >
>> > drivers/clk/meson/a1-peripherals.h
>> > drivers/clk/meson/a1-pll.h
>> 
>> I understand. There was a time where it made sense, some definition
>> could have been shared back then. This is no longer the case and it
>> would probably welcome a rework.
>> 
>> ... and again, just pointing to another code does really invalidate my
>>  point.
>> 
>> >
>> >> > @@ -0,0 +1,16 @@
>> >> > +/* SPDX-License-Identifier: GPL-2.0+ */
>> >> > +/*
>> >> > + * Amlogic A1 CPU Clock Controller internals
>> >> > + *
>> >> > + * Copyright (c) 2024, SaluteDevices. All Rights Reserved.
>> >> > + * Author: Dmitry Rokosov <ddrokosov@salutedevices.com>
>> >> > + */
>> >> > +
>> >> > +#ifndef __A1_CPU_H
>> >> > +#define __A1_CPU_H
>> >> > +
>> >> > +/* cpu clock controller register offset */
>> >> > +#define CPUCTRL_CLK_CTRL0	0x80
>> >> > +#define CPUCTRL_CLK_CTRL1	0x84
>> >> 
>> >> You are claiming the registers from 0x00 to 0x84 (included), but only
>> >> using these 2 registers ? What is the rest ? Are you sure there is only
>> >> clocks in there ?
>> >> 
>> >
>> > Yes, unfortunately, the register map for this IP is not described in the
>> > A1 Datasheet. The only available information about it can be found in
>> > the vendor clock driver, which provides details for only two registers
>> > used to configure the CPU clock.
>> >
>> > From vendor kernel dtsi:
>> >
>> > 	clkc: clock-controller {
>> > 		compatible = "amlogic,a1-clkc";
>> > 		#clock-cells = <1>;
>> > 		reg = <0x0 0xfe000800 0x0 0x100>,
>> > 		      <0x0 0xfe007c00 0x0 0x21c>,
>> > 		      <0x0 0xfd000000 0x0 0x88>; <==== CPU clock regmap
>> > 		reg-names = "basic", "pll",
>> > 			    "cpu_clk";
>> > 		clocks = <&xtal>;
>> > 		clock-names = "core";
>> > 		status = "okay";
>> > 	};
>> >
>> > From vendor clkc driver:
>> >
>> > 	/*
>> > 	 * CPU clok register offset
>> > 	 * APB_BASE:  APB1_BASE_ADDR = 0xfd000000
>> > 	 */
>> >
>> > 	#define CPUCTRL_CLK_CTRL0		0x80
>> > 	#define CPUCTRL_CLK_CTRL1		0x84
>> 
>> If you had clock in 0x0 and 0x80, then I would agree claiming 0x0-0x88
>> is reasonable, even if you don't really know what is in between. It
>> would be a fair assumption.
>> 
>> It is not the case here.
>> For all we know it could resets, power domains, etc ...
>> 
>
> However, we do not have any information indicating that the gap from
> 0x00-0x80 contains additional registers. It is possible that there are
> internal registers, but they are not mentioned in the vendor datasheet
> or driver. Therefore, in this case, I personally prefer to rely on the
> vendor mapping.

I understand your preference. My request remains.

-- 
Jerome

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox