* [PATCH] Revert: "p54: Use skb_peek_tail() instead of direct head pointer accesses"
From: Matthew Whitehead @ 2019-02-11 20:20 UTC (permalink / raw)
To: whiteheadm, netdev, davem; +Cc: Matthew Whitehead
Commit e3554197fc8fbb9656f62c18f9c9edd396394e16 causes a null pointer error.
kernel: p54pci 0000:07:00.0: enabling device (0000 -> 0002)
kernel: ieee80211 phy1: p54 detected a LM86 firmware
kernel: p54: rx_mtu reduced from 3240 to 2376
kernel: ieee80211 phy1: FW rev 2.13.1.0 - Softmac protocol 5.5
kernel: ieee80211 phy1: cryptographic accelerator WEP:YES, TKIP:YES, CCMP:YES
kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000
kernel: *pde = 00000000
kernel: Oops: 0000 [#1] PREEMPT SMP
kernel: CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 4.19.0.bisect-14.#871
kernel: Hardware name: IBM 2378RVU/2378RVU, BIOS 1RETDKWW (3.16 ) 04/19/2005
kernel: Workqueue: events request_firmware_work_func
kernel: EIP: p54_tx_pending+0xff/0x128 [p54common]
kernel: Code: 8b 4d dc 89 7e 30 89 56 34 0f b6 53 56 01 d7 89 79 04 8b 96 a0 00 00 00 f6 42 01 80 75 0c 80 7a 28 00 75 06 89 bb d4 01 00 00 <8b> 10 89 46 04 89 16 89 30 8b 45 ec 89 72 04 8b 55 e8 ff 43 2c e8
kernel: EAX: 00000000 EBX: ec6a2d60 ECX: ed4de568 EDX: ed4de568
kernel: ESI: ec4e0980 EDI: 00020264 EBP: c0071eb8 ESP: c0071e94
kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010082
kernel: CR0: 80050033 CR2: 00000000 CR3: 2f715000 CR4: 00000690
kernel: Call Trace:
kernel: p54_tx+0x1a/0x1d [p54common]
kernel: p54_download_eeprom+0xa6/0xfb [p54common]
kernel: p54_read_eeprom+0x5c/0x99 [p54common]
kernel: p54p_firmware_step2+0x50/0xcd [p54pci]
kernel: request_firmware_work_func+0x2a/0x51
kernel: process_one_work+0x16b/0x28e
kernel: worker_thread+0x180/0x222
kernel: kthread+0xce/0xd0
kernel: ? cancel_delayed_work+0x5e/0x5e
kernel: ? kthread_create_worker_on_cpu+0x1c/0x1c
kernel: ret_from_fork+0x19/0x24
kernel: Modules linked in: p54pci p54common crc_ccitt mac80211 ipw2200 libipw lib80211 cfg80211 uhci_hcd pcmcia ehci_pci yenta_socket ehci_hcd rfkill i2c_i801 pcmcia_rsrc e1000 usbcore i2c_core pcmcia_core lpc_ich usb_common mfd_core floppy autofs4
kernel: CR2: 0000000000000000
kernel: ---[ end trace ddc1a265fd4f4bc6 ]---
kernel: EIP: p54_tx_pending+0xff/0x128 [p54common]
kernel: Code: 8b 4d dc 89 7e 30 89 56 34 0f b6 53 56 01 d7 89 79 04 8b 96 a0 00 00 00 f6 42 01 80 75 0c 80 7a 28 00 75 06 89 bb d4 01 00 00 <8b> 10 89 46 04 89 16 89 30 8b 45 ec 89 72 04 8b 55 e8 ff 43 2c e8
kernel: EAX: 00000000 EBX: ec6a2d60 ECX: ed4de568 EDX: ed4de568
kernel: ESI: ec4e0980 EDI: 00020264 EBP: c0071eb8 ESP: c16252e8
kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010082
kernel: CR0: 80050033 CR2: 00000000 CR3: 2f715000 CR4: 00000690
kernel: note: kworker/0:0[5] exited with preempt_count 1
Reverting the patch fixes the problem.
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
---
drivers/net/wireless/intersil/p54/txrx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intersil/p54/txrx.c b/drivers/net/wireless/intersil/p54/txrx.c
index 79078456..3a4214d 100644
--- a/drivers/net/wireless/intersil/p54/txrx.c
+++ b/drivers/net/wireless/intersil/p54/txrx.c
@@ -121,8 +121,8 @@ static int p54_assign_address(struct p54_common *priv, struct sk_buff *skb)
}
if (unlikely(!target_skb)) {
if (priv->rx_end - last_addr >= len) {
- target_skb = skb_peek_tail(&priv->tx_queue);
- if (target_skb) {
+ target_skb = priv->tx_queue.prev;
+ if (!skb_queue_empty(&priv->tx_queue)) {
info = IEEE80211_SKB_CB(target_skb);
range = (void *)info->rate_driver_data;
target_addr = range->end_addr;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH mlx5-next 2/2] net/mlx5: Factor out HCA capabilities functions
From: Jason Gunthorpe @ 2019-02-11 20:32 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, RDMA mailing list, Moni Shoua, Saeed Mahameed,
linux-netdev
In-Reply-To: <20190211200207.GD21447@mtr-leonro.mtl.com>
On Mon, Feb 11, 2019 at 10:02:07PM +0200, Leon Romanovsky wrote:
> On Mon, Feb 11, 2019 at 07:50:55PM +0000, Jason Gunthorpe wrote:
> > On Mon, Feb 11, 2019 at 01:56:08PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > >
> > > Combine all HCA capabilities setters under one function
> > > and compile out the ODP related function in case kernel
> > > was compiled without ODP support.
> > >
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > .../net/ethernet/mellanox/mlx5/core/main.c | 47 +++++++++++++------
> > > 1 file changed, 33 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > index 6d45518edbdc..d7145ab6105d 100644
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > @@ -459,6 +459,7 @@ static int handle_hca_cap_atomic(struct mlx5_core_dev *dev)
> > > return err;
> > > }
> > >
> > > +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> > > static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
> > > {
> > > void *set_hca_cap;
> > > @@ -502,6 +503,7 @@ static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
> > > kfree(set_ctx);
> > > return err;
> > > }
> > > +#endif
> > >
> > > static int handle_hca_cap(struct mlx5_core_dev *dev)
> > > {
> > > @@ -576,6 +578,35 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
> > > return err;
> > > }
> > >
> > > +static int set_hca_cap(struct mlx5_core_dev *dev)
> > > +{
> > > + struct pci_dev *pdev = dev->pdev;
> > > + int err;
> > > +
> > > + err = handle_hca_cap(dev);
> > > + if (err) {
> > > + dev_err(&pdev->dev, "handle_hca_cap failed\n");
> > > + goto out;
> > > + }
> > > +
> > > + err = handle_hca_cap_atomic(dev);
> > > + if (err) {
> > > + dev_err(&pdev->dev, "handle_hca_cap_atomic failed\n");
> > > + goto out;
> > > + }
> > > +
> > > +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> > > + err = handle_hca_cap_odp(dev);
> > > + if (err) {
> > > + dev_err(&pdev->dev, "handle_hca_cap_odp failed\n");
> > > + goto out;
> > > + }
> > > +#endif
> >
> > Adding
> > if (IS_ENABLED..)
> > return 0;
> >
> > To the top of handle_hca_cap_odp is alot better.
>
> Saeed gave comment that he prefers code to be compiled-out in case
> config is not set. In you suggestion, the code will exist and only
> with some optimizations enabled it will be thrown.
the kernel always compiles with optimizations that will throw away
this code, it is a widely used pattern.
#ifdef creates a compilation test matrix that is very undesired.
Jason
^ permalink raw reply
* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Davidlohr Bueso @ 2019-02-11 20:34 UTC (permalink / raw)
To: ira.weiny
Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann, netdev,
Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
Jason Gunthorpe, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211201643.7599-1-ira.weiny@intel.com>
On Mon, 11 Feb 2019, ira.weiny@intel.com wrote:
>Ira Weiny (3):
> mm/gup: Change "write" parameter to flags
> mm/gup: Introduce get_user_pages_fast_longterm()
> IB/HFI1: Use new get_user_pages_fast_longterm()
Out of curiosity, are you planning on having all rdma drivers
use get_user_pages_fast_longterm()? Ie:
hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
hw/qib/qib_user_sdma.c: ret = get_user_pages_fast(addr, j, 0, pages);
Thanks,
Davidlohr
^ permalink raw reply
* Re: Kernel memory corruption in CIPSO labeled TCP packets processing.
From: Paul Moore @ 2019-02-11 20:37 UTC (permalink / raw)
To: Nazarov Sergey
Cc: linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
netdev@vger.kernel.org, Casey Schaufler
In-Reply-To: <11242361548940840@iva8-8d7a47df0521.qloud-c.yandex.net>
On Thu, Jan 31, 2019 at 8:20 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
> > This isn't how the rest of the stack works, look at
> > ip_local_deliver_finish() for one example. Perhaps the behavior you
> > are proposing is correct, but please show me where in the various RFC
> > specs it is defined so that I can better understand why it should work
> > this way.
> > --
> > paul moore
> > www.paul-moore.com
>
> Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
> skb is NULL. My last message could be ignored.
Hi Nazarov,
Do you plan on submitting these patches as a proper patchset for
review and merging?
--
paul moore
www.paul-moore.com
^ permalink raw reply
* Re: [PATCH] ipv6: fix icmp6_send() route lookup
From: David Miller @ 2019-02-11 20:38 UTC (permalink / raw)
To: alin.nastac; +Cc: netdev
In-Reply-To: <1549551931-11909-1-git-send-email-alin.nastac@gmail.com>
From: Alin Nastac <alin.nastac@gmail.com>
Date: Thu, 7 Feb 2019 16:05:31 +0100
> Original packet destination address must be used as saddr for the
> route lookup performed by icmp6_send() even when this address is
> not local. This fixes the IPv6 router ability to send back
> destination unreachable ICMPv6 errors for forwarded packets when
> the route toward the saddr of the original packet is source
> filtered (e.g. a default route with a "from PD" attribute, where
> PD is the delegated prefix).
>
> Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Yes, but however this will change behavior for a lot of situations
not just the one you are interested in.
The base ipv6_chk_addr() test has been there for more than a decade
and I'm not comfortable with changing this logic until I see you
write up a full audit of all of the use cases of icmp6_send() and
how they are impacted by your changes.
Thanks.
^ permalink raw reply
* Re: [PATCH] Documentation: bring operstate documentation up-to-date
From: David Miller @ 2019-02-11 20:39 UTC (permalink / raw)
To: j.witteveen; +Cc: netdev
In-Reply-To: <20190207161432.GA6060@Mindship-05.localdomain>
From: Jouke Witteveen <j.witteveen@gmail.com>
Date: Thu, 7 Feb 2019 17:14:32 +0100
> Netlink has moved from bitmasks to group numbers long ago.
>
> Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Jason Gunthorpe @ 2019-02-11 20:39 UTC (permalink / raw)
To: ira.weiny
Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann,
Davidlohr Bueso, netdev, Mike Marciniszyn, Dennis Dalessandro,
Doug Ledford, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211201643.7599-3-ira.weiny@intel.com>
On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> Users of get_user_pages_fast are not protected against mapping
> pages within FS DAX. Introduce a call which protects them.
>
> We do this by checking for DEVMAP pages during the fast walk and
> falling back to the longterm gup call to check for FS DAX if needed.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> include/linux/mm.h | 8 ++++
> mm/gup.c | 102 +++++++++++++++++++++++++++++++++++----------
> 2 files changed, 88 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 80bb6408fe73..8f831c823630 100644
> +++ b/include/linux/mm.h
> @@ -1540,6 +1540,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> struct vm_area_struct **vmas);
> +int get_user_pages_fast_longterm(unsigned long start, int nr_pages, bool write,
> + struct page **pages);
> #else
> static inline long get_user_pages_longterm(unsigned long start,
> unsigned long nr_pages, unsigned int gup_flags,
> @@ -1547,6 +1549,11 @@ static inline long get_user_pages_longterm(unsigned long start,
> {
> return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> }
> +static inline int get_user_pages_fast_longterm(unsigned long start, int nr_pages,
> + bool write, struct page **pages)
> +{
> + return get_user_pages_fast(start, nr_pages, write, pages);
> +}
> #endif /* CONFIG_FS_DAX */
>
> int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> @@ -2615,6 +2622,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
> #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
> #define FOLL_COW 0x4000 /* internal GUP flag */
> #define FOLL_ANON 0x8000 /* don't do file mappings */
> +#define FOLL_LONGTERM 0x10000 /* mapping is intended for a long term pin */
If we are adding a new flag, maybe we should get rid of the 'longterm'
entry points and just rely on the callers to pass the flag?
Jason
^ permalink raw reply
* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Jason Gunthorpe @ 2019-02-11 20:40 UTC (permalink / raw)
To: ira.weiny
Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann,
Davidlohr Bueso, netdev, Mike Marciniszyn, Dennis Dalessandro,
Doug Ledford, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211201643.7599-1-ira.weiny@intel.com>
On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> NOTE: This series depends on my clean up patch to remove the write parameter
> from gup_fast_permitted()[1]
>
> HFI1 uses get_user_pages_fast() due to it performance advantages. Like RDMA,
> HFI1 pages can be held for a significant time. But get_user_pages_fast() does
> not protect against mapping of FS DAX pages.
If HFI1 can use the _fast varient, can't all the general RDMA stuff
use it too?
What is the guidance on when fast vs not fast should be use?
Jason
^ permalink raw reply
* RE: [PATCH v3] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Leo Li @ 2019-02-11 20:43 UTC (permalink / raw)
To: Shawn Guo, Pankaj Bansal
Cc: Andrew Lunn, Florian Fainelli, netdev@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190211030010.GI22487@dragon>
> -----Original Message-----
> From: Shawn Guo <shawnguo@kernel.org>
> Sent: Sunday, February 10, 2019 9:00 PM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>
> Cc: Leo Li <leoyang.li@nxp.com>; Andrew Lunn <andrew@lunn.ch>; Florian
> Fainelli <f.fainelli@gmail.com>; netdev@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: [PATCH v3] arm64: dts: lx2160aqds: Add mdio mux nodes
>
> On Wed, Feb 06, 2019 at 09:40:33AM +0000, Pankaj Bansal wrote:
> > The two external MDIO buses used to communicate with phy devices that
> > are external to SOC are muxed in LX2160AQDS board.
> >
> > These buses can be routed to any one of the eight IO slots on
> > LX2160AQDS board depending on value in fpga register 0x54.
> >
> > Additionally the external MDIO1 is used to communicate to the onboard
> > RGMII phy devices.
> >
> > The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is
> > controlled by bits 0-3 of fpga register.
> >
> > Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> > ---
> >
> > Notes:
> > V3:
> > - Add status = disabled in soc file and status = okay in board file
> > for external MDIO nodes
> > - Add interrupts property in external mdio nodes in soc file
> > V2:
> > - removed unnecassary TODO statements
> > - removed device_type from mdio nodes
> > - change the case of hex number to lowercase
> > - removed board specific comments from soc file
> >
> > .../boot/dts/freescale/fsl-lx2160a-qds.dts | 123 +++++++++++++++++
> > .../boot/dts/freescale/fsl-lx2160a.dtsi | 22 +++
> > 2 files changed, 145 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > index 99a22abbe725..079264b391a2 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> > @@ -35,6 +35,14 @@
> > status = "okay";
> > };
> >
> > +&emdio1 {
> > + status = "okay";
> > +};
> > +
> > +&emdio2 {
> > + status = "okay";
> > +};
> > +
> > &esdhc0 {
> > status = "okay";
> > };
> > @@ -46,6 +54,121 @@
> > &i2c0 {
> > status = "okay";
> >
> > + fpga@66 {
> > + compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> > + reg = <0x66>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > +
> > + mdio-mux-1@54 {
> > + mdio-parent-bus = <&emdio1>;
> > + reg = <0x54>; /* BRDCFG4 */
> > + mux-mask = <0xf8>; /* EMI1_MDIO */
> > + #address-cells=<1>;
> > + #size-cells = <0>;
> > +
> > + mdio@0 {
> > + reg = <0x00>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
>
> Please have a newline between nodes. It doesn't deserve a respin though. I
> can fix them up when applying if Leo is fine with this version.
I think there should be a compatible string defined for the binding of parent node mdio-mux, probably "mdio-mux-regmap", and be used here in the device tree.
>
> Shawn
>
> > + mdio@40 {
> > + reg = <0x40>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@c0 {
> > + reg = <0xc0>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@c8 {
> > + reg = <0xc8>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@d0 {
> > + reg = <0xd0>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@d8 {
> > + reg = <0xd8>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@e0 {
> > + reg = <0xe0>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@e8 {
> > + reg = <0xe8>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@f0 {
> > + reg = <0xf0>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@f8 {
> > + reg = <0xf8>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + };
> > +
> > + mdio-mux-2@54 {
> > + mdio-parent-bus = <&emdio2>;
> > + reg = <0x54>; /* BRDCFG4 */
> > + mux-mask = <0x07>; /* EMI2_MDIO */
> > + #address-cells=<1>;
> > + #size-cells = <0>;
> > +
> > + mdio@0 {
> > + reg = <0x00>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@1 {
> > + reg = <0x01>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@2 {
> > + reg = <0x02>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@3 {
> > + reg = <0x03>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@4 {
> > + reg = <0x04>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@5 {
> > + reg = <0x05>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@6 {
> > + reg = <0x06>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + mdio@7 {
> > + reg = <0x07>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + };
> > + };
> > +
> > i2c-mux@77 {
> > compatible = "nxp,pca9547";
> > reg = <0x77>;
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > index a79f5c1ea56d..7def5252ac1a 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > @@ -762,5 +762,27 @@
> > <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
> > dma-coherent;
> > };
> > +
> > + /* WRIOP0: 0x8b8_0000, E-MDIO1: 0x1_6000 */
> > + emdio1: mdio@8b96000 {
> > + compatible = "fsl,fman-memac-mdio";
> > + reg = <0x0 0x8b96000 0x0 0x1000>;
> > + interrupts = <GIC_SPI 90 IRQ_TYPE_LEVEL_HIGH>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + little-endian; /* force the driver in LE mode */
> > + status = "disabled";
> > + };
> > +
> > + /* WRIOP0: 0x8b8_0000, E-MDIO2: 0x1_7000 */
> > + emdio2: mdio@8b97000 {
> > + compatible = "fsl,fman-memac-mdio";
> > + reg = <0x0 0x8b97000 0x0 0x1000>;
> > + interrupts = <GIC_SPI 91 IRQ_TYPE_LEVEL_HIGH>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + little-endian; /* force the driver in LE mode */
> > + status = "disabled";
> > + };
> > };
> > };
> > --
> > 2.17.1
> >
^ permalink raw reply
* Re: [PATCH net] vxlan: test dev->flags & IFF_UP before calling netif_rx()
From: David Miller @ 2019-02-11 20:44 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, petrm, idosch, roopa, sbrivio
In-Reply-To: <20190207202738.155940-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 7 Feb 2019 12:27:38 -0800
> netif_rx() must be called under a strict contract.
>
> At device dismantle phase, core networking clears IFF_UP
> and flush_all_backlogs() is called after rcu grace period
> to make sure no incoming packet might be in a cpu backlog
> and still referencing the device.
>
> Most drivers call netif_rx() from their interrupt handler,
> and since the interrupts are disabled at device dismantle,
> netif_rx() does not have to check dev->flags & IFF_UP
>
> Virtual drivers do not have this guarantee, and must
> therefore make the check themselves.
>
> Otherwise we risk use-after-free and/or crashes.
>
> Note this patch also fixes a small issue that came
> with commit ce6502a8f957 ("vxlan: fix a use after free
> in vxlan_encap_bypass"), since the dev->stats.rx_dropped
> change was done on the wrong device.
>
> Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
> Fixes: ce6502a8f957 ("vxlan: fix a use after free in vxlan_encap_bypass")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH net] af_key: unconditionally clone on broadcast
From: David Miller @ 2019-02-11 20:45 UTC (permalink / raw)
To: stranche; +Cc: eric.dumazet, netdev, steffen.klassert
In-Reply-To: <1549571601-395-1-git-send-email-stranche@codeaurora.org>
From: Sean Tranchetti <stranche@codeaurora.org>
Date: Thu, 7 Feb 2019 13:33:21 -0700
> Attempting to avoid cloning the skb when broadcasting by inflating
> the refcount with sock_hold/sock_put while under RCU lock is dangerous
> and violates RCU principles. It leads to subtle race conditions when
> attempting to free the SKB, as we may reference sockets that have
> already been freed by the stack.
...
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
> ---
> Realized I never actually sent this patch out after testing the changes
> Eric recommended. Whoops. Better late then never, I suppose...
Steffen, I assume you will review and pick this up.
Thanks.
^ permalink raw reply
* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Jason Gunthorpe @ 2019-02-11 20:47 UTC (permalink / raw)
To: ira.weiny, linux-rdma, linux-kernel, linux-mm, Daniel Borkmann,
netdev, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211203417.a2c2kbmjai43flyz@linux-r8p5>
On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote:
> On Mon, 11 Feb 2019, ira.weiny@intel.com wrote:
> > Ira Weiny (3):
> > mm/gup: Change "write" parameter to flags
> > mm/gup: Introduce get_user_pages_fast_longterm()
> > IB/HFI1: Use new get_user_pages_fast_longterm()
>
> Out of curiosity, are you planning on having all rdma drivers
> use get_user_pages_fast_longterm()? Ie:
>
> hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
This one is certainly a mistake - this should be done with a umem.
Jason
^ permalink raw reply
* Re: [PATCH net] net: dsa: microchip: add switch offload forwarding support
From: David Miller @ 2019-02-11 20:49 UTC (permalink / raw)
To: Tristram.Ha
Cc: sergio.paracuellos, andrew, f.fainelli, pavel, UNGLinuxDriver,
netdev
In-Reply-To: <1549598758-25870-1-git-send-email-Tristram.Ha@microchip.com>
From: <Tristram.Ha@microchip.com>
Date: Thu, 7 Feb 2019 20:05:58 -0800
> From: Tristram Ha <Tristram.Ha@microchip.com>
>
> The flag offload_fwd_mark is set as the switch can forward frames by
> itself.
>
> This can be considered a fix to a problem introduced in commit
> c2e866911e254067 where the port membership are not set in sync. The flag
> offload_fwd_mark just needs to be set in tag_ksz.c to prevent the software
> bridge from forwarding duplicate multicast frames.
>
> Fixes: c2e866911e254067 ("microchip: break KSZ9477 DSA driver into two files")
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Applied, thank you.
^ permalink raw reply
* [PATCH net-next] net/skbuff: fix up kernel-doc placement
From: Brian Norris @ 2019-02-11 21:02 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, David S. Miller, Brian Norris
There are several skb_* functions where the locked and unlocked
functions are confusingly documented. For several of them, the
kernel-doc for the unlocked version is placed above the locked version,
which to the casual reader makes it seems like the locked version "takes
no locks and you must therefore hold required locks before calling it."
One can see, for example, that this link claims to document
skb_queue_head(), while instead describing __skb_queue_head().
https://www.kernel.org/doc/html/latest/networking/kapi.html#c.skb_queue_head
The correct documentation for skb_queue_head() is also included further
down the page.
This diff tested via:
$ scripts/kernel-doc -rst include/linux/skbuff.h net/core/skbuff.c
No new warnings were seen, and the output makes a little more sense.
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
include/linux/skbuff.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 831846617d07..a41e84f7730c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1889,12 +1889,12 @@ static inline void __skb_queue_before(struct sk_buff_head *list,
*
* A buffer cannot be placed on two lists at the same time.
*/
-void skb_queue_head(struct sk_buff_head *list, struct sk_buff *newsk);
static inline void __skb_queue_head(struct sk_buff_head *list,
struct sk_buff *newsk)
{
__skb_queue_after(list, (struct sk_buff *)list, newsk);
}
+void skb_queue_head(struct sk_buff_head *list, struct sk_buff *newsk);
/**
* __skb_queue_tail - queue a buffer at the list tail
@@ -1906,12 +1906,12 @@ static inline void __skb_queue_head(struct sk_buff_head *list,
*
* A buffer cannot be placed on two lists at the same time.
*/
-void skb_queue_tail(struct sk_buff_head *list, struct sk_buff *newsk);
static inline void __skb_queue_tail(struct sk_buff_head *list,
struct sk_buff *newsk)
{
__skb_queue_before(list, (struct sk_buff *)list, newsk);
}
+void skb_queue_tail(struct sk_buff_head *list, struct sk_buff *newsk);
/*
* remove sk_buff from list. _Must_ be called atomically, and with
@@ -1938,7 +1938,6 @@ static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
* so must be used with appropriate locks held only. The head item is
* returned or %NULL if the list is empty.
*/
-struct sk_buff *skb_dequeue(struct sk_buff_head *list);
static inline struct sk_buff *__skb_dequeue(struct sk_buff_head *list)
{
struct sk_buff *skb = skb_peek(list);
@@ -1946,6 +1945,7 @@ static inline struct sk_buff *__skb_dequeue(struct sk_buff_head *list)
__skb_unlink(skb, list);
return skb;
}
+struct sk_buff *skb_dequeue(struct sk_buff_head *list);
/**
* __skb_dequeue_tail - remove from the tail of the queue
@@ -1955,7 +1955,6 @@ static inline struct sk_buff *__skb_dequeue(struct sk_buff_head *list)
* so must be used with appropriate locks held only. The tail item is
* returned or %NULL if the list is empty.
*/
-struct sk_buff *skb_dequeue_tail(struct sk_buff_head *list);
static inline struct sk_buff *__skb_dequeue_tail(struct sk_buff_head *list)
{
struct sk_buff *skb = skb_peek_tail(list);
@@ -1963,6 +1962,7 @@ static inline struct sk_buff *__skb_dequeue_tail(struct sk_buff_head *list)
__skb_unlink(skb, list);
return skb;
}
+struct sk_buff *skb_dequeue_tail(struct sk_buff_head *list);
static inline bool skb_is_nonlinear(const struct sk_buff *skb)
@@ -2653,13 +2653,13 @@ static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask)
* the list and one reference dropped. This function does not take the
* list lock and the caller must hold the relevant locks to use it.
*/
-void skb_queue_purge(struct sk_buff_head *list);
static inline void __skb_queue_purge(struct sk_buff_head *list)
{
struct sk_buff *skb;
while ((skb = __skb_dequeue(list)) != NULL)
kfree_skb(skb);
}
+void skb_queue_purge(struct sk_buff_head *list);
unsigned int skb_rbtree_purge(struct rb_root *root);
@@ -3028,7 +3028,7 @@ static inline int skb_padto(struct sk_buff *skb, unsigned int len)
}
/**
- * skb_put_padto - increase size and pad an skbuff up to a minimal size
+ * __skb_put_padto - increase size and pad an skbuff up to a minimal size
* @skb: buffer to pad
* @len: minimal length
* @free_on_error: free buffer on error
--
2.20.1.791
^ permalink raw reply related
* [Patch net v2 0/3] net_sched: some fixes for cls_tcindex
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang
This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.
v2: fix a compile error in patch 2
drop netns refcnt in patch 1
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cong Wang (3):
net_sched: fix a race condition in tcindex_destroy()
net_sched: fix a memory leak in cls_tcindex
net_sched: fix two more memory leaks in cls_tcindex
net/sched/cls_tcindex.c | 80 ++++++++++++++++++++++++-----------------
1 file changed, 48 insertions(+), 32 deletions(-)
--
2.20.1
^ permalink raw reply
* [Patch net v2 1/3] net_sched: fix a race condition in tcindex_destroy()
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, Adrian, Ben Hutchings, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.
Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.
Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.
Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/sched/cls_tcindex.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 9ccc93f257db..79b52a637dda 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -48,7 +48,7 @@ struct tcindex_data {
u32 hash; /* hash table size; 0 if undefined */
u32 alloc_hash; /* allocated size */
u32 fall_through; /* 0: only classify if explicit match */
- struct rcu_head rcu;
+ struct rcu_work rwork;
};
static inline int tcindex_filter_is_set(struct tcindex_filter_result *r)
@@ -229,9 +229,11 @@ static int tcindex_destroy_element(struct tcf_proto *tp,
return tcindex_delete(tp, arg, &last, NULL);
}
-static void __tcindex_destroy(struct rcu_head *head)
+static void tcindex_destroy_work(struct work_struct *work)
{
- struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+ struct tcindex_data *p = container_of(to_rcu_work(work),
+ struct tcindex_data,
+ rwork);
kfree(p->perfect);
kfree(p->h);
@@ -258,9 +260,11 @@ static int tcindex_filter_result_init(struct tcindex_filter_result *r)
return tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
}
-static void __tcindex_partial_destroy(struct rcu_head *head)
+static void tcindex_partial_destroy_work(struct work_struct *work)
{
- struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+ struct tcindex_data *p = container_of(to_rcu_work(work),
+ struct tcindex_data,
+ rwork);
kfree(p->perfect);
kfree(p);
@@ -478,7 +482,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
if (oldp)
- call_rcu(&oldp->rcu, __tcindex_partial_destroy);
+ tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
return 0;
errout_alloc:
@@ -570,7 +574,7 @@ static void tcindex_destroy(struct tcf_proto *tp,
walker.fn = tcindex_destroy_element;
tcindex_walk(tp, &walker);
- call_rcu(&p->rcu, __tcindex_destroy);
+ tcf_queue_work(&p->rwork, tcindex_destroy_work);
}
--
2.20.1
^ permalink raw reply related
* [Patch net v2 2/3] net_sched: fix a memory leak in cls_tcindex
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.
This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.
As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/sched/cls_tcindex.c | 46 +++++++++++++++++++++++++++--------------
1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 79b52a637dda..70ea5b1a7889 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -221,14 +221,6 @@ static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
return 0;
}
-static int tcindex_destroy_element(struct tcf_proto *tp,
- void *arg, struct tcf_walker *walker)
-{
- bool last;
-
- return tcindex_delete(tp, arg, &last, NULL);
-}
-
static void tcindex_destroy_work(struct work_struct *work)
{
struct tcindex_data *p = container_of(to_rcu_work(work),
@@ -279,7 +271,7 @@ static void tcindex_free_perfect_hash(struct tcindex_data *cp)
kfree(cp->perfect);
}
-static int tcindex_alloc_perfect_hash(struct tcindex_data *cp)
+static int tcindex_alloc_perfect_hash(struct net *net, struct tcindex_data *cp)
{
int i, err = 0;
@@ -293,6 +285,9 @@ static int tcindex_alloc_perfect_hash(struct tcindex_data *cp)
TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
goto errout;
+#ifdef CONFIG_NET_CLS_ACT
+ cp->perfect[i].exts.net = net;
+#endif
}
return 0;
@@ -341,7 +336,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
if (p->perfect) {
int i;
- if (tcindex_alloc_perfect_hash(cp) < 0)
+ if (tcindex_alloc_perfect_hash(net, cp) < 0)
goto errout;
for (i = 0; i < cp->hash; i++)
cp->perfect[i].res = p->perfect[i].res;
@@ -410,7 +405,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
err = -ENOMEM;
if (!cp->perfect && !cp->h) {
if (valid_perfect_hash(cp)) {
- if (tcindex_alloc_perfect_hash(cp) < 0)
+ if (tcindex_alloc_perfect_hash(net, cp) < 0)
goto errout_alloc;
balloc = 1;
} else {
@@ -566,13 +561,32 @@ static void tcindex_destroy(struct tcf_proto *tp,
struct netlink_ext_ack *extack)
{
struct tcindex_data *p = rtnl_dereference(tp->root);
- struct tcf_walker walker;
+ int i;
pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
- walker.count = 0;
- walker.skip = 0;
- walker.fn = tcindex_destroy_element;
- tcindex_walk(tp, &walker);
+
+ if (p->perfect) {
+ for (i = 0; i < p->hash; i++) {
+ struct tcindex_filter_result *r = p->perfect + i;
+
+ tcf_unbind_filter(tp, &r->res);
+ if (tcf_exts_get_net(&r->exts))
+ tcf_queue_work(&r->rwork,
+ tcindex_destroy_rexts_work);
+ else
+ __tcindex_destroy_rexts(r);
+ }
+ }
+
+ for (i = 0; p->h && i < p->hash; i++) {
+ struct tcindex_filter *f, *next;
+ bool last;
+
+ for (f = rtnl_dereference(p->h[i]); f; f = next) {
+ next = rtnl_dereference(f->next);
+ tcindex_delete(tp, &f->result, &last, NULL);
+ }
+ }
tcf_queue_work(&p->rwork, tcindex_destroy_work);
}
--
2.20.1
^ permalink raw reply related
* [Patch net v2 3/3] net_sched: fix two more memory leaks in cls_tcindex
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.
For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.
For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/sched/cls_tcindex.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 70ea5b1a7889..38bb882bb958 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -304,9 +304,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
struct nlattr *est, bool ovr, struct netlink_ext_ack *extack)
{
struct tcindex_filter_result new_filter_result, *old_r = r;
- struct tcindex_filter_result cr;
struct tcindex_data *cp = NULL, *oldp;
struct tcindex_filter *f = NULL; /* make gcc behave */
+ struct tcf_result cr = {};
int err, balloc = 0;
struct tcf_exts e;
@@ -345,13 +345,10 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
cp->h = p->h;
err = tcindex_filter_result_init(&new_filter_result);
- if (err < 0)
- goto errout1;
- err = tcindex_filter_result_init(&cr);
if (err < 0)
goto errout1;
if (old_r)
- cr.res = r->res;
+ cr = r->res;
if (tb[TCA_TCINDEX_HASH])
cp->hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
@@ -442,8 +439,8 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
if (tb[TCA_TCINDEX_CLASSID]) {
- cr.res.classid = nla_get_u32(tb[TCA_TCINDEX_CLASSID]);
- tcf_bind_filter(tp, &cr.res, base);
+ cr.classid = nla_get_u32(tb[TCA_TCINDEX_CLASSID]);
+ tcf_bind_filter(tp, &cr, base);
}
if (old_r && old_r != r) {
@@ -455,7 +452,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
oldp = p;
- r->res = cr.res;
+ r->res = cr;
tcf_exts_change(&r->exts, &e);
rcu_assign_pointer(tp->root, cp);
@@ -474,6 +471,8 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
; /* nothing */
rcu_assign_pointer(*fp, f);
+ } else {
+ tcf_exts_destroy(&new_filter_result.exts);
}
if (oldp)
@@ -486,7 +485,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
else if (balloc == 2)
kfree(cp->h);
errout1:
- tcf_exts_destroy(&cr.exts);
tcf_exts_destroy(&new_filter_result.exts);
errout:
kfree(cp);
--
2.20.1
^ permalink raw reply related
* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: John Hubbard @ 2019-02-11 21:13 UTC (permalink / raw)
To: Jason Gunthorpe, ira.weiny
Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann,
Davidlohr Bueso, netdev, Mike Marciniszyn, Dennis Dalessandro,
Doug Ledford, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211203916.GA2771@ziepe.ca>
On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
>> From: Ira Weiny <ira.weiny@intel.com>
[...]
>> +static inline int get_user_pages_fast_longterm(unsigned long start, int nr_pages,
>> + bool write, struct page **pages)
>> +{
>> + return get_user_pages_fast(start, nr_pages, write, pages);
>> +}
>> #endif /* CONFIG_FS_DAX */
>>
>> int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>> @@ -2615,6 +2622,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>> #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
>> #define FOLL_COW 0x4000 /* internal GUP flag */
>> #define FOLL_ANON 0x8000 /* don't do file mappings */
>> +#define FOLL_LONGTERM 0x10000 /* mapping is intended for a long term pin */
>
> If we are adding a new flag, maybe we should get rid of the 'longterm'
> entry points and just rely on the callers to pass the flag?
>
> Jason
>
+1, I agree that the overall get_user_pages*() API family will be cleaner
*without* get_user_pages_longterm*() calls. And this new flag makes that possible.
So I'd like to see the "longerm" call replaced with just passing this flag. Maybe
even as part of this patchset, but either way.
Taking a moment to reflect on where I think this might go eventually (the notes
below do not need to affect your patchset here, but this seems like a good place
to mention this):
It seems to me that the longterm vs. short-term is of questionable value.
It's actually better to just call get_user_pages(), and then if it really is
long-term enough to matter internally, we'll see the pages marked as gup-pinned.
If the gup pages are released before anyone (filesystem, that is) notices, then
it must have been short term.
Doing it that way is self-maintaining. Of course, this assumes that we end up with
a design that doesn't require being told, by the call sites, that a given gup
call is intended for "long term" use. So I could be wrong about this direction, but
let's please consider the possibility.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
* RE: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Weiny, Ira @ 2019-02-11 21:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Daniel Borkmann, Davidlohr Bueso,
netdev@vger.kernel.org, Marciniszyn, Mike, Dalessandro, Dennis,
Doug Ledford, Andrew Morton, Kirill A. Shutemov, Williams, Dan J
In-Reply-To: <20190211204049.GB2771@ziepe.ca>
>
> On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > NOTE: This series depends on my clean up patch to remove the write
> > parameter from gup_fast_permitted()[1]
> >
> > HFI1 uses get_user_pages_fast() due to it performance advantages.
> > Like RDMA,
> > HFI1 pages can be held for a significant time. But
> > get_user_pages_fast() does not protect against mapping of FS DAX pages.
>
> If HFI1 can use the _fast varient, can't all the general RDMA stuff use it too?
>
> What is the guidance on when fast vs not fast should be use?
Right now it can't because it holds mmap_sem across the call. Once Shiraz's patches are accepted removing the umem->hugetlb flag I think we can change umem.c.
Also, it specifies FOLL_FORCE which can't currently be specified with gup fast. One idea I had was to change get_user_pages_fast() to use gup_flags instead of a single write flag. But that proved to be a very big cosmetic change across a lot of callers so I went this way.
Ira
>
> Jason
^ permalink raw reply
* [PATCH net-next 0/3] Remove getting SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
Hi all,
AFAICT there is no code that attempts to get the value of the attribute
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS while it is used with
switchdev_port_attr_set().
This is effectively no doing anything and it can slow down future work
that tries to make modifications in these areas so remove that.
David, there should be no dependency with previous patch series, but
again, feedback from Ido and Jiri would be welcome in case this was
added for a reason.
Thanks!
Florian Fainelli (3):
mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS
rocker: Remove getting PORT_BRIDGE_FLAGS
staging: fsl-dpaa2: ethsw: Remove getting PORT_BRIDGE_FLAGS
.../mellanox/mlxsw/spectrum_switchdev.c | 17 -----------------
drivers/net/ethernet/rocker/rocker.h | 2 --
drivers/net/ethernet/rocker/rocker_main.c | 15 ---------------
drivers/net/ethernet/rocker/rocker_ofdpa.c | 10 ----------
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 5 -----
5 files changed, 49 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH net-next 1/3] mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>
There is no code that will query the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
attribute remove support for that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
.../mellanox/mlxsw/spectrum_switchdev.c | 17 -----------------
1 file changed, 17 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 95e37de3e48f..4c5780f8f4b2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -431,19 +431,6 @@ static void mlxsw_sp_bridge_vlan_put(struct mlxsw_sp_bridge_vlan *bridge_vlan)
mlxsw_sp_bridge_vlan_destroy(bridge_vlan);
}
-static void mlxsw_sp_port_bridge_flags_get(struct mlxsw_sp_bridge *bridge,
- struct net_device *dev,
- unsigned long *brport_flags)
-{
- struct mlxsw_sp_bridge_port *bridge_port;
-
- bridge_port = mlxsw_sp_bridge_port_find(bridge, dev);
- if (WARN_ON(!bridge_port))
- return;
-
- memcpy(brport_flags, &bridge_port->flags, sizeof(*brport_flags));
-}
-
static int mlxsw_sp_port_attr_get(struct net_device *dev,
struct switchdev_attr *attr)
{
@@ -451,10 +438,6 @@ static int mlxsw_sp_port_attr_get(struct net_device *dev,
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
- mlxsw_sp_port_bridge_flags_get(mlxsw_sp->bridge, attr->orig_dev,
- &attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD |
BR_MCAST_FLOOD;
--
2.17.1
^ permalink raw reply related
* [PATCH net-next 2/3] rocker: Remove getting PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>
There is no code that attempts to get the
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS attribute, remove support for that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/rocker/rocker.h | 2 --
drivers/net/ethernet/rocker/rocker_main.c | 15 ---------------
drivers/net/ethernet/rocker/rocker_ofdpa.c | 10 ----------
3 files changed, 27 deletions(-)
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index 748fb12260a6..2b2e1c4f0dc3 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -109,8 +109,6 @@ struct rocker_world_ops {
int (*port_attr_bridge_flags_set)(struct rocker_port *rocker_port,
unsigned long brport_flags,
struct switchdev_trans *trans);
- int (*port_attr_bridge_flags_get)(const struct rocker_port *rocker_port,
- unsigned long *p_brport_flags);
int (*port_attr_bridge_flags_support_get)(const struct rocker_port *
rocker_port,
unsigned long *
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 66f72f8c46e5..5ce8d5aba603 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1582,17 +1582,6 @@ rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
trans);
}
-static int
-rocker_world_port_attr_bridge_flags_get(const struct rocker_port *rocker_port,
- unsigned long *p_brport_flags)
-{
- struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
- if (!wops->port_attr_bridge_flags_get)
- return -EOPNOTSUPP;
- return wops->port_attr_bridge_flags_get(rocker_port, p_brport_flags);
-}
-
static int
rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
rocker_port,
@@ -2061,10 +2050,6 @@ static int rocker_port_attr_get(struct net_device *dev,
int err = 0;
switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
- err = rocker_world_port_attr_bridge_flags_get(rocker_port,
- &attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
err = rocker_world_port_attr_bridge_flags_support_get(rocker_port,
&attr->u.brport_flags_support);
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index bea7895930f6..9c07f6040720 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2511,16 +2511,6 @@ static int ofdpa_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
return err;
}
-static int
-ofdpa_port_attr_bridge_flags_get(const struct rocker_port *rocker_port,
- unsigned long *p_brport_flags)
-{
- const struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
-
- *p_brport_flags = ofdpa_port->brport_flags;
- return 0;
-}
-
static int
ofdpa_port_attr_bridge_flags_support_get(const struct rocker_port *
rocker_port,
--
2.17.1
^ permalink raw reply related
* [PATCH net-next 3/3] staging: fsl-dpaa2: ethsw: Remove getting PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>
There is no code that tries to get the attribute
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS, remove support for doing that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index e559f4c25cf7..e5a14c7c777f 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -646,11 +646,6 @@ static int swdev_port_attr_get(struct net_device *netdev,
struct ethsw_port_priv *port_priv = netdev_priv(netdev);
switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
- attr->u.brport_flags =
- (port_priv->ethsw_data->learning ? BR_LEARNING : 0) |
- (port_priv->flood ? BR_FLOOD : 0);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD;
break;
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v3 0/3 net-next] net: phy: disregard "Clause 22 registers present" bit in get_phy_c45_devs_in_pkg
From: David Miller @ 2019-02-11 21:19 UTC (permalink / raw)
To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <75c9d8ee-582f-f247-7595-d8732ac26c20@gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 8 Feb 2019 20:19:17 +0100
> Bit 0 in register 1.5 doesn't represent a device but is a flag that
> Clause 22 registers are present. Therefore disregard this bit when
> populating the device list. If code needs this information it
> should read register 1.5 directly instead of accessing the device
> list.
> Because this bit doesn't represent a device don't define a
> MDIO_MMD_XYZ constant, just define a MDIO_DEVS_XYZ constant for
> the flag in the device list bitmap.
...
This doesn't apply cleanly to net-next, please respin.
Thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox