Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net/mlx5e: Move priv variable into case statement in mlx5e_setup_tc
From: Nathan Chancellor @ 2019-07-11  6:09 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller,
	Linux Netdev List, RDMA mailing list, linux-kernel,
	clang-built-linux
In-Reply-To: <CALzJLG9Aw=sVPDiewHr+4Jiuaod_1q=10vzMzCUVg-rCCXD6cQ@mail.gmail.com>

On Wed, Jul 10, 2019 at 11:02:00PM -0700, Saeed Mahameed wrote:
> On Wed, Jul 10, 2019 at 12:05 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > There is an unused variable warning on arm64 defconfig when
> > CONFIG_MLX5_ESWITCH is unset:
> >
> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: warning:
> > unused variable 'priv' [-Wunused-variable]
> >         struct mlx5e_priv *priv = netdev_priv(dev);
> >                            ^
> > 1 warning generated.
> >
> > Move it down into the case statement where it is used.
> >
> > Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
> > Link: https://github.com/ClangBuiltLinux/linux/issues/597
> > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 6d0ae87c8ded..651eb714eb5b 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -3464,15 +3464,16 @@ static LIST_HEAD(mlx5e_block_cb_list);
> >  static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
> >                           void *type_data)
> >  {
> > -       struct mlx5e_priv *priv = netdev_priv(dev);
> > -
> >         switch (type) {
> >  #ifdef CONFIG_MLX5_ESWITCH
> > -       case TC_SETUP_BLOCK:
> > +       case TC_SETUP_BLOCK: {
> > +               struct mlx5e_priv *priv = netdev_priv(dev);
> > +
> >                 return flow_block_cb_setup_simple(type_data,
> >                                                   &mlx5e_block_cb_list,
> >                                                   mlx5e_setup_tc_block_cb,
> >                                                   priv, priv, true);
> > +       }
> 
> Hi Nathan,
> 
> We have another patch internally that fixes this, and it is already
> queued up in my queue.
> it works differently as we want to pass priv instead of netdev to
> mlx5e_setup_tc_mqprio below,
> which will also solve warning ..
> 
> So i would like to submit that patch if it is ok with you ?

Hi Saeed,

Whatever works best for you, I just care that the warning gets fixed,
not how it is done :) I wouldn't mind being put on CC so I can pick it
up for my local tests.

Thanks for the follow up!
Nathan

^ permalink raw reply

* Re: [PATCH] net/mlx5e: Move priv variable into case statement in mlx5e_setup_tc
From: Saeed Mahameed @ 2019-07-11  6:02 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller,
	Linux Netdev List, RDMA mailing list, linux-kernel,
	clang-built-linux
In-Reply-To: <20190710190502.104010-1-natechancellor@gmail.com>

On Wed, Jul 10, 2019 at 12:05 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> There is an unused variable warning on arm64 defconfig when
> CONFIG_MLX5_ESWITCH is unset:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: warning:
> unused variable 'priv' [-Wunused-variable]
>         struct mlx5e_priv *priv = netdev_priv(dev);
>                            ^
> 1 warning generated.
>
> Move it down into the case statement where it is used.
>
> Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
> Link: https://github.com/ClangBuiltLinux/linux/issues/597
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 6d0ae87c8ded..651eb714eb5b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3464,15 +3464,16 @@ static LIST_HEAD(mlx5e_block_cb_list);
>  static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
>                           void *type_data)
>  {
> -       struct mlx5e_priv *priv = netdev_priv(dev);
> -
>         switch (type) {
>  #ifdef CONFIG_MLX5_ESWITCH
> -       case TC_SETUP_BLOCK:
> +       case TC_SETUP_BLOCK: {
> +               struct mlx5e_priv *priv = netdev_priv(dev);
> +
>                 return flow_block_cb_setup_simple(type_data,
>                                                   &mlx5e_block_cb_list,
>                                                   mlx5e_setup_tc_block_cb,
>                                                   priv, priv, true);
> +       }

Hi Nathan,

We have another patch internally that fixes this, and it is already
queued up in my queue.
it works differently as we want to pass priv instead of netdev to
mlx5e_setup_tc_mqprio below,
which will also solve warning ..

So i would like to submit that patch if it is ok with you ?

>  #endif
>         case TC_SETUP_QDISC_MQPRIO:
>                 return mlx5e_setup_tc_mqprio(dev, type_data);
> --
> 2.22.0
>

^ permalink raw reply

* Re: Fw: [Bug 204099] New: systemd-networkd fails on 5.2 - same version works on 5.1.16
From: Leon Romanovsky @ 2019-07-11  5:54 UTC (permalink / raw)
  To: David Ahern; +Cc: Stephen Hemminger, netdev, Maxim Mikityanskiy
In-Reply-To: <37ee2993-f81b-6265-87b0-1179162f1a2d@gmail.com>

On Wed, Jul 10, 2019 at 04:43:18PM -0600, David Ahern wrote:
> On 7/9/19 8:43 AM, Stephen Hemminger wrote:
> > Looks like the stricter netlink validation broke userspace.
> > This is bad.

Actually, the initial bug in systemd and it is where it should be fixed.

>
> I believe other reports have traced this to
>
> commit 7dc2bccab0ee37ac28096b8fcdc390a679a15841
> Author: Maxim Mikityanskiy <maximmi@mellanox.com>
> Date:   Tue May 21 06:40:04 2019 +0000
>
>     Validate required parameters in inet6_validate_link_af

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Leon Romanovsky @ 2019-07-11  5:40 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Jason Gunthorpe, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131603.6b11b831@canb.auug.org.au>

On Thu, Jul 11, 2019 at 01:16:03PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> > >
> > > > So today this failed to build after I merged the rdma tree (previously
> > > > it didn;t until after the net-next tree was merged (I assume a
> > > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > > tree :-(
> > >
> > > ? I'm confused..
> > >
> > > rdma.git builds fine stand alone (I hope!)
> >
> > I have "Fixup to build SIW issue" from Leon (which switches to using
> > in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> > without that the rdma tree would not build for me.  Are you saying that
> > I don't need that at all, now?
>
> Actually , I get it now, "Fixup to build SIW issue" is really just a
> fixup for the net-next and rdma trees merge ... OK, I will fix that up
> tomorrow.  Sorry for my confusion.

Yes, it was for build only.

>
> --
> Cheers,
> Stephen Rothwell



^ permalink raw reply

* Re: [PATCH v4 2/2] rtw88: pci: Use DMA sync instead of remapping in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:30 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, Linux Netdev List, Linux Kernel,
	Linux Upstreaming Team, Daniel Drake, stable
In-Reply-To: <20190711052427.5582-2-jian-hong@endlessm.com>

Jian-Hong Pan <jian-hong@endlessm.com> 於 2019年7月11日 週四 下午1:25寫道:
>
> Since each skb in RX ring is reused instead of new allocation, we can
> treat the DMA in a more efficient way by DMA synchronization.
>
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> Cc: <stable@vger.kernel.org>
> ---

Sorry, also forget to place the version difference here

v2:
 - New patch by following [PATCH v3 1/2] rtw88: pci: Rearrange the
   memory usage for skb in RX ISR.

v3:
 - Remove rtw_pci_sync_rx_desc_cpu and call dma_sync_single_for_cpu in
   rtw_pci_rx_isr directly.
 - Remove the return value of rtw_pci_sync_rx_desc_device.
 - Use DMA_FROM_DEVICE instead of PCI_DMA_FROMDEVICE.

v4:
 - Same as v3.

>  drivers/net/wireless/realtek/rtw88/pci.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> index c415f5e94fed..68fae52151dd 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -206,6 +206,23 @@ static int rtw_pci_reset_rx_desc(struct rtw_dev *rtwdev, struct sk_buff *skb,
>         return 0;
>  }
>
> +static void rtw_pci_sync_rx_desc_device(struct rtw_dev *rtwdev, dma_addr_t dma,
> +                                       struct rtw_pci_rx_ring *rx_ring,
> +                                       u32 idx, u32 desc_sz)
> +{
> +       struct device *dev = rtwdev->dev;
> +       struct rtw_pci_rx_buffer_desc *buf_desc;
> +       int buf_sz = RTK_PCI_RX_BUF_SIZE;
> +
> +       dma_sync_single_for_device(dev, dma, buf_sz, DMA_FROM_DEVICE);
> +
> +       buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head +
> +                                                    idx * desc_sz);
> +       memset(buf_desc, 0, sizeof(*buf_desc));
> +       buf_desc->buf_size = cpu_to_le16(RTK_PCI_RX_BUF_SIZE);
> +       buf_desc->dma = cpu_to_le32(dma);
> +}
> +
>  static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
>                                 struct rtw_pci_rx_ring *rx_ring,
>                                 u8 desc_size, u32 len)
> @@ -782,8 +799,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>                 rtw_pci_dma_check(rtwdev, ring, cur_rp);
>                 skb = ring->buf[cur_rp];
>                 dma = *((dma_addr_t *)skb->cb);
> -               pci_unmap_single(rtwpci->pdev, dma, RTK_PCI_RX_BUF_SIZE,
> -                                PCI_DMA_FROMDEVICE);
> +               dma_sync_single_for_cpu(rtwdev->dev, dma, RTK_PCI_RX_BUF_SIZE,
> +                                       DMA_FROM_DEVICE);
>                 rx_desc = skb->data;
>                 chip->ops->query_rx_desc(rtwdev, rx_desc, &pkt_stat, &rx_status);
>
> @@ -818,7 +835,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>
>  next_rp:
>                 /* new skb delivered to mac80211, re-enable original skb DMA */
> -               rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
> +               rtw_pci_sync_rx_desc_device(rtwdev, dma, ring, cur_rp,
> +                                           buf_desc_sz);
>
>                 /* host read next element in ring */
>                 if (++cur_rp >= ring->r.len)
> --
> 2.22.0
>

^ permalink raw reply

* Re: [PATCH v4 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:28 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, Linux Netdev List, Linux Kernel,
	Linux Upstreaming Team, Daniel Drake, stable
In-Reply-To: <20190711052427.5582-1-jian-hong@endlessm.com>

Jian-Hong Pan <jian-hong@endlessm.com> 於 2019年7月11日 週四 下午1:25寫道:
>
> Testing with RTL8822BE hardware, when available memory is low, we
> frequently see a kernel panic and system freeze.
>
> First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
>
> rx routine starvation
> WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> [ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
>
> Then we see a variety of different error conditions and kernel panics,
> such as this one (trimmed):
>
> rtw_pci 0000:02:00.0: pci bus timeout, check dma status
> skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:105!
> invalid opcode: 0000 [#1] SMP NOPTI
> RIP: 0010:skb_panic+0x43/0x45
>
> When skb allocation fails and the "rx routine starvation" is hit, the
> function returns immediately without updating the RX ring. At this
> point, the RX ring may continue referencing an old skb which was already
> handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
> bad things happen.
>
> This patch allocates a new, data-sized skb first in RX ISR. After
> copying the data in, we pass it to the upper layers. However, if skb
> allocation fails, we effectively drop the frame. In both cases, the
> original, full size ring skb is reused.
>
> In addition, to fixing the kernel crash, the RX routine should now
> generally behave better under low memory conditions.
>
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> Cc: <stable@vger.kernel.org>
> ---

Sorry, I forget to place the version difference here.

v2:
 - Allocate new data-sized skb and put data into it, then pass it to
   mac80211. Reuse the original skb in RX ring by DMA sync.
 - Modify the commit message.
 - Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
   of remapping in RX ISR.

v3:
 - Same as v2.

v4:
 - Fix comment: allocate a new skb for this frame, discard the frame
if none available

>  drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
>  1 file changed, 22 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> index cfe05ba7280d..c415f5e94fed 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>         u32 pkt_offset;
>         u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
>         u32 buf_desc_sz = chip->rx_buf_desc_sz;
> +       u32 new_len;
>         u8 *rx_desc;
>         dma_addr_t dma;
>
> @@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
>                 pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
>                              pkt_stat.shift;
>
> -               if (pkt_stat.is_c2h) {
> -                       /* keep rx_desc, halmac needs it */
> -                       skb_put(skb, pkt_stat.pkt_len + pkt_offset);
> +               /* allocate a new skb for this frame,
> +                * discard the frame if none available
> +                */
> +               new_len = pkt_stat.pkt_len + pkt_offset;
> +               new = dev_alloc_skb(new_len);
> +               if (WARN_ONCE(!new, "rx routine starvation\n"))
> +                       goto next_rp;
> +
> +               /* put the DMA data including rx_desc from phy to new skb */
> +               skb_put_data(new, skb->data, new_len);
>
> -                       /* pass offset for further operation */
> -                       *((u32 *)skb->cb) = pkt_offset;
> -                       skb_queue_tail(&rtwdev->c2h_queue, skb);
> +               if (pkt_stat.is_c2h) {
> +                        /* pass rx_desc & offset for further operation */
> +                       *((u32 *)new->cb) = pkt_offset;
> +                       skb_queue_tail(&rtwdev->c2h_queue, new);
>                         ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
>                 } else {
> -                       /* remove rx_desc, maybe use skb_pull? */
> -                       skb_put(skb, pkt_stat.pkt_len);
> -                       skb_reserve(skb, pkt_offset);
> -
> -                       /* alloc a smaller skb to mac80211 */
> -                       new = dev_alloc_skb(pkt_stat.pkt_len);
> -                       if (!new) {
> -                               new = skb;
> -                       } else {
> -                               skb_put_data(new, skb->data, skb->len);
> -                               dev_kfree_skb_any(skb);
> -                       }
> -                       /* TODO: merge into rx.c */
> -                       rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
> +                       /* remove rx_desc */
> +                       skb_pull(new, pkt_offset);
> +
> +                       rtw_rx_stats(rtwdev, pkt_stat.vif, new);
>                         memcpy(new->cb, &rx_status, sizeof(rx_status));
>                         ieee80211_rx_irqsafe(rtwdev->hw, new);
>                 }
>
> -               /* skb delivered to mac80211, alloc a new one in rx ring */
> -               new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
> -               if (WARN(!new, "rx routine starvation\n"))
> -                       return;
> -
> -               ring->buf[cur_rp] = new;
> -               rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
> +next_rp:
> +               /* new skb delivered to mac80211, re-enable original skb DMA */
> +               rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
>
>                 /* host read next element in ring */
>                 if (++cur_rp >= ring->r.len)
> --
> 2.22.0
>

^ permalink raw reply

* [PATCH v4 2/2] rtw88: pci: Use DMA sync instead of remapping in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:24 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <20190711052427.5582-1-jian-hong@endlessm.com>

Since each skb in RX ring is reused instead of new allocation, we can
treat the DMA in a more efficient way by DMA synchronization.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index c415f5e94fed..68fae52151dd 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -206,6 +206,23 @@ static int rtw_pci_reset_rx_desc(struct rtw_dev *rtwdev, struct sk_buff *skb,
 	return 0;
 }
 
+static void rtw_pci_sync_rx_desc_device(struct rtw_dev *rtwdev, dma_addr_t dma,
+					struct rtw_pci_rx_ring *rx_ring,
+					u32 idx, u32 desc_sz)
+{
+	struct device *dev = rtwdev->dev;
+	struct rtw_pci_rx_buffer_desc *buf_desc;
+	int buf_sz = RTK_PCI_RX_BUF_SIZE;
+
+	dma_sync_single_for_device(dev, dma, buf_sz, DMA_FROM_DEVICE);
+
+	buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head +
+						     idx * desc_sz);
+	memset(buf_desc, 0, sizeof(*buf_desc));
+	buf_desc->buf_size = cpu_to_le16(RTK_PCI_RX_BUF_SIZE);
+	buf_desc->dma = cpu_to_le32(dma);
+}
+
 static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
 				struct rtw_pci_rx_ring *rx_ring,
 				u8 desc_size, u32 len)
@@ -782,8 +799,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		rtw_pci_dma_check(rtwdev, ring, cur_rp);
 		skb = ring->buf[cur_rp];
 		dma = *((dma_addr_t *)skb->cb);
-		pci_unmap_single(rtwpci->pdev, dma, RTK_PCI_RX_BUF_SIZE,
-				 PCI_DMA_FROMDEVICE);
+		dma_sync_single_for_cpu(rtwdev->dev, dma, RTK_PCI_RX_BUF_SIZE,
+					DMA_FROM_DEVICE);
 		rx_desc = skb->data;
 		chip->ops->query_rx_desc(rtwdev, rx_desc, &pkt_stat, &rx_status);
 
@@ -818,7 +835,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 
 next_rp:
 		/* new skb delivered to mac80211, re-enable original skb DMA */
-		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
+		rtw_pci_sync_rx_desc_device(rtwdev, dma, ring, cur_rp,
+					    buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* [PATCH v4 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  5:24 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight, Christoph Hellwig
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <CAPpJ_edDcaBq+0DocPmS-yYM10B4MkWvBn=f6wwbYdqzSGmp_g@mail.gmail.com>

Testing with RTL8822BE hardware, when available memory is low, we
frequently see a kernel panic and system freeze.

First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):

rx routine starvation
WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
[ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]

Then we see a variety of different error conditions and kernel panics,
such as this one (trimmed):

rtw_pci 0000:02:00.0: pci bus timeout, check dma status
skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:105!
invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:skb_panic+0x43/0x45

When skb allocation fails and the "rx routine starvation" is hit, the
function returns immediately without updating the RX ring. At this
point, the RX ring may continue referencing an old skb which was already
handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
bad things happen.

This patch allocates a new, data-sized skb first in RX ISR. After
copying the data in, we pass it to the upper layers. However, if skb
allocation fails, we effectively drop the frame. In both cases, the
original, full size ring skb is reused.

In addition, to fixing the kernel crash, the RX routine should now
generally behave better under low memory conditions.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index cfe05ba7280d..c415f5e94fed 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 	u32 pkt_offset;
 	u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
 	u32 buf_desc_sz = chip->rx_buf_desc_sz;
+	u32 new_len;
 	u8 *rx_desc;
 	dma_addr_t dma;
 
@@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
 			     pkt_stat.shift;
 
-		if (pkt_stat.is_c2h) {
-			/* keep rx_desc, halmac needs it */
-			skb_put(skb, pkt_stat.pkt_len + pkt_offset);
+		/* allocate a new skb for this frame,
+		 * discard the frame if none available
+		 */
+		new_len = pkt_stat.pkt_len + pkt_offset;
+		new = dev_alloc_skb(new_len);
+		if (WARN_ONCE(!new, "rx routine starvation\n"))
+			goto next_rp;
+
+		/* put the DMA data including rx_desc from phy to new skb */
+		skb_put_data(new, skb->data, new_len);
 
-			/* pass offset for further operation */
-			*((u32 *)skb->cb) = pkt_offset;
-			skb_queue_tail(&rtwdev->c2h_queue, skb);
+		if (pkt_stat.is_c2h) {
+			 /* pass rx_desc & offset for further operation */
+			*((u32 *)new->cb) = pkt_offset;
+			skb_queue_tail(&rtwdev->c2h_queue, new);
 			ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
 		} else {
-			/* remove rx_desc, maybe use skb_pull? */
-			skb_put(skb, pkt_stat.pkt_len);
-			skb_reserve(skb, pkt_offset);
-
-			/* alloc a smaller skb to mac80211 */
-			new = dev_alloc_skb(pkt_stat.pkt_len);
-			if (!new) {
-				new = skb;
-			} else {
-				skb_put_data(new, skb->data, skb->len);
-				dev_kfree_skb_any(skb);
-			}
-			/* TODO: merge into rx.c */
-			rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
+			/* remove rx_desc */
+			skb_pull(new, pkt_offset);
+
+			rtw_rx_stats(rtwdev, pkt_stat.vif, new);
 			memcpy(new->cb, &rx_status, sizeof(rx_status));
 			ieee80211_rx_irqsafe(rtwdev->hw, new);
 		}
 
-		/* skb delivered to mac80211, alloc a new one in rx ring */
-		new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
-		if (WARN(!new, "rx routine starvation\n"))
-			return;
-
-		ring->buf[cur_rp] = new;
-		rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
+next_rp:
+		/* new skb delivered to mac80211, re-enable original skb DMA */
+		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* RE: [PATCH v6 0/5] net: macb: cover letter
From: Parshuram Raju Thombare @ 2019-07-11  5:20 UTC (permalink / raw)
  To: David Miller
  Cc: andrew@lunn.ch, nicolas.ferre@microchip.com, f.fainelli@gmail.com,
	linux@armlinux.org.uk, netdev@vger.kernel.org,
	hkallweit1@gmail.com, linux-kernel@vger.kernel.org,
	Rafal Ciepiela, Piotr Sroka, Anil Joy Varughese, Arthur Marris,
	Steven Ho, Milind Parab
In-Reply-To: <20190710.114707.1137811182536299673.davem@davemloft.net>

Hi David,

Ok, I will resubmit it.

Regards,
Parshuram Thombare

^ permalink raw reply

* Re: [PATCH net-next] net: mlx5: Fix compiling error in tls.c
From: Saeed Mahameed @ 2019-07-11  5:07 UTC (permalink / raw)
  To: Mao Wenan
  Cc: David S. Miller, Saeed Mahameed, Linux Netdev List, linux-kernel
In-Reply-To: <20190710093852.34549-1-maowenan@huawei.com>

On Wed, Jul 10, 2019 at 2:33 AM Mao Wenan <maowenan@huawei.com> wrote:
>
> There are some errors while compiling tls.c if
> CONFIG_MLX5_FPGA_TLS is not obvious on.
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c: In function mlx5e_tls_set_ipv4_flow:
> ./include/linux/mlx5/device.h:61:39: error: invalid application of sizeof to incomplete type struct mlx5_ifc_tls_flow_bits
>  #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
>                                        ^
> ./include/linux/compiler.h:330:9: note: in definition of macro __compiletime_assert
>    if (!(condition))     \
>          ^~~~~~~~~
> ...
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c: In function mlx5e_tls_build_netdev:
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:202:13: error: MLX5_ACCEL_TLS_TX undeclared (first use in this function); did you mean __MLX5_ACCEL_TLS_H__?
>   if (caps & MLX5_ACCEL_TLS_TX) {
>              ^~~~~~~~~~~~~~~~~
>              __MLX5_ACCEL_TLS_H__
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:207:13: error: MLX5_ACCEL_TLS_RX undeclared (first use in this function); did you mean MLX5_ACCEL_TLS_TX?
>   if (caps & MLX5_ACCEL_TLS_RX) {
>              ^~~~~~~~~~~~~~~~~
>              MLX5_ACCEL_TLS_TX
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:212:15: error: MLX5_ACCEL_TLS_LRO undeclared (first use in this function); did you mean MLX5_ACCEL_TLS_RX?
>   if (!(caps & MLX5_ACCEL_TLS_LRO)) {
>                ^~~~~~~~~~~~~~~~~~
>                MLX5_ACCEL_TLS_RX
> make[5]: *** [drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> make[4]: *** [drivers/net/ethernet/mellanox/mlx5/core] Error 2
> make[3]: *** [drivers/net/ethernet/mellanox] Error 2
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [drivers/net/ethernet] Error 2
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [drivers/net] Error 2
> make[1]: *** Waiting for unfinished jobs....
> make: *** [drivers] Error 2
> make: *** Waiting for unfinished jobs....
>
> this patch is to fix this error using 'depends on MLX5_FPGA_TLS' when MLX5_TLS is set.
>

Hi Mao, Thanks for the patch. sorry for the delayed response, I was
out of office.

Actually MLX5_TLS doesn't depend on MLX5_FPGA_TLS anymore.
Tariq prepared a patch to fix this, we will submit it this week.


> Fixes: e2869fb2068b ("net/mlx5: Kconfig, Better organize compilation flags")
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index 37fef8c..1da2770 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -139,6 +139,7 @@ config MLX5_TLS
>         depends on MLX5_CORE_EN
>         depends on TLS_DEVICE
>         depends on TLS=y || MLX5_CORE=m
> +       depends on MLX5_FPGA_TLS
>         select MLX5_ACCEL
>         default n
>         help
> --
> 2.7.4
>

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  4:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <05db3afa-b94e-d0ba-7d61-ec1bf9a82777@fb.com>

On Wed, Jul 10, 2019 at 9:14 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> >>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>>>> BTF verifier has Different logic depending on whether we are following
> >>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>>>> stop early in DFS traversal while resolving BTF types. But it also
> >>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>>>> size won't be resolved, as it is considered to be a sink for pointer,
> >>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>>>> completely wrong.
> >>>>>
> >>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>>>> over all BTF types anyways, so the only saving is a potentially slightly
> >>>>> shorter stack. But correctness is more important that tiny savings.
> >>>>>
> >>>>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>>>> typedef as a value type:
> >>>>>
> >>>>> typedef int array_t[16];
> >>>>>
> >>>>> struct {
> >>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>>>         __type(value, array_t); /* i.e., array_t *value; */
> >>>>> } test_map SEC(".maps");
> >>>>>
> >>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>>>
> >>>> The change seems okay to me. Currently, looks like intermediate
> >>>> modifier type will carry size = 0 (in the internal data structure).
> >>>
> >>> Yes, which is totally wrong, especially that we use that size in some
> >>> cases to reject map with specified BTF.
> >>>
> >>>>
> >>>> If we remove RESOLVE logic, we probably want to double check
> >>>> whether we handle circular types correctly or not. Maybe we will
> >>>> be okay if all self tests pass.
> >>>
> >>> I checked, it does. We'll attempt to add referenced type unless it's a
> >>> "resolve sink" (where size is immediately known) or is already
> >>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> >>> env_stack_push(), which check that the state of that type is
> >>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> >>> type is added into the stack, it's resolve state goes from NOT_VISITED
> >>> to VISITED.
> >>>
> >>> So, if there is a loop, then we'll detect it as soon as we'll attempt
> >>> to add the same type onto the stack second time.
> >>>
> >>>>
> >>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >>>> before removing it.
> >>>
> >>> I don't think there is any, because every type will be visited exactly
> >>> once, due to DFS nature of algorithm. The only difference is that if
> >>> we have a long chain of modifiers, we can technically reach the max
> >>> limit and fail. But at 32 I think it's pretty unrealistic to have such
> >>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >>>
> >>>>
> >>>> Another possible change is, for external usage, removing
> >>>> modifiers, before checking the size, something like below.
> >>>> Note that I am not strongly advocating my below patch as
> >>>> it has the same shortcoming that maintained modifier type
> >>>> size may not be correct.
> >>>
> >>> I don't think your patch helps, it can actually confuse things even
> >>> more. It skips modifiers until underlying type is found, but you still
> >>> don't guarantee that at that time that underlying type will have its
> >>> size resolved.
> >>
> >> It actually does help. It does not change the internal btf type
> >> traversal algorithms. It only change the implementation of
> >> an external API btf_type_id_size(). Previously, this function
> >> is used by externals and internal btf.c. I broke it into two,
> >> one internal __btf_type_id_size(), and another external
> >> btf_type_id_size(). The external one removes modifier before
> >> finding type size. The external one is typically used only
> >> after btf is validated.
> >
> > Sure, for external callers yes, it solves the problem. But there is
> > deeper problem: we mark modifier types RESOLVED before types they
> > ultimately point to are resolved. Then in all those btf_xxx_resolve()
> > functions we have check:
> >
> > if (!env_type_is_resolve_sink && !env_type_is_resolved)
> >    return env_stack_push();
> > else {
> >
> >    /* here we assume that we can calculate size of the type */
> >    /* so even if we traverse through all the modifiers and find
> > underlying type */
> >    /* that type will have resolved_size = 0, because we haven't
> > processed it yet */
> >    /* but we will just incorrectly assume that zero is *final* size */
> > }
> >
> > So I think that your patch is still just hiding the problem, not solving it.
> >
> > BTW, I've also identified part of btf_ptr_resolve() logic that can be
> > now safely removed (it's a special case that "restarts" DFS traversal
> > for modifiers, because they could have been prematurely marked
> > resolved). This is another sign that there is something wrong in an
> > algorithm.
> >
> > I'd rather remove unnecessary complexity and fix underlying problem,
> > especially given that there is no performance or correctness penalty.
>
> Could you create a special btf with type like
> typedef int a1;
> typedef a1 a2;
> ...
> typedef a65533 a65532;
> (maximum kernel allowed number of types is 64KB)
>
> In the BTF, the typedef order is reverse
> 1: typedef a65533 to 2
> 2: typedef ... to 3
> 3 ...
>
> So kernel won't run into deep recursion or panic?

Yeah I was just thinking about the need to generate artificially
constructed BTFs to stress-test BTF verification. Will add something.

>
> Thanks.
>
> >
> > I'll post v2 soon.
> >
> >>
> >> Will go through your other comments later.
> >>
> >>>
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 546ebee39e2a..6f927c3e0a89 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >>>> btf_type *t)
> >>>>            return true;
> >>>>     }
> >>>>
> >>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >>>> +                                                u32 *type_id, u32
> >>>> *ret_size,
> >>>> +                                                bool skip_modifier)
> >>>> +{
> >>>> +       const struct btf_type *size_type;
> >>>> +       u32 size_type_id = *type_id;
> >>>> +       u32 size = 0;
> >>>> +
> >>>> +       size_type = btf_type_by_id(btf, size_type_id);
> >>>> +       if (size_type && skip_modifier) {
> >>>> +               while (btf_type_is_modifier(size_type))
> >>>> +                       size_type = btf_type_by_id(btf, size_type->type);
> >>>> +       }
> >>>> +
> >>>> +       if (btf_type_nosize_or_null(size_type))
> >>>> +               return NULL;
> >>>> +
> >>>> +       if (btf_type_has_size(size_type)) {
> >>>> +               size = size_type->size;
> >>>> +       } else if (btf_type_is_array(size_type)) {
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +       } else if (btf_type_is_ptr(size_type)) {
> >>>> +               size = sizeof(void *);
> >>>> +       } else {
> >>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >>>> +                                !btf_type_is_var(size_type)))
> >>>> +                       return NULL;
> >>>> +
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +               size_type_id = btf->resolved_ids[size_type_id];
> >>>> +               size_type = btf_type_by_id(btf, size_type_id);
> >>>> +               if (btf_type_nosize_or_null(size_type))
> >>>> +                       return NULL;
> >>>> +       }
> >>>> +
> >>>> +       *type_id = size_type_id;
> >>>> +       if (ret_size)
> >>>> +               *ret_size = size;
> >>>> +
> >>>> +       return size_type;
> >>>> +}
> >>>> +
> >> [...]
> >

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  4:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <eebd6ac9-d968-9efb-db07-e5d877f7ae4c@fb.com>

On Wed, Jul 10, 2019 at 6:53 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> >>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>>>> BTF verifier has Different logic depending on whether we are following
> >>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>>>> stop early in DFS traversal while resolving BTF types. But it also
> >>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>>>> size won't be resolved, as it is considered to be a sink for pointer,
> >>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>>>> completely wrong.
> >>>>>
> >>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>>>> over all BTF types anyways, so the only saving is a potentially slightly
> >>>>> shorter stack. But correctness is more important that tiny savings.
> >>>>>
> >>>>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>>>> typedef as a value type:
> >>>>>
> >>>>> typedef int array_t[16];
> >>>>>
> >>>>> struct {
> >>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>>>         __type(value, array_t); /* i.e., array_t *value; */
> >>>>> } test_map SEC(".maps");
> >>>>>
> >>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>>>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>>>
> >>>> The change seems okay to me. Currently, looks like intermediate
> >>>> modifier type will carry size = 0 (in the internal data structure).
> >>>
> >>> Yes, which is totally wrong, especially that we use that size in some
> >>> cases to reject map with specified BTF.
> >>>
> >>>>
> >>>> If we remove RESOLVE logic, we probably want to double check
> >>>> whether we handle circular types correctly or not. Maybe we will
> >>>> be okay if all self tests pass.
> >>>
> >>> I checked, it does. We'll attempt to add referenced type unless it's a
> >>> "resolve sink" (where size is immediately known) or is already
> >>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
> >>> env_stack_push(), which check that the state of that type is
> >>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> >>> type is added into the stack, it's resolve state goes from NOT_VISITED
> >>> to VISITED.
> >>>
> >>> So, if there is a loop, then we'll detect it as soon as we'll attempt
> >>> to add the same type onto the stack second time.
> >>>
> >>>>
> >>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >>>> before removing it.
> >>>
> >>> I don't think there is any, because every type will be visited exactly
> >>> once, due to DFS nature of algorithm. The only difference is that if
> >>> we have a long chain of modifiers, we can technically reach the max
> >>> limit and fail. But at 32 I think it's pretty unrealistic to have such
> >>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >>>
> >>>>
> >>>> Another possible change is, for external usage, removing
> >>>> modifiers, before checking the size, something like below.
> >>>> Note that I am not strongly advocating my below patch as
> >>>> it has the same shortcoming that maintained modifier type
> >>>> size may not be correct.
> >>>
> >>> I don't think your patch helps, it can actually confuse things even
> >>> more. It skips modifiers until underlying type is found, but you still
> >>> don't guarantee that at that time that underlying type will have its
> >>> size resolved.
> >>
> >> It actually does help. It does not change the internal btf type
> >> traversal algorithms. It only change the implementation of
> >> an external API btf_type_id_size(). Previously, this function
> >> is used by externals and internal btf.c. I broke it into two,
> >> one internal __btf_type_id_size(), and another external
> >> btf_type_id_size(). The external one removes modifier before
> >> finding type size. The external one is typically used only
> >> after btf is validated.
> >
> > Sure, for external callers yes, it solves the problem. But there is
> > deeper problem: we mark modifier types RESOLVED before types they
> > ultimately point to are resolved. Then in all those btf_xxx_resolve()
> > functions we have check:
> >
> > if (!env_type_is_resolve_sink && !env_type_is_resolved)
> >    return env_stack_push();
> > else {
> >
> >    /* here we assume that we can calculate size of the type */
> >    /* so even if we traverse through all the modifiers and find
> > underlying type */
> >    /* that type will have resolved_size = 0, because we haven't
> > processed it yet */
> >    /* but we will just incorrectly assume that zero is *final* size */
> > }
> >
> > So I think that your patch is still just hiding the problem, not solving it.
>
> That is why I am not advocating it.
>
> The really long modifier chain (const volatile restrict ...) is rare.
> So I agree removing this RESOLVE logic is okay.

So :) thinking about this a bit more. Stack size is proportional not
to a longest chain of pointers and modifiers, but actually could be as
long as entire type graph (O(N)). So for this approach we'll need to
dynamically resize stack. This is easy to do, but I'm not sure how
much push back I'll get for such change.

But I'll think about doing it differently. The problem is with
resolved_sizes array, we assume it's filled for some types too early.
I'll see if I can get rid of it completely and instead just calculate
that on the fly by relying on resolved_ids. Will post v2 with one of
those approaches.

>
> >
> > BTW, I've also identified part of btf_ptr_resolve() logic that can be
> > now safely removed (it's a special case that "restarts" DFS traversal
> > for modifiers, because they could have been prematurely marked
> > resolved). This is another sign that there is something wrong in an
> > algorithm.
> >
> > I'd rather remove unnecessary complexity and fix underlying problem,
> > especially given that there is no performance or correctness penalty.
> >
> > I'll post v2 soon.
>
> Sounds good.
>
> >
> >>
> >> Will go through your other comments later.
> >>
> >>>
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 546ebee39e2a..6f927c3e0a89 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >>>> btf_type *t)
> >>>>            return true;
> >>>>     }
> >>>>
> >>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >>>> +                                                u32 *type_id, u32
> >>>> *ret_size,
> >>>> +                                                bool skip_modifier)
> >>>> +{
> >>>> +       const struct btf_type *size_type;
> >>>> +       u32 size_type_id = *type_id;
> >>>> +       u32 size = 0;
> >>>> +
> >>>> +       size_type = btf_type_by_id(btf, size_type_id);
> >>>> +       if (size_type && skip_modifier) {
> >>>> +               while (btf_type_is_modifier(size_type))
> >>>> +                       size_type = btf_type_by_id(btf, size_type->type);
> >>>> +       }
> >>>> +
> >>>> +       if (btf_type_nosize_or_null(size_type))
> >>>> +               return NULL;
> >>>> +
> >>>> +       if (btf_type_has_size(size_type)) {
> >>>> +               size = size_type->size;
> >>>> +       } else if (btf_type_is_array(size_type)) {
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +       } else if (btf_type_is_ptr(size_type)) {
> >>>> +               size = sizeof(void *);
> >>>> +       } else {
> >>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >>>> +                                !btf_type_is_var(size_type)))
> >>>> +                       return NULL;
> >>>> +
> >>>> +               size = btf->resolved_sizes[size_type_id];
> >>>> +               size_type_id = btf->resolved_ids[size_type_id];
> >>>> +               size_type = btf_type_by_id(btf, size_type_id);
> >>>> +               if (btf_type_nosize_or_null(size_type))
> >>>> +                       return NULL;
> >>>> +       }
> >>>> +
> >>>> +       *type_id = size_type_id;
> >>>> +       if (ret_size)
> >>>> +               *ret_size = size;
> >>>> +
> >>>> +       return size_type;
> >>>> +}
> >>>> +
> >> [...]

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  4:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4Bza6Y87C2_Fobj9CwU-2YRTU32S61f8_8CQdhMPenJiJZQ@mail.gmail.com>



On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
>>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>>>> BTF verifier has Different logic depending on whether we are following
>>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>>>> stop early in DFS traversal while resolving BTF types. But it also
>>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>>>> size won't be resolved, as it is considered to be a sink for pointer,
>>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>>>> completely wrong.
>>>>>
>>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>>>> over all BTF types anyways, so the only saving is a potentially slightly
>>>>> shorter stack. But correctness is more important that tiny savings.
>>>>>
>>>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>>>> typedef as a value type:
>>>>>
>>>>> typedef int array_t[16];
>>>>>
>>>>> struct {
>>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
>>>>>         __type(value, array_t); /* i.e., array_t *value; */
>>>>> } test_map SEC(".maps");
>>>>>
>>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>>>
>>>> The change seems okay to me. Currently, looks like intermediate
>>>> modifier type will carry size = 0 (in the internal data structure).
>>>
>>> Yes, which is totally wrong, especially that we use that size in some
>>> cases to reject map with specified BTF.
>>>
>>>>
>>>> If we remove RESOLVE logic, we probably want to double check
>>>> whether we handle circular types correctly or not. Maybe we will
>>>> be okay if all self tests pass.
>>>
>>> I checked, it does. We'll attempt to add referenced type unless it's a
>>> "resolve sink" (where size is immediately known) or is already
>>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
>>> env_stack_push(), which check that the state of that type is
>>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
>>> type is added into the stack, it's resolve state goes from NOT_VISITED
>>> to VISITED.
>>>
>>> So, if there is a loop, then we'll detect it as soon as we'll attempt
>>> to add the same type onto the stack second time.
>>>
>>>>
>>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>>>> before removing it.
>>>
>>> I don't think there is any, because every type will be visited exactly
>>> once, due to DFS nature of algorithm. The only difference is that if
>>> we have a long chain of modifiers, we can technically reach the max
>>> limit and fail. But at 32 I think it's pretty unrealistic to have such
>>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
>>>
>>>>
>>>> Another possible change is, for external usage, removing
>>>> modifiers, before checking the size, something like below.
>>>> Note that I am not strongly advocating my below patch as
>>>> it has the same shortcoming that maintained modifier type
>>>> size may not be correct.
>>>
>>> I don't think your patch helps, it can actually confuse things even
>>> more. It skips modifiers until underlying type is found, but you still
>>> don't guarantee that at that time that underlying type will have its
>>> size resolved.
>>
>> It actually does help. It does not change the internal btf type
>> traversal algorithms. It only change the implementation of
>> an external API btf_type_id_size(). Previously, this function
>> is used by externals and internal btf.c. I broke it into two,
>> one internal __btf_type_id_size(), and another external
>> btf_type_id_size(). The external one removes modifier before
>> finding type size. The external one is typically used only
>> after btf is validated.
> 
> Sure, for external callers yes, it solves the problem. But there is
> deeper problem: we mark modifier types RESOLVED before types they
> ultimately point to are resolved. Then in all those btf_xxx_resolve()
> functions we have check:
> 
> if (!env_type_is_resolve_sink && !env_type_is_resolved)
>    return env_stack_push();
> else {
> 
>    /* here we assume that we can calculate size of the type */
>    /* so even if we traverse through all the modifiers and find
> underlying type */
>    /* that type will have resolved_size = 0, because we haven't
> processed it yet */
>    /* but we will just incorrectly assume that zero is *final* size */
> }
> 
> So I think that your patch is still just hiding the problem, not solving it.
> 
> BTW, I've also identified part of btf_ptr_resolve() logic that can be
> now safely removed (it's a special case that "restarts" DFS traversal
> for modifiers, because they could have been prematurely marked
> resolved). This is another sign that there is something wrong in an
> algorithm.
> 
> I'd rather remove unnecessary complexity and fix underlying problem,
> especially given that there is no performance or correctness penalty.

Could you create a special btf with type like
typedef int a1;
typedef a1 a2;
...
typedef a65533 a65532;
(maximum kernel allowed number of types is 64KB)

In the BTF, the typedef order is reverse
1: typedef a65533 to 2
2: typedef ... to 3
3 ...

So kernel won't run into deep recursion or panic?

Thanks.

> 
> I'll post v2 soon.
> 
>>
>> Will go through your other comments later.
>>
>>>
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index 546ebee39e2a..6f927c3e0a89 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>>>> btf_type *t)
>>>>            return true;
>>>>     }
>>>>
>>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>>>> +                                                u32 *type_id, u32
>>>> *ret_size,
>>>> +                                                bool skip_modifier)
>>>> +{
>>>> +       const struct btf_type *size_type;
>>>> +       u32 size_type_id = *type_id;
>>>> +       u32 size = 0;
>>>> +
>>>> +       size_type = btf_type_by_id(btf, size_type_id);
>>>> +       if (size_type && skip_modifier) {
>>>> +               while (btf_type_is_modifier(size_type))
>>>> +                       size_type = btf_type_by_id(btf, size_type->type);
>>>> +       }
>>>> +
>>>> +       if (btf_type_nosize_or_null(size_type))
>>>> +               return NULL;
>>>> +
>>>> +       if (btf_type_has_size(size_type)) {
>>>> +               size = size_type->size;
>>>> +       } else if (btf_type_is_array(size_type)) {
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +       } else if (btf_type_is_ptr(size_type)) {
>>>> +               size = sizeof(void *);
>>>> +       } else {
>>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>>>> +                                !btf_type_is_var(size_type)))
>>>> +                       return NULL;
>>>> +
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +               size_type_id = btf->resolved_ids[size_type_id];
>>>> +               size_type = btf_type_by_id(btf, size_type_id);
>>>> +               if (btf_type_nosize_or_null(size_type))
>>>> +                       return NULL;
>>>> +       }
>>>> +
>>>> +       *type_id = size_type_id;
>>>> +       if (ret_size)
>>>> +               *ret_size = size;
>>>> +
>>>> +       return size_type;
>>>> +}
>>>> +
>> [...]
> 

^ permalink raw reply

* Re: [PATCH v3 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-11  3:50 UTC (permalink / raw)
  To: David Laight
  Cc: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	Christoph Hellwig, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@endlessm.com, Daniel Drake, stable@vger.kernel.org
In-Reply-To: <81a2b91c4b084617bab8656fca932f6d@AcuMS.aculab.com>

David Laight <David.Laight@aculab.com> 於 2019年7月10日 週三 下午4:57寫道:
>
> From: Jian-Hong Pan
> > Sent: 10 July 2019 09:38
> >
> > Testing with RTL8822BE hardware, when available memory is low, we
> > frequently see a kernel panic and system freeze.
> >
> > First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
> >
> > rx routine starvation
> > WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822
> > rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> > [ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
> >
> > Then we see a variety of different error conditions and kernel panics,
> > such as this one (trimmed):
> >
> > rtw_pci 0000:02:00.0: pci bus timeout, check dma status
> > skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f
> > data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
> > ------------[ cut here ]------------
> > kernel BUG at net/core/skbuff.c:105!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > RIP: 0010:skb_panic+0x43/0x45
> >
> > When skb allocation fails and the "rx routine starvation" is hit, the
> > function returns immediately without updating the RX ring. At this
> > point, the RX ring may continue referencing an old skb which was already
> > handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
> > bad things happen.
> >
> > This patch allocates a new, data-sized skb first in RX ISR. After
> > copying the data in, we pass it to the upper layers. However, if skb
> > allocation fails, we effectively drop the frame. In both cases, the
> > original, full size ring skb is reused.
> >
> > In addition, by fixing the kernel crash, the RX routine should now
> > generally behave better under low memory conditions.
>
> A couple of minor nits (see below).
> You may want to do a followup patch that changes the rx buffers
> (used by the hardware) to by just memory buffers.
> Nothing (probably) relies on them being skb with all the accociated
> baggage.

It is a good idea for later commit.

>         David
>
> >
> > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > Cc: <stable@vger.kernel.org>
> > ---
> > v2:
> >  - Allocate new data-sized skb and put data into it, then pass it to
> >    mac80211. Reuse the original skb in RX ring by DMA sync.
> >  - Modify the commit message.
> >  - Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
> >    of remapping in RX ISR.
> >
> > v3:
> >  - Same as v2.
> >
> >  drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
> >  1 file changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
> > index cfe05ba7280d..e9fe3ad896c8 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
> >       u32 pkt_offset;
> >       u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
> >       u32 buf_desc_sz = chip->rx_buf_desc_sz;
> > +     u32 new_len;
> >       u8 *rx_desc;
> >       dma_addr_t dma;
> >
> > @@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
> >               pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
> >                            pkt_stat.shift;
> >
> > -             if (pkt_stat.is_c2h) {
> > -                     /* keep rx_desc, halmac needs it */
> > -                     skb_put(skb, pkt_stat.pkt_len + pkt_offset);
> > +             /* discard current skb if the new skb cannot be allocated as a
> > +              * new one in rx ring later
> > +              */
>
> That comment isn't quite right.
> maybe: "Allocate a new skb for this frame, discard if none available"

Thanks!  I will tweak it.

> > +             new_len = pkt_stat.pkt_len + pkt_offset;
> > +             new = dev_alloc_skb(new_len);
> > +             if (WARN_ONCE(!new, "rx routine starvation\n"))
>
> I think you should count these??

Larry has a different idea here. [1]
I agree with Larry that just need to know not enough memory here.

[1] https://lkml.org/lkml/2019/7/8/1049

Jian-Hong Pan

> > +                     goto next_rp;
> > +
> > +             /* put the DMA data including rx_desc from phy to new skb */
> > +             skb_put_data(new, skb->data, new_len);
> >
> > -                     /* pass offset for further operation */
> > -                     *((u32 *)skb->cb) = pkt_offset;
> > -                     skb_queue_tail(&rtwdev->c2h_queue, skb);
> > +             if (pkt_stat.is_c2h) {
> > +                      /* pass rx_desc & offset for further operation */
> > +                     *((u32 *)new->cb) = pkt_offset;
> > +                     skb_queue_tail(&rtwdev->c2h_queue, new);
> >                       ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
> >               } else {
> > -                     /* remove rx_desc, maybe use skb_pull? */
> > -                     skb_put(skb, pkt_stat.pkt_len);
> > -                     skb_reserve(skb, pkt_offset);
> > -
> > -                     /* alloc a smaller skb to mac80211 */
> > -                     new = dev_alloc_skb(pkt_stat.pkt_len);
> > -                     if (!new) {
> > -                             new = skb;
> > -                     } else {
> > -                             skb_put_data(new, skb->data, skb->len);
> > -                             dev_kfree_skb_any(skb);
> > -                     }
> > -                     /* TODO: merge into rx.c */
> > -                     rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
> > +                     /* remove rx_desc */
> > +                     skb_pull(new, pkt_offset);
> > +
> > +                     rtw_rx_stats(rtwdev, pkt_stat.vif, new);
> >                       memcpy(new->cb, &rx_status, sizeof(rx_status));
> >                       ieee80211_rx_irqsafe(rtwdev->hw, new);
> >               }
> >
> > -             /* skb delivered to mac80211, alloc a new one in rx ring */
> > -             new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
> > -             if (WARN(!new, "rx routine starvation\n"))
> > -                     return;
> > -
> > -             ring->buf[cur_rp] = new;
> > -             rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
> > +next_rp:
> > +             /* new skb delivered to mac80211, re-enable original skb DMA */
> > +             rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
> >
> >               /* host read next element in ring */
> >               if (++cur_rp >= ring->r.len)
> > --
> > 2.22.0
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply

* [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-11  3:39 UTC (permalink / raw)
  To: wensong
  Cc: horms, ja, pablo, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, yangxingwu

this patch removes the extra space and use bitmap_zalloc instead

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
---
 net/netfilter/ipvs/ip_vs_mh.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
index 94d9d34..3229867 100644
--- a/net/netfilter/ipvs/ip_vs_mh.c
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -174,8 +174,7 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
 		return 0;
 	}
 
-	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
-			 sizeof(unsigned long), GFP_KERNEL);
+	table = bitmap_zalloc(IP_VS_MH_TAB_SIZE, GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131603.6b11b831@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]

Hi all,

On Thu, 11 Jul 2019 13:16:03 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:  
> > >
> > > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> > >     
> > > > So today this failed to build after I merged the rdma tree (previously
> > > > it didn;t until after the net-next tree was merged (I assume a
> > > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > > tree :-(      
> > > 
> > > ? I'm confused.. 
> > > 
> > > rdma.git builds fine stand alone (I hope!)    
> > 
> > I have "Fixup to build SIW issue" from Leon (which switches to using
> > in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> > without that the rdma tree would not build for me.  Are you saying that
> > I don't need that at all, now?  
> 
> Actually , I get it now, "Fixup to build SIW issue" is really just a
> fixup for the net-next and rdma trees merge ... OK, I will fix that up
> tomorrow.  Sorry for my confusion.

Actually, I have rewound my tree and am starting from the merge of the
rdma tree again, so hopefully it should all be good today.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711131344.452fc064@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Hi all,

On Thu, 11 Jul 2019 13:13:44 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> >   
> > > So today this failed to build after I merged the rdma tree (previously
> > > it didn;t until after the net-next tree was merged (I assume a
> > > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > > tree :-(    
> > 
> > ? I'm confused.. 
> > 
> > rdma.git builds fine stand alone (I hope!)  
> 
> I have "Fixup to build SIW issue" from Leon (which switches to using
> in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
> without that the rdma tree would not build for me.  Are you saying that
> I don't need that at all, now?

Actually , I get it now, "Fixup to build SIW issue" is really just a
fixup for the net-next and rdma trees merge ... OK, I will fix that up
tomorrow.  Sorry for my confusion.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  3:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711015854.GC22409@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

Hi Jason,

On Thu, 11 Jul 2019 02:26:27 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:
> 
> > So today this failed to build after I merged the rdma tree (previously
> > it didn;t until after the net-next tree was merged (I assume a
> > dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> > in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> > tree :-(  
> 
> ? I'm confused.. 
> 
> rdma.git builds fine stand alone (I hope!)

I have "Fixup to build SIW issue" from Leon (which switches to using
in_dev_for_each_ifa_rcu) included in the rmda tree merge commit because
without that the rdma tree would not build for me.  Are you saying that
I don't need that at all, now?

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [GIT PULL] Keys: Set 4 - Key ACLs for 5.3
From: Mimi Zohar @ 2019-07-11  3:07 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, James Morris, keyrings, Netdev,
	linux-nfs, CIFS, linux-afs, linux-fsdevel, linux-integrity,
	LSM List, Linux List Kernel Mailing
In-Reply-To: <CAHk-=wiFti6=K2fyAYhx-PSX9ovQPJUNp0FMdV0pDaO_pSx9MQ@mail.gmail.com>

Hi Linus,

On Wed, 2019-07-10 at 18:59 -0700, Linus Torvalds wrote:
> Anyway, since it does seem like David is offline, I've just reverted
> this from my tree, and will be continuing my normal merge window pull
> requests (the other issues I have seen have fixes in their respective
> trees).

Sorry for the delay.  An exception is needed for loading builtin keys
"KEY_ALLOC_BUILT_IN" onto a keyring that is not writable by userspace.
 The following works, but probably is not how David would handle the
exception.

diff --git a/security/keys/key.c b/security/keys/key.c
index 519211a996e7..a99332c1e014 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -896,7 +896,7 @@ key_ref_t key_create_or_update(key_ref_t keyring_ref,
        /* if we're going to allocate a new key, we're going to have
         * to modify the keyring */
        ret = key_permission(keyring_ref, KEY_NEED_WRITE);
-       if (ret < 0) {
+       if (ret < 0 && !(flags & KEY_ALLOC_BUILT_IN)) {
                key_ref = ERR_PTR(ret);
                goto error_link_end;
        }

Mimi


^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Jason Gunthorpe @ 2019-07-11  2:26 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190711115054.7d7f468c@canb.auug.org.au>

On Thu, Jul 11, 2019 at 11:50:54AM +1000, Stephen Rothwell wrote:

> So today this failed to build after I merged the rdma tree (previously
> it didn;t until after the net-next tree was merged (I assume a
> dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
> in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
> tree :-(

? I'm confused.. 

rdma.git builds fine stand alone (I hope!)

If you merge it with netdev then the above patch is needed afer the
merge as netdev changed to ifa_rcu

I just did this a few hours ago to make and test the patch I sent
above..

Jason

^ permalink raw reply

* Re: [GIT PULL] Keys: Set 4 - Key ACLs for 5.3
From: Linus Torvalds @ 2019-07-11  1:59 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, James Morris, keyrings, Netdev,
	linux-nfs, CIFS, linux-afs, linux-fsdevel, linux-integrity,
	LSM List, Linux List Kernel Mailing
In-Reply-To: <20190710201552.GB83443@gmail.com>

On Wed, Jul 10, 2019 at 1:15 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Also worth noting that the key ACL patches were only in linux-next for 9 days
> before the pull request was sent.

Yes. I was not entirely happy with the whole key subsystem situation.
See my concerns in

  https://lore.kernel.org/lkml/CAHk-=wjEowdfG7v_4ttu3xhf9gqopj1+q1nGG86+mGfGDTEBBg@mail.gmail.com/

for more. That was before I realized it was buggy.

So it really would be good to have more people involved, and more
structure to the keys development (and, I suspect, much else under
security/)

Anyway, since it does seem like David is offline, I've just reverted
this from my tree, and will be continuing my normal merge window pull
requests (the other issues I have seen have fixes in their respective
trees).

                 Linus

^ permalink raw reply

* RE: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Nixiaoming @ 2019-07-11  1:55 UTC (permalink / raw)
  To: Vasily Averin, adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	gregkh@linuxfoundation.org, jlayton@kernel.org, luto@kernel.org,
	mingo@kernel.org, Nadia.Derbey@bull.net,
	paulmck@linux.vnet.ibm.com, semen.protsenko@linaro.org,
	stable@kernel.org, stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org
  Cc: Huangjianhui (Alex), Dailei, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <f628ff03-eb47-62f3-465b-fe4ed046b30c@virtuozzo.com>

On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>> Registering the same notifier to a hook repeatedly can cause the hook
>> list to form a ring or lose other members of the list.
>
>I think is not enough to _prevent_ 2nd register attempt,
>it's enough to detect just attempt and generate warning to mark host in bad state.
>

Duplicate registration is prevented in my patch, not just "mark host in bad state"

Duplicate registration is checked and exited in notifier_chain_cond_register()

Duplicate registration was checked in notifier_chain_register() but only 
the alarm was triggered without exiting. added by commit 831246570d34692e 
("kernel/notifier.c: double register detection")

My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
 which triggers an alarm and exits when a duplicate registration is detected.

>Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>and it can lead to host crash in any time: 
>you can unregister notifier on first attempt it can be too early, it can be still in use.
>on the other hand you can never call 2nd unregister at all.

Since the member was not added to the linked list at the time of the second registration, 
no linked list ring was formed. 
The member is released on the first unregistration and -ENOENT on the second unregistration.
After patching, the fault has been alleviated

It may be more helpful to return an error code when someone tries to register the same
notification program a second time.
But I noticed that notifier_chain_cond_register() returns 0 when duplicate registration 
is detected. At the same time, in all the existing export function comments of notify,
"Currently always returns zero"

I am a bit confused: which is better?

>
>Unfortunately I do not see any ways to handle such cases properly,
>and it seems for me your patches does not resolve this problem.
>
>Am I missed something probably?
> 
>> case1: An infinite loop in notifier_chain_register() can cause soft lockup
>>         atomic_notifier_chain_register(&test_notifier_list, &test1);
>>         atomic_notifier_chain_register(&test_notifier_list, &test1);
>>         atomic_notifier_chain_register(&test_notifier_list, &test2);

Thanks

Xiaoming Ni

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Yonghong Song @ 2019-07-11  1:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <CAEf4Bza6Y87C2_Fobj9CwU-2YRTU32S61f8_8CQdhMPenJiJZQ@mail.gmail.com>



On 7/10/19 6:45 PM, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
>>> On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
>>>>> BTF verifier has Different logic depending on whether we are following
>>>>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
>>>>> stop early in DFS traversal while resolving BTF types. But it also
>>>>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
>>>>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
>>>>> size won't be resolved, as it is considered to be a sink for pointer,
>>>>> leading to TYPEDEF being in RESOLVED state with zero size, which is
>>>>> completely wrong.
>>>>>
>>>>> Optimization is doubtful, though, as btf_check_all_types() will iterate
>>>>> over all BTF types anyways, so the only saving is a potentially slightly
>>>>> shorter stack. But correctness is more important that tiny savings.
>>>>>
>>>>> This bug manifests itself in rejecting BTF-defined maps that use array
>>>>> typedef as a value type:
>>>>>
>>>>> typedef int array_t[16];
>>>>>
>>>>> struct {
>>>>>         __uint(type, BPF_MAP_TYPE_ARRAY);
>>>>>         __type(value, array_t); /* i.e., array_t *value; */
>>>>> } test_map SEC(".maps");
>>>>>
>>>>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
>>>>> Cc: Martin KaFai Lau <kafai@fb.com>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>>>
>>>> The change seems okay to me. Currently, looks like intermediate
>>>> modifier type will carry size = 0 (in the internal data structure).
>>>
>>> Yes, which is totally wrong, especially that we use that size in some
>>> cases to reject map with specified BTF.
>>>
>>>>
>>>> If we remove RESOLVE logic, we probably want to double check
>>>> whether we handle circular types correctly or not. Maybe we will
>>>> be okay if all self tests pass.
>>>
>>> I checked, it does. We'll attempt to add referenced type unless it's a
>>> "resolve sink" (where size is immediately known) or is already
>>> resolved (it's state is RESOLVED). In other cases, we'll attempt to
>>> env_stack_push(), which check that the state of that type is
>>> NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
>>> type is added into the stack, it's resolve state goes from NOT_VISITED
>>> to VISITED.
>>>
>>> So, if there is a loop, then we'll detect it as soon as we'll attempt
>>> to add the same type onto the stack second time.
>>>
>>>>
>>>> I may still be worthwhile to qualify the RESOLVE optimization benefit
>>>> before removing it.
>>>
>>> I don't think there is any, because every type will be visited exactly
>>> once, due to DFS nature of algorithm. The only difference is that if
>>> we have a long chain of modifiers, we can technically reach the max
>>> limit and fail. But at 32 I think it's pretty unrealistic to have such
>>> a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
>>>
>>>>
>>>> Another possible change is, for external usage, removing
>>>> modifiers, before checking the size, something like below.
>>>> Note that I am not strongly advocating my below patch as
>>>> it has the same shortcoming that maintained modifier type
>>>> size may not be correct.
>>>
>>> I don't think your patch helps, it can actually confuse things even
>>> more. It skips modifiers until underlying type is found, but you still
>>> don't guarantee that at that time that underlying type will have its
>>> size resolved.
>>
>> It actually does help. It does not change the internal btf type
>> traversal algorithms. It only change the implementation of
>> an external API btf_type_id_size(). Previously, this function
>> is used by externals and internal btf.c. I broke it into two,
>> one internal __btf_type_id_size(), and another external
>> btf_type_id_size(). The external one removes modifier before
>> finding type size. The external one is typically used only
>> after btf is validated.
> 
> Sure, for external callers yes, it solves the problem. But there is
> deeper problem: we mark modifier types RESOLVED before types they
> ultimately point to are resolved. Then in all those btf_xxx_resolve()
> functions we have check:
> 
> if (!env_type_is_resolve_sink && !env_type_is_resolved)
>    return env_stack_push();
> else {
> 
>    /* here we assume that we can calculate size of the type */
>    /* so even if we traverse through all the modifiers and find
> underlying type */
>    /* that type will have resolved_size = 0, because we haven't
> processed it yet */
>    /* but we will just incorrectly assume that zero is *final* size */
> }
> 
> So I think that your patch is still just hiding the problem, not solving it.

That is why I am not advocating it.

The really long modifier chain (const volatile restrict ...) is rare.
So I agree removing this RESOLVE logic is okay.

> 
> BTW, I've also identified part of btf_ptr_resolve() logic that can be
> now safely removed (it's a special case that "restarts" DFS traversal
> for modifiers, because they could have been prematurely marked
> resolved). This is another sign that there is something wrong in an
> algorithm.
> 
> I'd rather remove unnecessary complexity and fix underlying problem,
> especially given that there is no performance or correctness penalty.
> 
> I'll post v2 soon.

Sounds good.

> 
>>
>> Will go through your other comments later.
>>
>>>
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index 546ebee39e2a..6f927c3e0a89 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
>>>> btf_type *t)
>>>>            return true;
>>>>     }
>>>>
>>>> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
>>>> +                                                u32 *type_id, u32
>>>> *ret_size,
>>>> +                                                bool skip_modifier)
>>>> +{
>>>> +       const struct btf_type *size_type;
>>>> +       u32 size_type_id = *type_id;
>>>> +       u32 size = 0;
>>>> +
>>>> +       size_type = btf_type_by_id(btf, size_type_id);
>>>> +       if (size_type && skip_modifier) {
>>>> +               while (btf_type_is_modifier(size_type))
>>>> +                       size_type = btf_type_by_id(btf, size_type->type);
>>>> +       }
>>>> +
>>>> +       if (btf_type_nosize_or_null(size_type))
>>>> +               return NULL;
>>>> +
>>>> +       if (btf_type_has_size(size_type)) {
>>>> +               size = size_type->size;
>>>> +       } else if (btf_type_is_array(size_type)) {
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +       } else if (btf_type_is_ptr(size_type)) {
>>>> +               size = sizeof(void *);
>>>> +       } else {
>>>> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
>>>> +                                !btf_type_is_var(size_type)))
>>>> +                       return NULL;
>>>> +
>>>> +               size = btf->resolved_sizes[size_type_id];
>>>> +               size_type_id = btf->resolved_ids[size_type_id];
>>>> +               size_type = btf_type_by_id(btf, size_type_id);
>>>> +               if (btf_type_nosize_or_null(size_type))
>>>> +                       return NULL;
>>>> +       }
>>>> +
>>>> +       *type_id = size_type_id;
>>>> +       if (ret_size)
>>>> +               *ret_size = size;
>>>> +
>>>> +       return size_type;
>>>> +}
>>>> +
>> [...]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2019-07-11  1:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bernard Metzler, Doug Ledford, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190710175212.GM2887@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 4311 bytes --]

Hi all,

On Wed, 10 Jul 2019 17:52:17 +0000 Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Tue, Jul 09, 2019 at 09:43:46AM +0300, Leon Romanovsky wrote:
> > On Tue, Jul 09, 2019 at 01:56:36PM +1000, Stephen Rothwell wrote:  
> > > Hi all,
> > >
> > > After merging the net-next tree, today's linux-next build (x86_64
> > > allmodconfig) failed like this:
> > >
> > > drivers/infiniband/sw/siw/siw_cm.c: In function 'siw_create_listen':
> > > drivers/infiniband/sw/siw/siw_cm.c:1978:3: error: implicit declaration of function 'for_ifa'; did you mean 'fork_idle'? [-Werror=implicit-function-declaration]
> > >    for_ifa(in_dev)
> > >    ^~~~~~~
> > >    fork_idle
> > > drivers/infiniband/sw/siw/siw_cm.c:1978:18: error: expected ';' before '{' token
> > >    for_ifa(in_dev)
> > >                   ^
> > >                   ;
> > >    {
> > >    ~
> > >
> > > Caused by commit
> > >
> > >   6c52fdc244b5 ("rdma/siw: connection management")
> > >
> > > from the rdma tree.  I don't know why this didn't fail after I mereged
> > > that tree.  
> > 
> > I had the same question, because I have this fix for a couple of days already.
> > 
> > From 56c9e15ec670af580daa8c3ffde9503af3042d67 Mon Sep 17 00:00:00 2001
> > From: Leon Romanovsky <leonro@mellanox.com>
> > Date: Sun, 7 Jul 2019 10:43:42 +0300
> > Subject: [PATCH] Fixup to build SIW issue
> > 
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/sw/siw/siw_cm.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
> > index 8e618cb7261f..c883bf514341 100644
> > +++ b/drivers/infiniband/sw/siw/siw_cm.c
> > @@ -1954,6 +1954,7 @@ static void siw_drop_listeners(struct iw_cm_id *id)
> >  int siw_create_listen(struct iw_cm_id *id, int backlog)
> >  {
> >  	struct net_device *dev = to_siw_dev(id->device)->netdev;
> > +	const struct in_ifaddr *ifa;
> >  	int rv = 0, listeners = 0;
> > 
> >  	siw_dbg(id->device, "id 0x%p: backlog %d\n", id, backlog);
> > @@ -1975,8 +1976,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
> >  			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
> >  			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));
> > 
> > -		for_ifa(in_dev)
> > -		{
> > +		in_dev_for_each_ifa_rcu(ifa, in_dev) {
> >  			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||  
> 
> Hum. There is no rcu lock held here and we can't use RCU anyhow as
> siw_listen_address will sleep.
> 
> I think this needs to use rtnl, as below. Bernard, please urgently
> confirm. Thanks
> 
> diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
> index 8e618cb7261f62..ee98e96a5bfaba 100644
> --- a/drivers/infiniband/sw/siw/siw_cm.c
> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> @@ -1965,6 +1965,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
>  	 */
>  	if (id->local_addr.ss_family == AF_INET) {
>  		struct in_device *in_dev = in_dev_get(dev);
> +		const struct in_ifaddr *ifa;
>  		struct sockaddr_in s_laddr, *s_raddr;
>  
>  		memcpy(&s_laddr, &id->local_addr, sizeof(s_laddr));
> @@ -1975,8 +1976,8 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
>  			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
>  			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));
>  
> -		for_ifa(in_dev)
> -		{
> +		rtnl_lock();
> +		in_dev_for_each_ifa_rtnl(ifa, in_dev) {
>  			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||
>  			    s_laddr.sin_addr.s_addr == ifa->ifa_address) {
>  				s_laddr.sin_addr.s_addr = ifa->ifa_address;
> @@ -1988,7 +1989,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
>  					listeners++;
>  			}
>  		}
> -		endfor_ifa(in_dev);
> +		rtnl_unlock();
>  		in_dev_put(in_dev);
>  	} else if (id->local_addr.ss_family == AF_INET6) {
>  		struct inet6_dev *in6_dev = in6_dev_get(dev);

So today this failed to build after I merged the rdma tree (previously
it didn;t until after the net-next tree was merged (I assume a
dependency changed).  It failed because in_dev_for_each_ifa_rcu (and
in_dev_for_each_ifa_rtnl) is only defined in a commit in the net-next
tree :-(

I have disabled the driver again.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix BTF verifier size resolution logic
From: Andrii Nakryiko @ 2019-07-11  1:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Alexei Starovoitov, daniel@iogearbox.net,
	bpf@vger.kernel.org, netdev@vger.kernel.org, Kernel Team,
	Martin Lau
In-Reply-To: <304d8535-5043-836d-2933-1a5efb7aec72@fb.com>

On Wed, Jul 10, 2019 at 5:36 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/19 5:29 PM, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2019 at 5:16 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 1:08 AM, Andrii Nakryiko wrote:
> >>> BTF verifier has Different logic depending on whether we are following
> >>> a PTR or STRUCT/ARRAY (or something else). This is an optimization to
> >>> stop early in DFS traversal while resolving BTF types. But it also
> >>> results in a size resolution bug, when there is a chain, e.g., of PTR ->
> >>> TYPEDEF -> ARRAY, in which case due to being in pointer context ARRAY
> >>> size won't be resolved, as it is considered to be a sink for pointer,
> >>> leading to TYPEDEF being in RESOLVED state with zero size, which is
> >>> completely wrong.
> >>>
> >>> Optimization is doubtful, though, as btf_check_all_types() will iterate
> >>> over all BTF types anyways, so the only saving is a potentially slightly
> >>> shorter stack. But correctness is more important that tiny savings.
> >>>
> >>> This bug manifests itself in rejecting BTF-defined maps that use array
> >>> typedef as a value type:
> >>>
> >>> typedef int array_t[16];
> >>>
> >>> struct {
> >>>        __uint(type, BPF_MAP_TYPE_ARRAY);
> >>>        __type(value, array_t); /* i.e., array_t *value; */
> >>> } test_map SEC(".maps");
> >>>
> >>> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> >>> Cc: Martin KaFai Lau <kafai@fb.com>
> >>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>
> >> The change seems okay to me. Currently, looks like intermediate
> >> modifier type will carry size = 0 (in the internal data structure).
> >
> > Yes, which is totally wrong, especially that we use that size in some
> > cases to reject map with specified BTF.
> >
> >>
> >> If we remove RESOLVE logic, we probably want to double check
> >> whether we handle circular types correctly or not. Maybe we will
> >> be okay if all self tests pass.
> >
> > I checked, it does. We'll attempt to add referenced type unless it's a
> > "resolve sink" (where size is immediately known) or is already
> > resolved (it's state is RESOLVED). In other cases, we'll attempt to
> > env_stack_push(), which check that the state of that type is
> > NOT_VISITED. If it's RESOLVED or VISITED, it returns -EEXISTS. When
> > type is added into the stack, it's resolve state goes from NOT_VISITED
> > to VISITED.
> >
> > So, if there is a loop, then we'll detect it as soon as we'll attempt
> > to add the same type onto the stack second time.
> >
> >>
> >> I may still be worthwhile to qualify the RESOLVE optimization benefit
> >> before removing it.
> >
> > I don't think there is any, because every type will be visited exactly
> > once, due to DFS nature of algorithm. The only difference is that if
> > we have a long chain of modifiers, we can technically reach the max
> > limit and fail. But at 32 I think it's pretty unrealistic to have such
> > a long chain of PTR/TYPEDEF/CONST/VOLATILE/RESTRICTs :)
> >
> >>
> >> Another possible change is, for external usage, removing
> >> modifiers, before checking the size, something like below.
> >> Note that I am not strongly advocating my below patch as
> >> it has the same shortcoming that maintained modifier type
> >> size may not be correct.
> >
> > I don't think your patch helps, it can actually confuse things even
> > more. It skips modifiers until underlying type is found, but you still
> > don't guarantee that at that time that underlying type will have its
> > size resolved.
>
> It actually does help. It does not change the internal btf type
> traversal algorithms. It only change the implementation of
> an external API btf_type_id_size(). Previously, this function
> is used by externals and internal btf.c. I broke it into two,
> one internal __btf_type_id_size(), and another external
> btf_type_id_size(). The external one removes modifier before
> finding type size. The external one is typically used only
> after btf is validated.

Sure, for external callers yes, it solves the problem. But there is
deeper problem: we mark modifier types RESOLVED before types they
ultimately point to are resolved. Then in all those btf_xxx_resolve()
functions we have check:

if (!env_type_is_resolve_sink && !env_type_is_resolved)
  return env_stack_push();
else {

  /* here we assume that we can calculate size of the type */
  /* so even if we traverse through all the modifiers and find
underlying type */
  /* that type will have resolved_size = 0, because we haven't
processed it yet */
  /* but we will just incorrectly assume that zero is *final* size */
}

So I think that your patch is still just hiding the problem, not solving it.

BTW, I've also identified part of btf_ptr_resolve() logic that can be
now safely removed (it's a special case that "restarts" DFS traversal
for modifiers, because they could have been prematurely marked
resolved). This is another sign that there is something wrong in an
algorithm.

I'd rather remove unnecessary complexity and fix underlying problem,
especially given that there is no performance or correctness penalty.

I'll post v2 soon.

>
> Will go through your other comments later.
>
> >
> >>
> >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >> index 546ebee39e2a..6f927c3e0a89 100644
> >> --- a/kernel/bpf/btf.c
> >> +++ b/kernel/bpf/btf.c
> >> @@ -620,6 +620,54 @@ static bool btf_type_int_is_regular(const struct
> >> btf_type *t)
> >>           return true;
> >>    }
> >>
> >> +static const struct btf_type *__btf_type_id_size(const struct btf *btf,
> >> +                                                u32 *type_id, u32
> >> *ret_size,
> >> +                                                bool skip_modifier)
> >> +{
> >> +       const struct btf_type *size_type;
> >> +       u32 size_type_id = *type_id;
> >> +       u32 size = 0;
> >> +
> >> +       size_type = btf_type_by_id(btf, size_type_id);
> >> +       if (size_type && skip_modifier) {
> >> +               while (btf_type_is_modifier(size_type))
> >> +                       size_type = btf_type_by_id(btf, size_type->type);
> >> +       }
> >> +
> >> +       if (btf_type_nosize_or_null(size_type))
> >> +               return NULL;
> >> +
> >> +       if (btf_type_has_size(size_type)) {
> >> +               size = size_type->size;
> >> +       } else if (btf_type_is_array(size_type)) {
> >> +               size = btf->resolved_sizes[size_type_id];
> >> +       } else if (btf_type_is_ptr(size_type)) {
> >> +               size = sizeof(void *);
> >> +       } else {
> >> +               if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) &&
> >> +                                !btf_type_is_var(size_type)))
> >> +                       return NULL;
> >> +
> >> +               size = btf->resolved_sizes[size_type_id];
> >> +               size_type_id = btf->resolved_ids[size_type_id];
> >> +               size_type = btf_type_by_id(btf, size_type_id);
> >> +               if (btf_type_nosize_or_null(size_type))
> >> +                       return NULL;
> >> +       }
> >> +
> >> +       *type_id = size_type_id;
> >> +       if (ret_size)
> >> +               *ret_size = size;
> >> +
> >> +       return size_type;
> >> +}
> >> +
> [...]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox