All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dragos Tatulea <dtatulea@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>
Subject: Re: [PATCH rdma-rc v1] IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
Date: Sat, 21 Jan 2023 09:40:43 +0100	[thread overview]
Message-ID: <Y8ulCzN2U6cKN0T1@goatcheese> (raw)
In-Reply-To: <Y8r/BUdb7XMxwVN+@nvidia.com>

On 01/20, Jason Gunthorpe wrote:
> On Fri, Jan 20, 2023 at 07:02:48PM +0200, Leon Romanovsky wrote:
> > From: Dragos Tatulea <dtatulea@nvidia.com>
> > 
> > The cited commit creates child PKEY interfaces over netlink will multiple
> > tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx
> > queues. This causes to a crash when traffic is sent over the PKEY interface
> > due to the parent having a single queue but the child having multiple queues.
> > 
> > This patch inherits the real_num_tx/rx_queues from the parent netdev.
> > 
> > BUG: kernel NULL pointer dereference, address: 000000000000036b
> > PGD 0 P4D 0
> > Oops: 0000 [#1] SMP
> > CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:kmem_cache_alloc+0xcb/0x450
> > Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b
> > RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202
> > RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae
> > RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00
> > RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40
> > R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000
> > R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000
> > FS:  00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  skb_clone+0x55/0xd0
> >  ip6_finish_output2+0x3fe/0x690
> >  ip6_finish_output+0xfa/0x310
> >  ip6_send_skb+0x1e/0x60
> >  udp_v6_send_skb+0x1e5/0x420
> >  udpv6_sendmsg+0xb3c/0xe60
> >  ? ip_mc_finish_output+0x180/0x180
> >  ? __switch_to_asm+0x3a/0x60
> >  ? __switch_to_asm+0x34/0x60
> >  sock_sendmsg+0x33/0x40
> >  __sys_sendto+0x103/0x160
> >  ? _copy_to_user+0x21/0x30
> >  ? kvm_clock_get_cycles+0xd/0x10
> >  ? ktime_get_ts64+0x49/0xe0
> >  __x64_sys_sendto+0x25/0x30
> >  do_syscall_64+0x3d/0x90
> >  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > RIP: 0033:0x7f9374f1ed14
> > Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b
> > RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> > RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14
> > RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030
> > RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c
> > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> > R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc
> >  </TASK>
> > 
> > Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces")
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> > Changelog:
> > v1:
> >  * Fixed typo in warning print.
> > v0: https://lore.kernel.org/all/4a7ecec08ee30ad8004019818fadf1e58057e945.1674137153.git.leon@kernel.org
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
> > index 9ad8d9856275..0548735a15b5 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
> > @@ -126,6 +126,18 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
> >  	} else
> >  		child_pkey  = nla_get_u16(data[IFLA_IPOIB_PKEY]);
> >  
> > +	err = netif_set_real_num_tx_queues(dev, pdev->real_num_tx_queues);
> > +	if (err) {
> > +		ipoib_warn(ppriv, "failed setting the child tx queue count based on parent\n");
> > +		return err;
> > +	}
> > +
> > +	err = netif_set_real_num_rx_queues(dev, pdev->real_num_rx_queues);
> > +	if (err) {
> > +		ipoib_warn(ppriv, "failed setting the child rx queue count based on parent\n");
> > +		return err;
> > +	}
> 
> This still seems flawed.. Netlink does this:
> 
> 	unsigned int num_rx_queues = 1;
> 
> 	if (tb[IFLA_NUM_RX_QUEUES])
> 		num_rx_queues = nla_get_u32(tb[IFLA_NUM_RX_QUEUES]);
> 	else if (ops->get_num_rx_queues)
> 		num_rx_queues = ops->get_num_rx_queues();
> 
> So num_rx_queues can really be any value that userspaces cares to
> provide.
> 
> If pdev->real_num_rx_queues is > the user provided value then
> netif_set_real_num_rx_queues() just fails.
> 
> So at a minimum this should min the actual number of queues requested
> against the maximum number of queues the driver can provide and use
> that to set the real queues.
>
Hmmm, this patch does indeed introduce more room for confusion for the general
case.

What we want to avoid is to have legacy IPoIB interfaces use more than one
queue. That's  when we encounter the mentioned issue. So maybe the code should
explicitly do just that: set the numer of queues to 1 when legacy IPoIB is
detected in ipoib_intf_init():

	rc = rdma_init_netdev(hca, port, RDMA_NETDEV_IPOIB, name,
			      NET_NAME_UNKNOWN, ipoib_setup_common, dev);
	if (rc) {
		if (rc != -EOPNOTSUPP)
			goto out;

+		netif_set_real_num_tx_queues(dev, 1);
+		netif_set_real_num_rx_queues(dev, 1);
		
		...
	}

> And the return of a really big number from ops->get_num_rx_queues is
> pretty ugly too, ideally that would be fixed to pass in some function
> arguments and obtain the ppriv so it can return the actual maximum
> number of queues and we don't waste a bunch of memory..
> 
Right. This would make things easier.

Thanks,
Dragos

  reply	other threads:[~2023-01-21  8:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 17:02 [PATCH rdma-rc v1] IB/IPoIB: Fix legacy IPoIB due to wrong number of queues Leon Romanovsky
2023-01-20 20:52 ` Jason Gunthorpe
2023-01-21  8:40   ` Dragos Tatulea [this message]
2023-01-22 12:44   ` Leon Romanovsky
2023-01-23 18:32     ` Jason Gunthorpe
2023-01-24  6:27       ` Leon Romanovsky
2023-01-24 13:00         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8ulCzN2U6cKN0T1@goatcheese \
    --to=dtatulea@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.