Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* pyverbs test regression
@ 2020-11-03 23:54 Bob Pearson
  2020-11-04  0:00 ` Jason Gunthorpe
  2020-11-04 13:53 ` Bernard Metzler
  0 siblings, 2 replies; 10+ messages in thread
From: Bob Pearson @ 2020-11-03 23:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky; +Cc: linux-rdma@vger.kernel.org

Since 5.10 some of the pyverbs tests are skipping with the warning
	"Device rxe_0 doesn't have net interface"

These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in

RDMATestCase _add_gids_per_port after the following

	    if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
                self.args.append([dev, port, idx, None])
                continue

In fact there is no such path which means it never finds an ip_addr for the device.

Did something change here? Do other RDMA devices have /sys/class/infiniband/XXX/device/net?

Thanks,

Bob Pearson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-03 23:54 pyverbs test regression Bob Pearson
@ 2020-11-04  0:00 ` Jason Gunthorpe
  2020-11-04 10:40   ` Edward Srouji
  2020-11-04 13:53 ` Bernard Metzler
  1 sibling, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-11-04  0:00 UTC (permalink / raw)
  To: Bob Pearson, Edward Srouji; +Cc: Leon Romanovsky, linux-rdma@vger.kernel.org

On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
> Since 5.10 some of the pyverbs tests are skipping with the warning
> 	"Device rxe_0 doesn't have net interface"
> 
> These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
> 
> RDMATestCase _add_gids_per_port after the following
> 
> 	    if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>                 self.args.append([dev, port, idx, None])
>                 continue
> 
> In fact there is no such path which means it never finds an ip_addr for the device.

That isn't an acceptable way to find netdevs for a RDMA device..

This test is really buggy, that is not an acceptable way to find the
netdev for a RDMA device. Looks like it is some hacky way to read the
gid table? It should just read the gid table.. Edward?

> Did something change here? Do other RDMA devices have /sys/class/infiniband/XXX/device/net?

Yes, some will

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04  0:00 ` Jason Gunthorpe
@ 2020-11-04 10:40   ` Edward Srouji
  2020-11-04 11:47     ` Gal Pressman
  2020-11-04 12:36     ` Jason Gunthorpe
  0 siblings, 2 replies; 10+ messages in thread
From: Edward Srouji @ 2020-11-04 10:40 UTC (permalink / raw)
  To: Jason Gunthorpe, Bob Pearson; +Cc: Leon Romanovsky, linux-rdma@vger.kernel.org


On 11/4/2020 2:00 AM, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
>> Since 5.10 some of the pyverbs tests are skipping with the warning
>> 	"Device rxe_0 doesn't have net interface"
>>
>> These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
>>
>> RDMATestCase _add_gids_per_port after the following
>>
>> 	    if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>>                  self.args.append([dev, port, idx, None])
>>                  continue
>>
>> In fact there is no such path which means it never finds an ip_addr for the device.
> That isn't an acceptable way to find netdevs for a RDMA device..
>
> This test is really buggy, that is not an acceptable way to find the
> netdev for a RDMA device. Looks like it is some hacky way to read the
> gid table? It should just read the gid table.. Edward?

GID table is not the reason. We need the netdev in order to get the IP 
address of the interface.

Do you have a better alternative suggestion to do that?

>> Did something change here? Do other RDMA devices have /sys/class/infiniband/XXX/device/net?
> Yes, some will

Nothing really changed in this area lately (in pyverbs / rdma-core tests).

RXE can also have a netdev here if it's linked to one. E.g. by doing 
"rdma link add <rxe_devname> type rxe netdev <net_devname>"

>
> Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04 10:40   ` Edward Srouji
@ 2020-11-04 11:47     ` Gal Pressman
  2020-11-04 17:46       ` Bob Pearson
  2020-11-04 12:36     ` Jason Gunthorpe
  1 sibling, 1 reply; 10+ messages in thread
From: Gal Pressman @ 2020-11-04 11:47 UTC (permalink / raw)
  To: Edward Srouji, Jason Gunthorpe, Bob Pearson
  Cc: Leon Romanovsky, linux-rdma@vger.kernel.org

On 04/11/2020 12:40, Edward Srouji wrote:
> On 11/4/2020 2:00 AM, Jason Gunthorpe wrote:
>> On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
>>> Since 5.10 some of the pyverbs tests are skipping with the warning
>>>     "Device rxe_0 doesn't have net interface"
>>>
>>> These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
>>>
>>> RDMATestCase _add_gids_per_port after the following
>>>
>>>         if not
>>> os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>>>                  self.args.append([dev, port, idx, None])
>>>                  continue
>>>
>>> In fact there is no such path which means it never finds an ip_addr for the
>>> device.
>> That isn't an acceptable way to find netdevs for a RDMA device..
>>
>> This test is really buggy, that is not an acceptable way to find the
>> netdev for a RDMA device. Looks like it is some hacky way to read the
>> gid table? It should just read the gid table.. Edward?
> 
> GID table is not the reason. We need the netdev in order to get the IP address
> of the interface.
> 
> Do you have a better alternative suggestion to do that?
> 
>>> Did something change here? Do other RDMA devices have
>>> /sys/class/infiniband/XXX/device/net?
>> Yes, some will
> 
> Nothing really changed in this area lately (in pyverbs / rdma-core tests).
> 
> RXE can also have a netdev here if it's linked to one. E.g. by doing "rdma link
> add <rxe_devname> type rxe netdev <net_devname>"

Maybe it was changed in b27e504929d7 ("tests: Verify net interface support on
RDMATestCase"), which made the tests skip if the path doesn't exist, instead of
returning an error and failing the test.

How did these tests work for rxe before if the path doesn't exist?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04 10:40   ` Edward Srouji
  2020-11-04 11:47     ` Gal Pressman
@ 2020-11-04 12:36     ` Jason Gunthorpe
  2020-11-04 18:34       ` Edward Srouji
  1 sibling, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2020-11-04 12:36 UTC (permalink / raw)
  To: Edward Srouji; +Cc: Bob Pearson, Leon Romanovsky, linux-rdma@vger.kernel.org

On Wed, Nov 04, 2020 at 12:40:11PM +0200, Edward Srouji wrote:
> 
> On 11/4/2020 2:00 AM, Jason Gunthorpe wrote:
> > On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
> > > Since 5.10 some of the pyverbs tests are skipping with the warning
> > > 	"Device rxe_0 doesn't have net interface"
> > > 
> > > These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
> > > 
> > > RDMATestCase _add_gids_per_port after the following
> > > 
> > > 	    if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
> > >                  self.args.append([dev, port, idx, None])
> > >                  continue
> > > 
> > > In fact there is no such path which means it never finds an ip_addr for the device.
> > That isn't an acceptable way to find netdevs for a RDMA device..
> > 
> > This test is really buggy, that is not an acceptable way to find the
> > netdev for a RDMA device. Looks like it is some hacky way to read the
> > gid table? It should just read the gid table.. Edward?
> 
> GID table is not the reason. We need the netdev in order to get the IP
> address of the interface.

The GID table has a list of all the IP addresses of the IB device, and
all the netdevs that provide it

> > > Did something change here? Do other RDMA devices have /sys/class/infiniband/XXX/device/net?
> > Yes, some will
> 
> Nothing really changed in this area lately (in pyverbs / rdma-core tests).
> 
> RXE can also have a netdev here if it's linked to one. E.g. by doing "rdma
> link add <rxe_devname> type rxe netdev <net_devname>"

No it can't, this is the "parent" device and ib_device can never be a
parent of a netdev. rxe should have no parent.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:  pyverbs test regression
  2020-11-03 23:54 pyverbs test regression Bob Pearson
  2020-11-04  0:00 ` Jason Gunthorpe
@ 2020-11-04 13:53 ` Bernard Metzler
  2020-11-04 17:06   ` Bob Pearson
  1 sibling, 1 reply; 10+ messages in thread
From: Bernard Metzler @ 2020-11-04 13:53 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma@vger.kernel.org

-----"Bob Pearson" <rpearsonhpe@gmail.com> wrote: -----

>To: "Jason Gunthorpe" <jgg@nvidia.com>, "Leon Romanovsky"
><leon@kernel.org>
>From: "Bob Pearson" <rpearsonhpe@gmail.com>
>Date: 11/04/2020 12:55AM
>Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>Subject: [EXTERNAL] pyverbs test regression
>
>Since 5.10 some of the pyverbs tests are skipping with the warning
>	"Device rxe_0 doesn't have net interface"
>
>These occur in tests/test_rdmacm.py. As far as I can tell the error
>occurs in
>
>RDMATestCase _add_gids_per_port after the following
>
>	    if not
>os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>                self.args.append([dev, port, idx, None])
>                continue
>
>In fact there is no such path which means it never finds an ip_addr
>for the device.
>
>Did something change here? Do other RDMA devices have
>/sys/class/infiniband/XXX/device/net?
>

Hmm, with 5.10.0-rc1, I still see it for both rdma_rxe and siw. 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04 13:53 ` Bernard Metzler
@ 2020-11-04 17:06   ` Bob Pearson
  0 siblings, 0 replies; 10+ messages in thread
From: Bob Pearson @ 2020-11-04 17:06 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma@vger.kernel.org

On 11/4/20 7:53 AM, Bernard Metzler wrote:
> -----"Bob Pearson" <rpearsonhpe@gmail.com> wrote: -----
> 
>> To: "Jason Gunthorpe" <jgg@nvidia.com>, "Leon Romanovsky"
>> <leon@kernel.org>
>> From: "Bob Pearson" <rpearsonhpe@gmail.com>
>> Date: 11/04/2020 12:55AM
>> Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>> Subject: [EXTERNAL] pyverbs test regression
>>
>> Since 5.10 some of the pyverbs tests are skipping with the warning
>> 	"Device rxe_0 doesn't have net interface"
>>
>> These occur in tests/test_rdmacm.py. As far as I can tell the error
>> occurs in
>>
>> RDMATestCase _add_gids_per_port after the following
>>
>> 	    if not
>> os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>>                self.args.append([dev, port, idx, None])
>>                continue
>>
>> In fact there is no such path which means it never finds an ip_addr
>> for the device.
>>
>> Did something change here? Do other RDMA devices have
>> /sys/class/infiniband/XXX/device/net?
>>
> 
> Hmm, with 5.10.0-rc1, I still see it for both rdma_rxe and siw. 
> 

Bernard,

	The script I use to setup the rxe device is

export LD_LIBRARY_PATH=/home/rpearson/src/rdma-core/build/lib/
sudo ip link set dev enp6s0 mtu 4500
sudo ip addr add dev enp6s0 scope link fe80::b62e:99ff:fef9:fa2e/64
sudo rdma link add rxe_0 type rxe netdev enp6s0

	After running this the rxe device is functional but I get

rpearson$ ls /sys/class/infiniband/rxe_0
fw_ver     node_guid  parent  power      sys_image_guid
node_desc  node_type  ports   subsystem  uevent

	with no 'device'. How are you seeing 'device'? We should be running the same bits.

Bob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04 11:47     ` Gal Pressman
@ 2020-11-04 17:46       ` Bob Pearson
  0 siblings, 0 replies; 10+ messages in thread
From: Bob Pearson @ 2020-11-04 17:46 UTC (permalink / raw)
  To: Gal Pressman, Edward Srouji, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma@vger.kernel.org

On 11/4/20 5:47 AM, Gal Pressman wrote:
> On 04/11/2020 12:40, Edward Srouji wrote:
>> On 11/4/2020 2:00 AM, Jason Gunthorpe wrote:
>>> On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
>>>> Since 5.10 some of the pyverbs tests are skipping with the warning
>>>>     "Device rxe_0 doesn't have net interface"
>>>>
>>>> These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
>>>>
>>>> RDMATestCase _add_gids_per_port after the following
>>>>
>>>>         if not
>>>> os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>>>>                  self.args.append([dev, port, idx, None])
>>>>                  continue
>>>>
>>>> In fact there is no such path which means it never finds an ip_addr for the
>>>> device.
>>> That isn't an acceptable way to find netdevs for a RDMA device..
>>>
>>> This test is really buggy, that is not an acceptable way to find the
>>> netdev for a RDMA device. Looks like it is some hacky way to read the
>>> gid table? It should just read the gid table.. Edward?
>>
>> GID table is not the reason. We need the netdev in order to get the IP address
>> of the interface.
>>
>> Do you have a better alternative suggestion to do that?
>>
>>>> Did something change here? Do other RDMA devices have
>>>> /sys/class/infiniband/XXX/device/net?
>>> Yes, some will
>>
>> Nothing really changed in this area lately (in pyverbs / rdma-core tests).
>>
>> RXE can also have a netdev here if it's linked to one. E.g. by doing "rdma link
>> add <rxe_devname> type rxe netdev <net_devname>"
> 
> Maybe it was changed in b27e504929d7 ("tests: Verify net interface support on
> RDMATestCase"), which made the tests skip if the path doesn't exist, instead of
> returning an error and failing the test.
> 
> How did these tests work for rxe before if the path doesn't exist?
> 

I wasn't really focused on this area so I only have a vague recollection that I wasn't
getting errors and I wasn't skipping these tests but I can't swear to it. From my point of
view there was clearly a netdev (enp6s0) with several IP addresses (one IPV4 and five IPV6).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04 12:36     ` Jason Gunthorpe
@ 2020-11-04 18:34       ` Edward Srouji
  2020-11-04 18:43         ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Edward Srouji @ 2020-11-04 18:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Bob Pearson, Leon Romanovsky, linux-rdma@vger.kernel.org


On 11/4/2020 2:36 PM, Jason Gunthorpe wrote:
> On Wed, Nov 04, 2020 at 12:40:11PM +0200, Edward Srouji wrote:
>> On 11/4/2020 2:00 AM, Jason Gunthorpe wrote:
>>> On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
>>>> Since 5.10 some of the pyverbs tests are skipping with the warning
>>>> 	"Device rxe_0 doesn't have net interface"
>>>>
>>>> These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
>>>>
>>>> RDMATestCase _add_gids_per_port after the following
>>>>
>>>> 	    if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
>>>>                   self.args.append([dev, port, idx, None])
>>>>                   continue
>>>>
>>>> In fact there is no such path which means it never finds an ip_addr for the device.
>>> That isn't an acceptable way to find netdevs for a RDMA device..
>>>
>>> This test is really buggy, that is not an acceptable way to find the
>>> netdev for a RDMA device. Looks like it is some hacky way to read the
>>> gid table? It should just read the gid table.. Edward?
>> GID table is not the reason. We need the netdev in order to get the IP
>> address of the interface.
> The GID table has a list of all the IP addresses of the IB device, and
> all the netdevs that provide it

Then how can you get the IP address via verbs API(s)? AFAIK, the 
gid_entry does not hold the IP addresses, you can only get the subnet 
prefix, don't you?

>
>>>> Did something change here? Do other RDMA devices have /sys/class/infiniband/XXX/device/net?
>>> Yes, some will
>> Nothing really changed in this area lately (in pyverbs / rdma-core tests).
>>
>> RXE can also have a netdev here if it's linked to one. E.g. by doing "rdma
>> link add <rxe_devname> type rxe netdev <net_devname>"
> No it can't, this is the "parent" device and ib_device can never be a
> parent of a netdev. rxe should have no parent.
>
> Jason
Edward

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pyverbs test regression
  2020-11-04 18:34       ` Edward Srouji
@ 2020-11-04 18:43         ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2020-11-04 18:43 UTC (permalink / raw)
  To: Edward Srouji; +Cc: Bob Pearson, Leon Romanovsky, linux-rdma@vger.kernel.org

On Wed, Nov 04, 2020 at 08:34:37PM +0200, Edward Srouji wrote:
> 
> On 11/4/2020 2:36 PM, Jason Gunthorpe wrote:
> > On Wed, Nov 04, 2020 at 12:40:11PM +0200, Edward Srouji wrote:
> > > On 11/4/2020 2:00 AM, Jason Gunthorpe wrote:
> > > > On Tue, Nov 03, 2020 at 05:54:58PM -0600, Bob Pearson wrote:
> > > > > Since 5.10 some of the pyverbs tests are skipping with the warning
> > > > > 	"Device rxe_0 doesn't have net interface"
> > > > > 
> > > > > These occur in tests/test_rdmacm.py. As far as I can tell the error occurs in
> > > > > 
> > > > > RDMATestCase _add_gids_per_port after the following
> > > > > 
> > > > > 	    if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):
> > > > >                   self.args.append([dev, port, idx, None])
> > > > >                   continue
> > > > > 
> > > > > In fact there is no such path which means it never finds an ip_addr for the device.
> > > > That isn't an acceptable way to find netdevs for a RDMA device..
> > > > 
> > > > This test is really buggy, that is not an acceptable way to find the
> > > > netdev for a RDMA device. Looks like it is some hacky way to read the
> > > > gid table? It should just read the gid table.. Edward?
> > > GID table is not the reason. We need the netdev in order to get the IP
> > > address of the interface.
> > The GID table has a list of all the IP addresses of the IB device, and
> > all the netdevs that provide it
> 
> Then how can you get the IP address via verbs API(s)? AFAIK, the gid_entry
> does not hold the IP addresses, you can only get the subnet prefix, don't
> you?

IIRC the GID encodes IPv6 as IPv6 and IPv4 as an IPv6 mapped address

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-04 18:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-03 23:54 pyverbs test regression Bob Pearson
2020-11-04  0:00 ` Jason Gunthorpe
2020-11-04 10:40   ` Edward Srouji
2020-11-04 11:47     ` Gal Pressman
2020-11-04 17:46       ` Bob Pearson
2020-11-04 12:36     ` Jason Gunthorpe
2020-11-04 18:34       ` Edward Srouji
2020-11-04 18:43         ` Jason Gunthorpe
2020-11-04 13:53 ` Bernard Metzler
2020-11-04 17:06   ` Bob Pearson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox