Netdev List
 help / color / mirror / Atom feed
* Re: [net-ipv4] question about arguments position
From: David Miller @ 2017-05-04 16:46 UTC (permalink / raw)
  To: garsilva; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20170504110755.Horde.9md-X4baf8TwOEKjGrXkelZ@gator4166.hostgator.com>

From: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
Date: Thu, 04 May 2017 11:07:54 -0500

> While looking into Coverity ID 1357474 I ran into the following piece
> of code at net/ipv4/inet_diag.c:392:

Because it's been this way since at least 2005, it doesn't matter if
the order is correct or not.  What's there is the locked in behavior
exposed to userspace and changing it will break things for people.

^ permalink raw reply

* Fw: [Bug 195217] siocsifflags - irda doesn't work (MCS7780)
From: Stephen Hemminger @ 2017-05-04 16:48 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: netdev

Apparently IRDA is broken by VMAP_STACK

Begin forwarded message:

Date: Thu, 04 May 2017 12:16:15 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 195217] siocsifflags - irda doesn't work (MCS7780)


https://bugzilla.kernel.org/show_bug.cgi?id=195217

Jerome (jb_69100@yahoo.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Other                       |USB
     Kernel Version|4.9.X, 4.10.X, 4.11-rcX     |4.9.0 --> 4.11.0
            Product|Networking                  |Drivers
           Severity|blocking                    |high

--- Comment #2 from Jerome (jb_69100@yahoo.com) ---
Hi,

If I compile the kernel by setting CONFIG_VMAP_STACK = N, "ifconfig irda0 up"
is working correctly. (Tested with kernel 4.11.0)

I suppose the problem comes either from the MCS7780 driver (use the stack not
"in the rules"?) Or from VMAP_STACK.

Can you help me better understand the problem?

Regards,
Jerome.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: Lino Sanfilippo @ 2017-05-04 16:48 UTC (permalink / raw)
  To: Pavel Belous, David S . Miller; +Cc: netdev, David Arcari
In-Reply-To: <90666578f043b366313ddd90ffad86de42d890f2.1493914743.git.pavel.belous@aquantia.com>

Hi Pavel,

On 04.05.2017 18:33, Pavel Belous wrote:
> From: Pavel Belous <pavel.belous@aquantia.com>
> 
> This patch fixes the crash that happens when driver tries to collect statistics
> from already released "aq_vec" object.
> 
> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
> ---
>  drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> index cdb0299..3a32573 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> @@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
>  	count = 0U;
>  
>  	for (i = 0U, aq_vec = self->aq_vec[0];
> -		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
> +		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>  		data += count;
>  		aq_vec_get_sw_stats(aq_vec, data, &count);
>  	}
> @@ -961,6 +961,7 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
>  		if (self->aq_vec[i])
>  			aq_vec_free(self->aq_vec[i]);
> +			self->aq_vec[i] = NULL;
>  	}
>  
>  err_exit:;
> 

if the driver does not support statistics when the interface is down, would not it be clearer
to check if netif_running() in get_stats() instead?

Regards,
Lino

^ permalink raw reply

* struct ip vs struct iphdr
From: Oleg @ 2017-05-04 16:42 UTC (permalink / raw)
  To: netdev

  Hi, all.

It seems struct ip and struct iphdr are similar: struct ip, despite of
it name, doesn't contain anything but ip header.

So, my noob question, what is the difference between them?

Thanks.

-- 
Олег Неманов (Oleg Nemanov)

^ permalink raw reply

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: David Miller @ 2017-05-04 16:51 UTC (permalink / raw)
  To: LinoSanfilippo; +Cc: Pavel.Belous, netdev, darcari
In-Reply-To: <0babf800-080c-96e0-4dbb-8d3fca3fb784@gmx.de>

From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Thu, 4 May 2017 18:48:12 +0200

> Hi Pavel,
> 
> On 04.05.2017 18:33, Pavel Belous wrote:
>> From: Pavel Belous <pavel.belous@aquantia.com>
>> 
>> This patch fixes the crash that happens when driver tries to collect statistics
>> from already released "aq_vec" object.
>> 
>> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
>> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
>> ---
>>  drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> index cdb0299..3a32573 100644
>> --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> @@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
>>  	count = 0U;
>>  
>>  	for (i = 0U, aq_vec = self->aq_vec[0];
>> -		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>> +		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>>  		data += count;
>>  		aq_vec_get_sw_stats(aq_vec, data, &count);
>>  	}
>> @@ -961,6 +961,7 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
>>  		if (self->aq_vec[i])
>>  			aq_vec_free(self->aq_vec[i]);
>> +			self->aq_vec[i] = NULL;
>>  	}
>>  
>>  err_exit:;
>> 
> 
> if the driver does not support statistics when the interface is down, would not it be clearer
> to check if netif_running() in get_stats() instead?

Yes, much cleaner.

Much better would be to have a cached software copy so that statistics
can be reported regardless of whether the device is down or not.

^ permalink raw reply

* Re: [Patch net] ipv6: initialize route null entry in addrconf_init()
From: David Miller @ 2017-05-04 16:51 UTC (permalink / raw)
  To: andreyknvl; +Cc: xiyou.wangcong, netdev, dsahern
In-Reply-To: <CAAeHK+zmMUvt4Lz28k3Oyxi1QFmrTMBe34=6xJDBPxkeivnPeA@mail.gmail.com>

From: Andrey Konovalov <andreyknvl@google.com>
Date: Thu, 4 May 2017 14:28:37 +0200

> On Thu, May 4, 2017 at 7:07 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
>> since it is always NULL.
>>
>> This is clearly wrong, we have code to initialize it to loopback_dev,
>> unfortunately the order is still not correct.
>>
>> loopback_dev is registered very early during boot, we lose a chance
>> to re-initialize it in notifier. addrconf_init() is called after
>> ip6_route_init(), which means we have no chance to correct it.
>>
>> Fix it by moving this initialization explicitly after
>> ipv6_add_dev(init_net.loopback_dev) in addrconf_init().
>>
>> Reported-by: Andrey Konovalov <andreyknvl@google.com>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> Hi Cong,
> 
> This fixes the bug triggered by my reproducer.
> 
> Thanks!
> 
> Tested-by: Andrey Konovalov <andreyknvl@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: struct ip vs struct iphdr
From: Sowmini Varadhan @ 2017-05-04 16:52 UTC (permalink / raw)
  To: Oleg; +Cc: netdev
In-Reply-To: <20170504164219.GA25287@legohost>

On (05/04/17 19:42), Oleg wrote:
> 
>   Hi, all.
> 
> It seems struct ip and struct iphdr are similar: struct ip, despite of
> it name, doesn't contain anything but ip header.
> 
> So, my noob question, what is the difference between them?
> 
> Thanks.

BSD vs linux?

struct ip is a BSD-ism, intended to be used if you were porting
some BSD app.

--Sowmini

^ permalink raw reply

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: David Arcari @ 2017-05-04 17:00 UTC (permalink / raw)
  To: Pavel Belous, David S . Miller; +Cc: netdev
In-Reply-To: <90666578f043b366313ddd90ffad86de42d890f2.1493914743.git.pavel.belous@aquantia.com>

Hi Pavel,

On 05/04/2017 12:33 PM, Pavel Belous wrote:
> From: Pavel Belous <pavel.belous@aquantia.com>
> 
> This patch fixes the crash that happens when driver tries to collect statistics
> from already released "aq_vec" object.
> 
> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
> ---
>  drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> index cdb0299..3a32573 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> @@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
>  	count = 0U;
>  
>  	for (i = 0U, aq_vec = self->aq_vec[0];
> -		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
> +		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>  		data += count;
>  		aq_vec_get_sw_stats(aq_vec, data, &count);
>  	}
> @@ -961,6 +961,7 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
>  		if (self->aq_vec[i])
>  			aq_vec_free(self->aq_vec[i]);
> +			self->aq_vec[i] = NULL;

I think you intended to to add { } to the if statement.  The code compiles as
is, but the indentation is not correct.

-DA

>  	}
>  
>  err_exit:;
> 

^ permalink raw reply

* Why do we need MSG_SENDPAGE_NOTLAST?
From: Ilya Lesokhin @ 2017-05-04 17:03 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: tls-fpga-sw-dev, Dave Watson

I don't understand the need for MSG_SENDPAGE_NOTLAST and I'm hoping someone can enlighten me.

According to commit 35f9c09 ('tcp: tcp_sendpages() should call tcp_push() once'):
"We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall."

I don't understand why we need to differentiate between the user setting MSG_MORE 
and splice indicating that more data is going to be sent.
if the user passed MSG_MORE and didn't push any extra data, isn't it the users fault? 
Do we need it because poorly written applications were broken when 
MSG_MORE was added to tcp_sendpage? Or is there a deeper reason?

The reason I'm asking is that we are working on a kernel TLS implementation 
and I would like to know if we can coalesce multiple tls_sendpage calls with MSG_MORE into a single
tls record or whether we must push out the record as soon as MSG_SENDPAGE_NOTLAST is cleared?

Thanks,
Ilya

^ permalink raw reply

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: Pavel Belous @ 2017-05-04 17:08 UTC (permalink / raw)
  To: David Arcari, David S . Miller; +Cc: netdev
In-Reply-To: <50047294-4f40-e8ae-0982-5720b87019a9@redhat.com>



On 04.05.2017 20:00, David Arcari wrote:
> Hi Pavel,
>
> On 05/04/2017 12:33 PM, Pavel Belous wrote:
>> From: Pavel Belous <pavel.belous@aquantia.com>
>>
>> This patch fixes the crash that happens when driver tries to collect statistics
>> from already released "aq_vec" object.
>>
>> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
>> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
>> ---
>>  drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> index cdb0299..3a32573 100644
>> --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> @@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
>>  	count = 0U;
>>
>>  	for (i = 0U, aq_vec = self->aq_vec[0];
>> -		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>> +		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>>  		data += count;
>>  		aq_vec_get_sw_stats(aq_vec, data, &count);
>>  	}
>> @@ -961,6 +961,7 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
>>  		if (self->aq_vec[i])
>>  			aq_vec_free(self->aq_vec[i]);
>> +			self->aq_vec[i] = NULL;
>
> I think you intended to to add { } to the if statement.  The code compiles as
> is, but the indentation is not correct.
>
> -DA

Oh. Sorry about that. I did not see the loss of braces during merge.
I will prepare another patch with Lino and David M. comments.

Regards,
Pavel.


>
>>  	}
>>
>>  err_exit:;
>>
>

^ permalink raw reply

* Re: struct ip vs struct iphdr
From: Girish Moodalbail @ 2017-05-04 17:08 UTC (permalink / raw)
  To: Oleg, netdev
In-Reply-To: <20170504164219.GA25287@legohost>

On 5/4/17 9:42 AM, Oleg wrote:
>   Hi, all.
>
> It seems struct ip and struct iphdr are similar: struct ip, despite of
> it name, doesn't contain anything but ip header.
>
> So, my noob question, what is the difference between them?

Also, see this:

http://stackoverflow.com/questions/42840636/difference-between-struct-ip-and-struct-iphdr

>
> Thanks.
>

^ permalink raw reply

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: Pavel Belous @ 2017-05-04 17:09 UTC (permalink / raw)
  To: David Miller, LinoSanfilippo; +Cc: netdev, darcari
In-Reply-To: <20170504.125113.569969253752966231.davem@davemloft.net>



On 04.05.2017 19:51, David Miller wrote:
> From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Date: Thu, 4 May 2017 18:48:12 +0200
>
>> Hi Pavel,
>>
>> On 04.05.2017 18:33, Pavel Belous wrote:
>>> From: Pavel Belous <pavel.belous@aquantia.com>
>>>
>>> This patch fixes the crash that happens when driver tries to collect statistics
>>> from already released "aq_vec" object.
>>>
>>> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
>>> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
>>> ---
>>>  drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>>> index cdb0299..3a32573 100644
>>> --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>>> @@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
>>>  	count = 0U;
>>>
>>>  	for (i = 0U, aq_vec = self->aq_vec[0];
>>> -		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>>> +		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>>>  		data += count;
>>>  		aq_vec_get_sw_stats(aq_vec, data, &count);
>>>  	}
>>> @@ -961,6 +961,7 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>>>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
>>>  		if (self->aq_vec[i])
>>>  			aq_vec_free(self->aq_vec[i]);
>>> +			self->aq_vec[i] = NULL;
>>>  	}
>>>
>>>  err_exit:;
>>>
>>
>> if the driver does not support statistics when the interface is down, would not it be clearer
>> to check if netif_running() in get_stats() instead?
>
> Yes, much cleaner.
>
> Much better would be to have a cached software copy so that statistics
> can be reported regardless of whether the device is down or not.
>

Thank you.
I will think about how to do it better.

Regards,
Pavel

^ permalink raw reply

* Re: [Patch net] ipv6: initialize route null entry in addrconf_init()
From: David Ahern @ 2017-05-04 17:12 UTC (permalink / raw)
  To: David Miller, andreyknvl, xiyou.wangcong; +Cc: netdev
In-Reply-To: <20170504.125144.78973292337920988.davem@davemloft.net>

On 5/4/17 10:51 AM, David Miller wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> Date: Thu, 4 May 2017 14:28:37 +0200
> 
>> On Thu, May 4, 2017 at 7:07 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
>>> since it is always NULL.
>>>
>>> This is clearly wrong, we have code to initialize it to loopback_dev,
>>> unfortunately the order is still not correct.
>>>
>>> loopback_dev is registered very early during boot, we lose a chance
>>> to re-initialize it in notifier. addrconf_init() is called after
>>> ip6_route_init(), which means we have no chance to correct it.
>>>
>>> Fix it by moving this initialization explicitly after
>>> ipv6_add_dev(init_net.loopback_dev) in addrconf_init().
>>>
>>> Reported-by: Andrey Konovalov <andreyknvl@google.com>
>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>
>> Hi Cong,
>>
>> This fixes the bug triggered by my reproducer.
>>
>> Thanks!
>>
>> Tested-by: Andrey Konovalov <andreyknvl@google.com>
> 
> Applied and queued up for -stable, thanks.
> 

This is not the complete solution; it only fixes init_net. It still
blows up when you do:

unshare -n
./rt6_device_match


same exact stack trace

^ permalink raw reply

* Re: [PATCH] cfg80211: make RATE_INFO_BW_20 the default
From: David Miller @ 2017-05-04 17:15 UTC (permalink / raw)
  To: johannes; +Cc: linux-wireless, torvalds, netdev, johannes.berg
In-Reply-To: <20170504064230.22058-1-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Thu,  4 May 2017 08:42:30 +0200

> From: Johannes Berg <johannes.berg@intel.com>
> 
> Due to the way I did the RX bitrate conversions in mac80211 with
> spatch, going setting flags to setting the value, many drivers now
> don't set the bandwidth value for 20 MHz, since with the flags it
> wasn't necessary to (there was no 20 MHz flag, only the others.)
> 
> Rather than go through and try to fix up all the drivers, instead
> renumber the enum so that 20 MHz, which is the typical bandwidth,
> actually has the value 0, making those drivers all work again.
> 
> If VHT was hit used with a driver not reporting it, e.g. iwlmvm,
> this manifested in hitting the bandwidth warning in
> cfg80211_calculate_bitrate_vht().
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Since Jens Axboe had the same problem and tested this patch, I'm
tossing it into my tree.

Just FYI...

^ permalink raw reply

* Re: [net-ipv4] question about arguments position
From: Gustavo A. R. Silva @ 2017-05-04 16:56 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20170504.124600.627647229351638856.davem@davemloft.net>

Hi David,

Quoting David Miller <davem@davemloft.net>:

> From: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
> Date: Thu, 04 May 2017 11:07:54 -0500
>
>> While looking into Coverity ID 1357474 I ran into the following piece
>> of code at net/ipv4/inet_diag.c:392:
>
> Because it's been this way since at least 2005, it doesn't matter if
> the order is correct or not.  What's there is the locked in behavior
> exposed to userspace and changing it will break things for people.

Oh, I see.

Thanks for clarifying

^ permalink raw reply

* Re: [Patch net] ipv6: initialize route null entry in addrconf_init()
From: Cong Wang @ 2017-05-04 17:19 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Andrey Konovalov, Linux Kernel Network Developers
In-Reply-To: <896e63a7-d0ad-a4db-21fc-9abdb02dd9df@gmail.com>

On Thu, May 4, 2017 at 10:12 AM, David Ahern <dsahern@gmail.com> wrote:
> On 5/4/17 10:51 AM, David Miller wrote:
>> From: Andrey Konovalov <andreyknvl@google.com>
>> Date: Thu, 4 May 2017 14:28:37 +0200
>>
>>> On Thu, May 4, 2017 at 7:07 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
>>>> since it is always NULL.
>>>>
>>>> This is clearly wrong, we have code to initialize it to loopback_dev,
>>>> unfortunately the order is still not correct.
>>>>
>>>> loopback_dev is registered very early during boot, we lose a chance
>>>> to re-initialize it in notifier. addrconf_init() is called after
>>>> ip6_route_init(), which means we have no chance to correct it.
>>>>
>>>> Fix it by moving this initialization explicitly after
>>>> ipv6_add_dev(init_net.loopback_dev) in addrconf_init().
>>>>
>>>> Reported-by: Andrey Konovalov <andreyknvl@google.com>
>>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>
>>> Hi Cong,
>>>
>>> This fixes the bug triggered by my reproducer.
>>>
>>> Thanks!
>>>
>>> Tested-by: Andrey Konovalov <andreyknvl@google.com>
>>
>> Applied and queued up for -stable, thanks.
>>
>
> This is not the complete solution; it only fixes init_net. It still
> blows up when you do:
>
> unshare -n
> ./rt6_device_match
>
>
> same exact stack trace

This is why I sent
[Patch net] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf

^ permalink raw reply

* Re: [Patch net] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
From: Cong Wang @ 2017-05-04 17:21 UTC (permalink / raw)
  To: David Ahern; +Cc: Linux Kernel Network Developers, Andrey Konovalov
In-Reply-To: <a127b7fe-82a7-53e9-8df4-704f1b295ada@gmail.com>

On Thu, May 4, 2017 at 7:04 AM, David Ahern <dsahern@gmail.com> wrote:
> On 5/3/17 11:07 PM, Cong Wang wrote:
>> For each netns (except init_net), we initialize its null entry
>> in 3 places:
>>
>> 1) The template itself, as we use kmemdup()
>> 2) Code around dst_init_metrics() in ip6_route_net_init()
>> 3) ip6_route_dev_notify(), which is supposed to initialize it after
>> loopback registers
>>
>> Unfortunately the last one still happens in a wrong order because
>> we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
>> net->loopback_dev's idev, so we have to do that after we add
>> idev to it. However, this notifier has priority == 0 same as
>> ipv6_dev_notf, and ipv6_dev_notf is registered after
>> ip6_route_dev_notifier so it is called actually after
>> ip6_route_dev_notifier.
>>
>> Fix it by specifying a smaller priority for ip6_route_dev_notifier.
>>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> ---
>>  net/ipv6/route.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 2f11366..4dbf7e2 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -4024,7 +4024,7 @@ static struct pernet_operations ip6_route_net_late_ops = {
>>
>>  static struct notifier_block ip6_route_dev_notifier = {
>>       .notifier_call = ip6_route_dev_notify,
>> -     .priority = 0,
>> +     .priority = -10, /* Must be called after addrconf_notify!! */
>>  };
>>
>>  void __init ip6_route_init_special_entries(void)
>>
>
> And I see a refcnt problem with this change:
>
> root@kenny-jessie2:~# unshare -n
> root@kenny-jessie2:~# logout
> root@kenny-jessie2:~# unshare -n
>
> Message from syslogd@kenny-jessie2 at May  4 07:04:38 ...
>  kernel:[   62.581552] unregister_netdevice: waiting for lo to become
> free. Usage count = 1

Ah, looks like we need to put the refcnt for UNREGISTER too.

Will send v2 to include your ADDRCONF_NOTIFY_PRIORITY suggestion.

^ permalink raw reply

* [PATCH] mac80211: Create ieee80211_if_process_skb from ieee80211_iface_work
From: Joe Perches @ 2017-05-04 17:30 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

This function is pretty long and the skb handling is a bit long too.
Create a new function just for the skb processing.

This isolates the code and reduces indentation a bit too.

No change in object size.

$ size net/mac80211/iface.o*
   text	   data	    bss	    dec	    hex	filename
  15736	     24	      0	  15760	   3d90	net/mac80211/iface.o.new
  15736	     24	      0	  15760	   3d90	net/mac80211/iface.o.old

Miscellanea:

o Use explicit casts to proper types instead of casts to (void *)
  and have the compiler do the implicit cast
o Rewrap comments

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/mac80211/iface.c | 253 ++++++++++++++++++++++++++-------------------------
 1 file changed, 127 insertions(+), 126 deletions(-)

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 3bd5b81f5d81..b51d3956feaa 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1230,145 +1230,125 @@ static void ieee80211_if_setup_no_queue(struct net_device *dev)
 	dev->priv_flags |= IFF_NO_QUEUE;
 }
 
-static void ieee80211_iface_work(struct work_struct *work)
+static void ieee80211_if_process_skb(struct sk_buff *skb,
+				     struct ieee80211_sub_if_data *sdata,
+				     struct ieee80211_local *local)
 {
-	struct ieee80211_sub_if_data *sdata =
-		container_of(work, struct ieee80211_sub_if_data, work);
-	struct ieee80211_local *local = sdata->local;
-	struct sk_buff *skb;
-	struct sta_info *sta;
 	struct ieee80211_ra_tid *ra_tid;
 	struct ieee80211_rx_agg *rx_agg;
-
-	if (!ieee80211_sdata_running(sdata))
-		return;
-
-	if (test_bit(SCAN_SW_SCANNING, &local->scanning))
-		return;
-
-	if (!ieee80211_can_run_worker(local))
-		return;
-
-	/* first process frames */
-	while ((skb = skb_dequeue(&sdata->skb_queue))) {
-		struct ieee80211_mgmt *mgmt = (void *)skb->data;
-
-		if (skb->pkt_type == IEEE80211_SDATA_QUEUE_AGG_START) {
-			ra_tid = (void *)&skb->cb;
-			ieee80211_start_tx_ba_cb(&sdata->vif, ra_tid->ra,
-						 ra_tid->tid);
-		} else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_AGG_STOP) {
-			ra_tid = (void *)&skb->cb;
-			ieee80211_stop_tx_ba_cb(&sdata->vif, ra_tid->ra,
-						ra_tid->tid);
-		} else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_RX_AGG_START) {
-			rx_agg = (void *)&skb->cb;
-			mutex_lock(&local->sta_mtx);
-			sta = sta_info_get_bss(sdata, rx_agg->addr);
-			if (sta)
-				__ieee80211_start_rx_ba_session(sta,
-						0, 0, 0, 1, rx_agg->tid,
-						IEEE80211_MAX_AMPDU_BUF,
-						false, true);
-			mutex_unlock(&local->sta_mtx);
-		} else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_RX_AGG_STOP) {
-			rx_agg = (void *)&skb->cb;
-			mutex_lock(&local->sta_mtx);
-			sta = sta_info_get_bss(sdata, rx_agg->addr);
-			if (sta)
-				__ieee80211_stop_rx_ba_session(sta,
-							rx_agg->tid,
-							WLAN_BACK_RECIPIENT, 0,
-							false);
-			mutex_unlock(&local->sta_mtx);
-		} else if (ieee80211_is_action(mgmt->frame_control) &&
-			   mgmt->u.action.category == WLAN_CATEGORY_BACK) {
-			int len = skb->len;
-
-			mutex_lock(&local->sta_mtx);
-			sta = sta_info_get_bss(sdata, mgmt->sa);
-			if (sta) {
-				switch (mgmt->u.action.u.addba_req.action_code) {
-				case WLAN_ACTION_ADDBA_REQ:
-					ieee80211_process_addba_request(
-							local, sta, mgmt, len);
-					break;
-				case WLAN_ACTION_ADDBA_RESP:
-					ieee80211_process_addba_resp(local, sta,
-								     mgmt, len);
-					break;
-				case WLAN_ACTION_DELBA:
-					ieee80211_process_delba(sdata, sta,
+	struct sta_info *sta;
+	struct ieee80211_mgmt *mgmt = (void *)skb->data;
+
+	if (skb->pkt_type == IEEE80211_SDATA_QUEUE_AGG_START) {
+		ra_tid = (struct ieee80211_ra_tid *)&skb->cb;
+		ieee80211_start_tx_ba_cb(&sdata->vif, ra_tid->ra, ra_tid->tid);
+	} else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_AGG_STOP) {
+		ra_tid = (struct ieee80211_ra_tid *)&skb->cb;
+		ieee80211_stop_tx_ba_cb(&sdata->vif, ra_tid->ra, ra_tid->tid);
+	} else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_RX_AGG_START) {
+		rx_agg = (struct ieee80211_rx_agg *)&skb->cb;
+		mutex_lock(&local->sta_mtx);
+		sta = sta_info_get_bss(sdata, rx_agg->addr);
+		if (sta)
+			__ieee80211_start_rx_ba_session(sta,
+							0, 0, 0, 1, rx_agg->tid,
+							IEEE80211_MAX_AMPDU_BUF,
+							false, true);
+		mutex_unlock(&local->sta_mtx);
+	} else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_RX_AGG_STOP) {
+		rx_agg = (struct ieee80211_rx_agg *)&skb->cb;
+		mutex_lock(&local->sta_mtx);
+		sta = sta_info_get_bss(sdata, rx_agg->addr);
+		if (sta)
+			__ieee80211_stop_rx_ba_session(sta, rx_agg->tid,
+						       WLAN_BACK_RECIPIENT, 0,
+						       false);
+		mutex_unlock(&local->sta_mtx);
+	} else if (ieee80211_is_action(mgmt->frame_control) &&
+		   mgmt->u.action.category == WLAN_CATEGORY_BACK) {
+		int len = skb->len;
+
+		mutex_lock(&local->sta_mtx);
+		sta = sta_info_get_bss(sdata, mgmt->sa);
+		if (sta) {
+			switch (mgmt->u.action.u.addba_req.action_code) {
+			case WLAN_ACTION_ADDBA_REQ:
+				ieee80211_process_addba_request(local, sta,
 								mgmt, len);
-					break;
-				default:
-					WARN_ON(1);
-					break;
-				}
-			}
-			mutex_unlock(&local->sta_mtx);
-		} else if (ieee80211_is_action(mgmt->frame_control) &&
-			   mgmt->u.action.category == WLAN_CATEGORY_VHT) {
-			switch (mgmt->u.action.u.vht_group_notif.action_code) {
-			case WLAN_VHT_ACTION_OPMODE_NOTIF: {
-				struct ieee80211_rx_status *status;
-				enum nl80211_band band;
-				u8 opmode;
-
-				status = IEEE80211_SKB_RXCB(skb);
-				band = status->band;
-				opmode = mgmt->u.action.u.vht_opmode_notif.operating_mode;
-
-				mutex_lock(&local->sta_mtx);
-				sta = sta_info_get_bss(sdata, mgmt->sa);
-
-				if (sta)
-					ieee80211_vht_handle_opmode(sdata, sta,
-								    opmode,
-								    band);
-
-				mutex_unlock(&local->sta_mtx);
 				break;
-			}
-			case WLAN_VHT_ACTION_GROUPID_MGMT:
-				ieee80211_process_mu_groups(sdata, mgmt);
+			case WLAN_ACTION_ADDBA_RESP:
+				ieee80211_process_addba_resp(local, sta,
+							     mgmt, len);
+				break;
+			case WLAN_ACTION_DELBA:
+				ieee80211_process_delba(sdata, sta, mgmt, len);
 				break;
 			default:
 				WARN_ON(1);
 				break;
 			}
-		} else if (ieee80211_is_data_qos(mgmt->frame_control)) {
-			struct ieee80211_hdr *hdr = (void *)mgmt;
-			/*
-			 * So the frame isn't mgmt, but frame_control
-			 * is at the right place anyway, of course, so
-			 * the if statement is correct.
-			 *
-			 * Warn if we have other data frame types here,
-			 * they must not get here.
-			 */
-			WARN_ON(hdr->frame_control &
-					cpu_to_le16(IEEE80211_STYPE_NULLFUNC));
-			WARN_ON(!(hdr->seq_ctrl &
-					cpu_to_le16(IEEE80211_SCTL_FRAG)));
-			/*
-			 * This was a fragment of a frame, received while
-			 * a block-ack session was active. That cannot be
-			 * right, so terminate the session.
-			 */
+		}
+		mutex_unlock(&local->sta_mtx);
+	} else if (ieee80211_is_action(mgmt->frame_control) &&
+		   mgmt->u.action.category == WLAN_CATEGORY_VHT) {
+		switch (mgmt->u.action.u.vht_group_notif.action_code) {
+		case WLAN_VHT_ACTION_OPMODE_NOTIF: {
+			struct ieee80211_rx_status *status;
+			enum nl80211_band band;
+			u8 opmode;
+
+			status = IEEE80211_SKB_RXCB(skb);
+			band = status->band;
+			opmode = mgmt->u.action.u.vht_opmode_notif.operating_mode;
+
 			mutex_lock(&local->sta_mtx);
 			sta = sta_info_get_bss(sdata, mgmt->sa);
-			if (sta) {
-				u16 tid = *ieee80211_get_qos_ctl(hdr) &
-						IEEE80211_QOS_CTL_TID_MASK;
-
-				__ieee80211_stop_rx_ba_session(
-					sta, tid, WLAN_BACK_RECIPIENT,
-					WLAN_REASON_QSTA_REQUIRE_SETUP,
-					true);
-			}
+
+			if (sta)
+				ieee80211_vht_handle_opmode(sdata, sta,
+							    opmode, band);
+
 			mutex_unlock(&local->sta_mtx);
-		} else switch (sdata->vif.type) {
+			break;
+		}
+		case WLAN_VHT_ACTION_GROUPID_MGMT:
+			ieee80211_process_mu_groups(sdata, mgmt);
+			break;
+		default:
+			WARN_ON(1);
+			break;
+		}
+	} else if (ieee80211_is_data_qos(mgmt->frame_control)) {
+		struct ieee80211_hdr *hdr = (void *)mgmt;
+		/*
+		 * So the frame isn't mgmt, but frame_control is at the right
+		 * place anyway, of course, so the if statement is correct.
+		 *
+		 * Warn if we have other data frame types here,
+		 * they must not get here.
+		 */
+		WARN_ON(hdr->frame_control &
+			cpu_to_le16(IEEE80211_STYPE_NULLFUNC));
+		WARN_ON(!(hdr->seq_ctrl & cpu_to_le16(IEEE80211_SCTL_FRAG)));
+		/*
+		 * This was a fragment of a frame,
+		 * received while a block-ack session was active.
+		 * That cannot be right, so terminate the session.
+		 */
+		mutex_lock(&local->sta_mtx);
+		sta = sta_info_get_bss(sdata, mgmt->sa);
+		if (sta) {
+			u16 tid = *ieee80211_get_qos_ctl(hdr) &
+				IEEE80211_QOS_CTL_TID_MASK;
+
+			__ieee80211_stop_rx_ba_session(sta, tid,
+						       WLAN_BACK_RECIPIENT,
+						       WLAN_REASON_QSTA_REQUIRE_SETUP,
+						       true);
+		}
+		mutex_unlock(&local->sta_mtx);
+	} else {
+		switch (sdata->vif.type) {
 		case NL80211_IFTYPE_STATION:
 			ieee80211_sta_rx_queued_mgmt(sdata, skb);
 			break;
@@ -1384,7 +1364,28 @@ static void ieee80211_iface_work(struct work_struct *work)
 			WARN(1, "frame for unexpected interface type");
 			break;
 		}
+	}
+}
+
+static void ieee80211_iface_work(struct work_struct *work)
+{
+	struct ieee80211_sub_if_data *sdata =
+		container_of(work, struct ieee80211_sub_if_data, work);
+	struct ieee80211_local *local = sdata->local;
+	struct sk_buff *skb;
+
+	if (!ieee80211_sdata_running(sdata))
+		return;
+
+	if (test_bit(SCAN_SW_SCANNING, &local->scanning))
+		return;
 
+	if (!ieee80211_can_run_worker(local))
+		return;
+
+	/* first process frames */
+	while ((skb = skb_dequeue(&sdata->skb_queue))) {
+		ieee80211_if_process_skb(skb, sdata, local);
 		kfree_skb(skb);
 	}
 
-- 
2.10.0.rc2.1.g053435c

^ permalink raw reply related

* Re: [PATCH] cfg80211: make RATE_INFO_BW_20 the default
From: Linus Torvalds @ 2017-05-04 17:33 UTC (permalink / raw)
  To: David Miller; +Cc: Johannes Berg, Linux Wireless List, Network Development
In-Reply-To: <20170504.112257.1601480484804301664.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Thu, May 4, 2017 at 8:22 AM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
> From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
>>
>> I figured I'd give Linus to a chance to try or even apply it, but I
>> have no objection to you applying it either. I don't have anything else
>>   yet right now, and sending a pull request for just a single patch
>> would be quite pointless.
>
> Ok, let's give Linus a chance to test the patch.

I'm having trouble recreating the warning. I have no idea why. It only
happened during ten minutes yesterday, and nothing in my wireless
setup has changed.

I wonder if *normally* my setup ends up connecting with a 40MHz band
or something, and I just happened to see the default uninitialized
case once.

I see that Jens reported that the patch works, although I'm wondering
how repeatable it was for him.  The patch obviously looks simple and
seems like an obviously GoodThing(tm) regardless.

                   Linus

^ permalink raw reply

* [Patch net v2] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
From: Cong Wang @ 2017-05-04 17:36 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, dsahern, Cong Wang

For each netns (except init_net), we initialize its null entry
in 3 places:

1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
   loopback registers

Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, so we have to do that after we add
idev to it. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier.

Fix it by picking a smaller priority for ip6_route_dev_notifier.
Also, we have to release the refcnt accordingly when unregistering
loopback_dev because device exit functions are called before subsys
exit functions.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/addrconf.h |  2 ++
 net/ipv6/addrconf.c    |  1 +
 net/ipv6/route.c       | 13 +++++++++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 1aeb25d..6c0ee3c 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -20,6 +20,8 @@
 #define ADDRCONF_TIMER_FUZZ		(HZ / 4)
 #define ADDRCONF_TIMER_FUZZ_MAX		(HZ)
 
+#define ADDRCONF_NOTIFY_PRIORITY	0
+
 #include <linux/in.h>
 #include <linux/in6.h>
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 77a4bd5..8d297a7 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3548,6 +3548,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
  */
 static struct notifier_block ipv6_dev_notf = {
 	.notifier_call = addrconf_notify,
+	.priority = ADDRCONF_NOTIFY_PRIORITY,
 };
 
 static void addrconf_type_change(struct net_device *dev, unsigned long event)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2f11366..dc61b0b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3709,7 +3709,10 @@ static int ip6_route_dev_notify(struct notifier_block *this,
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	struct net *net = dev_net(dev);
 
-	if (event == NETDEV_REGISTER && (dev->flags & IFF_LOOPBACK)) {
+	if (!(dev->flags & IFF_LOOPBACK))
+		return NOTIFY_OK;
+
+	if (event == NETDEV_REGISTER) {
 		net->ipv6.ip6_null_entry->dst.dev = dev;
 		net->ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(dev);
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
@@ -3718,6 +3721,12 @@ static int ip6_route_dev_notify(struct notifier_block *this,
 		net->ipv6.ip6_blk_hole_entry->dst.dev = dev;
 		net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev);
 #endif
+	 } else if (event == NETDEV_UNREGISTER) {
+		in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev);
+#ifdef CONFIG_IPV6_MULTIPLE_TABLES
+		in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev);
+		in6_dev_put(net->ipv6.ip6_blk_hole_entry->rt6i_idev);
+#endif
 	}
 
 	return NOTIFY_OK;
@@ -4024,7 +4033,7 @@ static struct pernet_operations ip6_route_net_late_ops = {
 
 static struct notifier_block ip6_route_dev_notifier = {
 	.notifier_call = ip6_route_dev_notify,
-	.priority = 0,
+	.priority = ADDRCONF_NOTIFY_PRIORITY - 10,
 };
 
 void __init ip6_route_init_special_entries(void)
-- 
2.5.5

^ permalink raw reply related

* Re: [net-ipv4] question about arguments position
From: Joe Perches @ 2017-05-04 17:47 UTC (permalink / raw)
  To: David Miller, garsilva
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20170504.124600.627647229351638856.davem@davemloft.net>

On Thu, 2017-05-04 at 12:46 -0400, David Miller wrote:
> From: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
> Date: Thu, 04 May 2017 11:07:54 -0500
> 
> > While looking into Coverity ID 1357474 I ran into the following piece
> > of code at net/ipv4/inet_diag.c:392:
> 
> Because it's been this way since at least 2005, it doesn't matter if
> the order is correct or not.  What's there is the locked in behavior
> exposed to userspace and changing it will break things for people.

Adding a few comments around the code about why
it is this way will help avoid future questions.

^ permalink raw reply

* Re: [RFC] iproute: Add support for extended ack to rtnl_talk
From: Leon Romanovsky @ 2017-05-04 17:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Daniel Borkmann, netdev
In-Reply-To: <20170504094558.78933e60@xeon-e3>

[-- Attachment #1: Type: text/plain, Size: 2617 bytes --]

On Thu, May 04, 2017 at 09:45:58AM -0700, Stephen Hemminger wrote:
> On Thu, 4 May 2017 17:37:38 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
>
> > On Thu, May 04, 2017 at 11:36:36AM +0200, Daniel Borkmann wrote:
> > > On 05/04/2017 01:56 AM, Stephen Hemminger wrote:
> > > > Add support for extended ack error reporting via libmnl. This
> > > > is a better alternative to use existing library and not copy/paste
> > > > code from the kernel. Also make arguments const where possible.
> > > >
> > > > Add a new function rtnl_talk_extack that takes a callback as an input
> > > > arg. If a netlink response contains extack attributes, the callback is
> > > > is invoked with the the err string, offset in the message and a pointer
> > > > to the message returned by the kernel.
> > > >
> > > > Adding a new function allows commands to be moved over to the
> > > > extended error reporting over time.
> > > >
> > > > For feedback, compile tested only.
> > >
> > > Just out of curiosity, what is the plan regarding converting iproute2
> > > over to libmnl (ip, tc, ss, ...)? In 2015, tipc tool was the first
> > > user merged that requires libmnl, the only other user today in the
> > > tree is devlink, which even seems to define its own libmnl library
> > > helpers. What is the clear benefit/rationale of outsourcing this to
> > > libmnl? I always was the impression we should strive for as little
> > > dependencies as possible?
> >
> > And I would like to get direction for the RDMA tool [1] which I'm
> > working on it now.
> >
> > The overall decision was to use netlink and put it under iproute2
> > umbrella. Currently, I have working RFC which is based on
> > legacy sysfs interface to ensure that we are converging on
> > user-experience even before moving to actual netlink defines.
> >
> > An I would like to continue to work on netlink interface, but which lib interface
> > should I need to base rdmatool's netlink code?
> >
> > [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg148523.html
> >
> > >
> > > I don't really like that we make extended ack reporting now dependent
> > > on libmnl, which further diverts from iproute's native nl library vs
> > > requiring to install another nl library, making the current status
> > > quo even worse ... :/
> > >
> > > Thanks,
> > > Daniel
>
> I would prefer new code use libmnl, but using libnetlink would also be ok.
> Any later conversion to libmnl would be mostly automated anyway.

Thanks, I'm copy/pasting devlink variation of libmnl :)

>
> The real objection was copy/pasting in the kernel netlink parser.
> That was unnecessary bloat.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [RFC iproute2 0/8] RDMA tool
From: Leon Romanovsky @ 2017-05-04 18:02 UTC (permalink / raw)
  To: Stephen Hemminger, Doug Ledford
  Cc: Leon Romanovsky, Jiri Pirko, Ariel Almog, Dennis Dalessandro,
	Ram Amrani, Bart Van Assche, Sagi Grimberg, Jason Gunthorpe,
	Christoph Hellwig, Or Gerlitz, Linux RDMA, Linux Netdev

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

----
This is initial phase to understand if user experience for this tool fits
RDMA and netdev communities exepectations. Also I would like to get feedback
if it is really worth to provide legacy sysfs for old kernels, or maybe I should
implement netlink from the beginning and abandon sysfs completely.
-----

Hi,

Please find below, the patch set with initial implementation of configuration
tool for RDMA subsystem, which will be supplementary tool to already existed
tools in netdev community (ip, devlink, ethtool, ..).

In opposite to netdev community, where standard tools exist to configure and
present different devices abilities, RDMA subsystem historically lacked it.

Following our discussion both in mailing list [1] and at the LPC 2016 [2],
we would like to propose this RDMA tool to be part of iproute2 package
and finally improve this situation.

The development of tool was influenced by ip and devlink tools. This implies
to the object->command interface and naming convention.

In order to close object model, ensure reuse of existing code and make this
tool usable from day one, we decided to implement wrappers over legacy sysfs
prior to implementing netlink functionality. As a nice bonus, it will allow
to use this tool with old kernels too.

It is important to mention that any future extension will be required to be
done with netlink, so for already existing objects small conversion to netlink
will be unavoidable.

  # rdma -h
  Usage: rdma [ OPTIONS ] OBJECT { COMMAND | help }
  where  OBJECT := { dev | link | ipoib | memory | stats | protocols | providers | monitor }
  OPTIONS := { -V[ersion] }

* DEV object equals to CA in IBTA specification and will provide
  a way to configure/present settings relevant to specific struct ib_device.

* LINK object represents port in IBTA specification and will give access to
  struct ib_port_immutable. From the day one, It prints netdev name of
  the corresponding IB port that makes ibdev2netdev script redundant.

* IPoIB object is supposed to be specific for IP-over-Infiniband upper
  layer protocol [3]. This ULP was mainly configured by combination
  of various sysfs knobs together with ethtool. Such situation adds
  challenges to add new and expose old configuration settings due to
  the mix between different subsystems.

* MEMORY object will be used to configure memory related settings,
  e.g. on-demand-paging (ODP), force-mr (force usage of MRs for
  RDMA READ/WRITE operation).

* STATS object is needed for everything related to statistics
  (per-PID, per-QP, per-device etc.). Despite the fact that RDMA
  devices provide extensive set of counters, the decision was to
  implement it in netlink directly, because there is a need to add
  filter mechanism to them, which doesn't exist now.

* PROTOCOLS object is going to be used for device special treatment
  of global to protocol settings (e.g. set device in RoCEv2 mode as
  a default, instead of RoCEv1, instead of configfs).

* PROVIDERS objects gives ability to get specific to the device
  information, like supported kABI objects [4].

* MONITOR object is needed to debug netlink communication and will
  follow standard functionality, which exists in ip and devlink tools.

There are number of ULPs which are not covered by this tool yet:
 * HFI-VNIC - I have no access to the HW and believe that Intel
   will add native object support for it.
 * Other storage related ULPs (iSER and SRP) were not introduced too,
   because they have special tools (scci-target-utils) to configure them.
   However it will be pretty straightforward to introduce new object,
   if there is demand for it.

At the initial stage, we implemented infrastructure to read legacy
sysfs entries (Patch #1), initial man pages (Patch #7) and provided
future object examples (Patch #2-6) to allow parallel development.

Following patches will focus on cleaning user interface, parsing other
relevant entries in similar fashion to the link capability mask (Patch #8)
and providing netlink interface.

These patches were tested with two following setups:
 * Setup A:
   - Two Mellanox ConnectX-4 devices (one port)
   - One Mellanox Connect-IB device (two ports)
 * Setup B:
   - One Mellanox ConnectX-4 device (one port)
   - One Mellanox ConnectX-3 Pro device (two ports)

Please consider the inclusion of the RDMA tool into iproute2 package,
so other participants will be able to speed up development.

[1] https://www.mail-archive.com/netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg148523.html
[2] http://www.medkio.com/talks/lpc_debug.pdf
[3] https://tools.ietf.org/html/rfc4392
[4] http://marc.info/?l=linux-rdma&m=149261526916544&w=2

TODO: Add json output

Cc: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Jiri Pirko <jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Ariel Almog <ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Ram Amrani <ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Cc: Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Linux RDMA <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Cc: Linux Netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>

Leon Romanovsky (8):
  rdma: Add basic infrastructure for RDMA tool
  rdma: Add dev object
  rdma: Add link object
  rdma: Add IPoIB object
  rdma: Add memory object
  rdma: add stubs for future objects
  man: rdma.8: Document objects and commands
  rdma: Add link capability parsing

 Makefile          |   2 +-
 man/man8/Makefile |   3 +-
 man/man8/rdma.8   | 109 +++++++++++++++++++
 rdma/.gitignore   |   1 +
 rdma/Makefile     |  15 +++
 rdma/dev.c        | 101 ++++++++++++++++++
 rdma/ipoib.c      |  54 ++++++++++
 rdma/link.c       | 160 ++++++++++++++++++++++++++++
 rdma/memory.c     |  30 ++++++
 rdma/monitor.c    |  22 ++++
 rdma/protocols.c  |  22 ++++
 rdma/providers.c  |  28 +++++
 rdma/rdma.c       | 104 ++++++++++++++++++
 rdma/rdma.h       |  93 ++++++++++++++++
 rdma/stats.c      |  22 ++++
 rdma/utils.c      | 313 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 16 files changed, 1077 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/rdma.8
 create mode 100644 rdma/.gitignore
 create mode 100644 rdma/Makefile
 create mode 100644 rdma/dev.c
 create mode 100644 rdma/ipoib.c
 create mode 100644 rdma/link.c
 create mode 100644 rdma/memory.c
 create mode 100644 rdma/monitor.c
 create mode 100644 rdma/protocols.c
 create mode 100644 rdma/providers.c
 create mode 100644 rdma/rdma.c
 create mode 100644 rdma/rdma.h
 create mode 100644 rdma/stats.c
 create mode 100644 rdma/utils.c

--
2.12.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [RFC iproute2 1/8] rdma: Add basic infrastructure for RDMA tool
From: Leon Romanovsky @ 2017-05-04 18:02 UTC (permalink / raw)
  To: Stephen Hemminger, Doug Ledford
  Cc: Leon Romanovsky, Jiri Pirko, Ariel Almog, Dennis Dalessandro,
	Ram Amrani, Bart Van Assche, Sagi Grimberg, Jason Gunthorpe,
	Christoph Hellwig, Or Gerlitz, Linux RDMA, Linux Netdev
In-Reply-To: <20170504180216.7665-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

RDMA devices are cross-functional devices from one side,
but very tailored for the specific markets from another.

Such diversity caused to spread of RDMA related configuration
across various tools, e.g. devlink, ip, ethtool, ib specific and
vendor specific solutions.

This patch adds ability to fill device and port information
by reading sysfs entries (legacy).

All future configuration settings will be implemented in netlink format,
to be aligned with iproute2 package.

Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 Makefile        |   2 +-
 rdma/.gitignore |   1 +
 rdma/Makefile   |  15 +++
 rdma/rdma.c     |  96 +++++++++++++++++
 rdma/rdma.h     |  77 ++++++++++++++
 rdma/utils.c    | 313 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 503 insertions(+), 1 deletion(-)
 create mode 100644 rdma/.gitignore
 create mode 100644 rdma/Makefile
 create mode 100644 rdma/rdma.c
 create mode 100644 rdma/rdma.h
 create mode 100644 rdma/utils.c

diff --git a/Makefile b/Makefile
index 18de7dcb..c255063b 100644
--- a/Makefile
+++ b/Makefile
@@ -52,7 +52,7 @@ WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
 CFLAGS := $(WFLAGS) $(CCOPTS) -I../include $(DEFINES) $(CFLAGS)
 YACCFLAGS = -d -t -v

-SUBDIRS=lib ip tc bridge misc netem genl tipc devlink man
+SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma man

 LIBNETLINK=../lib/libnetlink.a ../lib/libutil.a
 LDLIBS += $(LIBNETLINK)
diff --git a/rdma/.gitignore b/rdma/.gitignore
new file mode 100644
index 00000000..51fb172b
--- /dev/null
+++ b/rdma/.gitignore
@@ -0,0 +1 @@
+rdma
diff --git a/rdma/Makefile b/rdma/Makefile
new file mode 100644
index 00000000..65248b31
--- /dev/null
+++ b/rdma/Makefile
@@ -0,0 +1,15 @@
+include ../Config
+
+RDMA_OBJ = rdma.o utils.o
+TARGETS=rdma
+
+all:	$(TARGETS) $(LIBS)
+
+rdma:	$(RDMA_OBJ)
+	$(QUIET_LINK)$(CC) $^ -o $@
+
+install: all
+	install -m 0755 $(TARGETS) $(DESTDIR)$(SBINDIR)
+
+clean:
+	rm -f $(RDMA_OBJ) $(TARGETS)
diff --git a/rdma/rdma.c b/rdma/rdma.c
new file mode 100644
index 00000000..bc7d1483
--- /dev/null
+++ b/rdma/rdma.c
@@ -0,0 +1,96 @@
+/*
+ * rdma.c	RDMA tool
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
+ */
+
+#include <limits.h>
+
+#include "rdma.h"
+#include "SNAPSHOT.h"
+
+static void help(char *name)
+{
+	pr_out("Usage: %s [ OPTIONS ] OBJECT { COMMAND | help }\n"
+	       "where  OBJECT := { }\n"
+	       "       OPTIONS := { -V[ersion] }\n", name);
+}
+
+static int obj_help(struct rdma *rd)
+{
+	help(rd->filename);
+	return 0;
+}
+
+static int rd_cmd(struct rdma *rd)
+{
+	const struct rdma_obj objs[] = {
+		{ NULL,		obj_help },
+		{ "help",	obj_help },
+		{ 0 }
+	};
+
+	return rdma_exec_cmd(rd, objs, "object");
+}
+
+static int rd_init(struct rdma *rd, int argc, char **argv, char *filename)
+{
+	rd->filename = filename;
+	rd->argc = argc;
+	rd->argv = argv;
+	INIT_LIST_HEAD(&rd->dev_map_list);
+	return 0;
+}
+static void rd_free(struct rdma *rd)
+{
+	dev_map_cleanup(rd);
+}
+int main(int argc, char **argv)
+{
+	char *filename;
+	static const struct option long_options[] = {
+		{ "version",		no_argument,		NULL, 'V' },
+		{ "help",		no_argument,		NULL, 'h' },
+		{ NULL, 0, NULL, 0 }
+	};
+	struct rdma rd;
+	int opt;
+	int err;
+
+	filename = basename(argv[0]);
+
+	while ((opt = getopt_long(argc, argv, "Vh",
+				  long_options, NULL)) >= 0) {
+
+		switch (opt) {
+		case 'V':
+			printf("%s utility, iproute2-ss%s\n", filename, SNAPSHOT);
+			return EXIT_SUCCESS;
+		case 'h':
+			help(filename);
+			return EXIT_SUCCESS;
+		default:
+			pr_err("Unknown option.\n");
+			help(filename);
+			return EXIT_FAILURE;
+		}
+	}
+
+	argc -= optind;
+	argv += optind;
+
+	err = rd_init(&rd, argc, argv, filename);
+	if (err)
+		goto out;
+
+	err = rd_cmd(&rd);
+	/* Always cleanup */
+	rd_free(&rd);
+
+out:	return (err) ? EXIT_FAILURE:EXIT_SUCCESS;
+}
diff --git a/rdma/rdma.h b/rdma/rdma.h
new file mode 100644
index 00000000..156bb74c
--- /dev/null
+++ b/rdma/rdma.h
@@ -0,0 +1,77 @@
+/*
+ * rdma.c	RDMA tool
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
+ */
+#ifndef _RDMA_TOOL_H_
+#define _RDMA_TOOL_H_
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <getopt.h>
+#include <stdbool.h>
+
+#include "list.h"
+
+#define pr_err(args...) fprintf(stderr, ##args)
+#define pr_out(args...) fprintf(stdout, ##args)
+
+enum rt_protocols {
+	RT_PROTOCOL_IB,
+	RT_PROTOCOL_IWARP,
+	RT_PROTOCOL_ROCE_V1,
+	RT_PROTOCOL_ROCE_V2,
+	RT_PROTOCOL_OPA,
+	RT_NO_PROTOCOL
+};
+
+struct port_map {
+	struct list_head list;
+	char *ifname;
+	uint32_t idx;
+};
+
+struct dev_map {
+	struct list_head list;
+	char *dev_name;
+	uint32_t num_ports;
+	struct list_head port_map_list;
+	uint32_t idx;
+};
+
+struct rdma {
+	int argc;
+	char **argv;
+	char *filename;
+	struct list_head dev_map_list;
+};
+
+struct rdma_obj {
+	const char *cmd;
+	int (*func)(struct rdma *rd);
+};
+
+/*
+ * Parser interface
+ */
+bool rd_no_arg(struct rdma *rd);
+uint32_t get_port_from_argv(struct rdma *rd);
+
+int rdma_exec_cmd(struct rdma *rd, const struct rdma_obj *o, const char *str);
+int rdma_sysfs_read_ib(const char *name, int port, const char *field, char *res);
+
+/*
+ * Device manipulation
+ */
+int dev_map_init(struct rdma *rd);
+void dev_map_cleanup(struct rdma *rd);
+struct dev_map *dev_map_lookup(struct rdma *rd, bool allow_port_index);
+#endif /* _RDMA_TOOL_H_ */
diff --git a/rdma/utils.c b/rdma/utils.c
new file mode 100644
index 00000000..568d7c0a
--- /dev/null
+++ b/rdma/utils.c
@@ -0,0 +1,313 @@
+/*
+ * utils.c	RDMA tool
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
+ */
+
+#include <sys/types.h>
+#include <dirent.h>
+
+#include "rdma.h"
+
+/*
+ * This macro will be moved to generic layer,
+ * after code will be accepted.
+ * it is placed here to avoid rebases with upstream code.
+ */
+#define MAX(a, b)	((a) > (b) ? (a) : (b))
+
+static int rd_argc(struct rdma *rd)
+{
+	return rd->argc;
+}
+
+static char *rd_argv(struct rdma *rd)
+{
+	if (!rd_argc(rd))
+		return NULL;
+	return *rd->argv;
+}
+
+static int strcmpx(const char *str1, const char *str2)
+{
+	if (strlen(str1) > strlen(str2))
+			return -1;
+	return strncmp(str1, str2, strlen(str1));
+}
+
+static bool rd_argv_match(struct rdma *rd, const char *pattern)
+{
+	if (!rd_argc(rd))
+		return false;
+	return strcmpx(rd_argv(rd), pattern) == 0;
+}
+
+static void rd_arg_inc(struct rdma *rd)
+{
+	if (!rd_argc(rd))
+		return;
+	rd->argc--;
+	rd->argv++;
+}
+
+bool rd_no_arg(struct rdma *rd)
+{
+	return rd_argc(rd) == 0;
+}
+
+#define SYSFS_INFINIBAND "/sys/class/infiniband"
+#define SYSFS_NET "/sys/class/net"
+static int sysfs_read(const char *prefix, const char *name, int port, const char *field, char *res)
+{
+	char fpath[64];
+	FILE *fentry;
+	int result;
+
+	if (port == 0)
+		snprintf(fpath, 64, "%s/%s/%s", prefix, name, field);
+	else
+		snprintf(fpath, 64, "%s/%s/ports/%d/%s", prefix, name, port, field);
+
+	fentry = fopen(fpath, "r");
+	if (!fentry)
+		return -ENOENT;
+
+	result = fread(res, 1, 4096, fentry);
+	/* Remove last "\n" */
+	res[result-1] = '\0';
+	fclose(fentry);
+	return 0;
+}
+
+int rdma_sysfs_read_ib(const char *name, int port, const char *field, char *res)
+{
+	return sysfs_read(SYSFS_INFINIBAND, name, port, field, res);
+}
+
+static int sysfs_read_net(const char *name,  const char *field, char *res)
+{
+	return sysfs_read(SYSFS_NET, name, 0, field, res);
+}
+
+static struct dev_map *dev_map_alloc(char *dev_name)
+{
+	struct dev_map *dev_map;
+
+	dev_map = calloc(1, sizeof(*dev_map));
+	if (!dev_map)
+		return NULL;
+	dev_map->dev_name = strdup(dev_name);
+	INIT_LIST_HEAD(&dev_map->port_map_list);
+
+	return dev_map;
+}
+
+static void port_map_free(struct port_map *port_map)
+{
+	free(port_map->ifname);
+	free(port_map);
+}
+
+static void dev_map_free(struct dev_map *dev_map)
+{
+	struct port_map *port_map, *tmp;
+
+	list_for_each_entry_safe(port_map, tmp,
+				 &dev_map->port_map_list, list) {
+		list_del(&port_map->list);
+		port_map_free(port_map);
+	}
+
+	free(dev_map->dev_name);
+	free(dev_map);
+}
+
+void dev_map_cleanup(struct rdma *rd)
+{
+	struct dev_map *dev_map, *tmp;
+
+	list_for_each_entry_safe(dev_map, tmp,
+				 &rd->dev_map_list, list) {
+		list_del(&dev_map->list);
+		dev_map_free(dev_map);
+	}
+}
+
+uint32_t get_port_from_argv(struct rdma *rd)
+{
+	char *slash;
+
+	slash = strchr(rd_argv(rd), '/');
+	/* if no port found, return 0 */
+	return (slash) ? (atoi(slash + 1)):0;
+}
+
+struct dev_map *dev_map_lookup(struct rdma *rd, bool allow_port_index)
+{
+	struct dev_map *dev_map;
+	char *dev_name;
+	char *slash;
+
+	dev_name = strdup(rd_argv(rd));
+	if (allow_port_index) {
+		slash = strrchr(dev_name, '/');
+		if (slash)
+			*slash = '\0';
+	}
+
+	list_for_each_entry(dev_map, &rd->dev_map_list, list)
+		if (strcmp(dev_name, dev_map->dev_name) == 0) {
+			free(dev_name);
+			return dev_map;
+		}
+
+	free(dev_name);
+	return NULL;
+}
+
+static bool is_dots(char *name)
+{
+	if (!strcmp(name, ".") || !strcmp(name, ".."))
+		return true;
+	return false;
+}
+
+static char *find_ifname(struct dev_map *dev_map, uint32_t port)
+{
+	struct dirent *dentry, *drentry;
+	char data[4096];
+	char drpath[64];
+	uint32_t net_port;
+	DIR *dir, *drdir;
+	char *ifname = NULL;
+
+	dir = opendir(SYSFS_NET);
+	if (!dir)
+		return NULL;
+
+	while ((dentry = readdir(dir))) {
+		if (is_dots(dentry->d_name))
+			continue;
+
+		if (sysfs_read_net(dentry->d_name, "dev_id", data))
+			continue;
+
+		/* handle up to 9 ports, due to insufficient handling of hex  */
+		net_port = strtoul(data, NULL, 16);
+
+		snprintf(drpath, 64, "%s/%s/device/infiniband/", SYSFS_NET, dentry->d_name);
+
+		drdir = opendir(drpath);
+		if (!drdir)
+			continue;
+
+		while ((drentry = readdir(drdir))) {
+			if (is_dots(drentry->d_name))
+				continue;
+			if (!strcmp(drentry->d_name, dev_map->dev_name) &&
+			    net_port == port - 1) {
+				ifname = strdup(dentry->d_name);
+				closedir(drdir);
+				return ifname;
+			}
+		}
+
+		closedir(drdir);
+
+	}
+
+	closedir(dir);
+	return NULL;
+}
+
+static struct port_map *port_map_alloc(struct dev_map *dev_map, uint32_t port)
+{
+	struct port_map *port_map;
+
+	port_map = calloc(1, sizeof(*port_map));
+	if (!port_map)
+		return NULL;
+
+	port_map->idx = port;
+	/*
+	 * Expensive operation in sysfs world, need to rescan all net devices.
+	 * Hopefuly, it is one time operation.
+	 * In netlink, it will be simpler
+	 */
+	port_map->ifname = find_ifname(dev_map, port);
+	return port_map;
+}
+
+int dev_map_init(struct rdma *rd)
+{
+	struct dev_map *dev_map;
+	struct port_map *port_map;
+	struct dirent *dentry, *pentry;
+	uint32_t i = 1;
+	uint32_t num_ports = 0;
+	char ports_name[64];
+	DIR *dir, *pdir;
+	dir = opendir(SYSFS_INFINIBAND);
+	if (!dir)
+		return -ENOENT;
+
+	while ((dentry = readdir(dir))) {
+		num_ports = 0;
+		if (is_dots(dentry->d_name))
+			continue;
+
+		dev_map = dev_map_alloc(dentry->d_name);
+		if (!dev_map)
+			/* The main function will cleanup the allocations */
+			return -ENOMEM;
+		list_add_tail(&dev_map->list, &rd->dev_map_list);
+		dev_map->idx = i;
+		i++;
+
+		snprintf(ports_name, 64, "%s/%s/ports/", SYSFS_INFINIBAND, dentry->d_name);
+		pdir = opendir(ports_name);
+		while ((pentry = readdir(pdir))) {
+			int port;
+			if (is_dots(pentry->d_name))
+				continue;
+
+			port = atoi(pentry->d_name);
+			port_map = port_map_alloc(dev_map, port);
+			if (!port_map)
+				return -ENOENT;
+			list_add_tail(&port_map->list, &dev_map->port_map_list);
+			num_ports = MAX(num_ports, port);
+		}
+		closedir(pdir);
+		dev_map->num_ports = num_ports;
+	}
+	closedir(dir);
+
+	/* num_ports == 0 => no devices in infiniband folder */
+	return (num_ports) ? 0:(-ENOENT);
+}
+
+int rdma_exec_cmd(struct rdma *rd, const struct rdma_obj *objs, const char *str)
+{
+	const struct rdma_obj *o;
+
+	/* First argument in objs table is default variant */
+	if (rd_no_arg(rd))
+		return objs->func(rd);
+
+	for (o = objs + 1; o->cmd; ++o) {
+		if (rd_argv_match(rd, o->cmd)) {
+			/* Move to next argument */
+			rd_arg_inc(rd);
+			return o->func(rd);
+		}
+	}
+
+	pr_err("Unknown %s '%s'.\n", str, rd_argv(rd));
+	return 0;
+}
--
2.12.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [RFC iproute2 4/8] rdma: Add IPoIB object
From: Leon Romanovsky @ 2017-05-04 18:02 UTC (permalink / raw)
  To: Stephen Hemminger, Doug Ledford
  Cc: Leon Romanovsky, Jiri Pirko, Ariel Almog, Dennis Dalessandro,
	Ram Amrani, Bart Van Assche, Sagi Grimberg, Jason Gunthorpe,
	Christoph Hellwig, Or Gerlitz, Linux RDMA, Linux Netdev
In-Reply-To: <20170504180216.7665-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

IPoIB object allows configuration and presentation of information for
IP-over-Infiniband user level protocol.

Supported commands are show, set and help.

Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 rdma/Makefile |  2 +-
 rdma/ipoib.c  | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 rdma/rdma.c   |  3 ++-
 rdma/rdma.h   |  1 +
 4 files changed, 58 insertions(+), 2 deletions(-)
 create mode 100644 rdma/ipoib.c

diff --git a/rdma/Makefile b/rdma/Makefile
index cf54ed36..dd702b9f 100644
--- a/rdma/Makefile
+++ b/rdma/Makefile
@@ -1,6 +1,6 @@
 include ../Config

-RDMA_OBJ = rdma.o utils.o dev.o link.o
+RDMA_OBJ = rdma.o utils.o dev.o link.o ipoib.o
 TARGETS=rdma

 all:	$(TARGETS) $(LIBS)
diff --git a/rdma/ipoib.c b/rdma/ipoib.c
new file mode 100644
index 00000000..dd0d0285
--- /dev/null
+++ b/rdma/ipoib.c
@@ -0,0 +1,54 @@
+/*
+ * ipoib.c	RDMA tool
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
+ */
+
+#include "rdma.h"
+
+static int ipoib_help(struct rdma *rd)
+{
+	pr_out("Usage: %s ipoib show NAME | DEV | DEV/PORT\n", rd->filename);
+	pr_out("       %s ipoib set NAME [ accel { off | on } ]\n", rd->filename);
+	pr_out("       %s ipoib start NAME dev DEV\n", rd->filename);
+	pr_out("       %s ipoib stop NAME\n", rd->filename);
+	return 0;
+}
+
+static int ipoib_show(struct rdma *rd)
+{
+	if (rd_no_arg(rd))
+		ipoib_help(rd);
+
+	return 0;
+}
+
+static int ipoib_set(struct rdma *rd)
+{
+	/* Not supported yet */
+	return 0;
+}
+
+int obj_ipoib(struct rdma *rd)
+{
+	const struct rdma_obj objs[] = {
+		{ NULL,		ipoib_show },
+		{ "show",	ipoib_show },
+		{ "list",	ipoib_show },
+		{ "set",	ipoib_set },
+		{ "help",	ipoib_help },
+		{ 0 }
+	};
+
+	if (dev_map_init(rd)) {
+		pr_err("There are no RDMA devices\n");
+		return -ENOENT;
+	}
+
+	return rdma_exec_cmd(rd, objs, "Uknown ipoib command");
+}
diff --git a/rdma/rdma.c b/rdma/rdma.c
index 55cbf0e3..ffd70899 100644
--- a/rdma/rdma.c
+++ b/rdma/rdma.c
@@ -17,7 +17,7 @@
 static void help(char *name)
 {
 	pr_out("Usage: %s [ OPTIONS ] OBJECT { COMMAND | help }\n"
-	       "where  OBJECT := { dev | link }\n"
+	       "where  OBJECT := { dev | link | ipoib }\n"
 	       "       OPTIONS := { -V[ersion] }\n", name);
 }

@@ -33,6 +33,7 @@ static int rd_cmd(struct rdma *rd)
 		{ NULL,		obj_help },
 		{ "dev",	obj_dev },
 		{ "link",	obj_link },
+		{ "ipoib",	obj_ipoib },
 		{ "help",	obj_help },
 		{ 0 }
 	};
diff --git a/rdma/rdma.h b/rdma/rdma.h
index bdb77b5e..1fef4eb8 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -64,6 +64,7 @@ struct rdma_obj {
  */
 int obj_dev(struct rdma *rd);
 int obj_link(struct rdma *rd);
+int obj_ipoib(struct rdma *rd);

 /*
  * Parser interface
--
2.12.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox