Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [patch net-next 1/3] idr: Add new APIs to support unsigned long
From: Chris Mi @ 2017-08-29  8:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Hannes Frederic Sowa, netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, davem@davemloft.net,
	mawilcox@microsoft.com
In-Reply-To: <20170829075711.GE1977@nanopsycho.orion>



> -----Original Message-----
> From: Jiri Pirko [mailto:jiri@resnulli.us]
> Sent: Tuesday, August 29, 2017 3:57 PM
> To: Chris Mi <chrism@mellanox.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>;
> netdev@vger.kernel.org; jhs@mojatatu.com; xiyou.wangcong@gmail.com;
> davem@davemloft.net; mawilcox@microsoft.com
> Subject: Re: [patch net-next 1/3] idr: Add new APIs to support unsigned long
> 
> Tue, Aug 29, 2017 at 09:34:47AM CEST, chrism@mellanox.com wrote:
> >Hi,
> >
> >> -----Original Message-----
> >> From: Hannes Frederic Sowa [mailto:hannes@stressinduktion.org]
> >> Sent: Tuesday, August 29, 2017 3:14 PM
> >> To: Chris Mi <chrism@mellanox.com>
> >> Cc: netdev@vger.kernel.org; jhs@mojatatu.com;
> >> xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
> >> mawilcox@microsoft.com
> >> Subject: Re: [patch net-next 1/3] idr: Add new APIs to support
> >> unsigned long
> >>
> >> Hello,
> >>
> >> Chris Mi <chrism@mellanox.com> writes:
> >>
> >> > The following new APIs are added:
> >> >
> >> > int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
> >> >                   unsigned long start, unsigned long end, gfp_t
> >> > gfp); static inline void *idr_remove_ext(struct idr *idr, unsigned
> >> > long id); static inline void *idr_find_ext(const struct idr *idr,
> >> > unsigned long id); void *idr_replace_ext(struct idr *idr, void
> >> > *ptr, unsigned long id); void *idr_get_next_ext(struct idr *idr,
> >> > unsigned long *nextid);
> >> >
> >> > Signed-off-by: Chris Mi <chrism@mellanox.com>
> >> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >> > ---
> >> >  include/linux/idr.h        | 16 ++++++++++
> >> >  include/linux/radix-tree.h |  3 ++
> >> >  lib/idr.c                  | 56 +++++++++++++++++++++++++++++++++++
> >> >  lib/radix-tree.c           | 73
> >> ++++++++++++++++++++++++++++++++++++++++++++++
> >> >  4 files changed, 148 insertions(+)
> >> >
> >>
> >> [...]
> >>
> >> > +int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
> >> > +		  unsigned long start, unsigned long end, gfp_t gfp) {
> >> > +	void __rcu **slot;
> >> > +	struct radix_tree_iter iter;
> >> > +
> >> > +	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
> >> > +		return -EINVAL;
> >> > +
> >> > +	radix_tree_iter_init(&iter, start);
> >> > +	slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
> >> > +	if (IS_ERR(slot))
> >> > +		return PTR_ERR(slot);
> >> > +
> >> > +	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
> >> > +	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
> >> > +
> >> > +	if (index)
> >> > +		*index = iter.index;
> >> > +	return 0;
> >> > +}
> >> > +EXPORT_SYMBOL_GPL(idr_alloc_ext);
> >>
> >> Can you express idr_alloc in terms of idr_alloc_ext? Same for most of
> >> the other functions (it seems that signed int was used as return
> >> value to indicate error cases, thus it should be easy to map those).
> >In idr_alloc(), we have the following check:
> >
> >        if (WARN_ON_ONCE(start < 0))
> >                return -EINVAL;
> >
> >But in idr_alloc_ext(), since we are using unsigned long, we needn't such
> check.
> 
> You can just check and call idr_alloc_ext then to do the actual work.
OK, will fix it.

^ permalink raw reply

* Re: [patch net-next 1/3] idr: Add new APIs to support unsigned long
From: Jiri Pirko @ 2017-08-29  7:57 UTC (permalink / raw)
  To: Chris Mi
  Cc: Hannes Frederic Sowa, netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, davem@davemloft.net,
	mawilcox@microsoft.com
In-Reply-To: <VI1PR0501MB214343F199709BB6EF6EB9B2AB9F0@VI1PR0501MB2143.eurprd05.prod.outlook.com>

Tue, Aug 29, 2017 at 09:34:47AM CEST, chrism@mellanox.com wrote:
>Hi,
>
>> -----Original Message-----
>> From: Hannes Frederic Sowa [mailto:hannes@stressinduktion.org]
>> Sent: Tuesday, August 29, 2017 3:14 PM
>> To: Chris Mi <chrism@mellanox.com>
>> Cc: netdev@vger.kernel.org; jhs@mojatatu.com;
>> xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
>> mawilcox@microsoft.com
>> Subject: Re: [patch net-next 1/3] idr: Add new APIs to support unsigned long
>> 
>> Hello,
>> 
>> Chris Mi <chrism@mellanox.com> writes:
>> 
>> > The following new APIs are added:
>> >
>> > int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
>> >                   unsigned long start, unsigned long end, gfp_t gfp);
>> > static inline void *idr_remove_ext(struct idr *idr, unsigned long id);
>> > static inline void *idr_find_ext(const struct idr *idr, unsigned long
>> > id); void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long
>> > id); void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
>> >
>> > Signed-off-by: Chris Mi <chrism@mellanox.com>
>> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> > ---
>> >  include/linux/idr.h        | 16 ++++++++++
>> >  include/linux/radix-tree.h |  3 ++
>> >  lib/idr.c                  | 56 +++++++++++++++++++++++++++++++++++
>> >  lib/radix-tree.c           | 73
>> ++++++++++++++++++++++++++++++++++++++++++++++
>> >  4 files changed, 148 insertions(+)
>> >
>> 
>> [...]
>> 
>> > +int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
>> > +		  unsigned long start, unsigned long end, gfp_t gfp) {
>> > +	void __rcu **slot;
>> > +	struct radix_tree_iter iter;
>> > +
>> > +	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
>> > +		return -EINVAL;
>> > +
>> > +	radix_tree_iter_init(&iter, start);
>> > +	slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
>> > +	if (IS_ERR(slot))
>> > +		return PTR_ERR(slot);
>> > +
>> > +	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
>> > +	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
>> > +
>> > +	if (index)
>> > +		*index = iter.index;
>> > +	return 0;
>> > +}
>> > +EXPORT_SYMBOL_GPL(idr_alloc_ext);
>> 
>> Can you express idr_alloc in terms of idr_alloc_ext? Same for most of the
>> other functions (it seems that signed int was used as return value to indicate
>> error cases, thus it should be easy to map those).
>In idr_alloc(), we have the following check:
>
>        if (WARN_ON_ONCE(start < 0))
>                return -EINVAL;
>
>But in idr_alloc_ext(), since we are using unsigned long, we needn't such check.

You can just check and call idr_alloc_ext then to do the actual work.

^ permalink raw reply

* Re: Question about ip_defrag
From: Florian Westphal @ 2017-08-29  7:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Florian Westphal, liujian (CE), davem@davemloft.net,
	kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org,
	elena.reshetova@intel.com, edumazet@google.com,
	netdev@vger.kernel.org, Wangkefeng (Kevin), weiyongjun (A)
In-Reply-To: <20170829092021.0a46fffa@redhat.com>

Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> On Mon, 28 Aug 2017 16:00:32 +0200
> Florian Westphal <fw@strlen.de> wrote:
> 
> > liujian (CE) <liujian56@huawei.com> wrote:
> > > Hi
> > > 
> > > I checked our 3.10 kernel, we had backported all percpu_counter bug fix in lib/percpu_counter.c and include/linux/percpu_counter.h.
> > > And I check 4.13-rc6, also has the issue if NIC's rx cpu num big enough.
> > >   
> > > > > > > the issue:
> > > > > > > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > > > > > > At this moment,sum_frag_mem_limit is about 10K.  
> > > 
> > > So should we change ipfrag high/low thresh to a reasonable value ? 
> > > And if it is, is there a standard to change the value?  
> > 
> > Each cpu can have frag_percpu_counter_batch bytes rest doesn't know
> > about so with 64 cpus that is ~8 mbyte.
> > 
> > possible solutions:
> > 1. reduce frag_percpu_counter_batch to 16k or so
> > 2. make both low and high thresh depend on NR_CPUS

I take 2) back.  Its wrong to do this, for large NR_CPU values it
would even overflow.

> To me it looks like we/I have been using the wrong API for comparing
> against percpu_counters.  I guess we should have used __percpu_counter_compare().

Are you sure?  For liujian use case (64 cores) it looks like we would
always fall through to percpu_counter_sum() so we eat spinlock_irqsave
cost for all compares.

Before we entertain this we should consider reducing frag_percpu_counter_batch
to a smaller value.

^ permalink raw reply

* Re: [patch net-next 1/3] idr: Add new APIs to support unsigned long
From: Jiri Pirko @ 2017-08-29  7:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Chris Mi, netdev, jhs, xiyou.wangcong, davem, mawilcox
In-Reply-To: <87y3q27sn7.fsf@stressinduktion.org>

Tue, Aug 29, 2017 at 09:14:04AM CEST, hannes@stressinduktion.org wrote:
>Hello,
>
>Chris Mi <chrism@mellanox.com> writes:
>
>> The following new APIs are added:
>>
>> int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
>>                   unsigned long start, unsigned long end, gfp_t gfp);
>> static inline void *idr_remove_ext(struct idr *idr, unsigned long id);
>> static inline void *idr_find_ext(const struct idr *idr, unsigned long id);
>> void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
>> void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
>>
>> Signed-off-by: Chris Mi <chrism@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/linux/idr.h        | 16 ++++++++++
>>  include/linux/radix-tree.h |  3 ++
>>  lib/idr.c                  | 56 +++++++++++++++++++++++++++++++++++
>>  lib/radix-tree.c           | 73 ++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 148 insertions(+)
>>
>
>[...]
>
>> +int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
>> +		  unsigned long start, unsigned long end, gfp_t gfp)
>> +{
>> +	void __rcu **slot;
>> +	struct radix_tree_iter iter;
>> +
>> +	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
>> +		return -EINVAL;
>> +
>> +	radix_tree_iter_init(&iter, start);
>> +	slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
>> +	if (IS_ERR(slot))
>> +		return PTR_ERR(slot);
>> +
>> +	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
>> +	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
>> +
>> +	if (index)
>> +		*index = iter.index;
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(idr_alloc_ext);
>
>Can you express idr_alloc in terms of idr_alloc_ext? Same for most of
>the other functions (it seems that signed int was used as return value
>to indicate error cases, thus it should be easy to map those).

Agreed. Same for free function.


>
>[...]
>
>Thanks,
>Hannes

^ permalink raw reply

* Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
From: Jiri Pirko @ 2017-08-29  7:55 UTC (permalink / raw)
  To: David Ahern
  Cc: Arkadi Sharshevsky, netdev, davem, idosch, mlxsw, roopa,
	Shrijeet Mukherjee
In-Reply-To: <f5bd669b-6a0f-46e9-97b9-7abe73a228a5@gmail.com>

Tue, Aug 29, 2017 at 04:57:12AM CEST, dsahern@gmail.com wrote:
>On 8/27/17 2:31 AM, Arkadi Sharshevsky wrote:
>>> Also, this dpipe capability seems to be just dumping data structures
>>> maintained by the driver. ie., you can compare the mlxsw view of
>>> networking state to IPv4 and IPv6 level tables. Any plans to offer a
>>> command that reads data from the h/w and passes that back to the user?
>>> i.e, a command to compare kernel tables to h/w state?
>>>
>> 
>> So this infra should provide several things-
>> 
>> 1) Reveal the interactions between various hardware tables
>> 2) Counters for this tables
>> 3) Debugabillity
>> 
>> The first two can be achieved right now. Regarding debugabillity, which
>> is a bit vague, the current assumption is that the drivers internal data
>> structures are synced with hardware (which is no always true), and maybe
>> are not synced with the kernel, so this can be achieved right now by
>> dumping the internal state of the driver. Furthermore, the counters are
>> dumped from the hardware and give the user additional indication.
>> 
>> I completely agree that the hardware should be dumped in order to
>> validate the internal data structures are really synced with HW. This
>> could be usable for observing data corruptions inside the ASIC and
>> various complex bugs.
>> 
>> In order to address that I though about maybe add a flag called
>> "validate_hw" so that during the dump the driver<-->hw state could be
>> validated.
>> 
>> What do you think about it?
>
>It is not just a matter of dumping hardware state. The data returned by
>dump needs to be consistent across platforms and vendors.
>
>If the intent is validating hardware state matches kernel state (ie.,

Nope, that is definitelly not the intent. The intent is to provide user
some more information about how the actual tables in hw look like, so he
knows exactly what is going on there and eventually can optimize things
if needed (resource allocations for example)


>h/w forwarding matches s/w forwarding), then the hardware state should
>be dumped by the driver in a form that parallels kernel state. e.g.,
>dump h/w routes, neighbor entries, fdb's in a form and granularity
>similar to what is done for kernel tables.
>
>With the recent dpipe changes that allows kernel to driver cache and
>kernel to h/w state comparisons.

^ permalink raw reply

* Re: [ethtool] ethtool: Remove UDP Fragmentation Offload use from ethtool
From: Tariq Toukan @ 2017-08-29  7:50 UTC (permalink / raw)
  To: John W. Linville, Eric Dumazet, David Miller
  Cc: netdev, Eran Ben Elisha, Shaker Daibes
In-Reply-To: <20170828182251.GC3092@tuxdriver.com>

On 28/08/2017 9:22 PM, John W. Linville wrote:
> On Mon, Aug 28, 2017 at 08:00:11AM -0700, Eric Dumazet wrote:
>> On Mon, 2017-08-28 at 15:38 +0300, Tariq Toukan wrote:
>>> From: Shaker Daibes <shakerd@mellanox.com>
>>>
>>> UFO was removed in kernel, here we remove it in ethtool app.
>>>
>>> Fixes the following issue:
>>> Features for ens8:
>>> Cannot get device udp-fragmentation-offload settings: Operation not supported
>>>
>>> Tested with "make check"
>>>
>>> Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
>>> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>>> ---
>>
>>
>> Hi guys
>>
>> I would rather remove the warning, but leave the ability to switch UFO
>> on machines running old kernel but a recent ethtool.
>>
>> ethtool does not need to be downgraded every time we boot an old
>> kernel ;)

Thanks all for your quick replies.

We thought about the backward compatibility issue before getting to 
writing this patch.
But, as the feature has very few device support, and is not that useful,
we thought it would be best to just totally remove it from ethtool.

We can re-work this so the feature would still be available on old kernels.

But I wonder how the warning removal should be done??

I have some suggestions in mind:
1) Have a special condition that does not print a warning only in the 
case of UFO?
2) Remove the warning totally? I don't like this option.
3) Add a max_kernel_ver field in struct off_flag_def, and use it to not 
print the warning, or to mark the feature 'off [fixed]'.

Please let me know what you think.

> 
> No, definitely not.
>   
>> Thanks !
> 
> Tariq, will you be reworking this as Eric suggests?

Yes. Once we decide what is the correct way to keep it backward compatible.

> 
> John
> 

Regards,
Tariq Toukan

^ permalink raw reply

* Re: [PATCH] DSA support for Micrel KSZ8895
From: Pavel Machek @ 2017-08-29  7:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Woojung.Huh, nathan.leigh.conrad, vivien.didelot, f.fainelli,
	netdev, linux-kernel, Tristram.Ha
In-Reply-To: <20170828140927.GD10418@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 909 bytes --]

On Mon 2017-08-28 16:09:27, Andrew Lunn wrote:
> > I may be confused here, but AFAICT:
> > 
> > 1) Yes, it has standard layout when accessed over MDIO. 
> 
> 
> Section 4.8 of the datasheet says:
> 
> 	All the registers defined in this section can be also accessed
> 	via the SPI interface.
> 
> Meaning all PHY registers can be access via the SPI interface. So you
> should be able to make a standard Linux MDIO bus driver which performs
> SPI reads.

As far as I can tell (and their driver confirms) -- yes, all those
registers can be accessed over the SPI, they are just shuffled
around... hence MDIO emulation code. I copied it from their code (see
the copyrights) so no, I don't believe there's nicer solution.

Best regards,

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* RE: Question about ip_defrag
From: liujian (CE) @ 2017-08-29  7:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Florian Westphal
  Cc: davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, elena.reshetova@intel.com,
	edumazet@google.com, netdev@vger.kernel.org, Wangkefeng (Kevin),
	weiyongjun (A)
In-Reply-To: <20170829092021.0a46fffa@redhat.com>


> -----Original Message-----
> From: Jesper Dangaard Brouer [mailto:brouer@redhat.com]
> Sent: Tuesday, August 29, 2017 3:20 PM
> To: Florian Westphal
> Cc: liujian (CE); davem@davemloft.net; kuznet@ms2.inr.ac.ru;
> yoshfuji@linux-ipv6.org; elena.reshetova@intel.com; edumazet@google.com;
> netdev@vger.kernel.org; Wangkefeng (Kevin); weiyongjun (A);
> brouer@redhat.com
> Subject: Re: Question about ip_defrag
> 
> On Mon, 28 Aug 2017 16:00:32 +0200
> Florian Westphal <fw@strlen.de> wrote:
> 
> > liujian (CE) <liujian56@huawei.com> wrote:
> > > Hi
> > >
> > > I checked our 3.10 kernel, we had backported all percpu_counter bug fix in
> lib/percpu_counter.c and include/linux/percpu_counter.h.
> > > And I check 4.13-rc6, also has the issue if NIC's rx cpu num big enough.
> > >
> > > > > > > the issue:
> > > > > > > Ip_defrag fail caused by frag_mem_limit reached
> 4M(frags.high_thresh).
> > > > > > > At this moment,sum_frag_mem_limit is about 10K.
> > >
> > > So should we change ipfrag high/low thresh to a reasonable value ?
> > > And if it is, is there a standard to change the value?
> >
> > Each cpu can have frag_percpu_counter_batch bytes rest doesn't know
> > about so with 64 cpus that is ~8 mbyte.
> >
> > possible solutions:
> > 1. reduce frag_percpu_counter_batch to 16k or so 2. make both low and
> > high thresh depend on NR_CPUS
> 
> To me it looks like we/I have been using the wrong API for comparing against
> percpu_counters.  I guess we should have used
> __percpu_counter_compare().

Are you means?
Change 
if (frag_mem_limit(nf) > nf->low_thresh)
to
__percpu_counter_compare(&nf->mem, nf->low_thresh, frag_percpu_counter_batch)

> /*
>  * Compare counter against given value.
>  * Return 1 if greater, 0 if equal and -1 if less  */ int
> __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch) {
> 	s64	count;
> 
> 	count = percpu_counter_read(fbc);
> 	/* Check to see if rough count will be sufficient for comparison */
> 	if (abs(count - rhs) > (batch * num_online_cpus())) {
> 		if (count > rhs)
> 			return 1;
> 		else
> 			return -1;
> 	}
> 	/* Need to use precise count */
> 	count = percpu_counter_sum(fbc);
> 	if (count > rhs)
> 		return 1;
> 	else if (count < rhs)
> 		return -1;
> 	else
> 		return 0;
> }
> EXPORT_SYMBOL(__percpu_counter_compare);
> 
> 
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] DSA support for Micrel KSZ8895
From: Pavel Machek @ 2017-08-29  7:41 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Andrew Lunn, Woojung.Huh, nathan.leigh.conrad, Vivien Didelot,
	Florian Fainelli, netdev, linux-kernel, Tristram.Ha
In-Reply-To: <CAJGZr0K46jBd9Sn4HNPgAeHTQtBuBHzX87GveAJf3_5b_WM69g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

Hi!

> Micrel has some drivers on their web site to support some chips. For
> that chips they do virtual mdio over spi.
> And driver is available on download page:
> http://www.microchip.com/wwwproducts/en/KSZ8895
> 
> Documentation->Software library.
> 
> Both driver and DSA driver. Driver has to work with some minor fixups
> related to your kernel version. But I think they are don't care about
> up-streaming that code.
> So you can take their code as a reference.

"Minor fixups". Take a look at the driver.. I wanted to do a "minor
fixups". It turned out it was easier to start from scratch.

But the MDIO emaulation code is from their driver, after lots of
deletions.

								Pavel
								
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* RE: Question about ip_defrag
From: liujian (CE) @ 2017-08-29  7:40 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Jesper Dangaard Brouer, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, elena.reshetova@intel.com,
	edumazet@google.com, netdev@vger.kernel.org, Wangkefeng (Kevin),
	weiyongjun (A)
In-Reply-To: <20170828140032.GB12926@breakpoint.cc>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Florian Westphal
> Sent: Monday, August 28, 2017 10:01 PM
> To: liujian (CE)
> Cc: Jesper Dangaard Brouer; davem@davemloft.net; kuznet@ms2.inr.ac.ru;
> yoshfuji@linux-ipv6.org; elena.reshetova@intel.com; edumazet@google.com;
> netdev@vger.kernel.org; Wangkefeng (Kevin); weiyongjun (A)
> Subject: Re: Question about ip_defrag
> 
> liujian (CE) <liujian56@huawei.com> wrote:
> > Hi
> >
> > I checked our 3.10 kernel, we had backported all percpu_counter bug fix in
> lib/percpu_counter.c and include/linux/percpu_counter.h.
> > And I check 4.13-rc6, also has the issue if NIC's rx cpu num big enough.
> >
> > > > > > the issue:
> > > > > > Ip_defrag fail caused by frag_mem_limit reached
> 4M(frags.high_thresh).
> > > > > > At this moment,sum_frag_mem_limit is about 10K.
> >
> > So should we change ipfrag high/low thresh to a reasonable value ?
> > And if it is, is there a standard to change the value?
> 
> Each cpu can have frag_percpu_counter_batch bytes rest doesn't know about
> so with 64 cpus that is ~8 mbyte.
> 
> possible solutions:
> 1. reduce frag_percpu_counter_batch to 16k or so 2. make both low and high
> thresh depend on NR_CPUS
> 
Thank you for your reply.
 
> liujian, does this change help in any way?

I will have a try.

> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -123,6 +123,17 @@ static bool inet_fragq_should_evict(const struct
> inet_frag_queue *q)
>  	       frag_mem_limit(q->net) >= q->net->low_thresh;  }
> 
> +/* ->mem batch size is huge, this can cause severe discrepancies
> + * between actual value (sum of pcpu values) and the global estimate.
> + *
> + * Use a smaller batch to give an opportunity for the global estimate
> + * to more accurately reflect current state.
> + */
> +static void update_frag_mem_limit(struct netns_frags *nf, unsigned int
> +batch) {
> +	 percpu_counter_add_batch(&nf->mem, 0, batch); }
> +
>  static unsigned int
>  inet_evict_bucket(struct inet_frags *f, struct inet_frag_bucket *hb)  { @@
> -146,8 +157,12 @@ inet_evict_bucket(struct inet_frags *f, struct
> inet_frag_bucket *hb)
> 
>  	spin_unlock(&hb->chain_lock);
> 
> -	hlist_for_each_entry_safe(fq, n, &expired, list_evictor)
> +	hlist_for_each_entry_safe(fq, n, &expired, list_evictor) {
> +		struct netns_frags *nf = fq->net;
> +
>  		f->frag_expire((unsigned long) fq);
> +		update_frag_mem_limit(nf, 1);

> +	}
> 
>  	return evicted;
>  }
> @@ -396,8 +411,10 @@ struct inet_frag_queue *inet_frag_find(struct
> netns_frags *nf,
>  	struct inet_frag_queue *q;
>  	int depth = 0;
> 
> -	if (frag_mem_limit(nf) > nf->low_thresh)
> +	if (frag_mem_limit(nf) > nf->low_thresh) {
>  		inet_frag_schedule_worker(f);
> +		update_frag_mem_limit(nf, SKB_TRUESIZE(1500) * 16); 
> +	}
> 
>  	hash &= (INETFRAGS_HASHSZ - 1);
>  	hb = &f->hash[hash];
> @@ -416,6 +433,8 @@ struct inet_frag_queue *inet_frag_find(struct
> netns_frags *nf,
>  	if (depth <= INETFRAGS_MAXDEPTH)
>  		return inet_frag_create(nf, f, key);
> 
> +	update_frag_mem_limit(nf, 1);
> +
>  	if (inet_frag_may_rebuild(f)) {
>  		if (!f->rebuild)
>  			f->rebuild = true;

^ permalink raw reply

* RE: [patch net-next 1/3] idr: Add new APIs to support unsigned long
From: Chris Mi @ 2017-08-29  7:34 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	mawilcox@microsoft.com
In-Reply-To: <87y3q27sn7.fsf@stressinduktion.org>

Hi,

> -----Original Message-----
> From: Hannes Frederic Sowa [mailto:hannes@stressinduktion.org]
> Sent: Tuesday, August 29, 2017 3:14 PM
> To: Chris Mi <chrism@mellanox.com>
> Cc: netdev@vger.kernel.org; jhs@mojatatu.com;
> xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
> mawilcox@microsoft.com
> Subject: Re: [patch net-next 1/3] idr: Add new APIs to support unsigned long
> 
> Hello,
> 
> Chris Mi <chrism@mellanox.com> writes:
> 
> > The following new APIs are added:
> >
> > int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
> >                   unsigned long start, unsigned long end, gfp_t gfp);
> > static inline void *idr_remove_ext(struct idr *idr, unsigned long id);
> > static inline void *idr_find_ext(const struct idr *idr, unsigned long
> > id); void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long
> > id); void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
> >
> > Signed-off-by: Chris Mi <chrism@mellanox.com>
> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> > ---
> >  include/linux/idr.h        | 16 ++++++++++
> >  include/linux/radix-tree.h |  3 ++
> >  lib/idr.c                  | 56 +++++++++++++++++++++++++++++++++++
> >  lib/radix-tree.c           | 73
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 148 insertions(+)
> >
> 
> [...]
> 
> > +int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
> > +		  unsigned long start, unsigned long end, gfp_t gfp) {
> > +	void __rcu **slot;
> > +	struct radix_tree_iter iter;
> > +
> > +	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
> > +		return -EINVAL;
> > +
> > +	radix_tree_iter_init(&iter, start);
> > +	slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
> > +	if (IS_ERR(slot))
> > +		return PTR_ERR(slot);
> > +
> > +	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
> > +	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
> > +
> > +	if (index)
> > +		*index = iter.index;
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(idr_alloc_ext);
> 
> Can you express idr_alloc in terms of idr_alloc_ext? Same for most of the
> other functions (it seems that signed int was used as return value to indicate
> error cases, thus it should be easy to map those).
In idr_alloc(), we have the following check:

        if (WARN_ON_ONCE(start < 0))
                return -EINVAL;

But in idr_alloc_ext(), since we are using unsigned long, we needn't such check.

In order to reuse several lines of code, I think it is not worth to express idr_alloc()
In terms of idr_alloc_ext. 

Thanks,
Chris
> 
> [...]
> 
> Thanks,
> Hannes

^ permalink raw reply

* Re: Question about ip_defrag
From: Jesper Dangaard Brouer @ 2017-08-29  7:20 UTC (permalink / raw)
  To: Florian Westphal
  Cc: liujian (CE), davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, elena.reshetova@intel.com,
	edumazet@google.com, netdev@vger.kernel.org, Wangkefeng (Kevin),
	weiyongjun (A), brouer
In-Reply-To: <20170828140032.GB12926@breakpoint.cc>

On Mon, 28 Aug 2017 16:00:32 +0200
Florian Westphal <fw@strlen.de> wrote:

> liujian (CE) <liujian56@huawei.com> wrote:
> > Hi
> > 
> > I checked our 3.10 kernel, we had backported all percpu_counter bug fix in lib/percpu_counter.c and include/linux/percpu_counter.h.
> > And I check 4.13-rc6, also has the issue if NIC's rx cpu num big enough.
> >   
> > > > > > the issue:
> > > > > > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > > > > > At this moment,sum_frag_mem_limit is about 10K.  
> > 
> > So should we change ipfrag high/low thresh to a reasonable value ? 
> > And if it is, is there a standard to change the value?  
> 
> Each cpu can have frag_percpu_counter_batch bytes rest doesn't know
> about so with 64 cpus that is ~8 mbyte.
> 
> possible solutions:
> 1. reduce frag_percpu_counter_batch to 16k or so
> 2. make both low and high thresh depend on NR_CPUS

To me it looks like we/I have been using the wrong API for comparing
against percpu_counters.  I guess we should have used __percpu_counter_compare().

/*
 * Compare counter against given value.
 * Return 1 if greater, 0 if equal and -1 if less
 */
int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
{
	s64	count;

	count = percpu_counter_read(fbc);
	/* Check to see if rough count will be sufficient for comparison */
	if (abs(count - rhs) > (batch * num_online_cpus())) {
		if (count > rhs)
			return 1;
		else
			return -1;
	}
	/* Need to use precise count */
	count = percpu_counter_sum(fbc);
	if (count > rhs)
		return 1;
	else if (count < rhs)
		return -1;
	else
		return 0;
}
EXPORT_SYMBOL(__percpu_counter_compare);


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] e1000e: changed some expensive calls of udelay to usleep_range
From: Neftin, Sasha @ 2017-08-29  7:19 UTC (permalink / raw)
  To: Matthew Tan, jeffrey.t.kirsher
  Cc: michael.kardonik, mitch.a.williams, linux-kernel, john.ronciak,
	intel-wired-lan, netdev
In-Reply-To: <1503503985-3869-1-git-send-email-matthew.tan_1@nxp.com>

On 8/23/2017 18:59, Matthew Tan wrote:
>      Calls to udelay are not preemtable by userspace so userspace
>      applications experience a large (~200us) latency when running on core
>      0. Instead usleep_range can be used to be more friendly to userspace
>      since it is preemtable. This is due to udelay using busy-wait loops
>      while usleep_rang uses hrtimers instead. It is recommended to use
>      udelay when the delay is <10us since at that precision overhead of
>      usleep_range hrtimer setup causes issues. However, the replaced calls
>      are for 50us and 100us so this should not be not an issue.
>
> Signed-off-by: Matthew Tan <matthew.tan_1@nxp.com>
> ---
>   drivers/net/ethernet/intel/e1000e/phy.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> index de13aea..e318fdc 100644
> --- a/drivers/net/ethernet/intel/e1000e/phy.c
> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> @@ -158,7 +158,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data)
>   	 * the lower time out
>   	 */
>   	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
> -		udelay(50);
> +		usleep_range(40, 60);
>   		mdic = er32(MDIC);
>   		if (mdic & E1000_MDIC_READY)
>   			break;
> @@ -183,7 +183,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data)
>   	 * reading duplicate data in the next MDIC transaction.
>   	 */
>   	if (hw->mac.type == e1000_pch2lan)
> -		udelay(100);
> +		usleep_range(90, 100);
>   
>   	return 0;
>   }
> @@ -222,7 +222,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)
>   	 * the lower time out
>   	 */
>   	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
> -		udelay(50);
> +		usleep_range(40, 60);
>   		mdic = er32(MDIC);
>   		if (mdic & E1000_MDIC_READY)
>   			break;
> @@ -246,7 +246,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)
>   	 * reading duplicate data in the next MDIC transaction.
>   	 */
>   	if (hw->mac.type == e1000_pch2lan)
> -		udelay(100);
> +		usleep_range(90, 110);
>   
>   	return 0;
>   }

Reasonable. Do you have any open bug or other reference describe this 
problem?

^ permalink raw reply

* Re: [patch net-next 1/3] idr: Add new APIs to support unsigned long
From: Hannes Frederic Sowa @ 2017-08-29  7:14 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, mawilcox
In-Reply-To: <1503902477-39829-2-git-send-email-chrism@mellanox.com>

Hello,

Chris Mi <chrism@mellanox.com> writes:

> The following new APIs are added:
>
> int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
>                   unsigned long start, unsigned long end, gfp_t gfp);
> static inline void *idr_remove_ext(struct idr *idr, unsigned long id);
> static inline void *idr_find_ext(const struct idr *idr, unsigned long id);
> void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
> void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
>
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/linux/idr.h        | 16 ++++++++++
>  include/linux/radix-tree.h |  3 ++
>  lib/idr.c                  | 56 +++++++++++++++++++++++++++++++++++
>  lib/radix-tree.c           | 73 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 148 insertions(+)
>

[...]

> +int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
> +		  unsigned long start, unsigned long end, gfp_t gfp)
> +{
> +	void __rcu **slot;
> +	struct radix_tree_iter iter;
> +
> +	if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
> +		return -EINVAL;
> +
> +	radix_tree_iter_init(&iter, start);
> +	slot = idr_get_free_ext(&idr->idr_rt, &iter, gfp, end);
> +	if (IS_ERR(slot))
> +		return PTR_ERR(slot);
> +
> +	radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr);
> +	radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE);
> +
> +	if (index)
> +		*index = iter.index;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(idr_alloc_ext);

Can you express idr_alloc in terms of idr_alloc_ext? Same for most of
the other functions (it seems that signed int was used as return value
to indicate error cases, thus it should be easy to map those).

[...]

Thanks,
Hannes

^ permalink raw reply

* [PATCH net-next] staging: irda: update MAINTAINERS
From: Greg Kroah-Hartman @ 2017-08-29  7:09 UTC (permalink / raw)
  To: davem, samuel; +Cc: devel, netdev, Joe Perches, linux-kernel

Now that the IRDA code has moved under drivers/staging/irda/, update the
MAINTAINERS file with the new location.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
---
 MAINTAINERS | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6fdfe2685eed..ff19b1c3141c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7101,9 +7101,7 @@ W:	http://irda.sourceforge.net/
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/sameo/irda-2.6.git
 F:	Documentation/networking/irda.txt
-F:	drivers/net/irda/
-F:	include/net/irda/
-F:	net/irda/
+F:	drivers/staging/irda/
 
 IRQ DOMAINS (IRQ NUMBER MAPPING LIBRARY)
 M:	Marc Zyngier <marc.zyngier@arm.com>
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next] Revert "ipv4: make net_protocol const"
From: Bhumika Goyal @ 2017-08-29  6:46 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, David Miller
In-Reply-To: <1503951789-31836-1-git-send-email-dsahern@gmail.com>

On Tue, Aug 29, 2017 at 1:53 AM, David Ahern <dsahern@gmail.com> wrote:
> This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.
>
> Early demux structs can not be made const. Doing so results in:
> [   84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10
> [   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
> [   84.970544] PGD 1a0a067
> [   84.970546] P4D 1a0a067
> [   84.971212] PUD 1a0b063
> [   84.971733] PMD 80000000016001e1
>
> [   84.972669] Oops: 0003 [#1] SMP
> [   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
> [   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
> [   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [   84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000
> [   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
> [   84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246
> [   84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000
> [   84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
> [   84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000
> [   84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001
> [   84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002
> [   84.982249] FS:  00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [   84.983231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0
> [   84.984817] Call Trace:
> [   84.985133]  proc_tcp_early_demux+0x29/0x30
>
> I think this is the second time such a patch has been reverted.
>
> Cc: Bhumika Goyal <bhumirks@gmail.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
> Bhumika: How are you testing these constify changes? In this case a simple
> sysctl -w net.ipv4.tcp_early_demux=1 would have shown the problem
>

I am compile testing them. In this case I did:  make
net/ipv4/af_inet.o and it compiled. Is this error because of
typecasting net_protocol inside inet_add_protocol function?

Thanks,
Bhumika


>  net/ipv4/af_inet.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 19aee073ba29..d678820e4306 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1596,7 +1596,7 @@ static const struct net_protocol igmp_protocol = {
>  };
>  #endif
>
> -static const struct net_protocol tcp_protocol = {
> +static struct net_protocol tcp_protocol = {
>         .early_demux    =       tcp_v4_early_demux,
>         .early_demux_handler =  tcp_v4_early_demux,
>         .handler        =       tcp_v4_rcv,
> @@ -1606,7 +1606,7 @@ static const struct net_protocol tcp_protocol = {
>         .icmp_strict_tag_validation = 1,
>  };
>
> -static const struct net_protocol udp_protocol = {
> +static struct net_protocol udp_protocol = {
>         .early_demux =  udp_v4_early_demux,
>         .early_demux_handler =  udp_v4_early_demux,
>         .handler =      udp_rcv,
> --
> 2.1.4
>

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface
From: Jiri Pirko @ 2017-08-29  6:29 UTC (permalink / raw)
  To: David Miller
  Cc: vivien.didelot, netdev, linux-kernel, kernel, f.fainelli, andrew,
	privat, john, Woojung.Huh, sean.wang, nikita.yoush, cphealy
In-Reply-To: <20170828.213837.1354872205076475221.davem@davemloft.net>

Tue, Aug 29, 2017 at 06:38:37AM CEST, davem@davemloft.net wrote:
>From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>Date: Mon, 28 Aug 2017 15:17:38 -0400
>
>> This patch series adds a generic debugfs interface for the DSA
>> framework, so that all switch devices benefit from it, e.g. Marvell,
>> Broadcom, Microchip or any other DSA driver.
>
>I've been thinking this over and I agree with the feedback given that
>debugfs really isn't appropriate for this.
>
>Please create a DSA device class, and hang these values under
>appropriate sysfs device nodes that can be easily found via
>/sys/class/dsa/ just as easily as they would be /sys/kernel/debug/dsa/
>
>You really intend these values to be consistent across DSA devices,
>and you don't intend to go willy-nilly changig these exported values
>arbitrarily over time.  That's what debugfs is for, throw-away
>stuff.
>
>So please make these proper device sysfs attributes rather than
>debugfs.

As I wrote, I believe that there is a big overlap with devlink and its
dpipe subset. I think that primary we should focus on extending whatever
is needed for dsa there. The iface should be generic for all drivers,
not only dsa. dsa-specific sysfs attributes should be last-resort solution,
I believe we can avoid them.

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface
From: Jiri Pirko @ 2017-08-29  6:25 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, netdev, linux-kernel, kernel, David S. Miller,
	Florian Fainelli, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, mlxsw
In-Reply-To: <20170828200834.GA1870@lunn.ch>

Mon, Aug 28, 2017 at 10:08:34PM CEST, andrew@lunn.ch wrote:
>> I see this overlaps a lot with DPIPE. Why won't you use that to expose
>> your hw state?
>
>We took a look at dpipe and i talked to you about using it for this
>sort of thing at netconf/netdev. But dpipe has issues displaying the
>sort of information we have. I never figured out how to do two
>dimensional tables. The output of the dpipe command is pretty
>unreadable. A lot of the information being dumped here is not about
>the data pipe, etc.

So improve it. No problem. Also, we extend it to support what you neede.


>
>There is a lot of pushback on debugfs for individual drivers. As i
>said recently to somebody, debugfs is a bit of a wild west. When
>designing this code, we thought about that. This debugfs is not at the
>driver level. It is at the DSA level. All DSA drivers will benefit
>from this code, and all DSA drivers will get the same information
>exposed in debugfs. It is generic, well defined and structured, with
>respect to DSA.

Still, it has *a lot* of overlap with devlink and dpipe. So instead of
making devlink and dpipe work for you, you introduced completely
separated debugfs interface specific to a list of drivers. That is just
wrong. Debugfs is never the correct answer! Please work with us on
devlink and dpipe so they are used for all drivers, mlxsw, dsa and others.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next] bnxt_en: add a dummy definition for bnxt_vf_rep_get_fid()
From: Michael Chan @ 2017-08-29  6:22 UTC (permalink / raw)
  To: Sathya Perla, David Miller; +Cc: Netdev
In-Reply-To: <1503987303-12392-1-git-send-email-sathya.perla@broadcom.com>

On Mon, Aug 28, 2017 at 11:15 PM, Sathya Perla
<sathya.perla@broadcom.com> wrote:
> When bnxt VF-reps are not compiled in (CONFIG_BNXT_SRIOV is off)
> bnxt_tc.c needs a dummy definition of the routine bnxt_vf_rep_get_fid().
>
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support")
> Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>

Signed-off-by: Michael Chan <michael.chan@broadcom.com>

^ permalink raw reply

* [PATCH net-next] bnxt_en: add a dummy definition for bnxt_vf_rep_get_fid()
From: Sathya Perla @ 2017-08-29  6:15 UTC (permalink / raw)
  To: netdev

When bnxt VF-reps are not compiled in (CONFIG_BNXT_SRIOV is off)
bnxt_tc.c needs a dummy definition of the routine bnxt_vf_rep_get_fid().

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h
index d8b5f89..7787cd24 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.h
@@ -80,5 +80,10 @@ static inline struct net_device *bnxt_get_vf_rep(struct bnxt *bp, u16 cfa_code)
 {
 	return NULL;
 }
+
+static inline u16 bnxt_vf_rep_get_fid(struct net_device *dev)
+{
+	return 0;
+}
 #endif /* CONFIG_BNXT_SRIOV */
 #endif /* BNXT_VFR_H */
-- 
2.7.4

^ permalink raw reply related

* Re: mlxsw and rtnl lock
From: Arkadi Sharshevsky @ 2017-08-29  6:10 UTC (permalink / raw)
  To: David Ahern, Ido Schimmel; +Cc: Jiri Pirko, netdev@vger.kernel.org, mlxsw
In-Reply-To: <b6dc0a4e-4ed9-ed5f-ac0f-c3fe06d5ed68@gmail.com>



On 08/28/2017 09:00 PM, David Ahern wrote:
> On 8/26/17 11:04 AM, Ido Schimmel wrote:
>> Regarding the silent abort, that's intentional. You can look at the same
>> code in v4.9 - when the chain was still blocking - and you'll see that
>> we didn't propagate the error even then. This was discussed in the past
>> and the conclusion was that user doesn't expect to operation to fail. If
>> hardware resources are exceeded, we let the kernel take care of the
>> forwarding instead.
>>
> 
> In addition to Roopa's comments... The silent abort is not a good user
> experience. Right now it's add a network address or route, cross fingers
> and hope it does not overflow some limit (nexthop, ecmp, neighbor,
> prefix, etc) that triggers the offload abort.
> 
> The mlxsw driver queries for some limits (e.g., max rifs) but I don't
> see any query related to current usage, and there is no API to pass any
> of that data to user space so user space has no programmatic way to
> handle this. I realize you are aware of this limitation. The point is to
> emphasize the need to resolve this.
> 

We actually thought about providing he user some tools to understand
the ASIC's limitations by introducing the 'resource' object to devlink.

By linking dpipe tables to resources the user can understand which
hardware processes share a common resource, furthermore this resources
usage could be observed. By this more visibility can be obtained.

Its not a remedy for the silent abort, but, maybe a notification
can be sent from devlink in case of abort that some resources is
full.

This proposition was sent as RFC several weeks ago.

^ permalink raw reply

* (unknown), 
From: morice.diane @ 2017-08-29  5:40 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: MAIL_81389397283742_netdev.zip --]
[-- Type: application/zip, Size: 72397 bytes --]

^ permalink raw reply

* Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
From: Greg KH @ 2017-08-29  5:06 UTC (permalink / raw)
  To: Joe Perches; +Cc: devel, netdev, samuel, David Miller, linux-kernel
In-Reply-To: <1503963967.2040.14.camel@perches.com>

On Mon, Aug 28, 2017 at 04:46:07PM -0700, Joe Perches wrote:
> On Mon, 2017-08-28 at 16:42 -0700, David Miller wrote:
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Date: Sun, 27 Aug 2017 17:03:30 +0200
> > 
> > > The IRDA code has long been obsolete and broken.  So, to keep people
> > > from trying to use it, and to prevent people from having to maintain it,
> > > let's move it to drivers/staging/ so that we can delete it entirely from
> > > the kernel in a few releases.
> > 
> > No objection, I'll apply this to net-next, thanks Greg.
> 
> Still needs an update to MAINTAINERS.

Oops, forgot those directories, will send a follow-on patch for that.

greg k-h

^ permalink raw reply

* Re: [PATCH net 0/4] xfrm_user info leaks
From: Steffen Klassert @ 2017-08-29  4:43 UTC (permalink / raw)
  To: David Miller; +Cc: minipli, herbert, netdev
In-Reply-To: <20170828.155232.1540318133719787999.davem@davemloft.net>

On Mon, Aug 28, 2017 at 03:52:32PM -0700, David Miller wrote:
> From: Mathias Krause <minipli@googlemail.com>
> Date: Sat, 26 Aug 2017 17:08:56 +0200
> 
> > Hi David, Steffen,
> > 
> > the following series fixes a few info leaks due to missing padding byte
> > initialization in the xfrm_user netlink interface.
> > 
> > Please apply!
> 
> Steffen please pick this up if you haven't already.

I had it already in the ipsec/testing branch, now merged into
ipsec/master.

Thanks everyone!

^ permalink raw reply

* Re: [PATCH net-next] hinic: don't build the module by default
From: David Miller @ 2017-08-29  4:40 UTC (permalink / raw)
  To: vkuznets; +Cc: netdev, aviad.krawczyk, linux-kernel
In-Reply-To: <20170828131605.3173-1-vkuznets@redhat.com>

From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Mon, 28 Aug 2017 15:16:05 +0200

> We probably don't want to enable code supporting particular hardware by
> default e.g. when someone does 'make defconfig'. Other ethernet modules
> don't do it.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox