Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: PURCHASE ORDER- #0023/2019, SMA35578__INTERNATIONAL Inbox  x
From: Fong, Erica @ 2019-02-13 15:25 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: SOA.zip --]
[-- Type: application/zip, Size: 240594 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v4] ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs
From: Nicolas Dichtel @ 2019-02-13 15:31 UTC (permalink / raw)
  To: Callum Sinclair, davem, kuznet, yoshfuji, nikolay, netdev,
	linux-kernel
In-Reply-To: <20190212031255.16121-2-callum.sinclair@alliedtelesis.co.nz>

Le 12/02/2019 à 04:12, Callum Sinclair a écrit :
[snip]
>  	/* Wipe the cache */
> -	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
> -		if (!all && (c->mfc_flags & MFC_STATIC))
> -			continue;
> -		rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
> -		list_del_rcu(&c->list);
> -		cache = (struct mfc_cache *)c;
> -		call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
> -					      mrt->id);
> -		mroute_netlink_event(mrt, cache, RTM_DELROUTE);
> -		mr_cache_put(c);
> -	}
> -
> -	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
> -		spin_lock_bh(&mfc_unres_lock);
> -		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
> -			list_del(&c->list);
> +	if (flags & (MRT_FLUSH_MFC | MRT_FLUSH_MFC_STATIC)) {
> +		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
> +			if (((c->mfc_flags & MFC_STATIC) && !(flags & MRT_FLUSH_MFC_STATIC)) ||
> +			    (!(c->mfc_flags & MFC_STATIC) && !(flags & MRT_FLUSH_MFC)))
> +				continue;
> +			rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
> +			list_del_rcu(&c->list);
>  			cache = (struct mfc_cache *)c;
> +			call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
> +						      mrt->id);
>  			mroute_netlink_event(mrt, cache, RTM_DELROUTE);
> -			ipmr_destroy_unres(mrt, cache);
> +			mr_cache_put(c);
> +		}
> +
> +		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
I wonder if the mfc_unres_queue must be cleaned up when only
MRT_FLUSH_MFC_STATIC is provided. My first intuition would be to do it only with
MRT_FLUSH_MFC.
Any opinion?


Regards,
Nicolas

^ permalink raw reply

* Re: [RFC 03/19] net/ice: Add support for ice peer devices and drivers
From: Jeff Kirsher @ 2019-02-13 15:40 UTC (permalink / raw)
  To: Jason Gunthorpe, Shiraz Saleem
  Cc: dledford, davem, linux-rdma, netdev, mustafa.ismail,
	Anirudh Venkataramanan
In-Reply-To: <20190213034104.GA8751@ziepe.ca>

[-- Attachment #1: Type: text/plain, Size: 1803 bytes --]

On Tue, 2019-02-12 at 20:41 -0700, Jason Gunthorpe wrote:
> On Tue, Feb 12, 2019 at 03:43:46PM -0600, Shiraz Saleem wrote:
> > From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
> > 
> > The E800 series of Ethernet devices has multiple hardware blocks,
> > of
> > which RDMA is one. The RDMA block isn't interfaced directly to PCI
> > or any other bus. The RDMA driver (irdma) thus depends on the ice
> > driver to provide access to the RDMA hardware block.
> > 
> > The ice driver first creates a pseudo bus and then creates and
> > attaches
> > a new device to the pseudo bus using device_register(). This new
> > device
> > is referred to as a "peer device" and the associated driver (i.e.
> > irdma)
> > is a "peer driver" to ice. Once the peer driver loads, it can call
> > ice driver functions exposed to it via struct ice_ops. Similarly,
> > ice can
> > call peer driver functions exposed to it via struct ice_peer_ops.
> 
> This seems quite big for this straightforward description..
>  
> I was going to say I like the idea of using the driver model to
> connect the drivers, but if it takes so much code ...

Part of the reason why the ice driver patches are much larger than the
i40e patch is because currently there is zero RDMA support for our ice
driver.  The ice developers wanted to wait for the new RDMA interface
implementation before adding the RDMA support to the ice driver.

> 
> > +     /* check for reset in progress before proceeding */
> > +     pf = pci_get_drvdata(peer_dev->pdev);
> > +     for (i = 0; i < ICE_MAX_RESET_WAIT; i++) {
> > +             if (!ice_is_reset_in_progress(pf->state))
> > +                     break;
> > +             msleep(100);
> > +     }
> 
> Use proper locking, not loops with sleeps.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* RE: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: David Laight @ 2019-02-13 16:17 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner', David Miller
  Cc: julien@arista.com, netdev@vger.kernel.org,
	linux-sctp@vger.kernel.org, linux-kernel@vger.kernel.org,
	nhorman@tuxdriver.com, vyasevich@gmail.com, lucien.xin@gmail.com
In-Reply-To: <20190210201559.GE10665@localhost.localdomain>

From: Marcelo Ricardo Leitner
> Sent: 10 February 2019 20:16
...
> We have issues on read path too. 52ccb8e90c0a ("[SCTP]: Update
> SCTP_PEER_ADDR_PARAMS socket option to the latest api draft.")
> extended struct sctp_paddrparams and its getsockopt goes with:

The API shouldn't change like this at all.
Is this from the RFC or elsewhere??

If the structure changes the socket option name and value
should also change.

IMHO large chunks of the sctp rfc are just horrid.
In particular all the places where is states that API functions are
implemented using setsockopt() - that should be an implementation detail.
Also ISTR that some of the structures are defined to have holes in them...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [RFC PATCH net-next 2/5] net: 8021q: vlan_dev: add vid tag for uc and mc address lists
From: Ivan Khoronzhuk @ 2019-02-13 16:17 UTC (permalink / raw)
  To: Florian Fainelli, davem, linux-omap, netdev, linux-kernel, jiri,
	andrew
In-Reply-To: <20190122131239.GC3125@khorivan>

On Tue, Jan 22, 2019 at 03:12:41PM +0200, Ivan Khoronzhuk wrote:
>On Mon, Jan 21, 2019 at 03:37:41PM -0800, Florian Fainelli wrote:
>>On 12/4/18 3:42 PM, Ivan Khoronzhuk wrote:
>>>On Tue, Dec 04, 2018 at 11:49:27AM -0800, Florian Fainelli wrote:
>
>[...]
>
>>
>>Ivan, based on the recent submission I copied you on [1], it sounds like
>>we want to move ahead with your proposal to extend netdev_hw_addr with a
>>vid member.
>>
>>On second thought, your approach is good and if we enclose the vid
>>member within an #if IS_ENABLED(CONFIG_VLAN)8021Q) we should be good for
>>most foreseeable use cases, if not, we can always introduce a variable
>>size/defined context in the future.
>>
>>Can you resubmit this patch series as non-RFC in the next few days so I
>>can also repost mine [1] and take advantage of these changes for
>>multicast over VLAN when VLAN filtering is globally enabled on the device.
>>
>>[1]: https://www.spinics.net/lists/netdev/msg544722.html
>>
>>Thanks!
>
>Yes, sure. I can start to do that in several days.
>Just a little busy right now.
>
>Just before doing this, maybe some comments could be added as it has more
>attention now. Meanwhile I can send alternative variant but based on
>real dev splitting addresses between vlans. In this approach it leaves address
>space w/o vid extension but requires more changes to vlan core. Drawback here
>that to change one address alg traverses all related vlan addresses, it can be
>cpu/time wasteful, if it's done regularly, but saves memory....
>
>Basically it's implemented locally in cpsw and requires more changes to move
>it as some vlan core auxiliary functions to be reused. But it can work only
>with vlans directly on top of real dev, which is fixable.
>
>Core function here:
>__hw_addr_ref_sync_dev
>it is called only for address the link of which was increased/decreased, thus
>update made only on one address, comparing it for every vlan dev.
>
>It was added with this patch:
>[1] net: core: dev_addr_lists: add auxiliary func to handle reference 
>address update e7946760de5852f32
>
>And used by this patch:
>[2] net: ethernet: ti: cpsw: fix vlan mcast 15180eca569bfe1d4d
>
>So, idea is to move [2] to be vlan core auxiliary function to be reused
>by NIC drivers.
>
>But potentially it can bring a little more changes I assume:
>
>1) add priv_flag |= IFF_IV_FLT (independent vlan filtering). It allows to reuse
>this flag for farther changes, probably for per vlan allmulti or so.
>
>2) real dev has to have complete list for vlans, not only their vids, but also
>all vlandevs in device chain above it. So changes in add_vid can be required.
>Vlan core can assign vlan dev pointer to real device only after it's completely
>initialized. And for propagation reasons it requires every device in
>infrastructure to be aware. That seems doable, but depends not only on me.
>
>3) Move code from [2] to be auxiliary vlan core API for setting mc and uc.
>From this patch only one function is cpsw specific: cpsw_set_mc(). The rest can
>be applicable on every NIC supporting IFF_IV_FLT.
>
>4) Move code from link below to do the same but for uc addresses:
>https://git.linaro.org/people/ivan.khoronzhuk/tsn_kernel.git/commit/?h=ucast_vlan_fix&id=ebc88a7d8758759322d9ff88f25f8bac51ce7219
>here only one func cpsw specific: cpsw_set_uc()
>the rest can be generic.
>
>As third alternative, we can think about how to reduce memory for addresses by
>reusing them or else, but this is as continuation of addr+vid approach, and API
>probably would be the same.
>
>Then all this can be compared for proper decision.


Hi Florian,

After several more investigations and tries probably better left this
idea as is.

Here actually several explanations for this:
1) If even assume that we can get access to vlan devices in the above ndev
tree (we can) that doesn't guarantee that receive vlan filters are set
replicating this structure. For example bond device can have one active slave
but both of them in the tree having vid set, in this case addresses are
syched only with active slave, no filters should be applied to not active slave.
this can be achieved only each address has vid context.

2) According to 1) rx filters device structure can be created while mc_sync()
in each rx_mode(), and then used as orthogonal info. I've tried and it looks
not cool and consumes anyway memory and even if it's less it's still not very
scalable. (+ no normal signal "in complex structure case" when address should
be undated to avoid redundant cpu cycles). Not sure it can have practical
results and be universal enouph.

3) Assuming that every device in the tree (bond, team or else) is legal to
modify its own address space, the real end device cannot be sure the vlan device
address spaces reflects vid addresses that device tree want's from him.
According to this each address in address space must hold its own context at
every device and this context is comparable with address size.

>-- Regards,
>Ivan Khoronzhuk

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Stephen Hemminger @ 2019-02-13 16:51 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Eric Dumazet, netdev, Sabrina Dubroca
In-Reply-To: <20190213093711.13ab560e@redhat.com>

On Wed, 13 Feb 2019 09:37:11 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Tue, 12 Feb 2019 16:42:04 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > I do not get it.
> > 
> > "ss -emoi " uses almost 1KB per socket.
> > 
> > 10,000,000 sockets -> we need about 10GB of memory  ???
> > 
> > This is a serious regression.  
> 
> I guess this is rather subjective: the worst case I considered back then
> was the output of 'ss -tei0' (less than 500 bytes) for one million
> sockets, which gives 500M of memory, which should in turn be fine on a
> machine handling one million sockets.
> 
> Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> curiosity: how are you going to process that output? Would JSON help?),
> I see two easy options to solve this:
> 
> 1. flush the output every time we reach a given buffer size (1M
>    perhaps). This might make the resulting blocks slightly unaligned,
>    with occasional loss of readability on lines occurring every 1k to
>    10k sockets approximately, even though after 1k sockets column sizes
>    won't change much (it looks anyway better than the original), and I
>    don't expect anybody to actually scroll that output
> 
> 2. add a switch for unbuffered output, but then you need to remember to
>    pass it manually, and the whole output would be as bad as the
>    original in case you need the switch.
> 
> I'd rather go with 1., it's easy to implement (we already have partial
> flushing with '--events') and it looks like a good compromise on
> usability. Thoughts?
> 

I agree with eric. The benefits of buffering are not worth it.
Let's just choose a reasonable field width, if something is too big, columns won't line up
which i snot a big deal.

Unless you come up with a better solution, I am going to revert this.

^ permalink raw reply

* [PATCH -next] net: ipvlan_l3s: fix kconfig dependency warning
From: Randy Dunlap @ 2019-02-13 16:55 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Mahesh Bandewar, Daniel Borkmann, David Miller

From: Randy Dunlap <rdunlap@infradead.org>

Fix the kconfig warning in IPVLAN_L3S when neither INET nor IPV6
is enabled:

WARNING: unmet direct dependencies detected for NET_L3_MASTER_DEV
  Depends on [n]: NET [=y] && (INET [=n] || IPV6 [=n])
  Selected by [y]:
  - IPVLAN_L3S [=y] && NETDEVICES [=y] && NET_CORE [=y] && NETFILTER [=y]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
v2: simplify the dependency to IPVLAN

Seen in mmotm but applies to linux-next.

 drivers/net/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- linux-next-20190213.orig/drivers/net/Kconfig
+++ linux-next-20190213/drivers/net/Kconfig
@@ -147,6 +147,7 @@ config MACVTAP
 
 config IPVLAN_L3S
 	depends on NETFILTER
+	depends on IPVLAN
 	def_bool y
 	select NET_L3_MASTER_DEV
 



^ permalink raw reply

* Re: [PATCH net] net: macb: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: Claudiu.Beznea @ 2019-02-13 16:55 UTC (permalink / raw)
  To: albin_yang, netdev; +Cc: Nicolas.Ferre, davem, yang.wei9
In-Reply-To: <1549987202-5393-1-git-send-email-albin_yang@163.com>



On 12.02.2019 18:00, Yang Wei wrote:
> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in at91ether_interrupt() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>

> ---
>  drivers/net/ethernet/cadence/macb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 2b28826..835cc58 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -3763,7 +3763,7 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id)
>  			dev->stats.tx_errors++;
>  
>  		if (lp->skb) {
> -			dev_kfree_skb_irq(lp->skb);
> +			dev_consume_skb_irq(lp->skb);
>  			lp->skb = NULL;
>  			dma_unmap_single(NULL, lp->skb_physaddr,
>  					 lp->skb_length, DMA_TO_DEVICE);
> 

^ permalink raw reply

* Re: [PATCH net-next 1/1] flow_offload: fix block stats
From: Pablo Neira Ayuso @ 2019-02-13 16:59 UTC (permalink / raw)
  To: John Hurley; +Cc: jiri, davem, netdev, oss-drivers
In-Reply-To: <1550017432-26306-1-git-send-email-john.hurley@netronome.com>

On Wed, Feb 13, 2019 at 12:23:52AM +0000, John Hurley wrote:
> With the introduction of flow_stats_update(), drivers now update the stats
> fields of the passed tc_cls_flower_offload struct, rather than call
> tcf_exts_stats_update() directly to update the stats of offloaded TC
> flower rules. However, if multiple qdiscs are registered to a TC shared
> block and a flower rule is applied, then, when getting stats for the rule,
> multiple callbacks may be made.
> 
> Take this into consideration by modifying flow_stats_update to gather the
> stats from all callbacks. Currently, the values in tc_cls_flower_offload
> only account for the last stats callback in the list.
> 
> Fixes: 3b1903ef97c0 ("flow_offload: add statistics retrieval infrastructure and use it")
> Signed-off-by: John Hurley <john.hurley@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply

* [PATCH net] net: stmmac: Fix NAPI poll in TX path when in multi-queue
From: Jose Abreu @ 2019-02-13 17:00 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Jose Abreu, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue

Commit 8fce33317023 introduced the concept of NAPI per-channel and
independent cleaning of TX path.

This is currently breaking performance in some cases. The scenario
happens when all packets are being received in Queue 0 but the TX is
performed in Queue != 0.

I didn't look very deep but it seems that NAPI for Queue 0 will clean
the RX path but as TX is in different NAPI, this last one is called at a
slower rate which kills performance in TX. I suspect this is due to TX
cleaning takes much longer than RX and because NAPI will get canceled
once we return with 0 budget consumed (e.g. when TX is still not done it
will return 0 budget).

Fix this by looking at all TX channels in NAPI poll function.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races")
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  1 -
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 +++++------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 63e1064b27a2..8f6741a626d8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -82,7 +82,6 @@ struct stmmac_channel {
 	struct stmmac_priv *priv_data;
 	u32 index;
 	int has_rx;
-	int has_tx;
 };
 
 struct stmmac_tc_entry {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 685d20472358..5bf5f8ebb4b6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2031,13 +2031,13 @@ static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
 	struct stmmac_channel *ch = &priv->channel[chan];
 	bool needs_work = false;
 
-	if ((status & handle_rx) && ch->has_rx) {
+	if (status & handle_rx) {
 		needs_work = true;
 	} else {
 		status &= ~handle_rx;
 	}
 
-	if ((status & handle_tx) && ch->has_tx) {
+	if (status & handle_tx) {
 		needs_work = true;
 	} else {
 		status &= ~handle_tx;
@@ -3528,11 +3528,12 @@ static int stmmac_napi_poll(struct napi_struct *napi, int budget)
 	struct stmmac_priv *priv = ch->priv_data;
 	int work_done, rx_done = 0, tx_done = 0;
 	u32 chan = ch->index;
+	int i;
 
 	priv->xstats.napi_poll++;
 
-	if (ch->has_tx)
-		tx_done = stmmac_tx_clean(priv, budget, chan);
+	for (i = 0; i < priv->plat->tx_queues_to_use; i++)
+		tx_done += stmmac_tx_clean(priv, budget, i);
 	if (ch->has_rx)
 		rx_done = stmmac_rx(priv, budget, chan);
 
@@ -4325,8 +4326,6 @@ int stmmac_dvr_probe(struct device *device,
 
 		if (queue < priv->plat->rx_queues_to_use)
 			ch->has_rx = true;
-		if (queue < priv->plat->tx_queues_to_use)
-			ch->has_tx = true;
 
 		netif_napi_add(ndev, &ch->napi, stmmac_napi_poll,
 			       NAPI_POLL_WEIGHT);
-- 
2.7.4


^ permalink raw reply related

* [PATCH bpf-next] net: bpf: remove XDP_QUERY_XSK_UMEM enumerator
From: Björn Töpel @ 2019-02-13 17:07 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Jan Sokolowski, magnus.karlsson, magnus.karlsson, intel-wired-lan

From: Jan Sokolowski <jan.sokolowski@intel.com>

Commit c9b47cc1fabc ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") moved the umem query code to the AF_XDP
core, and therefore removed the need to query the netdevice for a
umem.

This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
behavior, which is just dead code.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 --
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 28 -------------------
 drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 --
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 17 -----------
 include/linux/netdevice.h                     |  7 ++---
 7 files changed, 3 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 44856a84738d..5e74a5127849 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12128,9 +12128,6 @@ static int i40e_xdp(struct net_device *dev,
 	case XDP_QUERY_PROG:
 		xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0;
 		return 0;
-	case XDP_QUERY_XSK_UMEM:
-		return i40e_xsk_umem_query(vsi, &xdp->xsk.umem,
-					   xdp->xsk.queue_id);
 	case XDP_SETUP_XSK_UMEM:
 		return i40e_xsk_umem_setup(vsi, xdp->xsk.umem,
 					   xdp->xsk.queue_id);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 96d849460d9b..e190a2c2b9ff 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -154,34 +154,6 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
 	return 0;
 }
 
-/**
- * i40e_xsk_umem_query - Queries a certain ring/qid for its UMEM
- * @vsi: Current VSI
- * @umem: UMEM associated to the ring, if any
- * @qid: Rx ring to associate UMEM to
- *
- * This function will store, if any, the UMEM associated to certain ring.
- *
- * Returns 0 on success, <0 on failure
- **/
-int i40e_xsk_umem_query(struct i40e_vsi *vsi, struct xdp_umem **umem,
-			u16 qid)
-{
-	struct net_device *netdev = vsi->netdev;
-	struct xdp_umem *queried_umem;
-
-	if (vsi->type != I40E_VSI_MAIN)
-		return -EINVAL;
-
-	queried_umem = xdp_get_umem_from_qid(netdev, qid);
-
-	if (!queried_umem)
-		return -EINVAL;
-
-	*umem = queried_umem;
-	return 0;
-}
-
 /**
  * i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid
  * @vsi: Current VSI
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 9038c5d5cf08..8cc0a2e7d9a2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -10,8 +10,6 @@ struct zero_copy_allocator;
 
 int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
-int i40e_xsk_umem_query(struct i40e_vsi *vsi, struct xdp_umem **umem,
-			u16 qid);
 int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			u16 qid);
 void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b53087a980ef..38c430b94ae3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10280,9 +10280,6 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 		xdp->prog_id = adapter->xdp_prog ?
 			adapter->xdp_prog->aux->id : 0;
 		return 0;
-	case XDP_QUERY_XSK_UMEM:
-		return ixgbe_xsk_umem_query(adapter, &xdp->xsk.umem,
-					    xdp->xsk.queue_id);
 	case XDP_SETUP_XSK_UMEM:
 		return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
 					    xdp->xsk.queue_id);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 53d4089f5644..d93a690aff74 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -30,8 +30,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
 
 struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
 				struct ixgbe_ring *ring);
-int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
-			 u16 qid);
 int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
 			 u16 qid);
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 65c3e2c979d4..98870707b51a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -174,23 +174,6 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
 	return 0;
 }
 
-int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
-			 u16 qid)
-{
-	if (qid >= adapter->num_rx_queues)
-		return -EINVAL;
-
-	if (adapter->xsk_umems) {
-		if (qid >= adapter->num_xsk_umems)
-			return -EINVAL;
-		*umem = adapter->xsk_umems[qid];
-		return 0;
-	}
-
-	*umem = NULL;
-	return 0;
-}
-
 int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
 			 u16 qid)
 {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1d95e634f3fe..6aedaf1e9a25 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -868,7 +868,6 @@ enum bpf_netdev_command {
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
-	XDP_QUERY_XSK_UMEM,
 	XDP_SETUP_XSK_UMEM,
 };
 
@@ -895,10 +894,10 @@ struct netdev_bpf {
 		struct {
 			struct bpf_offloaded_map *offmap;
 		};
-		/* XDP_QUERY_XSK_UMEM, XDP_SETUP_XSK_UMEM */
+		/* XDP_SETUP_XSK_UMEM */
 		struct {
-			struct xdp_umem *umem; /* out for query*/
-			u16 queue_id; /* in for query */
+			struct xdp_umem *umem;
+			u16 queue_id;
 		} xsk;
 	};
 };
-- 
2.19.1


^ permalink raw reply related

* [PATCH net] selftests: fix timestamping Makefile
From: Deepa Dinamani @ 2019-02-13 17:09 UTC (permalink / raw)
  To: shuah; +Cc: willemb, netdev, linux-kselftest

The clean target in the makefile conflicts with the generic
kselftests lib.mk, and fails to properly remove the compiled
test programs.

Remove the redundant rule, the TEST_GEN_FILES will be already
removed by the CLEAN macro in lib.mk.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---

* Changes since v1: as per review comments

 tools/testing/selftests/networking/timestamping/Makefile | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/networking/timestamping/Makefile b/tools/testing/selftests/networking/timestamping/Makefile
index 9050eeea5f5f..1de8bd8ccf5d 100644
--- a/tools/testing/selftests/networking/timestamping/Makefile
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -9,6 +9,3 @@ all: $(TEST_PROGS)
 top_srcdir = ../../../../..
 KSFT_KHDR_INSTALL := 1
 include ../../lib.mk
-
-clean:
-	rm -fr $(TEST_GEN_FILES)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next] net: bpf: remove XDP_QUERY_XSK_UMEM enumerator
From: Björn Töpel @ 2019-02-13 17:09 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Netdev
  Cc: Jan Sokolowski, Karlsson, Magnus, Magnus Karlsson,
	intel-wired-lan
In-Reply-To: <20190213170729.13845-1-bjorn.topel@gmail.com>

On Wed, 13 Feb 2019 at 18:07, Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Jan Sokolowski <jan.sokolowski@intel.com>
>
> Commit c9b47cc1fabc ("xsk: fix bug when trying to use both copy and
> zero-copy on one queue id") moved the umem query code to the AF_XDP
> core, and therefore removed the need to query the netdevice for a
> umem.
>
> This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
> behavior, which is just dead code.
>
> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 --
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 28 -------------------
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  2 --
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 --
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  2 --
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 17 -----------
>  include/linux/netdevice.h                     |  7 ++---
>  7 files changed, 3 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 44856a84738d..5e74a5127849 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -12128,9 +12128,6 @@ static int i40e_xdp(struct net_device *dev,
>         case XDP_QUERY_PROG:
>                 xdp->prog_id = vsi->xdp_prog ? vsi->xdp_prog->aux->id : 0;
>                 return 0;
> -       case XDP_QUERY_XSK_UMEM:
> -               return i40e_xsk_umem_query(vsi, &xdp->xsk.umem,
> -                                          xdp->xsk.queue_id);
>         case XDP_SETUP_XSK_UMEM:
>                 return i40e_xsk_umem_setup(vsi, xdp->xsk.umem,
>                                            xdp->xsk.queue_id);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 96d849460d9b..e190a2c2b9ff 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -154,34 +154,6 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
>         return 0;
>  }
>
> -/**
> - * i40e_xsk_umem_query - Queries a certain ring/qid for its UMEM
> - * @vsi: Current VSI
> - * @umem: UMEM associated to the ring, if any
> - * @qid: Rx ring to associate UMEM to
> - *
> - * This function will store, if any, the UMEM associated to certain ring.
> - *
> - * Returns 0 on success, <0 on failure
> - **/
> -int i40e_xsk_umem_query(struct i40e_vsi *vsi, struct xdp_umem **umem,
> -                       u16 qid)
> -{
> -       struct net_device *netdev = vsi->netdev;
> -       struct xdp_umem *queried_umem;
> -
> -       if (vsi->type != I40E_VSI_MAIN)
> -               return -EINVAL;
> -
> -       queried_umem = xdp_get_umem_from_qid(netdev, qid);
> -
> -       if (!queried_umem)
> -               return -EINVAL;
> -
> -       *umem = queried_umem;
> -       return 0;
> -}
> -
>  /**
>   * i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid
>   * @vsi: Current VSI
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 9038c5d5cf08..8cc0a2e7d9a2 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -10,8 +10,6 @@ struct zero_copy_allocator;
>
>  int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
>  int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
> -int i40e_xsk_umem_query(struct i40e_vsi *vsi, struct xdp_umem **umem,
> -                       u16 qid);
>  int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
>                         u16 qid);
>  void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index b53087a980ef..38c430b94ae3 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10280,9 +10280,6 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>                 xdp->prog_id = adapter->xdp_prog ?
>                         adapter->xdp_prog->aux->id : 0;
>                 return 0;
> -       case XDP_QUERY_XSK_UMEM:
> -               return ixgbe_xsk_umem_query(adapter, &xdp->xsk.umem,
> -                                           xdp->xsk.queue_id);
>         case XDP_SETUP_XSK_UMEM:
>                 return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
>                                             xdp->xsk.queue_id);
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> index 53d4089f5644..d93a690aff74 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> @@ -30,8 +30,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
>
>  struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
>                                 struct ixgbe_ring *ring);
> -int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
> -                        u16 qid);
>  int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
>                          u16 qid);
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 65c3e2c979d4..98870707b51a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -174,23 +174,6 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
>         return 0;
>  }
>
> -int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
> -                        u16 qid)
> -{
> -       if (qid >= adapter->num_rx_queues)
> -               return -EINVAL;
> -
> -       if (adapter->xsk_umems) {
> -               if (qid >= adapter->num_xsk_umems)
> -                       return -EINVAL;
> -               *umem = adapter->xsk_umems[qid];
> -               return 0;
> -       }
> -
> -       *umem = NULL;
> -       return 0;
> -}
> -
>  int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
>                          u16 qid)
>  {
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 1d95e634f3fe..6aedaf1e9a25 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -868,7 +868,6 @@ enum bpf_netdev_command {
>         /* BPF program for offload callbacks, invoked at program load time. */
>         BPF_OFFLOAD_MAP_ALLOC,
>         BPF_OFFLOAD_MAP_FREE,
> -       XDP_QUERY_XSK_UMEM,
>         XDP_SETUP_XSK_UMEM,
>  };
>
> @@ -895,10 +894,10 @@ struct netdev_bpf {
>                 struct {
>                         struct bpf_offloaded_map *offmap;
>                 };
> -               /* XDP_QUERY_XSK_UMEM, XDP_SETUP_XSK_UMEM */
> +               /* XDP_SETUP_XSK_UMEM */
>                 struct {
> -                       struct xdp_umem *umem; /* out for query*/
> -                       u16 queue_id; /* in for query */
> +                       struct xdp_umem *umem;
> +                       u16 queue_id;
>                 } xsk;
>         };
>  };
> --
> 2.19.1
>

^ permalink raw reply

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Stefano Brivio @ 2019-02-13 17:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, netdev, Sabrina Dubroca, David Ahern
In-Reply-To: <20190213085101.7474ba5d@shemminger-XPS-13-9360>

On Wed, 13 Feb 2019 08:51:01 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Wed, 13 Feb 2019 09:37:11 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> > On Tue, 12 Feb 2019 16:42:04 -0800
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >   
> > > I do not get it.
> > > 
> > > "ss -emoi " uses almost 1KB per socket.
> > > 
> > > 10,000,000 sockets -> we need about 10GB of memory  ???
> > > 
> > > This is a serious regression.    
> > 
> > I guess this is rather subjective: the worst case I considered back then
> > was the output of 'ss -tei0' (less than 500 bytes) for one million
> > sockets, which gives 500M of memory, which should in turn be fine on a
> > machine handling one million sockets.
> > 
> > Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> > curiosity: how are you going to process that output? Would JSON help?),
> > I see two easy options to solve this:
> > 
> > 1. flush the output every time we reach a given buffer size (1M
> >    perhaps). This might make the resulting blocks slightly unaligned,
> >    with occasional loss of readability on lines occurring every 1k to
> >    10k sockets approximately, even though after 1k sockets column sizes
> >    won't change much (it looks anyway better than the original), and I
> >    don't expect anybody to actually scroll that output
> > 
> > 2. add a switch for unbuffered output, but then you need to remember to
> >    pass it manually, and the whole output would be as bad as the
> >    original in case you need the switch.
> > 
> > I'd rather go with 1., it's easy to implement (we already have partial
> > flushing with '--events') and it looks like a good compromise on
> > usability. Thoughts?
> >   
> I agree with eric. The benefits of buffering are not worth it.
> Let's just choose a reasonable field width, if something is too big, columns won't line up
> which i snot a big deal.

That's how it was before, and we couldn't even get fields aligned with
TCP and UDP sockets in a 80 columns wide terminal. See examples at:
https://patchwork.ozlabs.org/cover/847301/.

I tried, but I think it's impossible to find a "reasonable" field
width, especially when you mix a number of socket types.

> Unless you come up with a better solution, I am going to revert this.

That's why I asked for feedback about my proposals 1. and 2. above.
I'll go for 1. then.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: 'Marcelo Ricardo Leitner' @ 2019-02-13 17:23 UTC (permalink / raw)
  To: David Laight
  Cc: David Miller, julien@arista.com, netdev@vger.kernel.org,
	linux-sctp@vger.kernel.org, linux-kernel@vger.kernel.org,
	nhorman@tuxdriver.com, vyasevich@gmail.com, lucien.xin@gmail.com
In-Reply-To: <71e3d64ae3d44e499f3fb9f876398ee4@AcuMS.aculab.com>

On Wed, Feb 13, 2019 at 04:17:41PM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner
> > Sent: 10 February 2019 20:16
> ...
> > We have issues on read path too. 52ccb8e90c0a ("[SCTP]: Update
> > SCTP_PEER_ADDR_PARAMS socket option to the latest api draft.")
> > extended struct sctp_paddrparams and its getsockopt goes with:
> 
> The API shouldn't change like this at all.
> Is this from the RFC or elsewhere??

I would think so. That commit is from 2005, pretty close to initial
SCTP RFCs.

> 
> If the structure changes the socket option name and value
> should also change.

That's what is at the core of this thread.

  Marcelo

> 
> IMHO large chunks of the sctp rfc are just horrid.
> In particular all the places where is states that API functions are
> implemented using setsockopt() - that should be an implementation detail.
> Also ISTR that some of the structures are defined to have holes in them...
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Eric Dumazet @ 2019-02-13 17:31 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Stephen Hemminger, netdev, Sabrina Dubroca
In-Reply-To: <20190213093711.13ab560e@redhat.com>



On 02/13/2019 12:37 AM, Stefano Brivio wrote:
> On Tue, 12 Feb 2019 16:42:04 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> I do not get it.
>>
>> "ss -emoi " uses almost 1KB per socket.
>>
>> 10,000,000 sockets -> we need about 10GB of memory  ???
>>
>> This is a serious regression.
> 
> I guess this is rather subjective: the worst case I considered back then
> was the output of 'ss -tei0' (less than 500 bytes) for one million
> sockets, which gives 500M of memory, which should in turn be fine on a
> machine handling one million sockets.
> 
> Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> curiosity: how are you going to process that output? Would JSON help?),
> I see two easy options to solve this:


ss -temoi | parser (written in shell or awk or whatever...)

This is a use case, I just got bitten because using ss command
actually OOM my container, while trying to debug a busy GFE.

The host itself can have 10,000,000 TCP sockets, but usually sysadmin shells
run in a container with no more than 500 MB available. 

Otherwise, it would be too easy for a buggy program to OOM the whole machine
and have angry customers.

> 
> 1. flush the output every time we reach a given buffer size (1M
>    perhaps). This might make the resulting blocks slightly unaligned,
>    with occasional loss of readability on lines occurring every 1k to
>    10k sockets approximately, even though after 1k sockets column sizes
>    won't change much (it looks anyway better than the original), and I
>    don't expect anybody to actually scroll that output
> 
> 2. add a switch for unbuffered output, but then you need to remember to
>    pass it manually, and the whole output would be as bad as the
>    original in case you need the switch.
> 
> I'd rather go with 1., it's easy to implement (we already have partial
> flushing with '--events') and it looks like a good compromise on
> usability. Thoughts?
> 

1 seems fine, but a switch for 'please do not try to format' would be fine.

I wonder why we try to 'format' when stdout is a pipe or a regular file .




^ permalink raw reply

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Eric Dumazet @ 2019-02-13 17:32 UTC (permalink / raw)
  To: Stefano Brivio, Stephen Hemminger; +Cc: netdev, Sabrina Dubroca, David Ahern
In-Reply-To: <20190213182012.34a6928f@redhat.com>



On 02/13/2019 09:22 AM, Stefano Brivio wrote:

> That's why I asked for feedback about my proposals 1. and 2. above.
> I'll go for 1. then.
> 

If stdout is a regular file or a pipe, just do not try to align things ?

^ permalink raw reply

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Stefano Brivio @ 2019-02-13 17:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, netdev, Sabrina Dubroca, David Ahern
In-Reply-To: <dfdb5a99-d922-5be8-b110-e5f069600ecd@gmail.com>

On Wed, 13 Feb 2019 09:31:03 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On 02/13/2019 12:37 AM, Stefano Brivio wrote:
> > On Tue, 12 Feb 2019 16:42:04 -0800
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >   
> >> I do not get it.
> >>
> >> "ss -emoi " uses almost 1KB per socket.
> >>
> >> 10,000,000 sockets -> we need about 10GB of memory  ???
> >>
> >> This is a serious regression.  
> > 
> > I guess this is rather subjective: the worst case I considered back then
> > was the output of 'ss -tei0' (less than 500 bytes) for one million
> > sockets, which gives 500M of memory, which should in turn be fine on a
> > machine handling one million sockets.
> > 
> > Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> > curiosity: how are you going to process that output? Would JSON help?),
> > I see two easy options to solve this:  
> 
> 
> ss -temoi | parser (written in shell or awk or whatever...)
> 
> This is a use case, I just got bitten because using ss command
> actually OOM my container, while trying to debug a busy GFE.
> 
> The host itself can have 10,000,000 TCP sockets, but usually sysadmin shells
> run in a container with no more than 500 MB available. 
> 
> Otherwise, it would be too easy for a buggy program to OOM the whole machine
> and have angry customers.

Ouch, I see.

> > 
> > 1. flush the output every time we reach a given buffer size (1M
> >    perhaps). This might make the resulting blocks slightly unaligned,
> >    with occasional loss of readability on lines occurring every 1k to
> >    10k sockets approximately, even though after 1k sockets column sizes
> >    won't change much (it looks anyway better than the original), and I
> >    don't expect anybody to actually scroll that output
> > 
> > 2. add a switch for unbuffered output, but then you need to remember to
> >    pass it manually, and the whole output would be as bad as the
> >    original in case you need the switch.
> > 
> > I'd rather go with 1., it's easy to implement (we already have partial
> > flushing with '--events') and it looks like a good compromise on
> > usability. Thoughts?
> >   
> 
> 1 seems fine, but a switch for 'please do not try to format' would be fine.

Let me try with 1. first then -- we already have a huge number of
switches.

> I wonder why we try to 'format' when stdout is a pipe or a regular file .

As stupid as it might sound, I just didn't think of fixing that :)

What would you suggest in that case, single whitespaces? Tabs?

-- 
Stefano

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] tools/bpf: replace bzero with memset
From: Martin Lau @ 2019-02-13 17:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii.nakryiko@gmail.com, netdev@vger.kernel.org, Kernel Team,
	Yonghong Song, Alexei Starovoitov, daniel@iogearbox.net,
	david.laight@aculab.com, acme@kernel.org
In-Reply-To: <20190213012941.2571769-2-andriin@fb.com>

On Tue, Feb 12, 2019 at 05:29:40PM -0800, Andrii Nakryiko wrote:
> bzero() call is deprecated and superseded by memset().
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] tools: sync uapi/linux/if_link.h header
From: Martin Lau @ 2019-02-13 17:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii.nakryiko@gmail.com, netdev@vger.kernel.org, Kernel Team,
	Yonghong Song, Alexei Starovoitov, daniel@iogearbox.net,
	david.laight@aculab.com, acme@kernel.org
In-Reply-To: <20190213012941.2571769-3-andriin@fb.com>


On Tue, Feb 12, 2019 at 05:29:41PM -0800, Andrii Nakryiko wrote:
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Please add one liner commit message. Thanks.

^ permalink raw reply

* Re: [PATCH] net: phy: at803x: disable delay only for RGMII mode
From: Niklas Cassel @ 2019-02-13 17:40 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: Andrew Lunn, Florian Fainelli, Vinod Koul, David S Miller,
	linux-arm-msm, Bjorn Andersson, netdev, Nori, Sekhar,
	Peter Ujfalusi
In-Reply-To: <1ab5edac-a36c-9dc5-52e5-dbd3b70e7728@free.fr>

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

On Wed, Feb 13, 2019 at 02:40:18PM +0100, Marc Gonzalez wrote:
> On 13/02/2019 14:29, Andrew Lunn wrote:
> 
> >> So we have these modes:
> >>
> >> PHY_INTERFACE_MODE_RGMII: TX and RX delays disabled
> >> PHY_INTERFACE_MODE_RGMII_ID: TX and RX delays enabled
> >> PHY_INTERFACE_MODE_RGMII_RXID: RX delay enabled, TX delay disabled
> >> PHY_INTERFACE_MODE_RGMII_TXID: TX delay enabled, RX delay disabled
> >>
> >> What I don't like with this patch, is that if we specify phy-mode
> >> PHY_INTERFACE_MODE_RGMII_TXID, this patch will enable TX delay,
> >> but RX delay will not be explicitly set.
> > 
> > That is not the behaviour we want. It is best to assume the device is
> > in a random state, and correctly enable/disable all delays as
> > requested. Only leave the hardware alone if PHY_INTERFACE_MODE_NA is
> > used.
> 
> That's what my patch did:
> https://www.spinics.net/lists/netdev/msg445053.html
> 
> But see Florian's remarks:
> https://www.spinics.net/lists/netdev/msg445133.html

Hello Marc,

I saw that comment from Florian. However that was way back in 2017.
Maybe the phy-modes were not as well defined back then?

Andrew recently suggested to fix the driver so that it conforms with the
phy-modes, and fix any SoC that specified an incorrect phy-mode in DT
and thus relied upon the broken behavior of the PHY driver:
https://www.spinics.net/lists/netdev/msg445133.html


So, I've rebased your old patch, see attachment.
I suggest that Peter test it on am335x-evm.

am335x-evm appears to rely on the current broken behavior of the PHY
driver, so we will probably need to fix the am335x-evm according to this:
https://www.spinics.net/lists/netdev/msg445117.html
and merge that as well.


Andrew, Florian, do you both agree?


Kind regards,
Niklas

[-- Attachment #2: 0001-net-phy-at803x-Fix-RGMII-RX-and-TX-clock-delays-setu.patch --]
[-- Type: text/plain, Size: 2910 bytes --]

From 7f19f32d3074c9990e46349b76ae13dc9c1133d6 Mon Sep 17 00:00:00 2001
From: Marc Gonzalez <marc.w.gonzalez@free.fr>
Date: Wed, 13 Feb 2019 18:29:02 +0100
Subject: [PATCH] net: phy: at803x: Fix RGMII RX and TX clock delays setup

The current code supports enabling RGMII RX and TX clock delays.
The unstated assumption is that these settings are disabled by
default at reset, which is not the case.

RX clock delay is enabled at reset. And TX clock delay "survives"
across SW resets. Thus, if the bootloader enables TX clock delay,
it will remain enabled at reset in Linux.

Provide disable functions to configure the RGMII clock delays
exactly as specified in the fwspec.

Fixes: cd28d1d6e52e: ("net: phy: at803x: Disable phy delay for RGMII mode")
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
---
 drivers/net/phy/at803x.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 90dc62c15fc5..700c1e4d34ad 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -103,12 +103,24 @@ static int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
 	return phy_write(phydev, AT803X_DEBUG_DATA, val);
 }
 
+static inline int at803x_enable_rx_delay(struct phy_device *phydev)
+{
+	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_0, 0,
+				     AT803X_DEBUG_RX_CLK_DLY_EN);
+}
+
 static inline int at803x_disable_rx_delay(struct phy_device *phydev)
 {
 	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_0,
 				     AT803X_DEBUG_RX_CLK_DLY_EN, 0);
 }
 
+static inline int at803x_enable_tx_delay(struct phy_device *phydev)
+{
+	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_5, 0,
+				     AT803X_DEBUG_TX_CLK_DLY_EN);
+}
+
 static inline int at803x_disable_tx_delay(struct phy_device *phydev)
 {
 	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_5,
@@ -242,20 +254,22 @@ static int at803x_config_init(struct phy_device *phydev)
 		return ret;
 
 	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
-			phydev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
-			phydev->interface == PHY_INTERFACE_MODE_RGMII) {
+	    phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
+		ret = at803x_enable_rx_delay(phydev);
+	else
 		ret = at803x_disable_rx_delay(phydev);
-		if (ret < 0)
-			return ret;
-	}
+
+	if (ret < 0)
+		return ret;
 
 	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID ||
-			phydev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
-			phydev->interface == PHY_INTERFACE_MODE_RGMII) {
+	    phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
+		ret = at803x_enable_tx_delay(phydev);
+	else
 		ret = at803x_disable_tx_delay(phydev);
-		if (ret < 0)
-			return ret;
-	}
+
+	if (ret < 0)
+		return ret;
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
From: Phil Sutter @ 2019-02-13 17:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Ahern, Stephen Hemminger, netdev, Eric Dumazet, Hangbin Liu
In-Reply-To: <20190213015841.140383-1-edumazet@google.com>

Hi Eric,

On Tue, Feb 12, 2019 at 05:58:41PM -0800, Eric Dumazet wrote:
> In the past, we tried to increase the buffer size up to 32 KB in order
> to reduce number of syscalls per dump.
> 
> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> brought the size back to 4KB because the kernel can not know the application
> is ready to receive bigger requests.
> 
> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> for more details.

Wouldn't it be better if the kernel recognized MSG_TRUNC and allocated a
buffer large enough to hold the full message in that case? I have no
idea how hard that would be to implement, but calling recvmsg() with
MSG_TRUNC set and not getting the full message length in return is not
quite what one expects after reading recvmsg(2).

Cheers, Phil

^ permalink raw reply

* [PATCH 0/3] Netfilter/IPVS fixes for net
From: Pablo Neira Ayuso @ 2019-02-13 17:47 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains Netfilter/IPVS fixes for net:

1) Missing structure initialization in ebtables causes splat with
   32-bit user level on a 64-bit kernel, from Francesco Ruggeri.

2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
   Andrea Claudi.

3) Fix possible use-after-free from release path of target extensions.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit cf657d22ee1f0e887326a92169f2e28dc932fd10:

  net/x25: do not hold the cpu too long in x25_new_lci() (2019-02-11 13:20:14 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 753c111f655e38bbd52fc01321266633f022ebe2:

  netfilter: nft_compat: use-after-free when deleting targets (2019-02-13 18:14:54 +0100)

----------------------------------------------------------------
Andrea Claudi (1):
      ipvs: fix dependency on nf_defrag_ipv6

Francesco Ruggeri (1):
      netfilter: compat: initialize all fields in xt_init

Pablo Neira Ayuso (1):
      netfilter: nft_compat: use-after-free when deleting targets

 net/netfilter/ipvs/Kconfig      |  1 +
 net/netfilter/ipvs/ip_vs_core.c | 10 ++++------
 net/netfilter/ipvs/ip_vs_ctl.c  | 10 ++++++++++
 net/netfilter/nft_compat.c      |  3 ++-
 net/netfilter/x_tables.c        |  2 +-
 5 files changed, 18 insertions(+), 8 deletions(-)

^ permalink raw reply

* [PATCH 1/3] netfilter: compat: initialize all fields in xt_init
From: Pablo Neira Ayuso @ 2019-02-13 17:47 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190213174758.17275-1-pablo@netfilter.org>

From: Francesco Ruggeri <fruggeri@arista.com>

If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init
time, the following panic can be caused by running

% ebtables -t broute -F BROUTING

from a 32-bit user level on a 64-bit kernel. This patch replaces
kmalloc_array with kcalloc when allocating xt.

[  474.680846] BUG: unable to handle kernel paging request at 0000000009600920
[  474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0
[  474.693838] Oops: 0000 [#1] SMP
[  474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1
[  474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013
[  474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables]
[  474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d
[  474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207
[  474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249
[  474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124
[  474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f
[  474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001
[  474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8
[  474.780234] FS:  0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700
[  474.788612] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0
[  474.802052] Call Trace:
[  474.804789]  compat_do_replace+0x1fb/0x2a3 [ebtables]
[  474.810105]  compat_do_ebt_set_ctl+0x69/0xe6 [ebtables]
[  474.815605]  ? try_module_get+0x37/0x42
[  474.819716]  compat_nf_setsockopt+0x4f/0x6d
[  474.824172]  compat_ip_setsockopt+0x7e/0x8c
[  474.828641]  compat_raw_setsockopt+0x16/0x3a
[  474.833220]  compat_sock_common_setsockopt+0x1d/0x24
[  474.838458]  __compat_sys_setsockopt+0x17e/0x1b1
[  474.843343]  ? __check_object_size+0x76/0x19a
[  474.847960]  __ia32_compat_sys_socketcall+0x1cb/0x25b
[  474.853276]  do_fast_syscall_32+0xaf/0xf6
[  474.857548]  entry_SYSENTER_compat+0x6b/0x7a

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/x_tables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index aecadd471e1d..13e1ac333fa4 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1899,7 +1899,7 @@ static int __init xt_init(void)
 		seqcount_init(&per_cpu(xt_recseq, i));
 	}
 
-	xt = kmalloc_array(NFPROTO_NUMPROTO, sizeof(struct xt_af), GFP_KERNEL);
+	xt = kcalloc(NFPROTO_NUMPROTO, sizeof(struct xt_af), GFP_KERNEL);
 	if (!xt)
 		return -ENOMEM;
 
-- 
2.11.0


^ permalink raw reply related

* [PATCH 3/3] netfilter: nft_compat: use-after-free when deleting targets
From: Pablo Neira Ayuso @ 2019-02-13 17:47 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190213174758.17275-1-pablo@netfilter.org>

Fetch pointer to module before target object is released.

Fixes: 29e3880109e3 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index fe64df848365..0a4bad55a8aa 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -315,6 +315,7 @@ nft_target_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 {
 	struct xt_target *target = expr->ops->data;
 	void *info = nft_expr_priv(expr);
+	struct module *me = target->me;
 	struct xt_tgdtor_param par;
 
 	par.net = ctx->net;
@@ -325,7 +326,7 @@ nft_target_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
 		par.target->destroy(&par);
 
 	if (nft_xt_put(container_of(expr->ops, struct nft_xt, ops)))
-		module_put(target->me);
+		module_put(me);
 }
 
 static int nft_extension_dump_info(struct sk_buff *skb, int attr,
-- 
2.11.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox