Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] staging: rtlwifi: Improve debugging by using debugfs
From: Larry Finger @ 2017-08-25 13:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: gregkh, netdev, devel, Ping-Ke Shih, Yan-Hsuan Chuang,
	Birming Chiu, Shaofu, Steven Ting
In-Reply-To: <20170825015421.GA8852@lunn.ch>

On 08/24/2017 08:54 PM, Andrew Lunn wrote:
> netdev frowns upon debugfs. You should try to keep this altogether,
> making it easy to throw away before the driver is moved out of
> staging.
> 
> You might want to look at ethtool -d. That will be accepted.

Andrew,

What is the problem with debugfs?

Please suggest which driver has the best example of an ethtool -d implementation 
that we might study.

The first version of the debugfs changes were sent to wireless-drivers on July 
2. Why are we first hearing of this objection nearly 2 months later?

Larry

^ permalink raw reply

* [PATCH v2] cdc_ncm: flag the u-blox TOBY-L4 as wwan
From: Aleksander Morgado @ 2017-08-25 13:39 UTC (permalink / raw)
  To: oliver, davem
  Cc: stefano.godeas, marco.demarco, linux-usb, netdev, linux-kernel,
	gregkh, Aleksander Morgado
In-Reply-To: <20170825125946.12812-1-aleksander@aleksander.es>

The u-blox TOBY-L4 is a LTE Advanced (Cat 6) module with HSPA+ and 2G
fallback.

Unlike the TOBY-L2, this module has one single USB layout and exposes
several TTYs for control and a NCM interface for data. Connecting this
module may be done just by activating the desired PDP context with
'AT+CGACT=1,<cid>' and then running DHCP on the NCM interface.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
---
 drivers/net/usb/cdc_ncm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 811b18215cae..47cab1bde065 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -1758,6 +1758,13 @@ static const struct usb_device_id cdc_devs[] = {
 	  .driver_info = (unsigned long)&wwan_noarp_info,
 	},

+	/* u-blox TOBY-L4 */
+	{ USB_DEVICE_AND_INTERFACE_INFO(0x1546, 0x1010,
+		USB_CLASS_COMM,
+		USB_CDC_SUBCLASS_NCM, USB_CDC_PROTO_NONE),
+	  .driver_info = (unsigned long)&wwan_info,
+	},
+
 	/* Generic CDC-NCM devices */
 	{ USB_INTERFACE_INFO(USB_CLASS_COMM,
 		USB_CDC_SUBCLASS_NCM, USB_CDC_PROTO_NONE),
--
2.14.1

^ permalink raw reply related

* pull-request: wireless-drivers 2017-08-25
From: Kalle Valo @ 2017-08-25 13:37 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel

Hi Dave,

here's pull request to net tree for 4.13, more info in the signed tag
below. Please let me know if there are any problems.

Kalle

The following changes since commit e9bf53ab1ee34bb05c104bbfd2b77c844773f8e6:

  brcmfmac: feature check for multi-scheduled scan fails on bcm4343x devices (2017-08-14 11:09:30 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2017-08-25

for you to fetch changes up to 10a54d8196d11f6cc0db2f71249f93854cb9fe55:

  iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc() (2017-08-24 16:49:00 +0300)

----------------------------------------------------------------
wireless-drivers fixes for 4.13

Only one iwlwifi patch this time.

iwlwifi

* fix multiple times reported lockdep warning found by new locking
  annotation introduced in v4.13-rc1

----------------------------------------------------------------
Luca Coelho (1):
      iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()

 drivers/net/wireless/intel/iwlwifi/pcie/internal.h |  2 ++
 drivers/net/wireless/intel/iwlwifi/pcie/rx.c       | 10 +---------
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |  9 +++++++++
 3 files changed, 12 insertions(+), 9 deletions(-)

^ permalink raw reply

* Re: [PATCH] cdc_ncm: flag the ublox TOBY-L4 as wwan
From: Aleksander Morgado @ 2017-08-25 13:30 UTC (permalink / raw)
  To: Greg KH
  Cc: Oliver Neukum, David S. Miller, Stefano Godeas, Marco De Marco,
	linux-usb, netdev@vger.kernel.org, linux-kernel
In-Reply-To: <20170825132236.GB16203@kroah.com>

On Fri, Aug 25, 2017 at 3:22 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Aug 25, 2017 at 02:59:46PM +0200, Aleksander Morgado wrote:
>> Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
>> ---
>>  drivers/net/usb/cdc_ncm.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>
> Personally, I require patches to have changelog texts in them...

Sure, will push a v2 with more info.

-- 
Aleksander
https://aleksander.es

^ permalink raw reply

* [PATCH net-next] tcp: fix hang in tcp_sendpage_locked()
From: Eric Dumazet @ 2017-08-25 13:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Dmitry Vyukov

From: Eric Dumazet <edumazet@google.com>

syszkaller got a hang in tcp stack, related to a bug in
tcp_sendpage_locked()

root@syzkaller:~# cat /proc/3059/stack
[<ffffffff83de926c>] __lock_sock+0x1dc/0x2f0
[<ffffffff83de9473>] lock_sock_nested+0xf3/0x110
[<ffffffff8408ce01>] tcp_sendmsg+0x21/0x50
[<ffffffff84163b6f>] inet_sendmsg+0x11f/0x5e0
[<ffffffff83dd8eea>] sock_sendmsg+0xca/0x110
[<ffffffff83dd9547>] kernel_sendmsg+0x47/0x60
[<ffffffff83de35dc>] sock_no_sendpage+0x1cc/0x280
[<ffffffff8408916b>] tcp_sendpage_locked+0x10b/0x160
[<ffffffff84089203>] tcp_sendpage+0x43/0x60
[<ffffffff841641da>] inet_sendpage+0x1aa/0x660
[<ffffffff83dd4fcd>] kernel_sendpage+0x8d/0xe0
[<ffffffff83dd50ac>] sock_sendpage+0x8c/0xc0
[<ffffffff81b63300>] pipe_to_sendpage+0x290/0x3b0
[<ffffffff81b67243>] __splice_from_pipe+0x343/0x750
[<ffffffff81b6a459>] splice_from_pipe+0x1e9/0x330
[<ffffffff81b6a5e0>] generic_splice_sendpage+0x40/0x50
[<ffffffff81b6b1d7>] SyS_splice+0x7b7/0x1610
[<ffffffff84d77a01>] entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: 306b13eb3cf9 ("proto_ops: Add locked held versions of sendmsg and sendpage")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tom Herbert <tom@quantonium.net>
---
 net/ipv4/tcp.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0cce4472b4a1b0e3c110692571ac2a5c51467c42..566083ee2654c25410d80ff56ce5adb49bb28ae7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1052,8 +1052,7 @@ int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
 {
 	if (!(sk->sk_route_caps & NETIF_F_SG) ||
 	    !sk_check_csum_caps(sk))
-		return sock_no_sendpage(sk->sk_socket, page, offset, size,
-					flags);
+		return sock_no_sendpage_locked(sk, page, offset, size, flags);
 
 	tcp_rate_check_app_limited(sk);  /* is sending application-limited? */
 

^ permalink raw reply related

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Andrew Lunn @ 2017-08-25 13:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Chen-Yu Tsai, Corentin Labbe, Maxime Ripard, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <3d74f0e3-9baf-195d-1e59-dbd8210b3c6f@gmail.com>

> > Just tested. Yes the MDIO/MDC lines are also muxed and controlled through
> > the same mux bit.
> 
> Alright then the mdio-mux seems appropriate, thanks.

Please add a comment that it muxes the MII lines as well.  That is not
normal for an mdio-mux, so we should document it in the dtsi file.

Thanks
       Andrew

^ permalink raw reply

* [PATCH net-next] net: mvpp2: fix the packet size configuration for 10G
From: Antoine Tenart @ 2017-08-25 13:24 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev

The MVPP22_XLG_CTRL1_FRAMESIZELIMIT define is used as an offset, but is
defined as BIT(0). Updated its name to contains "OFFS" as in offset and
fix its value using the offset value, 0.

Reported-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Fixes: 76eb1b1de5b6 ("net: mvpp2: set maximum packet size for 10G ports")
---

Hi Dave,

This patch fixes a bug only present in net-next as it was introduced
recently.

Thanks!
Antoine

 drivers/net/ethernet/marvell/mvpp2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 7fa251bf91ae..fea9ae5b70ba 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -367,7 +367,7 @@
 #define     MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN	BIT(7)
 #define     MVPP22_XLG_CTRL0_MIB_CNT_DIS	BIT(14)
 #define MVPP22_XLG_CTRL1_REG			0x104
-#define     MVPP22_XLG_CTRL1_FRAMESIZELIMIT	BIT(0)
+#define     MVPP22_XLG_CTRL1_FRAMESIZELIMIT_OFFS	0
 #define     MVPP22_XLG_CTRL1_FRAMESIZELIMIT_MASK	0x1fff
 #define MVPP22_XLG_CTRL3_REG			0x11c
 #define     MVPP22_XLG_CTRL3_MACMODESELECT_MASK	(7 << 13)
@@ -4669,7 +4669,7 @@ static inline void mvpp2_xlg_max_rx_size_set(struct mvpp2_port *port)
 	val =  readl(port->base + MVPP22_XLG_CTRL1_REG);
 	val &= ~MVPP22_XLG_CTRL1_FRAMESIZELIMIT_MASK;
 	val |= ((port->pkt_size - MVPP2_MH_SIZE) / 2) <<
-	       MVPP22_XLG_CTRL1_FRAMESIZELIMIT;
+	       MVPP22_XLG_CTRL1_FRAMESIZELIMIT_OFFS;
 	writel(val, port->base + MVPP22_XLG_CTRL1_REG);
 }
 
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH] cdc_ncm: flag the ublox TOBY-L4 as wwan
From: Greg KH @ 2017-08-25 13:22 UTC (permalink / raw)
  To: Aleksander Morgado
  Cc: oliver, davem, stefano.godeas, marco.demarco, linux-usb, netdev,
	linux-kernel
In-Reply-To: <20170825125946.12812-1-aleksander@aleksander.es>

On Fri, Aug 25, 2017 at 02:59:46PM +0200, Aleksander Morgado wrote:
> Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
> ---
>  drivers/net/usb/cdc_ncm.c | 7 +++++++
>  1 file changed, 7 insertions(+)

Personally, I require patches to have changelog texts in them...

^ permalink raw reply

* [PATCH] net: missing call of trace_napi_poll in busy_poll_stop
From: Jesper Dangaard Brouer @ 2017-08-25 13:04 UTC (permalink / raw)
  To: netdev; +Cc: Jesper Dangaard Brouer

Noticed that busy_poll_stop() also invoke the drivers napi->poll()
function pointer, but didn't have an associated call to trace_napi_poll()
like all other call sites.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 270b54754821..a24d2fb691ed 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5319,6 +5319,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
 	 */
 	rc = napi->poll(napi, BUSY_POLL_BUDGET);
+	trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
 	netpoll_poll_unlock(have_poll_lock);
 	if (rc == BUSY_POLL_BUDGET)
 		__napi_schedule(napi);

^ permalink raw reply related

* [PATCH] cdc_ncm: flag the ublox TOBY-L4 as wwan
From: Aleksander Morgado @ 2017-08-25 12:59 UTC (permalink / raw)
  To: oliver, davem
  Cc: stefano.godeas, marco.demarco, linux-usb, netdev, linux-kernel,
	Aleksander Morgado

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
---
 drivers/net/usb/cdc_ncm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 811b18215cae..47cab1bde065 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -1758,6 +1758,13 @@ static const struct usb_device_id cdc_devs[] = {
 	  .driver_info = (unsigned long)&wwan_noarp_info,
 	},
 
+	/* u-blox TOBY-L4 */
+	{ USB_DEVICE_AND_INTERFACE_INFO(0x1546, 0x1010,
+		USB_CLASS_COMM,
+		USB_CDC_SUBCLASS_NCM, USB_CDC_PROTO_NONE),
+	  .driver_info = (unsigned long)&wwan_info,
+	},
+
 	/* Generic CDC-NCM devices */
 	{ USB_INTERFACE_INFO(USB_CLASS_COMM,
 		USB_CDC_SUBCLASS_NCM, USB_CDC_PROTO_NONE),
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next RESEND] sched: sfq: drop packets after root qdisc lock is released
From: Eric Dumazet @ 2017-08-25 12:49 UTC (permalink / raw)
  To: gfree.wind; +Cc: jhs, xiyou.wangcong, jiri, davem, edumazet, netdev
In-Reply-To: <1503646997-94678-1-git-send-email-gfree.wind@vip.163.com>

On Fri, 2017-08-25 at 15:43 +0800, gfree.wind@vip.163.com wrote:
> From: Gao Feng <gfree.wind@vip.163.com>
> 
> The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock
> is released) made a big change of tc for performance. But there are
> some points which are not changed in SFQ enqueue operation.
> 1. Fail to find the SFQ hash slot;
> 2. When the queue is full;
> 
> Now use qdisc_drop instead free skb directly.


Thanks for doing this work.

I have one comment

> 
> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
> ---
>  net/sched/sch_sfq.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 82469ef..8841f4d 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -292,7 +292,7 @@ static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb)
>  	slot->skblist_prev = skb;
>  }
>  
> -static unsigned int sfq_drop(struct Qdisc *sch)
> +static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
>  {
>  	struct sfq_sched_data *q = qdisc_priv(sch);
>  	sfq_index x, d = q->cur_depth;
> @@ -310,9 +310,13 @@ static unsigned int sfq_drop(struct Qdisc *sch)
>  		slot->backlog -= len;
>  		sfq_dec(q, x);
>  		sch->q.qlen--;
> -		qdisc_qstats_drop(sch);
>  		qdisc_qstats_backlog_dec(sch, skb);
> -		kfree_skb(skb);
> +		if (likely(to_free)) {
> +			qdisc_drop(skb, sch, to_free);
> +		} else {
> +			qdisc_qstats_drop(sch);
> +			kfree_skb(skb);

                         rtnl_kfree_skbs(skb, skb);  ?

Or even better provide a non NULL to_free from sfq_change()

Then call rtnl_kfree_skbs() once from sfq_change()


> +		}
>  		return len;
>  	}
>  
> @@ -360,7 +364,7 @@ static int sfq_headdrop(const struct sfq_sched_data *q)
>  	if (hash == 0) {
>  		if (ret & __NET_XMIT_BYPASS)
>  			qdisc_qstats_drop(sch);
> -		kfree_skb(skb);
> +		__qdisc_drop(skb, to_free);
>  		return ret;
>  	}
>  	hash--;
> @@ -465,7 +469,7 @@ static int sfq_headdrop(const struct sfq_sched_data *q)
>  		return NET_XMIT_SUCCESS;
>  
>  	qlen = slot->qlen;
> -	dropped = sfq_drop(sch);
> +	dropped = sfq_drop(sch, to_free);
>  	/* Return Congestion Notification only if we dropped a packet
>  	 * from this flow.
>  	 */
> @@ -675,7 +679,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
>  
>  	qlen = sch->q.qlen;
>  	while (sch->q.qlen > q->limit)
> -		dropped += sfq_drop(sch);
> +		dropped += sfq_drop(sch, NULL);
>  	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
>  
>  	del_timer(&q->perturb_timer);

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Jesper Dangaard Brouer @ 2017-08-25 12:45 UTC (permalink / raw)
  To: Michael Chan
  Cc: Alexander Duyck, Duyck, Alexander H, john.fastabend@gmail.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net, brouer
In-Reply-To: <CACKFLinGuaDLxYRd=vC99DL5n0mf0rDbPRaDg4ctev=DEAhRSQ@mail.gmail.com>

On Thu, 24 Aug 2017 20:36:28 -0700
Michael Chan <michael.chan@broadcom.com> wrote:

> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > On Tue, 22 Aug 2017 23:59:05 -0700
> > Michael Chan <michael.chan@broadcom.com> wrote:
> >  
> >> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >> <alexander.duyck@gmail.com> wrote:  
> >> > On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:  
> >> >>
> >> >> Right, but it's conceivable to add an API to "return" the buffer to
> >> >> the input device, right?  
> >
> > Yes, I would really like to see an API like this.
> >  
> >> >
> >> > You could, it is just added complexity. "just free the buffer" in
> >> > ixgbe usually just amounts to one atomic operation to decrement the
> >> > total page count since page recycling is already implemented in the
> >> > driver. You still would have to unmap the buffer regardless of if you
> >> > were recycling it or not so all you would save is 1.000015259 atomic
> >> > operations per packet. The fraction is because once every 64K uses we
> >> > have to bulk update the count on the page.
> >> >  
> >>
> >> If the buffer is returned to the input device, the input device can
> >> keep the DMA mapping.  All it needs to do is to dma_sync it back to
> >> the input device when the buffer is returned.  
> >
> > Yes, exactly, return to the input device. I really think we should
> > work on a solution where we can keep the DMA mapping around.  We have
> > an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> > page return call, to achieve this. (I imagine other arch's have a high
> > DMA overhead than Intel)
> >
> > I'm not sure how the API should look.  The ixgbe recycle mechanism and
> > splitting the page (into two packets) actually complicates things, and
> > tie us into a page-refcnt based model.  We could get around this by
> > each driver implementing a page-return-callback, that allow us to
> > return the page to the input device?  Then, drivers implementing the
> > 1-packet-per-page can simply check/read the page-refcnt, and if it is
> > "1" DMA-sync and reuse it in the RX queue.
> >  
> 
> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> redirecting to a non-intel NIC or vice versa will actually work.  It
> sounds like the output device has to make some assumptions about how
> the page was allocated by the input device. 

Yes, exactly. We are tied into a page refcnt based scheme.

Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
is also tied to the RX queue size, plus how fast the pages are returned.
This makes it very hard to tune.  As I demonstrated, default ixgbe
settings does not work well with XDP_REDIRECT.  I needed to increase
TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
10Mpps) so I also needed it increase RX-ring size.  But perf is best if
RX-ring size is smaller, thus two contradicting tuning needed.


> With buffer return API,
> each driver can cleanly recycle or free its own buffers properly.

Yes, exactly. And RX-driver can implement a special memory model for
this queue.  E.g. RX-driver can know this is a dedicated XDP RX-queue
which is never used for SKBs, thus opening for new RX memory models.

Another advantage of a return API.  There is also an opportunity for
avoiding the DMA map on TX. As we need to know the from-device.  Thus,
we can add a DMA API, where we can query if the two devices uses the
same DMA engine, and can reuse the same DMA address the RX-side already
knows.


> Let me discuss this further with Andy to see if we can come up with a
> good scheme.

Sound good, looking forward to hear what you come-up with :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [Patch net-next v2 4/4] net_sched: kill u32_node pointer in Qdisc
From: Jiri Pirko @ 2017-08-25 12:37 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Cong Wang, netdev
In-Reply-To: <c54bd7cc-ab56-6aac-2e24-85e9e8e1c708@mojatatu.com>

Fri, Aug 25, 2017 at 02:29:57PM CEST, jhs@mojatatu.com wrote:
>On 17-08-24 07:51 PM, Cong Wang wrote:
>> It is ugly to hide a u32-filter-specific pointer inside Qdisc,
>> this breaks the TC layers:
>> 
>> 1. Qdisc is a generic representation, should not have any specific
>>     data of any type
>> 
>> 2. Qdisc layer is above filter layer, should only save filters in
>>     the list of struct tcf_proto.
>> 
>> This pointer is used as the head of the chain of u32 hash tables,
>> that is struct tc_u_hnode, because u32 filter is very special,
>> it allows to create multiple hash tables within one qdisc and
>> across multiple u32 filters.
>> 
>> Instead of using this ugly pointer, we can just save it in a global
>> hash table key'ed by (dev ifindex, qdisc handle), therefore we can
>> still treat it as a per qdisc basis data structure conceptually.
>> 
>> Of course, because of network namespaces, this key is not unique
>> at all, but it is fine as we already have a pointer to Qdisc in
>> struct tc_u_common, we can just compare the pointers when collision.
>> 
>> And this only affects slow paths, has no impact to fast path,
>> thanks to the pointer ->tp_c.
>> 
>> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>> Cc: Jiri Pirko <jiri@resnulli.us>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>
>Nice work. should open the doors for Jiri now.

One of the doors, couple more left :)

>
>Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>
>cheers,
>jamal

^ permalink raw reply

* [PATCH net] udp6: set rx_dst_cookie on rx_dst updates
From: Paolo Abeni @ 2017-08-25 12:31 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Subash Abhinov Kasiviswanathan,
	Hannes Frederic Sowa

Currently, in the udp6 code, the dst cookie is not initialized/updated
concurrently with the RX dst used by early demux.

As a result, the dst_check() in the early_demux path always fails,
the rx dst cache is always invalidated, and we can't really
leverage significant gain from the demux lookup.

Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
to set the dst cookie when the dst entry is really changed.

The issue is there since the introduction of early demux for ipv6.

Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/udp.h |  2 +-
 net/ipv4/udp.c    |  4 +++-
 net/ipv6/udp.c    | 11 ++++++++++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 586de4b811b5..626c2d8a70c5 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -260,7 +260,7 @@ static inline struct sk_buff *skb_recv_udp(struct sock *sk, unsigned int flags,
 }
 
 void udp_v4_early_demux(struct sk_buff *skb);
-void udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst);
+bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst);
 int udp_get_port(struct sock *sk, unsigned short snum,
 		 int (*saddr_cmp)(const struct sock *,
 				  const struct sock *));
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd1d044a7fa5..a6dc48d76a29 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1929,14 +1929,16 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 /* For TCP sockets, sk_rx_dst is protected by socket lock
  * For UDP, we use xchg() to guard against concurrent changes.
  */
-void udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
+bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 {
 	struct dst_entry *old;
 
 	if (dst_hold_safe(dst)) {
 		old = xchg(&sk->sk_rx_dst, dst);
 		dst_release(old);
+		return old != dst;
 	}
+	return false;
 }
 EXPORT_SYMBOL(udp_sk_rx_dst_set);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 20039c8501eb..d6886228e1d0 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -768,6 +768,15 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	return 0;
 }
 
+static void udp6_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
+{
+	if (udp_sk_rx_dst_set(sk, dst)) {
+		const struct rt6_info *rt = (const struct rt6_info *)dst;
+
+		inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
+	}
+}
+
 int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		   int proto)
 {
@@ -817,7 +826,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		int ret;
 
 		if (unlikely(sk->sk_rx_dst != dst))
-			udp_sk_rx_dst_set(sk, dst);
+			udp6_sk_rx_dst_set(sk, dst);
 
 		ret = udpv6_queue_rcv_skb(sk, skb);
 		sock_put(sk);
-- 
2.13.5

^ permalink raw reply related

* Re: [Patch net-next v2 4/4] net_sched: kill u32_node pointer in Qdisc
From: Jamal Hadi Salim @ 2017-08-25 12:29 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Jiri Pirko
In-Reply-To: <20170824235130.28503-5-xiyou.wangcong@gmail.com>

On 17-08-24 07:51 PM, Cong Wang wrote:
> It is ugly to hide a u32-filter-specific pointer inside Qdisc,
> this breaks the TC layers:
> 
> 1. Qdisc is a generic representation, should not have any specific
>     data of any type
> 
> 2. Qdisc layer is above filter layer, should only save filters in
>     the list of struct tcf_proto.
> 
> This pointer is used as the head of the chain of u32 hash tables,
> that is struct tc_u_hnode, because u32 filter is very special,
> it allows to create multiple hash tables within one qdisc and
> across multiple u32 filters.
> 
> Instead of using this ugly pointer, we can just save it in a global
> hash table key'ed by (dev ifindex, qdisc handle), therefore we can
> still treat it as a per qdisc basis data structure conceptually.
> 
> Of course, because of network namespaces, this key is not unique
> at all, but it is fine as we already have a pointer to Qdisc in
> struct tc_u_common, we can just compare the pointers when collision.
> 
> And this only affects slow paths, has no impact to fast path,
> thanks to the pointer ->tp_c.
> 
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Nice work. should open the doors for Jiri now.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [Patch net-next v2 3/4] net_sched: remove tc class reference counting
From: Jamal Hadi Salim @ 2017-08-25 12:28 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <20170824235130.28503-4-xiyou.wangcong@gmail.com>

On 17-08-24 07:51 PM, Cong Wang wrote:
> For TC classes, their ->get() and ->put() are always paired, and the
> reference counting is completely useless, because:
> 
> 1) For class modification and dumping paths, we already hold RTNL lock,
>     so all of these ->get(),->change(),->put() are atomic.
> 
> 2) For filter bindiing/unbinding, we use other reference counter than
>     this one, and they should have RTNL lock too.
> 
> 3) For ->qlen_notify(), it is special because it is called on ->enqueue()
>     path, but we already hold qdisc tree lock there, and we hold this
>     tree lock when graft or delete the class too, so it should not be gone
>     or changed until we release the tree lock.
> 
> Therefore, this patch removes ->get() and ->put(), but:
> 
> 1) Adds a new ->find() to find the pointer to a class by classid, no
>     refcnt.
> 
> 2) Move the original class destroy upon the last refcnt into ->delete(),
>     right after releasing tree lock. This is fine because the class is
>     already removed from hash when holding the lock.
> 
> For those who also use ->put() as ->unbind(), just rename them to reflect
> this change.
> 
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

I am a tiny bit worried about this - but i went over it and it looks
good. I cant think of any use cases where something may fall out, so:

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Dear God's Select,
From: Ja Ikh @ 2017-08-25 12:05 UTC (permalink / raw)


Dear God's Select,

I am writing this mail to you with heavy tears In my eyes and great
sorrow in my heart, My Name is Mrs.Jaslin Ikhsan, and I am contacting
you from my country Tunisia I want to tell you this because I don't
have any other option than to tell you as I was touched to open up to
you, I married to Mr. Ouedrago Daisy Brown who worked with Tunisia
embassy in Burkina Faso for nine years before he died in the year
2005.We were married for eleven years without a child.

He died after a brief illness that lasted for only five days. Since
his death I decided not to remarry, When my late husband was alive he
deposited the sum of US$ 8.5m (Eight Million Five hundred Thousand
Dollars) in a bank in Ouagadougou the capital city of Burkina Faso in
west Africa Presently this money is still in bank. He made this money
available for exportation of Gold from Burkina Faso mining.

Recently, My Doctor told me that I would not last for the period of
seven months due to cancer problem. The one that disturbs me most is
my stroke sickness .Having known my condition I decided to hand you
over this money to take care of the less-privileged people, you will
utilize this money the way I am going to instruct herein.

I want you to take 30 Percent of the total money for your personal use
While 70% of the money will go to charity" people in the street and
helping the orphanage. I grew up as an Orphan and I don't have any
body as my family member, just to endeavour that the house of God is
maintained. Am doing this so that God will forgive my sins and accept
my soul because these sicknesses have suffered me so much.

As soon a s I receive your reply I shall give you the contact of the
bank in Burkina Faso and I will also instruct the Bank Manger to issue
you an authority letter that will prove you the present beneficiary of
the money in the bank that is if you assure me that you will act
accordingly as I Stated herein.

Always reply to my alternative for security purposes.
Hoping to receive your reply:
>From Mrs.Jaslin Ikhsan

^ permalink raw reply

* Re: [Patch net-next v2 2/4] net_sched: introduce tclass_del_notify()
From: Jamal Hadi Salim @ 2017-08-25 11:51 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <20170824235130.28503-3-xiyou.wangcong@gmail.com>

On 17-08-24 07:51 PM, Cong Wang wrote:
> Like for TC actions, ->delete() is a special case,
> we have to prepare and fill the notification before delete
> otherwise would get use-after-free after we remove the
> reference count.
> 
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [Patch net-next v2 1/4] net_sched: get rid of more forward declarations
From: Jamal Hadi Salim @ 2017-08-25 11:51 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <20170824235130.28503-2-xiyou.wangcong@gmail.com>

On 17-08-24 07:51 PM, Cong Wang wrote:
> This is not needed if we move them up properly.
> 
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* [PATCH net] tcp: fix refcnt leak with ebpf congestion control
From: Sabrina Dubroca @ 2017-08-25 11:10 UTC (permalink / raw)
  To: netdev; +Cc: Sabrina Dubroca, Lawrence Brakmo, Daniel Borkmann

There are a few bugs around refcnt handling in the new BPF congestion
control setsockopt:

 - The new ca is assigned to icsk->icsk_ca_ops even in the case where we
   cannot get a reference on it. This would lead to a use after free,
   since that ca is going away soon.

 - Changing the congestion control case doesn't release the refcnt on
   the previous ca.

 - In the reinit case, we first leak a reference on the old ca, then we
   call tcp_reinit_congestion_control on the ca that we have just
   assigned, leading to deinitializing the wrong ca (->release of the
   new ca on the old ca's data) and releasing the refcount on the ca
   that we actually want to use.

This is visible by building (for example) BIC as a module and setting
net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from
samples/bpf.

This patch fixes the refcount issues, and moves reinit back into tcp
core to avoid passing a ca pointer back to BPF.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 include/net/tcp.h   |  4 +---
 net/core/filter.c   |  7 ++-----
 net/ipv4/tcp.c      |  2 +-
 net/ipv4/tcp_cong.c | 19 ++++++++++++++-----
 4 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index ada65e767b28..f642a39f9eee 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1004,9 +1004,7 @@ void tcp_get_default_congestion_control(char *name);
 void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
-int tcp_set_congestion_control(struct sock *sk, const char *name, bool load);
-void tcp_reinit_congestion_control(struct sock *sk,
-				   const struct tcp_congestion_ops *ca);
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 8eb81e5fae08..169974998c76 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2836,15 +2836,12 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 		   sk->sk_prot->setsockopt == tcp_setsockopt) {
 		if (optname == TCP_CONGESTION) {
 			char name[TCP_CA_NAME_MAX];
+			bool reinit = bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN;
 
 			strncpy(name, optval, min_t(long, optlen,
 						    TCP_CA_NAME_MAX-1));
 			name[TCP_CA_NAME_MAX-1] = 0;
-			ret = tcp_set_congestion_control(sk, name, false);
-			if (!ret && bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN)
-				/* replacing an existing ca */
-				tcp_reinit_congestion_control(sk,
-					inet_csk(sk)->icsk_ca_ops);
+			ret = tcp_set_congestion_control(sk, name, false, reinit);
 		} else {
 			struct tcp_sock *tp = tcp_sk(sk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 71ce33decd97..a3e91b552edc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2481,7 +2481,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		name[val] = 0;
 
 		lock_sock(sk);
-		err = tcp_set_congestion_control(sk, name, true);
+		err = tcp_set_congestion_control(sk, name, true, true);
 		release_sock(sk);
 		return err;
 	}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index fde983f6376b..421ea1b918da 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -189,8 +189,8 @@ void tcp_init_congestion_control(struct sock *sk)
 		INET_ECN_dontxmit(sk);
 }
 
-void tcp_reinit_congestion_control(struct sock *sk,
-				   const struct tcp_congestion_ops *ca)
+static void tcp_reinit_congestion_control(struct sock *sk,
+					  const struct tcp_congestion_ops *ca)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
@@ -338,7 +338,7 @@ int tcp_set_allowed_congestion_control(char *val)
  * tcp_reinit_congestion_control (if the current congestion control was
  * already initialized.
  */
-int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const struct tcp_congestion_ops *ca;
@@ -360,9 +360,18 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
 	if (!ca) {
 		err = -ENOENT;
 	} else if (!load) {
-		icsk->icsk_ca_ops = ca;
-		if (!try_module_get(ca->owner))
+		const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops;
+
+		if (try_module_get(ca->owner)) {
+			if (reinit) {
+				tcp_reinit_congestion_control(sk, ca);
+			} else {
+				icsk->icsk_ca_ops = ca;
+				module_put(old_ca->owner);
+			}
+		} else {
 			err = -EBUSY;
+		}
 	} else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
 		     ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))) {
 		err = -EPERM;
-- 
2.14.1

^ permalink raw reply related

* Re: [net-next:master 1324/1341] drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few arguments to function 'devlink_dpipe_table_register'
From: Arkadi Sharshevsky @ 2017-08-25  9:59 UTC (permalink / raw)
  To: David Miller, fengguang.wu; +Cc: kbuild-all, netdev, jiri
In-Reply-To: <20170824.181124.1922755161454657026.davem@davemloft.net>



On 08/25/2017 04:11 AM, David Miller wrote:
> From: kbuild test robot <fengguang.wu@intel.com>
> Date: Fri, 25 Aug 2017 08:03:28 +0800
> 
>> All errors (new ones prefixed by >>):
>>
>>    drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 'mlxsw_sp_dpipe_erif_table_init':
>>>> drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few arguments to function 'devlink_dpipe_table_register'
>>      return devlink_dpipe_table_register(devlink,
>>             ^
>>    In file included from drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:36:0:
>>    include/net/devlink.h:401:1: note: declared here
>>     devlink_dpipe_table_register(struct devlink *devlink,
>>     ^
>>    drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:327:1: warning: control reaches end of non-void function [-Wreturn-type]
>>     }
>>     ^
> 
> I'll push the following fix into net-next for this:
> 
> ====================
> From 790c6056686cc4dd5b149b330bbd5ae208d4d721 Mon Sep 17 00:00:00 2001
> From: "David S. Miller" <davem@davemloft.net>
> Date: Thu, 24 Aug 2017 18:10:46 -0700
> Subject: [PATCH] devlink: Fix devlink_dpipe_table_register() stub signature.
> 
> One too many arguments compared to the non-stub version.
> 
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Fixes: ffd3cdccf214 ("devlink: Add support for dynamic table size")
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  include/net/devlink.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 047a0c54f652..aaf7178127a2 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -402,8 +402,7 @@ static inline int
>  devlink_dpipe_table_register(struct devlink *devlink,
>  			     const char *table_name,
>  			     struct devlink_dpipe_table_ops *table_ops,
> -			     void *priv, u64 size,
> -			     bool counter_control_extern)
> +			     void *priv, bool counter_control_extern)
>  {
>  	return 0;
>  }
> 

Thank you, sorry for the mistake.

^ permalink raw reply

* Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
From: Jesper Dangaard Brouer @ 2017-08-25  9:47 UTC (permalink / raw)
  To: Robert Hoo; +Cc: davem, tariqt, kyle.leet, netdev, robert.hu, brouer
In-Reply-To: <1503653196-64418-1-git-send-email-robert.hu@linux.intel.com>

On Fri, 25 Aug 2017 17:26:36 +0800
Robert Hoo <robert.hu@linux.intel.com> wrote:

> (Sorry for yesterday's wrong sending, I finally fixed my MTA and git
> send-email settings.)

Please see my reply in:
  http://lkml.kernel.org/r/20170825111921.061713c8@redhat.com

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Florian Westphal @ 2017-08-25  9:43 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Michal Kubecek, Davide Caratti, Pablo Neira Ayuso,
	Jozsef Kadlecsik, netfilter-devel, coreteam, netdev, linux-kernel,
	Michael S. Tsirkin, Markos Chandras
In-Reply-To: <20170825094025.GJ15739@breakpoint.cc>

Florian Westphal <fw@strlen.de> wrote:
> Michal Kubecek <mkubecek@suse.cz> wrote:
> > On Thu, Aug 24, 2017 at 03:17:22PM +0200, Florian Westphal wrote:
> > > Davide Caratti <dcaratti@redhat.com> wrote:
> > > > Small nit: may I suggest you to call skb_csum_hwoffload_help() instead of
> > > > skb_checksum_help(), so that we avoid corrupting SCTP packets in case they
> > > > hit xt_CHECKSUM target?
> > > 
> > > Alternatively we could restrict the target to udp only.
> > > 
> > > AFAIU the only reason this thing exists is to fix up udp checksum
> > > for old dhcp clients that use AF_PACKET without evaluating the extra
> > > metadata that indicates when a 'bad' checksum is in fact ok because it
> > > is supposed to be filled in by hardware later.
> > > 
> > > This can happen in virtual environemnt when such skb is directly passed
> > > to vm.
> > 
> > Based on what the documentation and the commit message of the commit
> > introducing xt_CHECKSUM module say, it seems so. But I must admit I'm
> > not sure where is the target is used and how (and why). In particular,
> > our issue was most likely result of
> > 
> >   https://github.com/openstack/openstack-ansible-tests/blob/master/test-prepare-host.yml#L196-L197
> 
> Sigh.  Ok, that pretty much leaves your patch as the only viable option,
> however, I still think the warning isn't useful.

Addendum: for net-next it makes sense to restrict this to udp and tcp
to avoid spreading this to sctp and other protocols.

We will however need to be lazy and can't just restrict it
in checkentry (as it might break existing config).

We could print a warning and have the target function ignores protocols
other than tcp and udp.

^ permalink raw reply

* Re: [Patch net-next v2 3/4] net_sched: remove tc class reference counting
From: Jiri Pirko @ 2017-08-25  9:41 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jhs
In-Reply-To: <20170824235130.28503-4-xiyou.wangcong@gmail.com>

Fri, Aug 25, 2017 at 01:51:29AM CEST, xiyou.wangcong@gmail.com wrote:
>For TC classes, their ->get() and ->put() are always paired, and the
>reference counting is completely useless, because:
>
>1) For class modification and dumping paths, we already hold RTNL lock,
>   so all of these ->get(),->change(),->put() are atomic.
>
>2) For filter bindiing/unbinding, we use other reference counter than
>   this one, and they should have RTNL lock too.
>
>3) For ->qlen_notify(), it is special because it is called on ->enqueue()
>   path, but we already hold qdisc tree lock there, and we hold this
>   tree lock when graft or delete the class too, so it should not be gone
>   or changed until we release the tree lock.
>
>Therefore, this patch removes ->get() and ->put(), but:
>
>1) Adds a new ->find() to find the pointer to a class by classid, no
>   refcnt.
>
>2) Move the original class destroy upon the last refcnt into ->delete(),
>   right after releasing tree lock. This is fine because the class is
>   already removed from hash when holding the lock.
>
>For those who also use ->put() as ->unbind(), just rename them to reflect
>this change.
>
>Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Look very nice. Thanks!
Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Florian Westphal @ 2017-08-25  9:40 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Florian Westphal, Davide Caratti, Pablo Neira Ayuso,
	Jozsef Kadlecsik, netfilter-devel, coreteam, netdev, linux-kernel,
	Michael S. Tsirkin, Markos Chandras
In-Reply-To: <20170825092819.hjgazfcvvrlibkd4@unicorn.suse.cz>

Michal Kubecek <mkubecek@suse.cz> wrote:
> On Thu, Aug 24, 2017 at 03:17:22PM +0200, Florian Westphal wrote:
> > Davide Caratti <dcaratti@redhat.com> wrote:
> > > Small nit: may I suggest you to call skb_csum_hwoffload_help() instead of
> > > skb_checksum_help(), so that we avoid corrupting SCTP packets in case they
> > > hit xt_CHECKSUM target?
> > 
> > Alternatively we could restrict the target to udp only.
> > 
> > AFAIU the only reason this thing exists is to fix up udp checksum
> > for old dhcp clients that use AF_PACKET without evaluating the extra
> > metadata that indicates when a 'bad' checksum is in fact ok because it
> > is supposed to be filled in by hardware later.
> > 
> > This can happen in virtual environemnt when such skb is directly passed
> > to vm.
> 
> Based on what the documentation and the commit message of the commit
> introducing xt_CHECKSUM module say, it seems so. But I must admit I'm
> not sure where is the target is used and how (and why). In particular,
> our issue was most likely result of
> 
>   https://github.com/openstack/openstack-ansible-tests/blob/master/test-prepare-host.yml#L196-L197

Sigh.  Ok, that pretty much leaves your patch as the only viable option,
however, I still think the warning isn't useful.

Can you send a v2 with gso check but without warning?

Thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox