Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Keller, Jacob E @ 2017-12-15 17:30 UTC (permalink / raw)
  To: Keller, Jacob E, Gabriel C, Pavel Machek, kernel list
  Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <02874ECE860811409154E81DA85FBB5882B5FB2A@ORSMSX115.amr.corp.intel.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On Behalf Of
> Keller, Jacob E
> Sent: Friday, December 15, 2017 9:29 AM
> To: Gabriel C <nix.or.die@gmail.com>; Pavel Machek <pavel@ucw.cz>; kernel list
> <linux-kernel@vger.kernel.org>
> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped
> working
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org]
> > On Behalf Of Gabriel C
> > Sent: Sunday, December 10, 2017 4:44 AM
> > To: Pavel Machek <pavel@ucw.cz>; kernel list <linux-kernel@vger.kernel.org>
> > Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; intel-wired-
> > lan@lists.osuosl.org; netdev@vger.kernel.org
> > Subject: Re: v4.15-rc2 on thinkpad x60: ethernet stopped working
> >
> > On 10.12.2017 09:39, Pavel Machek wrote:
> > > Hi!
> >
> > Hi,
> >
> > > In v4.15-rc2+, network manager can not see my ethernet card, and
> > > manual attempts to ifconfig it up did not really help, either.
> > >
> > > Card is:
> > >
> > > 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
> > > Controller
> > >
> > > Dmesg says:
> > >
> > >    dmesg | grep eth
> > > [    0.648931] e1000e 0000:02:00.0 eth0: (PCI Express:2.5GT/s:Width
> > > x1) 00:16:d3:25:19:04
> > > [    0.648934] e1000e 0000:02:00.0 eth0: Intel(R) PRO/1000 Network
> > > Connection
> > > [    0.649012] e1000e 0000:02:00.0 eth0: MAC: 2, PHY: 2, PBA No:
> > > 005302-003
> > > [    0.706510] usbcore: registered new interface driver cdc_ether
> > > [    6.557022] e1000e 0000:02:00.0 eth1: renamed from eth0
> > > [    6.577554] systemd-udevd[2363]: renamed network interface eth0 to
> > > eth1
> > >
> > > Any ideas ?
> >
> > Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.
> >
> > See:
> > https://bugzilla.kernel.org/show_bug.cgi?id=198047
> >
> > Fix there :
> > https://marc.info/?l=linux-kernel&m=151272209903675&w=2
> >
> > Regards,
> >
> > Gabriel C
> 
> Hi,
> 
> Digging into this, the problem is complicated. The original bug assumed behavior
> of the .check_for_link call, which is universally not implemented.
> 
> I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate signaling for
> link check/link up", 2017-10-10) and find a more proper solution.
> 
> I don't think any other code which uses check_for_link expects the interface to
> return in the way this patch attempted.
> 
> Thanks,
> Jake
> 

Alternatively, we can go a step farther and make sure every implementation of .check_for_link follows the modified interface.

Thanks,
Jake

^ permalink raw reply

* RE: v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Keller, Jacob E @ 2017-12-15 17:29 UTC (permalink / raw)
  To: Gabriel C, Pavel Machek, kernel list
  Cc: Kirsher, Jeffrey T, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org
In-Reply-To: <d1ae924b-4d8f-a787-4c07-1f2db91482e5@gmail.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Gabriel C
> Sent: Sunday, December 10, 2017 4:44 AM
> To: Pavel Machek <pavel@ucw.cz>; kernel list <linux-kernel@vger.kernel.org>
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; intel-wired-
> lan@lists.osuosl.org; netdev@vger.kernel.org
> Subject: Re: v4.15-rc2 on thinkpad x60: ethernet stopped working
> 
> On 10.12.2017 09:39, Pavel Machek wrote:
> > Hi!
> 
> Hi,
> 
> > In v4.15-rc2+, network manager can not see my ethernet card, and
> > manual attempts to ifconfig it up did not really help, either.
> >
> > Card is:
> >
> > 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
> > Controller
> >
> > Dmesg says:
> >
> >    dmesg | grep eth
> > [    0.648931] e1000e 0000:02:00.0 eth0: (PCI Express:2.5GT/s:Width
> > x1) 00:16:d3:25:19:04
> > [    0.648934] e1000e 0000:02:00.0 eth0: Intel(R) PRO/1000 Network
> > Connection
> > [    0.649012] e1000e 0000:02:00.0 eth0: MAC: 2, PHY: 2, PBA No:
> > 005302-003
> > [    0.706510] usbcore: registered new interface driver cdc_ether
> > [    6.557022] e1000e 0000:02:00.0 eth1: renamed from eth0
> > [    6.577554] systemd-udevd[2363]: renamed network interface eth0 to
> > eth1
> >
> > Any ideas ?
> 
> Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.
> 
> See:
> https://bugzilla.kernel.org/show_bug.cgi?id=198047
> 
> Fix there :
> https://marc.info/?l=linux-kernel&m=151272209903675&w=2
> 
> Regards,
> 
> Gabriel C

Hi,

Digging into this, the problem is complicated. The original bug assumed behavior of the .check_for_link call, which is universally not implemented.

I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate signaling for link check/link up", 2017-10-10) and find a more proper solution.

I don't think any other code which uses check_for_link expects the interface to return in the way this patch attempted.

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH bpf 0/5] Couple of BPF JIT fixes
From: Alexei Starovoitov @ 2017-12-15 17:28 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, holzheu, naveen.n.rao, davem, netdev
In-Reply-To: <20171214200727.22230-1-daniel@iogearbox.net>

On Thu, Dec 14, 2017 at 09:07:22PM +0100, Daniel Borkmann wrote:
> Two fixes that deal with buggy usage of bpf_helper_changes_pkt_data()
> in the sense that they also reload cached skb data when there's no
> skb context but xdp one, for example. A fix where skb meta data is
> reloaded out of the wrong register on helper call, rest is test cases
> and making sure on verifier side that there's always the guarantee
> that ctx sits in r1. Thanks!

Applied, thanks Daniel!

^ permalink raw reply

* Re: [PATCH net-next 0/2] nfp: ethtool flash updates
From: David Miller @ 2017-12-15 17:26 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20171213224502.25407-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 13 Dec 2017 14:45:00 -0800

> Dirk says:
> 
> This series adds the ability to update the control FW with ethtool.
> 
> It should be noted that the locking scheme here is to release the RTNL
> lock before the flashing operation and to take it again afterwards to
> ensure consistent state from the core code point of view. In this time,
> we take a reference to the device to prevent the device being freed
> while its being flashed.
> 
> This provides protection for the device being flashed while at the same
> time not holding up any networking related functions which would
> otherwise be locked out due to RTNL being held.

Series applied, thanks.

^ permalink raw reply

* Re: [B.A.T.M.A.N.] [RFC v2 2/6] batman-adv: Rename batman-adv.h to batadv_genl.h
From: Willem de Bruijn @ 2017-12-15 17:23 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: b.a.t.m.a.n, Eric Dumazet, Network Development, LKML, Jiri Pirko,
	David S . Miller
In-Reply-To: <1591888.FGtWPsc1tq@sven-edge>

On Fri, Dec 15, 2017 at 12:18 PM, Sven Eckelmann
<sven.eckelmann@openmesh.com> wrote:
> On Freitag, 15. Dezember 2017 11:57:55 CET Willem de Bruijn wrote:
>> > No, this is also bad because batman_adv.h is MIT license and packet.h is
>> > GPL-2. So what other name would you suggest for packet.h? batman_adv_packet.h?
>>
>> Sure, that sounds great. Thanks.
>
> Really? Isn't include/uapi/linux/batman_adv_packet.h looking like an accident
> which never should have had happened?

My only point was that renaming and modifying existing uapi files
can break userspace compilation.

As long as the existing files are not changed, I don't have a strong
opinion on naming for new files.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/3] net: dsa: mediatek: add VLAN support for MT7530
From: kbuild test robot @ 2017-12-15 17:23 UTC (permalink / raw)
  To: sean.wang
  Cc: kbuild-all, davem, andrew, f.fainelli, vivien.didelot, netdev,
	linux-kernel, linux-mediatek, Sean Wang
In-Reply-To: <72a0a9f2748193bc02fed5e74c343aa5397348b7.1513136754.git.sean.wang@mediatek.com>

Hi Sean,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/sean-wang-mediatek-com/add-VLAN-support-to-DSA-MT7530/20171215-214450
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)


vim +1324 drivers/net/dsa/mt7530.c

  1305	
  1306	static const struct dsa_switch_ops mt7530_switch_ops = {
  1307		.get_tag_protocol	= mtk_get_tag_protocol,
  1308		.setup			= mt7530_setup,
  1309		.get_strings		= mt7530_get_strings,
  1310		.phy_read		= mt7530_phy_read,
  1311		.phy_write		= mt7530_phy_write,
  1312		.get_ethtool_stats	= mt7530_get_ethtool_stats,
  1313		.get_sset_count		= mt7530_get_sset_count,
  1314		.adjust_link		= mt7530_adjust_link,
  1315		.port_enable		= mt7530_port_enable,
  1316		.port_disable		= mt7530_port_disable,
  1317		.port_stp_state_set	= mt7530_stp_state_set,
  1318		.port_bridge_join	= mt7530_port_bridge_join,
  1319		.port_bridge_leave	= mt7530_port_bridge_leave,
  1320		.port_fdb_add		= mt7530_port_fdb_add,
  1321		.port_fdb_del		= mt7530_port_fdb_del,
  1322		.port_fdb_dump		= mt7530_port_fdb_dump,
  1323		.port_vlan_filtering	= mt7530_port_vlan_filtering,
> 1324		.port_vlan_prepare	= mt7530_port_vlan_prepare,
> 1325		.port_vlan_add		= mt7530_port_vlan_add,
  1326		.port_vlan_del		= mt7530_port_vlan_del,
  1327	};
  1328	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* [PATCH] net: arc_emac: fix arc_emac_rx() error paths
From: Alexander Kochetkov @ 2017-12-15 17:20 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Florian Fainelli, Eric Dumazet, Alexander Kochetkov

arc_emac_rx() has some issues found by code review.

In case netdev_alloc_skb_ip_align() or dma_map_single() failure
rx fifo entry will not be returned to EMAC.

In case dma_map_single() failure previously allocated skb became
lost to driver. At the same time address of newly allocated skb
will not be provided to EMAC.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
---
 drivers/net/ethernet/arc/emac_main.c |   53 ++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index b2e0051..0ea57fe 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -212,39 +212,48 @@ static int arc_emac_rx(struct net_device *ndev, int budget)
 			continue;
 		}
 
-		pktlen = info & LEN_MASK;
-		stats->rx_packets++;
-		stats->rx_bytes += pktlen;
-		skb = rx_buff->skb;
-		skb_put(skb, pktlen);
-		skb->dev = ndev;
-		skb->protocol = eth_type_trans(skb, ndev);
-
-		dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
-				 dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
-
-		/* Prepare the BD for next cycle */
-		rx_buff->skb = netdev_alloc_skb_ip_align(ndev,
-							 EMAC_BUFFER_SIZE);
-		if (unlikely(!rx_buff->skb)) {
+		/* Prepare the BD for next cycle. netif_receive_skb()
+		 * only if new skb was allocated and mapped to avoid holes
+		 * in the RX fifo.
+		 */
+		skb = netdev_alloc_skb_ip_align(ndev, EMAC_BUFFER_SIZE);
+		if (unlikely(!skb)) {
+			if (net_ratelimit())
+				netdev_err(ndev, "cannot allocate skb\n");
+			/* Return ownership to EMAC */
+			rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
 			stats->rx_errors++;
-			/* Because receive_skb is below, increment rx_dropped */
 			stats->rx_dropped++;
 			continue;
 		}
 
-		/* receive_skb only if new skb was allocated to avoid holes */
-		netif_receive_skb(skb);
-
-		addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
+		addr = dma_map_single(&ndev->dev, (void *)skb->data,
 				      EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);
 		if (dma_mapping_error(&ndev->dev, addr)) {
 			if (net_ratelimit())
-				netdev_err(ndev, "cannot dma map\n");
-			dev_kfree_skb(rx_buff->skb);
+				netdev_err(ndev, "cannot map dma buffer\n");
+			dev_kfree_skb(skb);
+			/* Return ownership to EMAC */
+			rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
 			stats->rx_errors++;
+			stats->rx_dropped++;
 			continue;
 		}
+
+		/* unmap previosly mapped skb */
+		dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
+				 dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
+
+		pktlen = info & LEN_MASK;
+		stats->rx_packets++;
+		stats->rx_bytes += pktlen;
+		skb_put(rx_buff->skb, pktlen);
+		rx_buff->skb->dev = ndev;
+		rx_buff->skb->protocol = eth_type_trans(rx_buff->skb, ndev);
+
+		netif_receive_skb(rx_buff->skb);
+
+		rx_buff->skb = skb;
 		dma_unmap_addr_set(rx_buff, addr, addr);
 		dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [RFC v2 2/6] batman-adv: Rename batman-adv.h to batadv_genl.h
From: Sven Eckelmann @ 2017-12-15 17:18 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: Willem de Bruijn, Network Development, Eric Dumazet, LKML,
	Jiri Pirko, David S . Miller
In-Reply-To: <CAF=yD-JTfT-iOBG6KMhXv=KggoZ4tEP1fiJMiHMA_d0-wYncLQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

On Freitag, 15. Dezember 2017 11:57:55 CET Willem de Bruijn wrote:
> > No, this is also bad because batman_adv.h is MIT license and packet.h is
> > GPL-2. So what other name would you suggest for packet.h? batman_adv_packet.h?
> 
> Sure, that sounds great. Thanks.

Really? Isn't include/uapi/linux/batman_adv_packet.h looking like an accident 
which never should have had happened?

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [patch net] mlxsw: spectrum: Disable MAC learning for ovs port
From: Jiri Pirko @ 2017-12-15 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, yuvalm, idosch, mlxsw
In-Reply-To: <20171215132658.6553-1-jiri@resnulli.us>

Fri, Dec 15, 2017 at 02:26:58PM CET, jiri@resnulli.us wrote:
>From: Yuval Mintz <yuvalm@mellanox.com>
>
>Learning is currently enabled for ports which are OVS slaves -
>even though OVS doesn't need this indication.
>Since we're not associating a fid with the port, HW would continuously
>notify driver of learned [& aged] MACs which would be logged as errors.
>
>Fixes: 2b94e58df58c ("mlxsw: spectrum: Allow ports to work under OVS master")
>Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
>Reviewed-by: Ido Schimmel <idosch@mellanox.com>
>Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Oh, I sent this one twice. Sorry :)

^ permalink raw reply

* Re: [patch net-next v3 00/10] net: sched: allow qdiscs to share filter block instances
From: Jiri Pirko @ 2017-12-15 17:10 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <19ee8268-a93c-8c99-6005-b521a0ef346d@gmail.com>

Fri, Dec 15, 2017 at 06:08:13PM CET, dsahern@gmail.com wrote:
>On 12/13/17 5:46 PM, Jakub Kicinski wrote:
>> On Wed, 13 Dec 2017 19:42:41 +0100, Jiri Pirko wrote:
>>>>>>> I plan to do it as a follow-up patch. But this is how things are done
>>>>>>> now and have to continue to work.  
>>>>>>
>>>>>> Why is that? You are introducing the notion of a shared block with this
>>>>>> patch set. What is the legacy "how things are done now" you are
>>>>>> referring to?  
>>>>>
>>>>> Well, the filter add/del should just work no matter if the block behind is
>>>>> shared or not.  
>>>>
>>>> My argument is that modifying a shared block instance via a dev should
>>>> not be allowed. Those changes should only be allowed via the shared
>>>> block. So if a user puts adds a shared block to the device and then
>>>> attempts to add a filter via the device it should not be allowed.  
>>>
>>> I don't see why. The handle is the qdisc here.
>> 
>> If you look at it from Linux perspective that makes sense.  For people
>> coming from switching world the fact that we use qdiscs as a handle for
>> ACL blocks is an implementation detail..  is that the argument here?
>> 
>
>In a sense, yes. When configuring the filter, the primary command line
>argument is the device. The qdisc is then derived from it and is an
>implementation detail.

It is dev-handle tuple.

^ permalink raw reply

* Re: [patch net-next v3 00/10] net: sched: allow qdiscs to share filter block instances
From: David Ahern @ 2017-12-15 17:08 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, simon.horman, pieter.jansenvanvuuren, john.hurley,
	alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <20171213164652.5e5dfa2b@cakuba.netronome.com>

On 12/13/17 5:46 PM, Jakub Kicinski wrote:
> On Wed, 13 Dec 2017 19:42:41 +0100, Jiri Pirko wrote:
>>>>>> I plan to do it as a follow-up patch. But this is how things are done
>>>>>> now and have to continue to work.  
>>>>>
>>>>> Why is that? You are introducing the notion of a shared block with this
>>>>> patch set. What is the legacy "how things are done now" you are
>>>>> referring to?  
>>>>
>>>> Well, the filter add/del should just work no matter if the block behind is
>>>> shared or not.  
>>>
>>> My argument is that modifying a shared block instance via a dev should
>>> not be allowed. Those changes should only be allowed via the shared
>>> block. So if a user puts adds a shared block to the device and then
>>> attempts to add a filter via the device it should not be allowed.  
>>
>> I don't see why. The handle is the qdisc here.
> 
> If you look at it from Linux perspective that makes sense.  For people
> coming from switching world the fact that we use qdiscs as a handle for
> ACL blocks is an implementation detail..  is that the argument here?
> 

In a sense, yes. When configuring the filter, the primary command line
argument is the device. The qdisc is then derived from it and is an
implementation detail.

^ permalink raw reply

* [PATCH v2 net-next 2/4] net: tracepoint: replace tcp_set_state tracepoint with sock_set_state tracepoint
From: Yafang Shao @ 2017-12-15 17:01 UTC (permalink / raw)
  To: songliubraving, davem, marcelo.leitner, rostedt
  Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513357314-8402-1-git-send-email-laoar.shao@gmail.com>

As sk_state is a common field for struct sock, so the state
transition should not be a TCP specific feature.
So I rename tcp_set_state tracepoint to sock_set_state tracepoint with
some minor changes and move it into file trace/events/sock.h.

Two helpers are introduced to trace sk_state transition
    - void sk_state_store(struct sock *sk, int state);
    - void sk_set_state(struct sock *sk, int state);
As trace header should not be included in other header files,
so they are defined in sock.c.

The protocol such as SCTP maybe compiled as a ko, hence export
sk_set_state().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/net/sock.h              |  15 +-----
 include/trace/events/sock.h     | 106 ++++++++++++++++++++++++++++++++++++++++
 include/trace/events/tcp.h      |  91 ----------------------------------
 net/core/sock.c                 |  13 +++++
 net/ipv4/inet_connection_sock.c |   4 +-
 net/ipv4/inet_hashtables.c      |   2 +-
 net/ipv4/tcp.c                  |   4 --
 7 files changed, 124 insertions(+), 111 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 9a90472..988ce82 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2344,19 +2344,8 @@ static inline int sk_state_load(const struct sock *sk)
 	return smp_load_acquire(&sk->sk_state);
 }

-/**
- * sk_state_store - update sk->sk_state
- * @sk: socket pointer
- * @newstate: new state
- *
- * Paired with sk_state_load(). Should be used in contexts where
- * state change might impact lockless readers.
- */
-static inline void sk_state_store(struct sock *sk, int newstate)
-{
-	smp_store_release(&sk->sk_state, newstate);
-}
-
+void sk_state_store(struct sock *sk, int newstate);
+void sk_set_state(struct sock *sk, int state);
 void sock_enable_timestamp(struct sock *sk, int flag);
 int sock_get_timestamp(struct sock *, struct timeval __user *);
 int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index ec4dade..61977e5 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -6,7 +6,49 @@
 #define _TRACE_SOCK_H

 #include <net/sock.h>
+#include <net/ipv6.h>
 #include <linux/tracepoint.h>
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+
+#define inet_protocol_names		\
+		EM(IPPROTO_TCP)			\
+		EM(IPPROTO_DCCP)		\
+		EMe(IPPROTO_SCTP)
+
+#define tcp_state_names			\
+		EM(TCP_ESTABLISHED)		\
+		EM(TCP_SYN_SENT)		\
+		EM(TCP_SYN_RECV)		\
+		EM(TCP_FIN_WAIT1)	   \
+		EM(TCP_FIN_WAIT2)	   \
+		EM(TCP_TIME_WAIT)	   \
+		EM(TCP_CLOSE)		   \
+		EM(TCP_CLOSE_WAIT)	  \
+		EM(TCP_LAST_ACK)		\
+		EM(TCP_LISTEN)		  \
+		EM(TCP_CLOSING)		 \
+		EMe(TCP_NEW_SYN_RECV)
+
+/* enums need to be exported to user space */
+#undef EM
+#undef EMe
+#define EM(a)       TRACE_DEFINE_ENUM(a);
+#define EMe(a)      TRACE_DEFINE_ENUM(a);
+
+inet_protocol_names
+tcp_state_names
+
+#undef EM
+#undef EMe
+#define EM(a)       { a, #a },
+#define EMe(a)      { a, #a }
+
+#define show_inet_protocol_name(val)	\
+	__print_symbolic(val, inet_protocol_names)
+
+#define show_tcp_state_name(val)		\
+	__print_symbolic(val, tcp_state_names)

 TRACE_EVENT(sock_rcvqueue_full,

@@ -63,6 +105,70 @@
 		__entry->rmem_alloc)
 );

+TRACE_EVENT(sock_set_state,
+
+	TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
+
+	TP_ARGS(sk, oldstate, newstate),
+
+	TP_STRUCT__entry(
+		__field(const void *, skaddr)
+		__field(int, oldstate)
+		__field(int, newstate)
+		__field(__u16, sport)
+		__field(__u16, dport)
+		__field(__u8, protocol)
+		__array(__u8, saddr, 4)
+		__array(__u8, daddr, 4)
+		__array(__u8, saddr_v6, 16)
+		__array(__u8, daddr_v6, 16)
+	),
+
+	TP_fast_assign(
+		struct inet_sock *inet = inet_sk(sk);
+		struct in6_addr *pin6;
+		__be32 *p32;
+
+		__entry->skaddr = sk;
+		__entry->oldstate = oldstate;
+		__entry->newstate = newstate;
+
+		__entry->protocol = sk->sk_protocol;
+		__entry->sport = ntohs(inet->inet_sport);
+		__entry->dport = ntohs(inet->inet_dport);
+
+		p32 = (__be32 *) __entry->saddr;
+		*p32 = inet->inet_saddr;
+
+		p32 = (__be32 *) __entry->daddr;
+		*p32 =  inet->inet_daddr;
+
+#if IS_ENABLED(CONFIG_IPV6)
+		if (sk->sk_family == AF_INET6) {
+			pin6 = (struct in6_addr *)__entry->saddr_v6;
+			*pin6 = sk->sk_v6_rcv_saddr;
+			pin6 = (struct in6_addr *)__entry->daddr_v6;
+			*pin6 = sk->sk_v6_daddr;
+		} else
+#endif
+		{
+			pin6 = (struct in6_addr *)__entry->saddr_v6;
+			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
+			pin6 = (struct in6_addr *)__entry->daddr_v6;
+			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
+		}
+	),
+
+	TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4"
+			"saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
+			show_inet_protocol_name(__entry->protocol),
+			__entry->sport, __entry->dport,
+			__entry->saddr, __entry->daddr,
+			__entry->saddr_v6, __entry->daddr_v6,
+			show_tcp_state_name(__entry->oldstate),
+			show_tcp_state_name(__entry->newstate))
+);
+
 #endif /* _TRACE_SOCK_H */

 /* This part must be outside protection */
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 40240ac..7399399 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -9,37 +9,6 @@
 #include <linux/tracepoint.h>
 #include <net/ipv6.h>

-#define tcp_state_names			\
-		EM(TCP_ESTABLISHED)		\
-		EM(TCP_SYN_SENT)		\
-		EM(TCP_SYN_RECV)		\
-		EM(TCP_FIN_WAIT1)		\
-		EM(TCP_FIN_WAIT2)		\
-		EM(TCP_TIME_WAIT)		\
-		EM(TCP_CLOSE)			\
-		EM(TCP_CLOSE_WAIT)		\
-		EM(TCP_LAST_ACK)		\
-		EM(TCP_LISTEN)			\
-		EM(TCP_CLOSING)			\
-		EMe(TCP_NEW_SYN_RECV)	\
-
-/* enums need to be exported to user space */
-#undef EM
-#undef EMe
-#define EM(a)         TRACE_DEFINE_ENUM(a);
-#define EMe(a)        TRACE_DEFINE_ENUM(a);
-
-tcp_state_names
-
-#undef EM
-#undef EMe
-#define EM(a)         tcp_state_name(a),
-#define EMe(a)        tcp_state_name(a)
-
-#define tcp_state_name(state)	{ state, #state }
-#define show_tcp_state_name(val)			\
-	__print_symbolic(val, tcp_state_names)
-
 /*
  * tcp event with arguments sk and skb
  *
@@ -192,66 +161,6 @@
 	TP_ARGS(sk)
 );

-TRACE_EVENT(tcp_set_state,
-
-	TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
-
-	TP_ARGS(sk, oldstate, newstate),
-
-	TP_STRUCT__entry(
-		__field(const void *, skaddr)
-		__field(int, oldstate)
-		__field(int, newstate)
-		__field(__u16, sport)
-		__field(__u16, dport)
-		__array(__u8, saddr, 4)
-		__array(__u8, daddr, 4)
-		__array(__u8, saddr_v6, 16)
-		__array(__u8, daddr_v6, 16)
-	),
-
-	TP_fast_assign(
-		struct inet_sock *inet = inet_sk(sk);
-		struct in6_addr *pin6;
-		__be32 *p32;
-
-		__entry->skaddr = sk;
-		__entry->oldstate = oldstate;
-		__entry->newstate = newstate;
-
-		__entry->sport = ntohs(inet->inet_sport);
-		__entry->dport = ntohs(inet->inet_dport);
-
-		p32 = (__be32 *) __entry->saddr;
-		*p32 = inet->inet_saddr;
-
-		p32 = (__be32 *) __entry->daddr;
-		*p32 =  inet->inet_daddr;
-
-#if IS_ENABLED(CONFIG_IPV6)
-		if (sk->sk_family == AF_INET6) {
-			pin6 = (struct in6_addr *)__entry->saddr_v6;
-			*pin6 = sk->sk_v6_rcv_saddr;
-			pin6 = (struct in6_addr *)__entry->daddr_v6;
-			*pin6 = sk->sk_v6_daddr;
-		} else
-#endif
-		{
-			pin6 = (struct in6_addr *)__entry->saddr_v6;
-			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
-			pin6 = (struct in6_addr *)__entry->daddr_v6;
-			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
-		}
-	),
-
-	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
-		  __entry->sport, __entry->dport,
-		  __entry->saddr, __entry->daddr,
-		  __entry->saddr_v6, __entry->daddr_v6,
-		  show_tcp_state_name(__entry->oldstate),
-		  show_tcp_state_name(__entry->newstate))
-);
-
 TRACE_EVENT(tcp_retransmit_synack,

 	TP_PROTO(const struct sock *sk, const struct request_sock *req),
diff --git a/net/core/sock.c b/net/core/sock.c
index c0b5b2f..717f7f6 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2859,6 +2859,19 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
 }
 EXPORT_SYMBOL(sock_get_timestampns);

+void sk_state_store(struct sock *sk, int state)
+{
+	trace_sock_set_state(sk, sk->sk_state, state);
+	smp_store_release(&sk->sk_state, state);
+}
+
+void sk_set_state(struct sock *sk, int state)
+{
+	trace_sock_set_state(sk, sk->sk_state, state);
+	 sk->sk_state = state;
+}
+EXPORT_SYMBOL(sk_set_state);
+
 void sock_enable_timestamp(struct sock *sk, int flag)
 {
 	if (!sock_flag(sk, flag)) {
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ca46dc..001f7b0 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -783,7 +783,7 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
 	if (newsk) {
 		struct inet_connection_sock *newicsk = inet_csk(newsk);

-		newsk->sk_state = TCP_SYN_RECV;
+		sk_set_state(newsk, TCP_SYN_RECV);
 		newicsk->icsk_bind_hash = NULL;

 		inet_sk(newsk)->inet_dport = inet_rsk(req)->ir_rmt_port;
@@ -888,7 +888,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
 			return 0;
 	}

-	sk->sk_state = TCP_CLOSE;
+	sk_set_state(sk, TCP_CLOSE);
 	return err;
 }
 EXPORT_SYMBOL_GPL(inet_csk_listen_start);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f6f5810..5973693 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -544,7 +544,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk)
 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
 	} else {
 		percpu_counter_inc(sk->sk_prot->orphan_count);
-		sk->sk_state = TCP_CLOSE;
+		sk_set_state(sk, TCP_CLOSE);
 		sock_set_flag(sk, SOCK_DEAD);
 		inet_csk_destroy_sock(sk);
 	}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c470fec..df6da92 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -283,8 +283,6 @@
 #include <asm/ioctls.h>
 #include <net/busy_poll.h>

-#include <trace/events/tcp.h>
-
 struct percpu_counter tcp_orphan_count;
 EXPORT_SYMBOL_GPL(tcp_orphan_count);

@@ -2040,8 +2038,6 @@ void tcp_set_state(struct sock *sk, int state)
 {
 	int oldstate = sk->sk_state;

-	trace_tcp_set_state(sk, oldstate, state);
-
 	switch (state) {
 	case TCP_ESTABLISHED:
 		if (oldstate != TCP_ESTABLISHED)

^ permalink raw reply related

* [PATCH v2 net-next 0/4]  replace tcp_set_state tracepoint with
From: Yafang Shao @ 2017-12-15 17:01 UTC (permalink / raw)
  To: songliubraving, davem, marcelo.leitner, rostedt
  Cc: bgregg, netdev, linux-kernel, Yafang Shao

Hi,

According to the discussion in the mail thread
https://patchwork.kernel.org/patch/10099243/,
tcp_set_state tracepoint is renamed to sock_set_state tracepoint and is moved
to include/trace/events/sock.h.

Using this new tracepoint to trace TCP/DCCP/SCTP state transition.

v1-v2: Steven's patch is included in this series.

Steven Rostedt:
  tcp: Export to userspace the TCP state names for the trace events

Yafang Shao (3):
  net: tracepoint: using sock_set_state tracepoint to trace SCTP state
    transition
  net: tracepoint: replace tcp_set_state tracepoint with sock_set_state
    tracepoint
  net: tracepoint: using sock_set_state tracepoint to trace DCCP state
    transition

 include/net/sock.h              |  15 +-----
 include/trace/events/sock.h     | 106 ++++++++++++++++++++++++++++++++++++++++
 include/trace/events/tcp.h      |  76 ----------------------------
 net/core/sock.c                 |  13 +++++
 net/dccp/proto.c                |   2 +-
 net/ipv4/inet_connection_sock.c |   4 +-
 net/ipv4/inet_hashtables.c      |   2 +-
 net/ipv4/tcp.c                  |   4 --
 net/sctp/endpointola.c          |   2 +-
 net/sctp/sm_sideeffect.c        |   4 +-
 net/sctp/socket.c               |  12 ++---
 11 files changed, 134 insertions(+), 106 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v2 net-next 4/4] net: tracepoint: using sock_set_state tracepoint to trace SCTP state transition
From: Yafang Shao @ 2017-12-15 17:01 UTC (permalink / raw)
  To: songliubraving, davem, marcelo.leitner, rostedt
  Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513357314-8402-1-git-send-email-laoar.shao@gmail.com>

With changes in inet_ files, SCTP state transitions are traced with
sockt_set_state tracepoint.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 net/sctp/endpointola.c   |  2 +-
 net/sctp/sm_sideeffect.c |  4 ++--
 net/sctp/socket.c        | 12 ++++++------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index ee1e601..5e129df 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -232,7 +232,7 @@ void sctp_endpoint_free(struct sctp_endpoint *ep)
 {
 	ep->base.dead = true;

-	ep->base.sk->sk_state = SCTP_SS_CLOSED;
+	 sk_set_state(ep->base.sk, SCTP_SS_CLOSED);

 	/* Unlink this endpoint, so we can't find it again! */
 	sctp_unhash_endpoint(ep);
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 8adde71..22ab3b4 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -878,12 +878,12 @@ static void sctp_cmd_new_state(struct sctp_cmd_seq *cmds,
 		 * successfully completed a connect() call.
 		 */
 		if (sctp_state(asoc, ESTABLISHED) && sctp_sstate(sk, CLOSED))
-			sk->sk_state = SCTP_SS_ESTABLISHED;
+			sk_set_state(sk, SCTP_SS_ESTABLISHED);

 		/* Set the RCV_SHUTDOWN flag when a SHUTDOWN is received. */
 		if (sctp_state(asoc, SHUTDOWN_RECEIVED) &&
 		    sctp_sstate(sk, ESTABLISHED)) {
-			sk->sk_state = SCTP_SS_CLOSING;
+			sk_set_state(sk, SCTP_SS_CLOSING);
 			sk->sk_shutdown |= RCV_SHUTDOWN;
 		}
 	}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7eec0a0..ecb532c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1544,7 +1544,7 @@ static void sctp_close(struct sock *sk, long timeout)

 	lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
 	sk->sk_shutdown = SHUTDOWN_MASK;
-	sk->sk_state = SCTP_SS_CLOSING;
+	sk_set_state(sk, SCTP_SS_CLOSING);

 	ep = sctp_sk(sk)->ep;

@@ -4653,7 +4653,7 @@ static void sctp_shutdown(struct sock *sk, int how)
 	if (how & SEND_SHUTDOWN && !list_empty(&ep->asocs)) {
 		struct sctp_association *asoc;

-		sk->sk_state = SCTP_SS_CLOSING;
+		sk_set_state(sk, SCTP_SS_CLOSING);
 		asoc = list_entry(ep->asocs.next,
 				  struct sctp_association, asocs);
 		sctp_primitive_SHUTDOWN(net, asoc, NULL);
@@ -7509,13 +7509,13 @@ static int sctp_listen_start(struct sock *sk, int backlog)
 	 * sockets.
 	 *
 	 */
-	sk->sk_state = SCTP_SS_LISTENING;
+	sk_set_state(sk, SCTP_SS_LISTENING);
 	if (!ep->base.bind_addr.port) {
 		if (sctp_autobind(sk))
 			return -EAGAIN;
 	} else {
 		if (sctp_get_port(sk, inet_sk(sk)->inet_num)) {
-			sk->sk_state = SCTP_SS_CLOSED;
+			sk_set_state(sk, SCTP_SS_CLOSED);
 			return -EADDRINUSE;
 		}
 	}
@@ -8538,10 +8538,10 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
 	 * is called, set RCV_SHUTDOWN flag.
 	 */
 	if (sctp_state(assoc, CLOSED) && sctp_style(newsk, TCP)) {
-		newsk->sk_state = SCTP_SS_CLOSED;
+		sk_set_state(newsk, SCTP_SS_CLOSED);
 		newsk->sk_shutdown |= RCV_SHUTDOWN;
 	} else {
-		newsk->sk_state = SCTP_SS_ESTABLISHED;
+		sk_set_state(newsk, SCTP_SS_ESTABLISHED);
 	}

 	release_sock(newsk);
--
1.8.3.1

^ permalink raw reply related

* [PATCH v2 net-next 3/4] net: tracepoint: using sock_set_state tracepoint to trace DCCP state transition
From: Yafang Shao @ 2017-12-15 17:01 UTC (permalink / raw)
  To: songliubraving, davem, marcelo.leitner, rostedt
  Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513357314-8402-1-git-send-email-laoar.shao@gmail.com>

With changes in inet_ files, DCCP state transitions are traced with
sock_set_state tracepoint.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 net/dccp/proto.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 9d43c1f..2874faf 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -110,7 +110,7 @@ void dccp_set_state(struct sock *sk, const int state)
 	/* Change state AFTER socket is unhashed to avoid closed
 	 * socket sitting in hash tables.
 	 */
-	sk->sk_state = state;
+	sk_set_state(sk, state);
 }

 EXPORT_SYMBOL_GPL(dccp_set_state);
--
1.8.3.1

^ permalink raw reply related

* [PATCH v2 net-next 1/4] tcp: Export to userspace the TCP state names for the trace events
From: Yafang Shao @ 2017-12-15 17:01 UTC (permalink / raw)
  To: songliubraving, davem, marcelo.leitner, rostedt
  Cc: bgregg, netdev, linux-kernel, Yafang Shao
In-Reply-To: <1513357314-8402-1-git-send-email-laoar.shao@gmail.com>

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The TCP trace events (specifically tcp_set_state), maps emums to symbol
names via __print_symbolic(). But this only works for reading trace events
from the tracefs trace files. If perf or trace-cmd were to record these
events, the event format file does not convert the enum names into numbers,
and you get something like:

__print_symbolic(REC->oldstate,
    { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
    { TCP_SYN_SENT, "TCP_SYN_SENT" },
    { TCP_SYN_RECV, "TCP_SYN_RECV" },
    { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
    { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
    { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
    { TCP_CLOSE, "TCP_CLOSE" },
    { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
    { TCP_LAST_ACK, "TCP_LAST_ACK" },
    { TCP_LISTEN, "TCP_LISTEN" },
    { TCP_CLOSING, "TCP_CLOSING" },
    { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })

Where trace-cmd and perf do not know the values of those enums.

Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
the enum strings into their values at system boot. This will allow perf and
trace-cmd to see actual numbers and not enums:

__print_symbolic(REC->oldstate,
    { 1, "TCP_ESTABLISHED" },
    { 2, "TCP_SYN_SENT" },
    { 3, "TCP_SYN_RECV" },
    { 4, "TCP_FIN_WAIT1" },
    { 5, "TCP_FIN_WAIT2" },
    { 6, "TCP_TIME_WAIT" },
    { 7, "TCP_CLOSE" },
    { 8, "TCP_CLOSE_WAIT" },
    { 9, "TCP_LAST_ACK" },
    { 10, "TCP_LISTEN" },
    { 11, "TCP_CLOSING" },
    { 12, "TCP_NEW_SYN_RECV" })

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/trace/events/tcp.h | 41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 07cccca..40240ac 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -9,21 +9,36 @@
 #include <linux/tracepoint.h>
 #include <net/ipv6.h>

+#define tcp_state_names			\
+		EM(TCP_ESTABLISHED)		\
+		EM(TCP_SYN_SENT)		\
+		EM(TCP_SYN_RECV)		\
+		EM(TCP_FIN_WAIT1)		\
+		EM(TCP_FIN_WAIT2)		\
+		EM(TCP_TIME_WAIT)		\
+		EM(TCP_CLOSE)			\
+		EM(TCP_CLOSE_WAIT)		\
+		EM(TCP_LAST_ACK)		\
+		EM(TCP_LISTEN)			\
+		EM(TCP_CLOSING)			\
+		EMe(TCP_NEW_SYN_RECV)	\
+
+/* enums need to be exported to user space */
+#undef EM
+#undef EMe
+#define EM(a)         TRACE_DEFINE_ENUM(a);
+#define EMe(a)        TRACE_DEFINE_ENUM(a);
+
+tcp_state_names
+
+#undef EM
+#undef EMe
+#define EM(a)         tcp_state_name(a),
+#define EMe(a)        tcp_state_name(a)
+
 #define tcp_state_name(state)	{ state, #state }
 #define show_tcp_state_name(val)			\
-	__print_symbolic(val,				\
-		tcp_state_name(TCP_ESTABLISHED),	\
-		tcp_state_name(TCP_SYN_SENT),		\
-		tcp_state_name(TCP_SYN_RECV),		\
-		tcp_state_name(TCP_FIN_WAIT1),		\
-		tcp_state_name(TCP_FIN_WAIT2),		\
-		tcp_state_name(TCP_TIME_WAIT),		\
-		tcp_state_name(TCP_CLOSE),		\
-		tcp_state_name(TCP_CLOSE_WAIT),		\
-		tcp_state_name(TCP_LAST_ACK),		\
-		tcp_state_name(TCP_LISTEN),		\
-		tcp_state_name(TCP_CLOSING),		\
-		tcp_state_name(TCP_NEW_SYN_RECV))
+	__print_symbolic(val, tcp_state_names)

 /*
  * tcp event with arguments sk and skb
--
1.8.3.1

^ permalink raw reply related

* Re: [B.A.T.M.A.N.] [RFC v2 2/6] batman-adv: Rename batman-adv.h to batadv_genl.h
From: Willem de Bruijn @ 2017-12-15 16:57 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: b.a.t.m.a.n, Network Development, Jiri Pirko, LKML, Eric Dumazet,
	David S . Miller
In-Reply-To: <1780732.fBLghQceHc@bentobox>

On Fri, Dec 15, 2017 at 6:48 AM, Sven Eckelmann
<sven.eckelmann@openmesh.com> wrote:
> On Freitag, 15. Dezember 2017 11:32:05 CET Sven Eckelmann wrote:
>> On Mittwoch, 6. Dezember 2017 11:58:14 CET Willem de Bruijn wrote:
>> [...]
>> > >> > ---
>> > >> >  MAINTAINERS                                        | 2 +-
>> > >> >  include/uapi/linux/{batman_adv.h => batadv_genl.h} | 6 +++---
>> > >>
>> > >> This and the previous patch changes uapi. That might break userspace
>> > >> applications that rely on it.
>> > >
>> > > I am not aware of any application because all (alfred, batctl and some gluon
>> > > integration) of them currently ship their own copy because distribution didn't
>> > > catch up. And this is also the reason why I want to do it now - not later.
>> >
>> > That assumes that you know all applications, including those not
>> > publicly available. It may be true in this instance, but it is not
>> > possible to be certain.
>>
>> I've just talked with Simon. Because you have a problem with these two
>> changes, he suggested that I should drop these two patches and merge packet.h
>> with the uapi batadv genl header batman_adv.h
>
> No, this is also bad because batman_adv.h is MIT license and packet.h is
> GPL-2. So what other name would you suggest for packet.h? batman_adv_packet.h?

Sure, that sounds great. Thanks.

^ permalink raw reply

* Re: ixgbe tuning reset when XDP is setup
From: Peter Manev @ 2017-12-15 16:56 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: John Fastabend, David Miller, eric, Netdev, xdp-newbies,
	Emil Tantilov
In-Reply-To: <CAKgT0UeFUnYdvNZZAzFibYm6JXccyv+9q3=5j1AsAe9Z9bGhjw@mail.gmail.com>


> On 15 Dec 2017, at 17:51, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> 
> On Fri, Dec 15, 2017 at 8:03 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 12/15/2017 07:53 AM, David Miller wrote:
>>> From: Eric Leblond <eric@regit.org>
>>> Date: Fri, 15 Dec 2017 11:24:46 +0100
>>> 
>>>> Hello,
>>>> 
>>>> When using an ixgbe card with Suricata we are using the following
>>>> commands to get a symmetric hash on RSS load balancing:
>>>> 
>>>> ./set_irq_affinity 0-15 eth3
>>>> ethtool -X eth3 hkey 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 16
>>>> ethtool -x eth3
>>>> ethtool -n eth3
>>>> 
>>>> Then we start Suricata.
>>>> 
>>>> In my current experiment on XDP, I have Suricata that inject the eBPF
>>>> program when starting. The consequence of that when using an ixgbe card
>>>> is that the load balancing get reset and all interrupts are reaching
>>>> the first core.
>>> 
>>> This definitely should _not_ be a side effect of enabling XDP on a device.
>>> 
>> 
>> Agreed, CC Emil and Alex we should restore these settings after the
>> reconfiguration done to support a queue per core.
>> 
>> .John
> 
> So the interrupt configuration has to get reset since we have to
> assign 2 Tx queues for every Rx queue instead of the 1-1 that was
> previously there. That is a natural consequence of rearranging the
> queues as currently happens. The issue is the q_vectors themselves
> have to be reallocated. The only way to not make that happen would be
> to pre-allocate the Tx queues for XDP always.
> 
> Also just to be clear we are talking about the interrupts being reset,
> not the RSS key right? I just want to make sure that is what we are
> talking about.
> 

Yes.
From the tests we did I only observed the IRQs being all reset to the first CPU after Suricata started.



> Thanks.
> 
> - Alex

^ permalink raw reply

* Re: ixgbe tuning reset when XDP is setup
From: Alexander Duyck @ 2017-12-15 16:51 UTC (permalink / raw)
  To: John Fastabend
  Cc: David Miller, eric, Netdev, xdp-newbies, pmanev, Emil Tantilov
In-Reply-To: <c677440b-46c5-e32b-5038-21a760ba738b@gmail.com>

On Fri, Dec 15, 2017 at 8:03 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 12/15/2017 07:53 AM, David Miller wrote:
>> From: Eric Leblond <eric@regit.org>
>> Date: Fri, 15 Dec 2017 11:24:46 +0100
>>
>>> Hello,
>>>
>>> When using an ixgbe card with Suricata we are using the following
>>> commands to get a symmetric hash on RSS load balancing:
>>>
>>> ./set_irq_affinity 0-15 eth3
>>> ethtool -X eth3 hkey 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 16
>>> ethtool -x eth3
>>> ethtool -n eth3
>>>
>>> Then we start Suricata.
>>>
>>> In my current experiment on XDP, I have Suricata that inject the eBPF
>>> program when starting. The consequence of that when using an ixgbe card
>>> is that the load balancing get reset and all interrupts are reaching
>>> the first core.
>>
>> This definitely should _not_ be a side effect of enabling XDP on a device.
>>
>
> Agreed, CC Emil and Alex we should restore these settings after the
> reconfiguration done to support a queue per core.
>
> .John

So the interrupt configuration has to get reset since we have to
assign 2 Tx queues for every Rx queue instead of the 1-1 that was
previously there. That is a natural consequence of rearranging the
queues as currently happens. The issue is the q_vectors themselves
have to be reallocated. The only way to not make that happen would be
to pre-allocate the Tx queues for XDP always.

Also just to be clear we are talking about the interrupts being reset,
not the RSS key right? I just want to make sure that is what we are
talking about.

Thanks.

- Alex

^ permalink raw reply

* [RFC][PATCH v2] Add primitives for manipulating bitfields both in host- and fixed-endian.
From: Al Viro @ 2017-12-15 16:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: netdev, linux-kernel, Jakub Kicinski
In-Reply-To: <20171215053401.GH21978@ZenIV.linux.org.uk>

[Folks, please review and comment; if no objections show up, into -next it goes]

The following primitives are defined in linux/bitfield.h:

* u32 le32_get_bits(__le32 val, u32 field) extracts the contents of the
  bitfield specified by @field in little-endian 32bit object @val and
  converts it to host-endian.

* void le32p_replace_bits(__le32 *p, u32 v, u32 field) replaces
  the contents of the bitfield specified by @field in little-endian
  32bit object pointed to by @p with the value of @v.  New value is
  given in host-endian and stored as little-endian.

* __le32 le32_replace_bits(__le32 old, u32 v, u32 field) is equivalent to
  ({__le32 tmp = old; le32p_replace_bits(&tmp, v, field); tmp;})
  In other words, instead of modifying an object in memory, it takes
  the initial value and returns the modified one.

* __le32 le32_encode_bits(u32 v, u32 field) is equivalent to
  le32_replace_bits(0, v, field).  In other words, it returns a little-endian
  32bit object with the bitfield specified by @field containing the
  value of @v and all bits outside that bitfield being zero.

Such set of helpers is defined for each of little-, big- and host-endian
types; e.g. u64_get_bits(val, field) will return the contents of the bitfield
specified by @field in host-endian 64bit object @val, etc.  Of course, for
host-endian no conversion is involved.

Fields to access are specified as GENMASK() values - an N-bit field
starting at bit #M is encoded as GENMASK(M + N - 1, M).  Note that
bit numbers refer to endianness of the object we are working with -
e.g. GENMASK(11, 0) in __be16 refers to the second byte and the lower
4 bits of the first byte.  In __le16 it would refer to the first byte
and the lower 4 bits of the second byte, etc.

Field specification must be a constant; __builtin_constant_p() doesn't
have to be true for it, but compiler must be able to evaluate it at
build time.  If it cannot or if the value does not encode any bitfield,
the build will fail.

If the value being stored in a bitfield is a constant that does not fit
into that bitfield, a warning will be generated at compile time.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
index 1030651f8309..cf2588d81148 100644
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -16,6 +16,7 @@
 #define _LINUX_BITFIELD_H

 #include <linux/build_bug.h>
+#include <asm/byteorder.h>

 /*
  * Bitfield access macros
@@ -103,4 +104,49 @@
 		(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask));	\
 	})

+extern void __compiletime_warning("value doesn't fit into mask")
+__field_overflow(void);
+extern void __compiletime_error("bad bitfield mask")
+__bad_mask(void);
+static __always_inline u64 field_multiplier(u64 field)
+{
+	if ((field | (field - 1)) & ((field | (field - 1)) + 1))
+		__bad_mask();
+	return field & -field;
+}
+static __always_inline u64 field_mask(u64 field)
+{
+	return field / field_multiplier(field);
+}
+#define ____MAKE_OP(type,base,to,from)					\
+static __always_inline __##type type##_encode_bits(base v, base field)	\
+{									\
+        if (__builtin_constant_p(v) &&	(v & ~field_multiplier(field)))	\
+			    __field_overflow();				\
+	return to((v & field_mask(field)) * field_multiplier(field));	\
+}									\
+static __always_inline __##type type##_replace_bits(__##type old,	\
+					base val, base field)		\
+{									\
+	return (old & ~to(field)) | type##_encode_bits(val, field);	\
+}									\
+static __always_inline void type##p_replace_bits(__##type *p,		\
+					base val, base field)		\
+{									\
+	*p = (*p & ~to(field)) | type##_encode_bits(val, field);	\
+}									\
+static __always_inline base type##_get_bits(__##type v, base field)	\
+{									\
+	return (from(v) & field)/field_multiplier(field);		\
+}
+#define __MAKE_OP(size)							\
+	____MAKE_OP(le##size,u##size,cpu_to_le##size,le##size##_to_cpu)	\
+	____MAKE_OP(be##size,u##size,cpu_to_be##size,be##size##_to_cpu)	\
+	____MAKE_OP(u##size,u##size,,)
+__MAKE_OP(16)
+__MAKE_OP(32)
+__MAKE_OP(64)
+#undef __MAKE_OP
+#undef ____MAKE_OP
+
 #endif

^ permalink raw reply related

* Re: [PATCH next 0/2] ipvlan: packet scrub
From: David Miller @ 2017-12-15 16:37 UTC (permalink / raw)
  To: mahesh; +Cc: netdev, edumazet, maheshb
In-Reply-To: <20171213224012.202819-1-mahesh@bandewar.net>

From: Mahesh Bandewar <mahesh@bandewar.net>
Date: Wed, 13 Dec 2017 14:40:12 -0800

> From: Mahesh Bandewar <maheshb@google.com>
> 
> While crossing namespace boundary IPvlan aggressively scrubs packets.
> This is creating problems. First thing is that scrubbing changes the 
> packet type in skb meta-data to PACKET_HOST. This causes erroneous
> packet delivery when dev_forward_skb() has already marked the packet
> type as OTHER_HOST.
> 
> On the egress side scrubbing just before calling dev_queue_xmit()
> creates another set of problems. Scrubbing remove skb->sk so the
> prio update gets missed and more seriously, socket back-pressure
> fails making TSQ not function correctly.
> 
> The first patch in the series just reverts the earlier change which
> was adding a mac-check, but that is unnecessary if packet_type that
> dev_forward_skb() has set is honored. The second path removes two of
> the scrubs which are causing problems described above.

Series applied, thanks for following up on this.

^ permalink raw reply

* Re: [PATCH] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: William Tu @ 2017-12-15 16:34 UTC (permalink / raw)
  To: Haishuang Yan
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Linux Kernel Network Developers, linux-kernel
In-Reply-To: <1513305998-20750-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

On Thu, Dec 14, 2017 at 6:46 PM, Haishuang Yan
<yanhaishuang@cmss.chinamobile.com> wrote:
> pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
> the right place.
>
> Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> ---

Thanks for the patch!

Acked-by: William Tu <u9012063@gmail.com>

^ permalink raw reply

* Re: [PATCH 09/20] batman-adv: include build_bug.h for BUILD_BUG_ON define
From: Sven Eckelmann @ 2017-12-15 16:32 UTC (permalink / raw)
  To: Simon Wunderlich
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20171215114320.13645-10-sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 258 bytes --]

On Freitag, 15. Dezember 2017 12:43:09 CET Simon Wunderlich wrote:
>  compat-include/linux/build_bug.h   | 34 ++++++++++++++++++++++++++++++++++

This backport change slipped in. Sorry, this was my fault when I initially 
integrating it. 

Kind regards,
Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net] sock: free skb in skb_complete_tx_timestamp on error
From: David Miller @ 2017-12-15 16:31 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, richardcochran, willemb
In-Reply-To: <20171213194106.128322-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed, 13 Dec 2017 14:41:06 -0500

> From: Willem de Bruijn <willemb@google.com>
> 
> skb_complete_tx_timestamp must ingest the skb it is passed. Call
> kfree_skb if the skb cannot be enqueued.
> 
> Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
> Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
> Reported-by: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net 0/4] s390/qeth: fixes 2017-12-13
From: David Miller @ 2017-12-15 16:31 UTC (permalink / raw)
  To: jwi; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20171213175632.100561-1-jwi@linux.vnet.ibm.com>

From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Date: Wed, 13 Dec 2017 18:56:28 +0100

> some more patches for 4.15, that fix multiple issues with IP Takeover
> configuration in qeth.
> Please queue them up for stable kernels as well (4.9 and newer).

Series applied and queued up for -stable.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox