Netdev List

Netdev List
 help / color / mirror / Atom feed

* (unknown), 
From: marketing @ 2017-10-03 12:43 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 2303159403401.zip --]
[-- Type: application/zip, Size: 7286 bytes --]

^ permalink raw reply

* Re: [PATCH] fsl/fman: remove of_node
From: Andrew Lunn @ 2017-10-03 13:00 UTC (permalink / raw)
  To: Madalin-cristian Bucur
  Cc: David Miller, netdev@vger.kernel.org, f.fainelli@gmail.com,
	linux-kernel@vger.kernel.org
In-Reply-To: <AM5PR0402MB26914CBD37E3F28C17562258EC720@AM5PR0402MB2691.eurprd04.prod.outlook.com>

On Tue, Oct 03, 2017 at 08:49:31AM +0000, Madalin-cristian Bucur wrote:
> > -----Original Message-----
> > From: David Miller [mailto:davem@davemloft.net]
> > Sent: Tuesday, October 03, 2017 2:05 AM
> > To: Madalin-cristian Bucur <madalin.bucur@nxp.com>
> > Subject: Re: [PATCH] fsl/fman: remove of_node
> > 
> > From: Madalin Bucur <madalin.bucur@nxp.com>
> > Date: Mon, 2 Oct 2017 13:31:37 +0300
> > 
> > > The FMan MAC driver allocates a platform device for the Ethernet
> > > driver to probe on. Setting pdev->dev.of_node with the MAC node
> > > triggers the MAC driver probing of the new platform device. While
> > > this fails quickly and does not affect the functionality of the
> > > drivers, it is incorrect and must be removed. This was added to
> > > address a report that DSA code using of_find_net_device_by_node()
> > > is unable to use the DPAA interfaces. Error message seen before
> > > this fix:
> > >
> > > fsl_mac dpaa-ethernet.0: __devm_request_mem_region(mac) failed
> > > fsl_mac: probe of dpaa-ethernet.0 failed with error -16
> > >
> > > Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
> > 
> > Is the DSA issue no longer something we need to be concerned
> > about?  If not, why?  You have to explain this.
> 
> My patch removes the of_node that was set to a device that was not an
> of_device, preventing duplicated probing of both the real of_device
> and the "fake" one created through this assignment.
> 
> I understand that the DSA issue that triggered the initial change
> was related to DSA finding the network devices using 
> of_find_net_device_by_node(), something that will not work for the
> DPAA case where the netdevice does not have an of_node. I do not know
> enough about DSA to come up with a solution for this problem now.
> Andrew, Florian, can you please comment on this?
> 
> Thanks,

Hi Madalin

I guess the real fix is to throw away the platform device. But that is
a big change.

I've not looked at the code in detail. Why is the platform device
needed?

	Andrew

^ permalink raw reply

* Re: [PATCH] netfilter: nf_tables: Release memory obtained by kasprintf
From: Pablo Neira Ayuso @ 2017-10-03 13:21 UTC (permalink / raw)
  To: Arvind Yadav
  Cc: kadlec, fw, davem, netfilter-devel, coreteam, netdev,
	linux-kernel
In-Reply-To: <385554261c080cd3fc4adc093e68366a6d3dff77.1505889128.git.arvind.yadav.cs@gmail.com>

On Wed, Sep 20, 2017 at 12:31:28PM +0530, Arvind Yadav wrote:
> Free memory region, if nf_tables_set_alloc_name is not successful.

Applied, thanks.

I have added this tag to this patch:

Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: Leon Romanovsky @ 2017-10-03 13:26 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Ariel Elior
In-Reply-To: <1507020902-4952-7-git-send-email-Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3933 bytes --]

On Tue, Oct 03, 2017 at 11:54:56AM +0300, Michal Kalderon wrote:
> For iWARP unaligned MPA flow, a slowpath event of flushing an
> MPA connection that entered an unaligned state is required.
> The flush ramrod is received on the ll2 queue, and a pre-registered
> callback function is called to handle the flush event.
>
> Signed-off-by: Michal Kalderon <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ariel Elior <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c | 40 +++++++++++++++++++++++++++++--
>  include/linux/qed/qed_ll2_if.h            |  5 ++++
>  2 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> index 8eb9645..047f556 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
>  }
>
>  static int
> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
> +			struct qed_ll2_info *p_ll2_conn,
> +			union core_rx_cqe_union *p_cqe,
> +			unsigned long *p_lock_flags)
> +{
> +	struct qed_ll2_rx_queue *p_rx = &p_ll2_conn->rx_queue;
> +	struct core_rx_slow_path_cqe *sp_cqe;
> +
> +	sp_cqe = &p_cqe->rx_cqe_sp;
> +	if (sp_cqe->ramrod_cmd_id != CORE_RAMROD_RX_QUEUE_FLUSH) {
> +		DP_NOTICE(p_hwfn,
> +			  "LL2 - unexpected Rx CQE slowpath ramrod_cmd_id:%d\n",
> +			  sp_cqe->ramrod_cmd_id);
> +		return -EINVAL;
> +	}
> +
> +	if (!p_ll2_conn->cbs.slowpath_cb) {
> +		DP_NOTICE(p_hwfn,
> +			  "LL2 - received RX_QUEUE_FLUSH but no callback was provided\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);

Interesting, you are unlock the lock which was taken in upper layer.
It is not actual error, but chances to have such error are pretty high
(for example, after refactoring).

> +
> +	p_ll2_conn->cbs.slowpath_cb(p_ll2_conn->cbs.cookie,
> +				    p_ll2_conn->my_id,
> +				    le32_to_cpu(sp_cqe->opaque_data.data[0]),
> +				    le32_to_cpu(sp_cqe->opaque_data.data[1]));
> +
> +	spin_lock_irqsave(&p_rx->lock, *p_lock_flags);
> +
> +	return 0;
> +}
> +
> +static int
>  qed_ll2_rxq_handle_completion(struct qed_hwfn *p_hwfn,
>  			      struct qed_ll2_info *p_ll2_conn,
>  			      union core_rx_cqe_union *p_cqe,
> @@ -495,8 +530,8 @@ static int qed_ll2_rxq_completion(struct qed_hwfn *p_hwfn, void *cookie)
>
>  		switch (cqe->rx_cqe_sp.type) {
>  		case CORE_RX_CQE_TYPE_SLOW_PATH:
> -			DP_NOTICE(p_hwfn, "LL2 - unexpected Rx CQE slowpath\n");
> -			rc = -EINVAL;
> +			rc = qed_ll2_handle_slowpath(p_hwfn, p_ll2_conn,
> +						     cqe, &flags);
>  			break;
>  		case CORE_RX_CQE_TYPE_GSI_OFFLOAD:
>  		case CORE_RX_CQE_TYPE_REGULAR:
> @@ -1214,6 +1249,7 @@ static int qed_ll2_acquire_connection_tx(struct qed_hwfn *p_hwfn,
>  	p_ll2_info->cbs.rx_release_cb = cbs->rx_release_cb;
>  	p_ll2_info->cbs.tx_comp_cb = cbs->tx_comp_cb;
>  	p_ll2_info->cbs.tx_release_cb = cbs->tx_release_cb;
> +	p_ll2_info->cbs.slowpath_cb = cbs->slowpath_cb;
>  	p_ll2_info->cbs.cookie = cbs->cookie;
>
>  	return 0;
> diff --git a/include/linux/qed/qed_ll2_if.h b/include/linux/qed/qed_ll2_if.h
> index 95fdf02..e755954 100644
> --- a/include/linux/qed/qed_ll2_if.h
> +++ b/include/linux/qed/qed_ll2_if.h
> @@ -151,11 +151,16 @@ struct qed_ll2_comp_rx_data {
>  				     dma_addr_t first_frag_addr,
>  				     bool b_last_fragment, bool b_last_packet);
>
> +typedef
> +void (*qed_ll2_slowpath_cb)(void *cxt, u8 connection_handle,
> +			    u32 opaque_data_0, u32 opaque_data_1);
> +
>  struct qed_ll2_cbs {
>  	qed_ll2_complete_rx_packet_cb rx_comp_cb;
>  	qed_ll2_release_rx_packet_cb rx_release_cb;
>  	qed_ll2_complete_tx_packet_cb tx_comp_cb;
>  	qed_ll2_release_tx_packet_cb tx_release_cb;
> +	qed_ll2_slowpath_cb slowpath_cb;
>  	void *cookie;
>  };
>
> --
> 1.8.3.1
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2] netfilter: SYNPROXY: fix process non tcp packet bug in {ipv4,ipv6}_synproxy_hook
From: Pablo Neira Ayuso @ 2017-10-03 13:28 UTC (permalink / raw)
  To: Lin Zhang
  Cc: kadlec, fw, davem, kuznet, yoshfuji, netfilter-devel, coreteam,
	netdev
In-Reply-To: <1506767115-10051-1-git-send-email-xiaolou4617@gmail.com>

On Sat, Sep 30, 2017 at 06:25:15PM +0800, Lin Zhang wrote:
> In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet,
> but the real server maybe reply an icmp error packet related to the 
> exist tcp conntrack, so we will access wrong tcp data.
> 
> For fix it, check for the protocol field and only process tcp traffic.

Applied, thanks.

I have made minor comestic changes to patch title:

netfilter: SYNPROXY: skip non-TCP packets from {ipv4,ipv6}_synproxy_hook

for the record.

^ permalink raw reply

* Re: [PATCH v2] netfilter: SYNPROXY: fix process non tcp packet bug in {ipv4,ipv6}_synproxy_hook
From: Pablo Neira Ayuso @ 2017-10-03 13:32 UTC (permalink / raw)
  To: Lin Zhang
  Cc: kadlec, fw, davem, kuznet, yoshfuji, netfilter-devel, coreteam,
	netdev
In-Reply-To: <20171003132825.GA11182@salvia>

On Tue, Oct 03, 2017 at 03:28:25PM +0200, Pablo Neira Ayuso wrote:
> On Sat, Sep 30, 2017 at 06:25:15PM +0800, Lin Zhang wrote:
> > In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet,
> > but the real server maybe reply an icmp error packet related to the 
> > exist tcp conntrack, so we will access wrong tcp data.
> > 
> > For fix it, check for the protocol field and only process tcp traffic.
> 
> Applied, thanks.
> 
> I have made minor comestic changes to patch title:
> 
> netfilter: SYNPROXY: skip non-TCP packets from {ipv4,ipv6}_synproxy_hook
> 
> for the record.

I have to keep this back, sorry.

This has been not compiled tested.

net/ipv6/netfilter/ip6t_SYNPROXY.c: In function ‘ipv6_synproxy_hook’:
net/ipv6/netfilter/ip6t_SYNPROXY.c:351:19: error: ‘struct ipv6hdr’ has
no member named ‘protocol’
      ipv6_hdr(skb)->protocol != IPPROTO_TCP)
                   ^

^ permalink raw reply

* Re: [PATCH] rndis_host: support Novatel Verizon USB730L
From: Bjørn Mork @ 2017-10-03 14:01 UTC (permalink / raw)
  To: David Miller
  Cc: aleksander-Dvg4H30XQSRVIjRurl1/8g, oliver-GvhC2dPhHPQdnm+yROfE0A,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20171002.161750.1123671129875495210.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> writes:

> From: Aleksander Morgado <aleksander-Dvg4H30XQSRVIjRurl1/8g@public.gmane.org>
> Date: Wed, 27 Sep 2017 23:31:03 +0200
>
>> I'm not sure if binding this logic to a specific vid:pid (1410:9030)
>> would be more appropriate here, or if it's ok to just bind
>> class/subclass/protocol (as in the activesync case).  Let me know
>> what you think.
>
> I don't have enough USB Networking knowledge to make a decision here.
>
> Some things seem confusing.  For example, if we should be matching
> USB_CLASS_MISC, subclass=4, protocol=1 for RNDIS over Ethernet, then
> we why aren't we also matching USB_CLASS_MISC, subclass=4, protocol=2
> for RNDIS over wireless and instead are matching "Remote RNDIS" in
> the form of USB_CLASS_WIRELSS, subclass=1, protocol=3?
>
> I really don't understand any of this magic :-)
>
> So someone more knowledgable needs to review this.

Let me just say that I am not qualified.  But since this needs a review,
I am going to offer my view anyway.  FWIW, I don't think *anyone*
understand this magic...  I certainly don't.

We can pretty much ignore the USB-IF and any specs, since that is what
the vendors appear to do.  They provide device specific drivers for
Windows, so all they care about is that their device "works" with their
driver.

But in Linux we prefer to create drivers for device classes whenever we
can, to avoid having to add every single device by ID.  So we try to
guess future patterns based on the devices we have observed, even when
there is no clear spec.  This is what Aleksander does here. He has a
device with a 'Cls=ef(misc ) Sub=04 Prot=01' function.  This device
works with the rndis_host driver. That is all we know.

We cannot prove that a class match is correct. But it does make sense to
try it.  At least we know that this works for one device.

Adding anything else, e.g. based on the table at
http://www.usb.org/developers/defined_class/#BaseClassEFh , is a bit
more risky.  We don't know if a driver will work with *any* such device
until we've actually seen one.

This is just my opinion, and probably full of bogus assumptions as
usual.  I was sort of hoping that some expert would speak up so I didn't
have to :-)

But FWIW:

Reviewed-by: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>

Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size
From: Jesper Dangaard Brouer @ 2017-10-03 14:03 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller,
	Chonggang Li, netdev, brouer, Eric Leblond, Wangnan (F)
In-Reply-To: <20171002164129.47986-2-kraigatgoog@gmail.com>



First of all, thank you Craig for working on this.  As Alexei says, we
need to improve tools/lib/bpf/libbpf and move towards converting users
of bpf_load.c to this lib instead.

Comments inlined below.

On Mon,  2 Oct 2017 12:41:28 -0400 Craig Gallek <kraigatgoog@gmail.com> wrote:

> From: Craig Gallek <kraig@google.com>
> 
> This library previously assumed a fixed-size map options structure.
> Any new options were ignored.  In order to allow the options structure
> to grow and to support parsing older programs, this patch updates
> the maps section parsing to handle varying sizes.
> 
> Object files with maps sections smaller than expected will have the new
> fields initialized to zero.  Object files which have larger than expected
> maps sections will be rejected unless all of the unrecognized data is zero.
> 
> This change still assumes that each map definition in the maps section
> is the same size.
> 
> Signed-off-by: Craig Gallek <kraig@google.com>
> ---
>  tools/lib/bpf/libbpf.c | 54 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 4f402dcdf372..28b300868ad7 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -580,7 +580,7 @@ bpf_object__init_kversion(struct bpf_object *obj,
>  }
>  
>  static int
> -bpf_object__validate_maps(struct bpf_object *obj)
> +bpf_object__validate_maps(struct bpf_object *obj, int map_def_sz)
>  {
>  	int i;
>  
> @@ -595,9 +595,11 @@ bpf_object__validate_maps(struct bpf_object *obj)
>  		const struct bpf_map *a = &obj->maps[i - 1];
>  		const struct bpf_map *b = &obj->maps[i];
>  
> -		if (b->offset - a->offset < sizeof(struct bpf_map_def)) {
> -			pr_warning("corrupted map section in %s: map \"%s\" too small\n",
> -				   obj->path, a->name);
> +		if (b->offset - a->offset < map_def_sz) {
> +			pr_warning("corrupted map section in %s: map \"%s\" too small "
> +				   "(%zd vs %d)\n",
> +				   obj->path, a->name, b->offset - a->offset,
> +				   map_def_sz);
>  			return -EINVAL;
>  		}
>  	}
> @@ -615,7 +617,7 @@ static int compare_bpf_map(const void *_a, const void *_b)
>  static int
>  bpf_object__init_maps(struct bpf_object *obj)
>  {
> -	int i, map_idx, nr_maps = 0;
> +	int i, map_idx, map_def_sz, nr_maps = 0;
>  	Elf_Scn *scn;
>  	Elf_Data *data;
>  	Elf_Data *symbols = obj->efile.symbols;
> @@ -658,6 +660,15 @@ bpf_object__init_maps(struct bpf_object *obj)
>  	if (!nr_maps)
>  		return 0;
>  
> +	/* Assume equally sized map definitions */
> +	map_def_sz = data->d_size / nr_maps;
> +	if (!data->d_size || (data->d_size % nr_maps) != 0) {
> +		pr_warning("unable to determine map definition size "
> +			   "section %s, %d maps in %zd bytes\n",
> +			   obj->path, nr_maps, data->d_size);
> +		return -EINVAL;
> +	}
> +
>  	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
>  	if (!obj->maps) {
>  		pr_warning("alloc maps for object failed\n");
> @@ -690,7 +701,7 @@ bpf_object__init_maps(struct bpf_object *obj)
>  				      obj->efile.strtabidx,
>  				      sym.st_name);
>  		obj->maps[map_idx].offset = sym.st_value;
> -		if (sym.st_value + sizeof(struct bpf_map_def) > data->d_size) {
> +		if (sym.st_value + map_def_sz > data->d_size) {
>  			pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
>  				   obj->path, map_name);
>  			return -EINVAL;
> @@ -704,12 +715,39 @@ bpf_object__init_maps(struct bpf_object *obj)
>  		pr_debug("map %d is \"%s\"\n", map_idx,
>  			 obj->maps[map_idx].name);
>  		def = (struct bpf_map_def *)(data->d_buf + sym.st_value);
> -		obj->maps[map_idx].def = *def;
> +		/*
> +		 * If the definition of the map in the object file fits in
> +		 * bpf_map_def, copy it.  Any extra fields in our version
> +		 * of bpf_map_def will default to zero as a result of the
> +		 * calloc above.
> +		 */
> +		if (map_def_sz <= sizeof(struct bpf_map_def)) {
> +			memcpy(&obj->maps[map_idx].def, def, map_def_sz);
> +		} else {
> +			/*
> +			 * Here the map structure being read is bigger than what
> +			 * we expect, truncate if the excess bits are all zero.
> +			 * If they are not zero, reject this map as
> +			 * incompatible.
> +			 */
> +			char *b;
> +			for (b = ((char *)def) + sizeof(struct bpf_map_def);
> +			     b < ((char *)def) + map_def_sz; b++) {
> +				if (*b != 0) {
> +					pr_warning("maps section in %s: \"%s\" "
> +						   "has unrecognized, non-zero "
> +						   "options\n",
> +						   obj->path, map_name);
> +					return -EINVAL;
> +				}
> +			}
> +			obj->maps[map_idx].def = *def;

I'm not too happy/comfortable with this way of copying the memory of
"def" (the type-cased struct bpf_map_def).  I guess it works, and is
part of the C-standard(?).


> +		}
>  		map_idx++;
>  	}
>  
>  	qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
> -	return bpf_object__validate_maps(obj);
> +	return bpf_object__validate_maps(obj, map_def_sz);
>  }
>  
>  static int bpf_object__elf_collect(struct bpf_object *obj)

Besides above comment, I think the patch is correct, based on what I
did in commit 156450d9d964 ("samples/bpf: make bpf_load.c code
compatible with ELF maps section changes").

  https://git.kernel.org/torvalds/c/156450d9d964

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size
From: Jesper Dangaard Brouer @ 2017-10-03 14:11 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller,
	Chonggang Li, netdev, brouer, Eric Leblond
In-Reply-To: <20171002164129.47986-2-kraigatgoog@gmail.com>

On Mon,  2 Oct 2017 12:41:28 -0400
Craig Gallek <kraigatgoog@gmail.com> wrote:

> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 4f402dcdf372..28b300868ad7 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -580,7 +580,7 @@ bpf_object__init_kversion(struct bpf_object *obj,
>  }
>  
>  static int
> -bpf_object__validate_maps(struct bpf_object *obj)
> +bpf_object__validate_maps(struct bpf_object *obj, int map_def_sz)
>  {
>  	int i;
>  
> @@ -595,9 +595,11 @@ bpf_object__validate_maps(struct bpf_object *obj)
>  		const struct bpf_map *a = &obj->maps[i - 1];
>  		const struct bpf_map *b = &obj->maps[i];
>  
> -		if (b->offset - a->offset < sizeof(struct bpf_map_def)) {
> -			pr_warning("corrupted map section in %s: map \"%s\" too small\n",
> -				   obj->path, a->name);
> +		if (b->offset - a->offset < map_def_sz) {
> +			pr_warning("corrupted map section in %s: map \"%s\" too small "
> +				   "(%zd vs %d)\n",
> +				   obj->path, a->name, b->offset - a->offset,
> +				   map_def_sz);
>  			return -EINVAL;

Hmm... one more comment.  You have just coded handling of ELF
map_def_sz which are smaller in a safe manor, but here this case will
get rejected (in bpf_object__validate_maps).  That cannot be the right
intend?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] tools: bpftool: add documentation
From: Daniel Borkmann @ 2017-10-03 14:15 UTC (permalink / raw)
  To: Jakub Kicinski, Alexei Starovoitov; +Cc: netdev, oss-drivers, David Beckett
In-Reply-To: <20171002183509.76b2cc65@cakuba>

On 10/03/2017 03:35 AM, Jakub Kicinski wrote:
> On Mon, 2 Oct 2017 17:55:02 -0700, Alexei Starovoitov wrote:
>>> +EXAMPLES
>>> +========
>>> +**# bpftool prog show**
>>> +::
>>> +
>>> +  10: xdp  name:some_prog  tag 00:5a:3d:21:23:62:0c:8b
>>
>> could you please remove ':' in the output to match what
>> show_fdinfo and kallsyms do ?

Yep, iproute2 as well, so would be better indeed.

> Ack.
>
>>> +	loaded_at:2024.771  uid:0
>>
>> may be translate that to something human readable?
>
> Oh yes, the code will print a proper date/time, I forgot to
> regenerate the doc :S
>
>>> +	xlated:528B  jited:370B  memlock:4096B  map_ids:10
>>> +
>>> +|
>>> +| **# bpftool prog dump xlated id 10 file /tmp/t**
>>> +| **# ls -l /tmp/t**
>>> +|   -rw------- 1 root root 560 Jul 22 01:42 /tmp/t
>>> +
>>> +|
>>> +| **# mount -t bpf none /sys/fs/bpf/**
>>> +| **# bpftool prog pin id 10 /sys/fs/bpf/prog**
>>> +| **# bpftool prog dum jited pinned /sys/fs/bpf/prog**
>>> +
>>> +::
>>> +
>>> +    push   %rbp
>>> +    mov    %rsp,%rbp
>>> +    sub    $0x228,%rsp
>>> +    sub    $0x28,%rbp
>>> +    mov    %rbx,0x0(%rbp)
>>
>> imo too many steps to dump disasm output.
>> Can it print it if we just say:
>> bpftool prog dump jited id 10
>> and
>> dump xlated
>
> Yes those will work.  This example kind of shows pinning and dumping at
> the some time.  Perhaps that's ill advised.
>
>> will pretty print them as verifier output as well?
>
> We tried to use LLVM as a library for this but the interface is
> painfully unstable and it's a heavy dependency.  The current thinking
> is to try to put the instruction printing code in some higher level
> library, but I would rather leave that as a follow up.
>
>> All that can be changed later. Thanks for the doc.

Fine with that as well, so:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net] net: rtnetlink: fix info leak in RTM_GETSTATS call
From: Roopa Prabhu @ 2017-10-03 14:18 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev@vger.kernel.org, keescook, Dmitry Vyukov, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko, davem@davemloft.net,
	Eric Dumazet
In-Reply-To: <1507026048-13734-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, Oct 3, 2017 at 3:20 AM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> When RTM_GETSTATS was added the fields of its header struct were not all
> initialized when returning the result thus leaking 4 bytes of information
> to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
> to Alexander Potapenko for the detailed report and bisection.
>
> Reported-by: Alexander Potapenko <glider@google.com>
> Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>

Thanks Nikolay!.

^ permalink raw reply

* Re: [net-next V3 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation
From: Jesper Dangaard Brouer @ 2017-10-03 14:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Andy Gospodarek, brouer
In-Reply-To: <20171003085843.14d3491e@redhat.com>

On Tue, 3 Oct 2017 08:58:43 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> > But that prog can do cpumap redirect again?
> > sort-of recursive redirect? Is it really useful?
> > May be call into __netif_receive_skb_core() directly?
> > not sure.  
> 
> I like the idea of calling  __netif_receive_skb_core() directly.  I'll
> send a V4 (after running my different benchmarks).

Using __netif_receive_skb_core() was straight forward/easy.

But I realized I had forgotten about Generic-XDP, which I also need to
code up.  And with Generic-XDP we cannot invoke netif_receive_skb(),
because it would recursively invoke itself (which you actually point out
above, thx).  I'll send a V4 out tomorrow.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next V3 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation
From: Daniel Borkmann @ 2017-10-03 14:25 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Alexei Starovoitov
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Andy Gospodarek
In-Reply-To: <20171003085843.14d3491e@redhat.com>

On 10/03/2017 08:58 AM, Jesper Dangaard Brouer wrote:
[...]
>> Or you're calling netif_receive_skb() to be able to call
>> generic XDP on that cpu again ?
>
> That should not (currently) be possible. AFAIK we (Daniel) choose to
> not allow Native and Generic XDP to be loaded on the same net_device.
> (With the same ABI argument as here)

Correct, it's either native or generic, but not both.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size
From: Daniel Borkmann @ 2017-10-03 14:39 UTC (permalink / raw)
  To: Alexei Starovoitov, Craig Gallek, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev
In-Reply-To: <5082193f-0b59-bc40-290f-4ef3709a1d26@fb.com>

On 10/03/2017 01:07 AM, Alexei Starovoitov wrote:
> On 10/2/17 9:41 AM, Craig Gallek wrote:
>> +    /* Assume equally sized map definitions */
>> +    map_def_sz = data->d_size / nr_maps;
>> +    if (!data->d_size || (data->d_size % nr_maps) != 0) {
>> +        pr_warning("unable to determine map definition size "
>> +               "section %s, %d maps in %zd bytes\n",
>> +               obj->path, nr_maps, data->d_size);
>> +        return -EINVAL;
>> +    }
>
> this approach is not as flexible as done by samples/bpf/bpf_load.c
> where it looks at every map independently by walking symtab,
> but I guess it's ok.

Regarding different map spec structs in a single prog: unless
we have a good use case why we would need it (and I'm not aware
of anything in particular), I would just go with a fixed size.
I did kind of similar sanity checks in bpf_fetch_maps_end() in
iproute2 loader as well.

^ permalink raw reply

* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Vivien Didelot @ 2017-10-03 14:57 UTC (permalink / raw)
  To: Andrew Lunn, Toshiaki Makita; +Cc: David Miller, netdev
In-Reply-To: <20171003121636.GB13548@lunn.ch>

Andrew Lunn <andrew@lunn.ch> writes:

> Now, i just noticed the switchdev call above. I don't think the DSA
> layer implements SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING. It probably
> should. So what is it supposed to do with this VLAN when filtering is
> disabled?

The DSA layer does implement SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING.
Its interpretation is to enable 802.1Q mode on targeted switch ports.
(hoping this is the correct thing to do.)

^ permalink raw reply

* RE: [PATCH 3/7] crypto:gf128mul: The x8_ble multiplication functions
From: David Laight @ 2017-10-03 14:58 UTC (permalink / raw)
  To: 'Harsh Jain', herbert@gondor.apana.org.au,
	linux-crypto@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <3e443f7a245229a2752fcf21dfed10998847e345.1507010612.git.harsh@chelsio.com>

From: Harsh Jain
> Sent: 03 October 2017 07:46
> It multiply GF(2^128) elements in the ble format.
> It will be used by chelsio driver to fasten gf multiplication.
                                       ^ speed up ??

	David

^ permalink raw reply

* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Toshiaki Makita @ 2017-10-03 15:03 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Toshiaki Makita, David Miller, Vivien Didelot, netdev
In-Reply-To: <20171003121636.GB13548@lunn.ch>

On 17/10/03 (火) 21:16, Andrew Lunn wrote:
> On Tue, Oct 03, 2017 at 12:29:56PM +0900, Toshiaki Makita wrote:
>> On 2017/10/03 9:55, Andrew Lunn wrote:
>>> With CONFIG_BRIDGE_VLAN_FILTERING enabled, but the feature not enabled
>>> via /sys/class/net/brX/bridge/vlan_filtering, mdb offloaded to the
>>> kernel have the wrong VID.
>>>
>>> When an interface is added to the bridge, switchdev is first used to
>>> notify the hardware that a port has joined a bridge. This is
>>> immediately followed by the default_pvid, 1, being added to the
>>> interface via another switchdev call.
>>>
>>> The bridge will then perform IGMP snooping, and offload an mdb entries
>>> to the switch as needed. With vlan filtering disabled, the vid is left
>>> as 0. This causes the switch to put the static mdb into the wrong
>>> vlan, and so frames are not forwarded by the mdb entry.
>>>
>>> If vlan filtering is disable, use the default_pvid, not 0.
>>>
>>> Fixes: f1fecb1d10ec ("bridge: Reflect MDB entries to hardware")
>>> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
>>> ---
>>>  net/bridge/br_vlan.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>>> index 233a30040c91..aa3589891797 100644
>>> --- a/net/bridge/br_vlan.c
>>> +++ b/net/bridge/br_vlan.c
>>> @@ -492,6 +492,7 @@ bool br_allowed_ingress(const struct net_bridge *br,
>>>  	 */
>>>  	if (!br->vlan_enabled) {
>>>  		BR_INPUT_SKB_CB(skb)->vlan_filtered = false;
>>> +		*vid = br_get_pvid(vg);
>>>  		return true;
>>>  	}
>>>
>>
>> This does not look correct.
>> This will update fdb with vid which is not 0.
>> Pvid can be different between each port even when vlan_filtering is
>> disabled so unicast forwarding (fdb learning) will break.
>> Also, fdb is visible to userspace so this can break userspace which
>> expects fdb entries with 0 as well.
>>
>> Why does the switch driver use pvid while vlan_filtering is disabled?
>
> Hi Toshiaki
>
> We get a vlan added to the port. I think it comes from a combination
> of:
>
>
> int br_vlan_init(struct net_bridge *br)
> {
>         struct net_bridge_vlan_group *vg;
>         int ret = -ENOMEM;
>
>         vg = kzalloc(sizeof(*vg), GFP_KERNEL);
>         if (!vg)
>                 goto out;
>         ret = rhashtable_init(&vg->vlan_hash, &br_vlan_rht_params);
>         if (ret)
>                 goto err_rhtbl;
>         ret = vlan_tunnel_init(vg);
>         if (ret)
>                 goto err_tunnel_init;
>         INIT_LIST_HEAD(&vg->vlan_list);
>         br->vlan_proto = htons(ETH_P_8021Q);
>         br->default_pvid = 1;
>
> and
>
> int nbp_vlan_init(struct net_bridge_port *p)
> {
>         struct switchdev_attr attr = {
>                 .orig_dev = p->br->dev,
>                 .id = SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
>                 .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP,
>                 .u.vlan_filtering = p->br->vlan_enabled,
>         };
>         struct net_bridge_vlan_group *vg;
>         int ret = -ENOMEM;
>
>         vg = kzalloc(sizeof(struct net_bridge_vlan_group), GFP_KERNEL);
>         if (!vg)
>                 goto out;
>
>         ret = switchdev_port_attr_set(p->dev, &attr);
>         if (ret && ret != -EOPNOTSUPP)
>                 goto err_vlan_enabled;
>
>         ret = rhashtable_init(&vg->vlan_hash, &br_vlan_rht_params);
>         if (ret)
>                 goto err_rhtbl;
>         ret = vlan_tunnel_init(vg);
>         if (ret)
>                 goto err_tunnel_init;
>         INIT_LIST_HEAD(&vg->vlan_list);
>         rcu_assign_pointer(p->vlgrp, vg);
>         if (p->br->default_pvid) {
>                 ret = nbp_vlan_add(p, p->br->default_pvid,
>                                    BRIDGE_VLAN_INFO_PVID |
>                                    BRIDGE_VLAN_INFO_UNTAGGED);
>
> Now, i just noticed the switchdev call above. I don't think the DSA
> layer implements SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING. It probably
> should. So what is it supposed to do with this VLAN when filtering is
> disabled?

The vlan will be effective only when vlan_filtering is enabled.
When vlan_filtering is disabled, vlan information is still kept in the 
bridge and gets effective later when vlan_filtering becomes enable.

Toshiaki Makita

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Dmitry Vyukov @ 2017-10-03 15:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mark Rutland, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <CANn89i+zQG=rjHRqzsvPzjg5tqW43Lcz-BJ9spLascP9Nt5z8Q@mail.gmail.com>

On Mon, Oct 2, 2017 at 4:42 PM, 'Eric Dumazet' via syzkaller
<syzkaller@googlegroups.com> wrote:
> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> Hi Eric,
>>
>> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>>> > skb_copy_and_csum_bits().
>>
>>> > kernel BUG at net/core/skbuff.c:2626!
>>
>>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>>
>>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>>> on loopback device, below minimum size of ipv4 MTU.
>>
>>> I tried to track it in August [1], but it seems hard to find all the
>>> issues with this.
>>>
>>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>>> Author: Eric Dumazet <edumazet@google.com>
>>> Date:   Wed Aug 16 11:09:12 2017 -0700
>>>
>>>     ipv4: better IP_MAX_MTU enforcement
>>>
>>>     While working on yet another syzkaller report, I found
>>>     that our IP_MAX_MTU enforcements were not properly done.
>>>
>>>     gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>>     final result can be bigger than IP_MAX_MTU :/
>>>
>>>     This is a problem because device mtu can be changed on other cpus or
>>>     threads.
>>>
>>>     While this patch does not fix the issue I am working on, it is
>>>     probably worth addressing it.
>>
>> Just to check I've understood correctly, are you suggesting that the
>> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
>> doesn't seem to exist today)?
>
> We have plenty of places this is checked.
>
> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
>
> Problem is : these checks are not fool proof yet.
>
> ( Only the admin was supposed to play these games )
>
>>
>> Otherwise, I do spot another potential issue. The writer side (e.g. most
>> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
>> fallback) doesn't use WRITE_ONCE().
>
> It does not matter how many strange values can be observed by the reader :
> We must be fool proof anyway from reader point of view, so the
> WRITE_ONCE() is not strictly needed.


Note if writer stores some temporal garbage there (which C language
perfectly allows), it does not matter what we do on reader side --
reader won't get correct data anyway. Say mtu changes from 1000 to
2000, but writer temporary stores 1 there, reader can observe 1 while
it must not. Synchronization is always a game of two.

^ permalink raw reply

* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Andrew Lunn @ 2017-10-03 15:30 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Toshiaki Makita, David Miller, Vivien Didelot, netdev
In-Reply-To: <ad0d7686-298b-02c7-d8f8-b9363f4630f3@gmail.com>

> The vlan will be effective only when vlan_filtering is enabled.
> When vlan_filtering is disabled, vlan information is still kept in the
> bridge and gets effective later when vlan_filtering becomes enable.

O.K, so things are starting to get clearer.

So when vlan filtering is disabled, the hardware should just ignore
the requests to add the vlan to the hardware?

When vlan_filtering is enabled, are all the vlans in the software
bridge again offloaded? Or do we need to remember all the vlans which
we ignored while vlan filtering was disabled? The average switch has
nowhere to store these disabled vlans. It can only store active vlans.

      Andrew

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-03 15:38 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Mark Rutland, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <CACT4Y+Yf86hS_3u=qe0ZL208GmrfF6bp50kYcL+3D9QFBh=LZA@mail.gmail.com>

On Tue, Oct 3, 2017 at 8:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Oct 2, 2017 at 4:42 PM, 'Eric Dumazet' via syzkaller
> <syzkaller@googlegroups.com> wrote:
>> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> Hi Eric,
>>>
>>> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>>>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>>>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>>>> > skb_copy_and_csum_bits().
>>>
>>>> > kernel BUG at net/core/skbuff.c:2626!
>>>
>>>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>>>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>>>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>>>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>>>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>>>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>>>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>>>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>>>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>>>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>>>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>>>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>>>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>>>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>>>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>>>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>>>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>>>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>>>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>>>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>>>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>>>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>>>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>>>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>>>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>>>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>>>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>>>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>>>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>>>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>>>
>>>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>>>> on loopback device, below minimum size of ipv4 MTU.
>>>
>>>> I tried to track it in August [1], but it seems hard to find all the
>>>> issues with this.
>>>>
>>>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>>>> Author: Eric Dumazet <edumazet@google.com>
>>>> Date:   Wed Aug 16 11:09:12 2017 -0700
>>>>
>>>>     ipv4: better IP_MAX_MTU enforcement
>>>>
>>>>     While working on yet another syzkaller report, I found
>>>>     that our IP_MAX_MTU enforcements were not properly done.
>>>>
>>>>     gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>>>     final result can be bigger than IP_MAX_MTU :/
>>>>
>>>>     This is a problem because device mtu can be changed on other cpus or
>>>>     threads.
>>>>
>>>>     While this patch does not fix the issue I am working on, it is
>>>>     probably worth addressing it.
>>>
>>> Just to check I've understood correctly, are you suggesting that the
>>> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
>>> doesn't seem to exist today)?
>>
>> We have plenty of places this is checked.
>>
>> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
>>
>> Problem is : these checks are not fool proof yet.
>>
>> ( Only the admin was supposed to play these games )
>>
>>>
>>> Otherwise, I do spot another potential issue. The writer side (e.g. most
>>> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
>>> fallback) doesn't use WRITE_ONCE().
>>
>> It does not matter how many strange values can be observed by the reader :
>> We must be fool proof anyway from reader point of view, so the
>> WRITE_ONCE() is not strictly needed.
>
>
> Note if writer stores some temporal garbage there (which C language
> perfectly allows), it does not matter what we do on reader side --
> reader won't get correct data anyway. Say mtu changes from 1000 to
> 2000, but writer temporary stores 1 there, reader can observe 1 while
> it must not. Synchronization is always a game of two.

Since we have no sync here, a reader _must_ cope with any MTU value.

We need to care of any value, so we do not care how dummy writers can be.

Sure, a WRITE_ONCE() will help avoiding some strange values being written,
 but since we _allow_ writers to write such strange values,
there is really no point pretending to be safe here.

Adding a WRITE_ONCE() will not fix the bug.

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] tools: bpftool: add documentation
From: David Ahern @ 2017-10-03 15:39 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski
  Cc: netdev, daniel, oss-drivers, David Beckett
In-Reply-To: <20171003042906.24mnbsfbs3bkp2wy@ast-mbp>

On 10/2/17 9:29 PM, Alexei Starovoitov wrote:
> On Mon, Oct 02, 2017 at 06:35:09PM -0700, Jakub Kicinski wrote:
>>> will pretty print them as verifier output as well?
>>
>> We tried to use LLVM as a library for this but the interface is
>> painfully unstable and it's a heavy dependency.  The current thinking
>> is to try to put the instruction printing code in some higher level
>> library, but I would rather leave that as a follow up.
> 
> follow up, of course.
> Not depending on llvm is must have for this tool.
> I think we need tiny and simple tools first.
> Since you're using gpl+bsd license for this tool I think
> it would be fine to copy-paste verifier's pretty print code into it.
> 

I have done that including integrating it into bpf-tool.

^ permalink raw reply

* [PATCH 0/5] VSOCK: add sock_diag interface
From: Stefan Hajnoczi @ 2017-10-03 15:39 UTC (permalink / raw)
  To: netdev; +Cc: Jorgen Hansen, Dexuan Cui, Stefan Hajnoczi

There is currently no way for userspace to query open AF_VSOCK sockets.  This
means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets.

This patch series adds the netlink sock_diag interface for AF_VSOCK.  Userspace
programs sent a DUMP request including an sk_state bitmap to filter sockets
based on their state (connected, listening, etc).  The vsock_diag.ko module
replies with information about matching sockets.  This userspace ABI is defined
in <linux/vm_sockets_diag.h>.

The final patch adds a test suite that exercises the basic cases.

Jorgen and Dexuan: I have only tested the virtio transport but this should also
work for VMCI and Hyper-V.  Please give it a shot if you have time.

Stefan Hajnoczi (5):
  VSOCK: export socket tables for sock_diag interface
  VSOCK: export __vsock_in_bound/connected_table()
  VSOCK: use TCP state constants for sk_state
  VSOCK: add sock_diag interface
  VSOCK: add tools/vsock/vsock_diag_test

 MAINTAINERS                                  |   3 +
 net/vmw_vsock/Makefile                       |   3 +
 tools/vsock/Makefile                         |   9 +
 include/net/af_vsock.h                       |  10 +-
 include/uapi/linux/vm_sockets_diag.h         |  33 ++
 tools/vsock/control.h                        |  13 +
 tools/vsock/timeout.h                        |  14 +
 net/vmw_vsock/af_vsock.c                     |  62 ++-
 net/vmw_vsock/diag.c                         | 186 ++++++++
 net/vmw_vsock/virtio_transport.c             |   2 +-
 net/vmw_vsock/virtio_transport_common.c      |  22 +-
 net/vmw_vsock/vmci_transport.c               |  34 +-
 net/vmw_vsock/vmci_transport_notify.c        |   2 +-
 net/vmw_vsock/vmci_transport_notify_qstate.c |   2 +-
 tools/vsock/control.c                        | 219 +++++++++
 tools/vsock/timeout.c                        |  64 +++
 tools/vsock/vsock_diag_test.c                | 681 +++++++++++++++++++++++++++
 net/vmw_vsock/Kconfig                        |  10 +
 tools/vsock/.gitignore                       |   2 +
 19 files changed, 1312 insertions(+), 59 deletions(-)
 create mode 100644 tools/vsock/Makefile
 create mode 100644 include/uapi/linux/vm_sockets_diag.h
 create mode 100644 tools/vsock/control.h
 create mode 100644 tools/vsock/timeout.h
 create mode 100644 net/vmw_vsock/diag.c
 create mode 100644 tools/vsock/control.c
 create mode 100644 tools/vsock/timeout.c
 create mode 100644 tools/vsock/vsock_diag_test.c
 create mode 100644 tools/vsock/.gitignore

-- 
2.13.6

^ permalink raw reply

* [PATCH 1/5] VSOCK: export socket tables for sock_diag interface
From: Stefan Hajnoczi @ 2017-10-03 15:39 UTC (permalink / raw)
  To: netdev; +Cc: Jorgen Hansen, Dexuan Cui, Stefan Hajnoczi
In-Reply-To: <20171003153943.23159-1-stefanha@redhat.com>

The socket table symbols need to be exported from vsock.ko so that the
vsock_diag.ko module will be able to traverse sockets.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/net/af_vsock.h   |  5 +++++
 net/vmw_vsock/af_vsock.c | 10 ++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f9fb566e75cf..30cba806e344 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -27,6 +27,11 @@
 
 #define LAST_RESERVED_PORT 1023
 
+#define VSOCK_HASH_SIZE         251
+extern struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+extern struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+extern spinlock_t vsock_table_lock;
+
 #define vsock_sk(__sk)    ((struct vsock_sock *)__sk)
 #define sk_vsock(__vsk)   (&(__vsk)->sk)
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index dfc8c51e4d74..9afe4da8c67d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -153,7 +153,6 @@ EXPORT_SYMBOL_GPL(vm_sockets_get_local_cid);
  * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.  The hash function
  * mods with VSOCK_HASH_SIZE to ensure this.
  */
-#define VSOCK_HASH_SIZE         251
 #define MAX_PORT_RETRIES        24
 
 #define VSOCK_HASH(addr)        ((addr)->svm_port % VSOCK_HASH_SIZE)
@@ -168,9 +167,12 @@ EXPORT_SYMBOL_GPL(vm_sockets_get_local_cid);
 #define vsock_connected_sockets_vsk(vsk)				\
 	vsock_connected_sockets(&(vsk)->remote_addr, &(vsk)->local_addr)
 
-static struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
-static struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
-static DEFINE_SPINLOCK(vsock_table_lock);
+struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+EXPORT_SYMBOL_GPL(vsock_bind_table);
+struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+EXPORT_SYMBOL_GPL(vsock_connected_table);
+DEFINE_SPINLOCK(vsock_table_lock);
+EXPORT_SYMBOL_GPL(vsock_table_lock);
 
 /* Autobind this socket to the local address if necessary. */
 static int vsock_auto_bind(struct vsock_sock *vsk)
-- 
2.13.6

^ permalink raw reply related

* [PATCH 2/5] VSOCK: export __vsock_in_bound/connected_table()
From: Stefan Hajnoczi @ 2017-10-03 15:39 UTC (permalink / raw)
  To: netdev; +Cc: Jorgen Hansen, Dexuan Cui, Stefan Hajnoczi
In-Reply-To: <20171003153943.23159-1-stefanha@redhat.com>

The vsock_diag.ko module will need to check socket table membership.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/net/af_vsock.h   | 2 ++
 net/vmw_vsock/af_vsock.c | 6 ++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 30cba806e344..88149d580975 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -187,6 +187,8 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
 void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
+bool __vsock_in_bound_table(struct vsock_sock *vsk);
+bool __vsock_in_connected_table(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 					 struct sockaddr_vm *dst);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9afe4da8c67d..f2d0fb593908 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -250,15 +250,17 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
 	return NULL;
 }
 
-static bool __vsock_in_bound_table(struct vsock_sock *vsk)
+bool __vsock_in_bound_table(struct vsock_sock *vsk)
 {
 	return !list_empty(&vsk->bound_table);
 }
+EXPORT_SYMBOL_GPL(__vsock_in_bound_table);
 
-static bool __vsock_in_connected_table(struct vsock_sock *vsk)
+bool __vsock_in_connected_table(struct vsock_sock *vsk)
 {
 	return !list_empty(&vsk->connected_table);
 }
+EXPORT_SYMBOL_GPL(__vsock_in_connected_table);
 
 static void vsock_insert_unbound(struct vsock_sock *vsk)
 {
-- 
2.13.6

^ permalink raw reply related

* [PATCH 3/5] VSOCK: use TCP state constants for sk_state
From: Stefan Hajnoczi @ 2017-10-03 15:39 UTC (permalink / raw)
  To: netdev; +Cc: Jorgen Hansen, Dexuan Cui, Stefan Hajnoczi
In-Reply-To: <20171003153943.23159-1-stefanha@redhat.com>

There are two state fields: socket->state and sock->sk_state.  The
socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
sock->sk_state typically uses values that match TCP state constants
(TCP_CLOSE, TCP_ESTABLISHED).  AF_VSOCK does not follow this convention
and instead uses SS_* constants for both fields.

The sk_state field will be exposed to userspace through the vsock_diag
interface for ss(8), netstat(8), and other programs.

This patch switches sk_state to TCP state constants so that the meaning
of this field is consistent with other address families.  Not just
AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.

The following mapping was used to convert the code:

  SS_FREE -> TCP_CLOSE
  SS_UNCONNECTED -> TCP_CLOSE
  SS_CONNECTING -> TCP_SYN_SENT
  SS_CONNECTED -> TCP_ESTABLISHED
  SS_DISCONNECTING -> TCP_CLOSING
  VSOCK_SS_LISTEN -> TCP_LISTEN

In __vsock_create() the sk_state initialization was dropped because
sock_init_data() already initializes sk_state to TCP_CLOSE.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/net/af_vsock.h                       |  3 --
 net/vmw_vsock/af_vsock.c                     | 46 ++++++++++++++++------------
 net/vmw_vsock/virtio_transport.c             |  2 +-
 net/vmw_vsock/virtio_transport_common.c      | 22 ++++++-------
 net/vmw_vsock/vmci_transport.c               | 34 ++++++++++----------
 net/vmw_vsock/vmci_transport_notify.c        |  2 +-
 net/vmw_vsock/vmci_transport_notify_qstate.c |  2 +-
 7 files changed, 58 insertions(+), 53 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 88149d580975..eaf45df90d97 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -22,9 +22,6 @@
 
 #include "vsock_addr.h"
 
-/* vsock-specific sock->sk_state constants */
-#define VSOCK_SS_LISTEN 255
-
 #define LAST_RESERVED_PORT 1023
 
 #define VSOCK_HASH_SIZE         251
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f2d0fb593908..9b76953aeac6 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -36,7 +36,7 @@
  * not support simultaneous connects (two "client" sockets connecting).
  *
  * - "Server" sockets are referred to as listener sockets throughout this
- * implementation because they are in the VSOCK_SS_LISTEN state.  When a
+ * implementation because they are in the TCP_LISTEN state.  When a
  * connection request is received (the second kind of socket mentioned above),
  * we create a new socket and refer to it as a pending socket.  These pending
  * sockets are placed on the pending connection list of the listener socket.
@@ -82,6 +82,15 @@
  * argument, we must ensure the reference count is increased to ensure the
  * socket isn't freed before the function is run; the deferred function will
  * then drop the reference.
+ *
+ * - sk->sk_state uses the TCP state constants because they are widely used by
+ * other address families and exposed to userspace tools like ss(8):
+ *
+ *   TCP_CLOSE - unconnected
+ *   TCP_SYN_SENT - connecting
+ *   TCP_ESTABLISHED - connected
+ *   TCP_CLOSING - disconnecting
+ *   TCP_LISTEN - listening
  */
 
 #include <linux/types.h>
@@ -489,7 +498,7 @@ void vsock_pending_work(struct work_struct *work)
 	if (vsock_in_connected_table(vsk))
 		vsock_remove_connected(vsk);
 
-	sk->sk_state = SS_FREE;
+	sk->sk_state = TCP_CLOSE;
 
 out:
 	release_sock(sk);
@@ -629,7 +638,6 @@ struct sock *__vsock_create(struct net *net,
 
 	sk->sk_destruct = vsock_sk_destruct;
 	sk->sk_backlog_rcv = vsock_queue_rcv_skb;
-	sk->sk_state = 0;
 	sock_reset_flag(sk, SOCK_DONE);
 
 	INIT_LIST_HEAD(&vsk->bound_table);
@@ -903,7 +911,7 @@ static unsigned int vsock_poll(struct file *file, struct socket *sock,
 		/* Listening sockets that have connections in their accept
 		 * queue can be read.
 		 */
-		if (sk->sk_state == VSOCK_SS_LISTEN
+		if (sk->sk_state == TCP_LISTEN
 		    && !vsock_is_accept_queue_empty(sk))
 			mask |= POLLIN | POLLRDNORM;
 
@@ -932,7 +940,7 @@ static unsigned int vsock_poll(struct file *file, struct socket *sock,
 		}
 
 		/* Connected sockets that can produce data can be written. */
-		if (sk->sk_state == SS_CONNECTED) {
+		if (sk->sk_state == TCP_ESTABLISHED) {
 			if (!(sk->sk_shutdown & SEND_SHUTDOWN)) {
 				bool space_avail_now = false;
 				int ret = transport->notify_poll_out(
@@ -954,7 +962,7 @@ static unsigned int vsock_poll(struct file *file, struct socket *sock,
 		 * POLLOUT|POLLWRNORM when peer is closed and nothing to read,
 		 * but local send is not shutdown.
 		 */
-		if (sk->sk_state == SS_UNCONNECTED) {
+		if (sk->sk_state == TCP_CLOSE) {
 			if (!(sk->sk_shutdown & SEND_SHUTDOWN))
 				mask |= POLLOUT | POLLWRNORM;
 
@@ -1124,9 +1132,9 @@ static void vsock_connect_timeout(struct work_struct *work)
 	sk = sk_vsock(vsk);
 
 	lock_sock(sk);
-	if (sk->sk_state == SS_CONNECTING &&
+	if (sk->sk_state == TCP_SYN_SENT &&
 	    (sk->sk_shutdown != SHUTDOWN_MASK)) {
-		sk->sk_state = SS_UNCONNECTED;
+		sk->sk_state = TCP_CLOSE;
 		sk->sk_err = ETIMEDOUT;
 		sk->sk_error_report(sk);
 		cancel = 1;
@@ -1172,7 +1180,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 		err = -EALREADY;
 		break;
 	default:
-		if ((sk->sk_state == VSOCK_SS_LISTEN) ||
+		if ((sk->sk_state == TCP_LISTEN) ||
 		    vsock_addr_cast(addr, addr_len, &remote_addr) != 0) {
 			err = -EINVAL;
 			goto out;
@@ -1195,7 +1203,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 		if (err)
 			goto out;
 
-		sk->sk_state = SS_CONNECTING;
+		sk->sk_state = TCP_SYN_SENT;
 
 		err = transport->connect(vsk);
 		if (err < 0)
@@ -1215,7 +1223,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 	timeout = vsk->connect_timeout;
 	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
-	while (sk->sk_state != SS_CONNECTED && sk->sk_err == 0) {
+	while (sk->sk_state != TCP_ESTABLISHED && sk->sk_err == 0) {
 		if (flags & O_NONBLOCK) {
 			/* If we're not going to block, we schedule a timeout
 			 * function to generate a timeout on the connection
@@ -1238,13 +1246,13 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 
 		if (signal_pending(current)) {
 			err = sock_intr_errno(timeout);
-			sk->sk_state = SS_UNCONNECTED;
+			sk->sk_state = TCP_CLOSE;
 			sock->state = SS_UNCONNECTED;
 			vsock_transport_cancel_pkt(vsk);
 			goto out_wait;
 		} else if (timeout == 0) {
 			err = -ETIMEDOUT;
-			sk->sk_state = SS_UNCONNECTED;
+			sk->sk_state = TCP_CLOSE;
 			sock->state = SS_UNCONNECTED;
 			vsock_transport_cancel_pkt(vsk);
 			goto out_wait;
@@ -1255,7 +1263,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 
 	if (sk->sk_err) {
 		err = -sk->sk_err;
-		sk->sk_state = SS_UNCONNECTED;
+		sk->sk_state = TCP_CLOSE;
 		sock->state = SS_UNCONNECTED;
 	} else {
 		err = 0;
@@ -1288,7 +1296,7 @@ static int vsock_accept(struct socket *sock, struct socket *newsock, int flags,
 		goto out;
 	}
 
-	if (listener->sk_state != VSOCK_SS_LISTEN) {
+	if (listener->sk_state != TCP_LISTEN) {
 		err = -EINVAL;
 		goto out;
 	}
@@ -1378,7 +1386,7 @@ static int vsock_listen(struct socket *sock, int backlog)
 	}
 
 	sk->sk_max_ack_backlog = backlog;
-	sk->sk_state = VSOCK_SS_LISTEN;
+	sk->sk_state = TCP_LISTEN;
 
 	err = 0;
 
@@ -1558,7 +1566,7 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 	/* Callers should not provide a destination with stream sockets. */
 	if (msg->msg_namelen) {
-		err = sk->sk_state == SS_CONNECTED ? -EISCONN : -EOPNOTSUPP;
+		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
 		goto out;
 	}
 
@@ -1569,7 +1577,7 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out;
 	}
 
-	if (sk->sk_state != SS_CONNECTED ||
+	if (sk->sk_state != TCP_ESTABLISHED ||
 	    !vsock_addr_bound(&vsk->local_addr)) {
 		err = -ENOTCONN;
 		goto out;
@@ -1693,7 +1701,7 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	lock_sock(sk);
 
-	if (sk->sk_state != SS_CONNECTED) {
+	if (sk->sk_state != TCP_ESTABLISHED) {
 		/* Recvmsg is supposed to return 0 if a peer performs an
 		 * orderly shutdown. Differentiate between that case and when a
 		 * peer has not connected or a local shutdown occured with the
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 403d86e80162..8e03bd3f3668 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -414,7 +414,7 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
 static void virtio_vsock_reset_sock(struct sock *sk)
 {
 	lock_sock(sk);
-	sk->sk_state = SS_UNCONNECTED;
+	sk->sk_state = TCP_CLOSE;
 	sk->sk_err = ECONNRESET;
 	sk->sk_error_report(sk);
 	release_sock(sk);
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index edba7ab97563..3ae3a33da70b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -708,7 +708,7 @@ static void virtio_transport_do_close(struct vsock_sock *vsk,
 	sock_set_flag(sk, SOCK_DONE);
 	vsk->peer_shutdown = SHUTDOWN_MASK;
 	if (vsock_stream_has_data(vsk) <= 0)
-		sk->sk_state = SS_DISCONNECTING;
+		sk->sk_state = TCP_CLOSING;
 	sk->sk_state_change(sk);
 
 	if (vsk->close_work_scheduled &&
@@ -748,8 +748,8 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
 {
 	struct sock *sk = &vsk->sk;
 
-	if (!(sk->sk_state == SS_CONNECTED ||
-	      sk->sk_state == SS_DISCONNECTING))
+	if (!(sk->sk_state == TCP_ESTABLISHED ||
+	      sk->sk_state == TCP_CLOSING))
 		return true;
 
 	/* Already received SHUTDOWN from peer, reply with RST */
@@ -801,7 +801,7 @@ virtio_transport_recv_connecting(struct sock *sk,
 
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RESPONSE:
-		sk->sk_state = SS_CONNECTED;
+		sk->sk_state = TCP_ESTABLISHED;
 		sk->sk_socket->state = SS_CONNECTED;
 		vsock_insert_connected(vsk);
 		sk->sk_state_change(sk);
@@ -821,7 +821,7 @@ virtio_transport_recv_connecting(struct sock *sk,
 
 destroy:
 	virtio_transport_reset(vsk, pkt);
-	sk->sk_state = SS_UNCONNECTED;
+	sk->sk_state = TCP_CLOSE;
 	sk->sk_err = skerr;
 	sk->sk_error_report(sk);
 	return err;
@@ -857,7 +857,7 @@ virtio_transport_recv_connected(struct sock *sk,
 			vsk->peer_shutdown |= SEND_SHUTDOWN;
 		if (vsk->peer_shutdown == SHUTDOWN_MASK &&
 		    vsock_stream_has_data(vsk) <= 0)
-			sk->sk_state = SS_DISCONNECTING;
+			sk->sk_state = TCP_CLOSING;
 		if (le32_to_cpu(pkt->hdr.flags))
 			sk->sk_state_change(sk);
 		break;
@@ -928,7 +928,7 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt)
 
 	lock_sock_nested(child, SINGLE_DEPTH_NESTING);
 
-	child->sk_state = SS_CONNECTED;
+	child->sk_state = TCP_ESTABLISHED;
 
 	vchild = vsock_sk(child);
 	vsock_addr_init(&vchild->local_addr, le64_to_cpu(pkt->hdr.dst_cid),
@@ -1016,18 +1016,18 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
 		sk->sk_write_space(sk);
 
 	switch (sk->sk_state) {
-	case VSOCK_SS_LISTEN:
+	case TCP_LISTEN:
 		virtio_transport_recv_listen(sk, pkt);
 		virtio_transport_free_pkt(pkt);
 		break;
-	case SS_CONNECTING:
+	case TCP_SYN_SENT:
 		virtio_transport_recv_connecting(sk, pkt);
 		virtio_transport_free_pkt(pkt);
 		break;
-	case SS_CONNECTED:
+	case TCP_ESTABLISHED:
 		virtio_transport_recv_connected(sk, pkt);
 		break;
-	case SS_DISCONNECTING:
+	case TCP_CLOSING:
 		virtio_transport_recv_disconnecting(sk, pkt);
 		virtio_transport_free_pkt(pkt);
 		break;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 10ae7823a19d..9cb3a6c780aa 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -743,7 +743,7 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg)
 		/* The local context ID may be out of date, update it. */
 		vsk->local_addr.svm_cid = dst.svm_cid;
 
-		if (sk->sk_state == SS_CONNECTED)
+		if (sk->sk_state == TCP_ESTABLISHED)
 			vmci_trans(vsk)->notify_ops->handle_notify_pkt(
 					sk, pkt, true, &dst, &src,
 					&bh_process_pkt);
@@ -801,7 +801,9 @@ static void vmci_transport_handle_detach(struct sock *sk)
 		 * left in our consume queue.
 		 */
 		if (vsock_stream_has_data(vsk) <= 0) {
-			if (sk->sk_state == SS_CONNECTING) {
+			sk->sk_state = TCP_CLOSE;
+
+			if (sk->sk_state == TCP_SYN_SENT) {
 				/* The peer may detach from a queue pair while
 				 * we are still in the connecting state, i.e.,
 				 * if the peer VM is killed after attaching to
@@ -810,12 +812,10 @@ static void vmci_transport_handle_detach(struct sock *sk)
 				 * event like a reset.
 				 */
 
-				sk->sk_state = SS_UNCONNECTED;
 				sk->sk_err = ECONNRESET;
 				sk->sk_error_report(sk);
 				return;
 			}
-			sk->sk_state = SS_UNCONNECTED;
 		}
 		sk->sk_state_change(sk);
 	}
@@ -883,17 +883,17 @@ static void vmci_transport_recv_pkt_work(struct work_struct *work)
 	vsock_sk(sk)->local_addr.svm_cid = pkt->dg.dst.context;
 
 	switch (sk->sk_state) {
-	case VSOCK_SS_LISTEN:
+	case TCP_LISTEN:
 		vmci_transport_recv_listen(sk, pkt);
 		break;
-	case SS_CONNECTING:
+	case TCP_SYN_SENT:
 		/* Processing of pending connections for servers goes through
 		 * the listening socket, so see vmci_transport_recv_listen()
 		 * for that path.
 		 */
 		vmci_transport_recv_connecting_client(sk, pkt);
 		break;
-	case SS_CONNECTED:
+	case TCP_ESTABLISHED:
 		vmci_transport_recv_connected(sk, pkt);
 		break;
 	default:
@@ -942,7 +942,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
 		vsock_sk(pending)->local_addr.svm_cid = pkt->dg.dst.context;
 
 		switch (pending->sk_state) {
-		case SS_CONNECTING:
+		case TCP_SYN_SENT:
 			err = vmci_transport_recv_connecting_server(sk,
 								    pending,
 								    pkt);
@@ -1072,7 +1072,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
 	vsock_add_pending(sk, pending);
 	sk->sk_ack_backlog++;
 
-	pending->sk_state = SS_CONNECTING;
+	pending->sk_state = TCP_SYN_SENT;
 	vmci_trans(vpending)->produce_size =
 		vmci_trans(vpending)->consume_size = qp_size;
 	vmci_trans(vpending)->queue_pair_size = qp_size;
@@ -1197,11 +1197,11 @@ vmci_transport_recv_connecting_server(struct sock *listener,
 	 * the socket will be valid until it is removed from the queue.
 	 *
 	 * If we fail sending the attach below, we remove the socket from the
-	 * connected list and move the socket to SS_UNCONNECTED before
+	 * connected list and move the socket to TCP_CLOSE before
 	 * releasing the lock, so a pending slow path processing of an incoming
 	 * packet will not see the socket in the connected state in that case.
 	 */
-	pending->sk_state = SS_CONNECTED;
+	pending->sk_state = TCP_ESTABLISHED;
 
 	vsock_insert_connected(vpending);
 
@@ -1232,7 +1232,7 @@ vmci_transport_recv_connecting_server(struct sock *listener,
 
 destroy:
 	pending->sk_err = skerr;
-	pending->sk_state = SS_UNCONNECTED;
+	pending->sk_state = TCP_CLOSE;
 	/* As long as we drop our reference, all necessary cleanup will handle
 	 * when the cleanup function drops its reference and our destruct
 	 * implementation is called.  Note that since the listen handler will
@@ -1270,7 +1270,7 @@ vmci_transport_recv_connecting_client(struct sock *sk,
 		 * accounting (it can already be found since it's in the bound
 		 * table).
 		 */
-		sk->sk_state = SS_CONNECTED;
+		sk->sk_state = TCP_ESTABLISHED;
 		sk->sk_socket->state = SS_CONNECTED;
 		vsock_insert_connected(vsk);
 		sk->sk_state_change(sk);
@@ -1338,7 +1338,7 @@ vmci_transport_recv_connecting_client(struct sock *sk,
 destroy:
 	vmci_transport_send_reset(sk, pkt);
 
-	sk->sk_state = SS_UNCONNECTED;
+	sk->sk_state = TCP_CLOSE;
 	sk->sk_err = skerr;
 	sk->sk_error_report(sk);
 	return err;
@@ -1526,7 +1526,7 @@ static int vmci_transport_recv_connected(struct sock *sk,
 		sock_set_flag(sk, SOCK_DONE);
 		vsk->peer_shutdown = SHUTDOWN_MASK;
 		if (vsock_stream_has_data(vsk) <= 0)
-			sk->sk_state = SS_DISCONNECTING;
+			sk->sk_state = TCP_CLOSING;
 
 		sk->sk_state_change(sk);
 		break;
@@ -1790,7 +1790,7 @@ static int vmci_transport_connect(struct vsock_sock *vsk)
 		err = vmci_transport_send_conn_request(
 			sk, vmci_trans(vsk)->queue_pair_size);
 		if (err < 0) {
-			sk->sk_state = SS_UNCONNECTED;
+			sk->sk_state = TCP_CLOSE;
 			return err;
 		}
 	} else {
@@ -1800,7 +1800,7 @@ static int vmci_transport_connect(struct vsock_sock *vsk)
 				sk, vmci_trans(vsk)->queue_pair_size,
 				supported_proto_versions);
 		if (err < 0) {
-			sk->sk_state = SS_UNCONNECTED;
+			sk->sk_state = TCP_CLOSE;
 			return err;
 		}
 
diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index 1406db4d97d1..41fb427f150a 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -355,7 +355,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
 		 * queue. Ask for notifications when there is something to
 		 * read.
 		 */
-		if (sk->sk_state == SS_CONNECTED) {
+		if (sk->sk_state == TCP_ESTABLISHED) {
 			if (!send_waiting_read(sk, 1))
 				return -1;
 
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index f3a0afc46208..0cc84f2bb05e 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -176,7 +176,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
 		 * queue. Ask for notifications when there is something to
 		 * read.
 		 */
-		if (sk->sk_state == SS_CONNECTED)
+		if (sk->sk_state == TCP_ESTABLISHED)
 			vsock_block_update_write_window(sk);
 		*data_ready_now = false;
 	}
-- 
2.13.6

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox