Netdev List
 help / color / mirror / Atom feed
* Re: ip route JSON format is unparseable for "unreachable" routes
From: Stephen Hemminger @ 2019-07-28 16:15 UTC (permalink / raw)
  To: Michael Ziegler; +Cc: netdev
In-Reply-To: <6e88311b-5edc-4c62-1581-0f5b160a5f4e@michaelziegler.name>

On Sun, 28 Jul 2019 13:09:55 +0200
Michael Ziegler <ich@michaelziegler.name> wrote:

> Hi,
> 
> I created a couple "unreachable" routes on one of my systems, like such:
> 
> > ip route add unreachable 10.0.0.0/8     metric 255
> > ip route add unreachable 192.168.0.0/16 metric 255  
> 
> Unfortunately this results in unparseable JSON output from "ip":
> 
> > # ip -j route show  | jq .
> > parse error: Objects must consist of key:value pairs at line 1, column 84  
> 
> The offending JSON objects are these:
> 
> > {"unreachable","dst":"10.0.0.0/8","metric":255,"flags":[]}
> > {"unreachable","dst":"192.168.0.0/16","metric":255,"flags":[]}  
> "unreachable" cannot appear on its own here, it needs to be some kind of
> field.
> 
> The manpage says to report here, thus I do :) I've searched the
> archives, but I wasn't able to find any existing bug reports about this.
> I'm running version
> 
> > ip utility, iproute2-ss190107  
> 
> on Debian Buster.
> 
> Regards,
> Michael.

Already fixed upstream by:

commit 073661773872709518d35d4d093f3a715281f21d
Author: Matteo Croce <mcroce@redhat.com>
Date:   Mon Mar 18 18:19:29 2019 +0100

    ip route: print route type in JSON output
    
    ip route generates an invalid JSON if the route type has to be printed,
    eg. when detailed mode is active, or the type is different that unicast:
    
        $ ip -d -j -p route show
        [ {"unicast",
                "dst": "192.168.122.0/24",
                "dev": "virbr0",
                "protocol": "kernel",
                "scope": "link",
                "prefsrc": "192.168.122.1",
                "flags": [ "linkdown" ]
            } ]
    
        $ ip -j -p route show
        [ {"unreachable",
                "dst": "192.168.23.0/24",
                "flags": [ ]
            },{"prohibit",
                "dst": "192.168.24.0/24",
                "flags": [ ]
            },{"blackhole",
                "dst": "192.168.25.0/24",
                "flags": [ ]
            } ]
    
    Fix it by printing the route type as the "type" attribute:
    
        $ ip -d -j -p route show
        [ {
                "type": "unicast",
                "dst": "default",
                "gateway": "192.168.85.1",
                "dev": "wlp3s0",
                "protocol": "dhcp",
                "scope": "global",
                "metric": 600,
                "flags": [ ]
            },{
                "type": "unreachable",
                "dst": "192.168.23.0/24",
                "protocol": "boot",
                "scope": "global",
                "flags": [ ]
            },{
                "type": "prohibit",
                "dst": "192.168.24.0/24",
                "protocol": "boot",
                "scope": "global",
                "flags": [ ]
            },{
                "type": "blackhole",
                "dst": "192.168.25.0/24",
                "protocol": "boot",
                "scope": "global",
                "flags": [ ]
            } ]
    
    Fixes: 663c3cb23103 ("iproute: implement JSON and color output")
    Acked-by: Phil Sutter <phil@nwl.cc>
    Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
    Signed-off-by: Matteo Croce <mcroce@redhat.com>
    Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply

* RE: [EXT] Re: [PATCH net-next] mvpp2: document HW checksum behaviour
From: Stefan Chulski @ 2019-07-28 15:22 UTC (permalink / raw)
  To: Matteo Croce, Antoine Tenart, Marcin Wojtas, Maxime Chevallier
  Cc: netdev, LKML, David S . Miller
In-Reply-To: <CAGnkfhz+PezeLT+gyXdsnyJz2dnKpYkcb2HbqvXJoLdzNxuC6g@mail.gmail.com>

> Hi all,
> 
> probably dev->vlan_features is safe to keep the CSUM features to avoid
> unnecessary calculation in some cases, but I have another question.
> Does the PP2 hardware support checksumming within any offset? I replaced
> 'NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM' with NETIF_F_HW_CSUM and
> then stacked 5 VxLANS on top of a mvpp2 device, to have the last IP header
> at offset 264:
> 
> ip link set $dev up
> ip addr add 192.168.0.$last/24 dev $dev
> 
> for i in {1..5}; do
> 	ip link add vx$i type vxlan id $i dstport 4789 remote 192.168.$((i-
> 1)).$other
> 	ip link set vx$i up
> 	ip addr add 192.168.$i.$last/24 dev vx$i done
> 
> 00:51:82:11:22:00 > 3c:fd:fe:9c:60:6c, ethertype IPv4 (0x0800), length 348:
> 192.168.0.1.33625 > 192.168.0.2.4789: VXLAN, flags [I] (0x08), vni 1
> 02:25:60:da:87:03 > 92:20:05:45:3d:d3, ethertype IPv4 (0x0800), length 298:
> 192.168.1.1.33625 > 192.168.1.2.4789: VXLAN, flags [I] (0x08), vni 2
> 12:20:97:15:8f:aa > 66:08:23:c7:72:ea, ethertype IPv4 (0x0800), length 248:
> 192.168.2.1.33625 > 192.168.2.2.4789: VXLAN, flags [I] (0x08), vni 3
> c6:1c:b9:fd:9d:28 > 22:ca:cb:6a:ea:68, ethertype IPv4 (0x0800), length 198:
> 192.168.3.1.33625 > 192.168.3.2.4789: VXLAN, flags [I] (0x08), vni 4
> 02:34:5f:45:a5:9d > d2:4e:d4:d7:42:31, ethertype IPv4 (0x0800), length 148:
> 192.168.4.1.34504 > 192.168.4.2.4789: VXLAN, flags [I] (0x08), vni 5
> a2:99:fd:9c:1b:05 > 5a:81:3b:fc:6a:07, ethertype IPv4 (0x0800), length 98:
> 192.168.5.1 > 192.168.5.2: ICMP echo request, id 1654, seq 156, length 64
> 
> It seems that the HW is capable of doing it, can someone with a datasheet
> confirm this?

L3_offset in TX descriptor has 7 bits, so beginning of Layer3 should be less than 128 Bytes.

Stefan,
Regards.

^ permalink raw reply

* Re: [PATCH net-next] mvpp2: document HW checksum behaviour
From: Matteo Croce @ 2019-07-28 14:30 UTC (permalink / raw)
  To: Antoine Tenart, Marcin Wojtas, Stefan Chulski, Maxime Chevallier
  Cc: netdev, LKML, David S . Miller
In-Reply-To: <CAGnkfhycOc8mvqeQDBcnXueUjrFQMC7hdfAOkxr5k0+xc_tnDw@mail.gmail.com>

On Sun, Jul 28, 2019 at 3:36 AM Matteo Croce <mcroce@redhat.com> wrote:
>
> On Fri, Jul 26, 2019 at 2:57 PM Antoine Tenart
> <antoine.tenart@bootlin.com> wrote:
> >
> > Hi Matteo,
> >
> > On Fri, Jul 26, 2019 at 01:15:46AM +0200, Matteo Croce wrote:
> > > The hardware can only offload checksum calculation on first port
> > > due to the Tx FIFO size limitation. Document this in a comment.
> > >
> > > Fixes: 576193f2d579 ("net: mvpp2: jumbo frames support")
> > > Signed-off-by: Matteo Croce <mcroce@redhat.com>
> >
> > Looks good. Please note there's a similar code path in the probe.
> > You could also add a comment there (or move this check/comment in a
> > common place).
> >
> > Thanks!
> > Antoine
> >
>
> Hi Antoine,
>
> I was making a v2, when I looked at the mvpp2_port_probe() which does:
>
> --------------------------------%<------------------------------
> features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> NETIF_F_TSO;
>
> if (port->pool_long->id == MVPP2_BM_JUMBO && port->id != 0) {
>     dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
>     dev->hw_features &= ~(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
> }
>
> dev->vlan_features |= features;
> -------------------------------->%------------------------------
>
> Is it ok to remove NETIF_F_IP*_CSUM from dev->features and
> dev->hw_features but keep it in dev->vlan_features?
>
> Regards,
> --
> Matteo Croce
> per aspera ad upstream

Hi all,

probably dev->vlan_features is safe to keep the CSUM features to avoid
unnecessary calculation in some cases, but I have another question.
Does the PP2 hardware support checksumming within any offset? I
replaced 'NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM' with NETIF_F_HW_CSUM and
then stacked 5 VxLANS on top of a mvpp2 device, to have the last IP
header at offset 264:

ip link set $dev up
ip addr add 192.168.0.$last/24 dev $dev

for i in {1..5}; do
	ip link add vx$i type vxlan id $i dstport 4789 remote 192.168.$((i-1)).$other
	ip link set vx$i up
	ip addr add 192.168.$i.$last/24 dev vx$i
done

00:51:82:11:22:00 > 3c:fd:fe:9c:60:6c, ethertype IPv4 (0x0800), length 348: 192.168.0.1.33625 > 192.168.0.2.4789: VXLAN, flags [I] (0x08), vni 1
02:25:60:da:87:03 > 92:20:05:45:3d:d3, ethertype IPv4 (0x0800), length 298: 192.168.1.1.33625 > 192.168.1.2.4789: VXLAN, flags [I] (0x08), vni 2
12:20:97:15:8f:aa > 66:08:23:c7:72:ea, ethertype IPv4 (0x0800), length 248: 192.168.2.1.33625 > 192.168.2.2.4789: VXLAN, flags [I] (0x08), vni 3
c6:1c:b9:fd:9d:28 > 22:ca:cb:6a:ea:68, ethertype IPv4 (0x0800), length 198: 192.168.3.1.33625 > 192.168.3.2.4789: VXLAN, flags [I] (0x08), vni 4
02:34:5f:45:a5:9d > d2:4e:d4:d7:42:31, ethertype IPv4 (0x0800), length 148: 192.168.4.1.34504 > 192.168.4.2.4789: VXLAN, flags [I] (0x08), vni 5
a2:99:fd:9c:1b:05 > 5a:81:3b:fc:6a:07, ethertype IPv4 (0x0800), length 98: 192.168.5.1 > 192.168.5.2: ICMP echo request, id 1654, seq 156, length 64

It seems that the HW is capable of doing it, can someone with a
datasheet confirm this?

Regards,
-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* Re: memory leak in fdb_create
From: syzbot @ 2019-07-28 14:20 UTC (permalink / raw)
  To: bridge, bsingharora, coreteam, davem, duwe, kaber, kadlec,
	linux-kernel, mingo, mpe, netdev, netfilter-devel, nikolay, pablo,
	roopa, rostedt, syzkaller-bugs
In-Reply-To: <0000000000005e6124058c0cbdbe@google.com>

syzbot has bisected this bug to:

commit 04cf31a759ef575f750a63777cee95500e410994
Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Thu Mar 24 11:04:01 2016 +0000

     ftrace: Make ftrace_location_range() global

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1538c778600000
start commit:   abf02e29 Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pu..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1738c778600000
console output: https://syzkaller.appspot.com/x/log.txt?x=1338c778600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=56f1da14935c3cce
dashboard link: https://syzkaller.appspot.com/bug?extid=88533dc8b582309bf3ee
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16de5c06a00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10546026a00000

Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
Fixes: 04cf31a759ef ("ftrace: Make ftrace_location_range() global")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH] gigaset: stop maintaining seperately
From: Tilman Schmidt @ 2019-07-28 14:17 UTC (permalink / raw)
  To: Paul Bolle
  Cc: David Miller, Hansjoerg Lipp, Arnd Bergmann, Karsten Keil, netdev,
	linux-kernel
In-Reply-To: <20190726220541.28783-1-pebolle@tiscali.nl>

Thanks to you, Paul, for all your contributions, and specifically for
keeping the driver maintained for four more years after I had to abandon
it for the same reason.

I had a lot of fun working on that driver and I learned a lot in the
course. Now it's time to move on without regrets.

All the best,
Tilman

Am 27.07.2019 um 00:05 schrieb Paul Bolle:
> The Dutch consumer grade ISDN network will be shut down on September 1,
> 2019. This means I'll be converted to some sort of VOIP shortly. At that
> point it would be unwise to try to maintain the gigaset driver, even for
> odd fixes as I do. So I'll stop maintaining it as a seperate driver and
> bump support to CAPI in staging. De facto this means the driver will be
> unmaintained, since no-one seems to be working on CAPI.
> 
> I've lighty tested the hardware specific modules of this driver (bas-gigaset,
> ser-gigaset, and usb-gigaset) for v5.3-rc1. The basic functionality appears to
> be working. It's unclear whether anyone still cares. I'm aware of only one
> person sort of using the driver a few years ago.
> 
> Thanks to Karsten Keil for the ISDN subsystems gigaset was using (I4L and
> CAPI). And many thanks to Hansjoerg Lipp and Tilman Schmidt for writing and
> upstreaming this driver.
> 
> Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
> ---
>  MAINTAINERS | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 783569e3c4b4..e99afbd13355 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6822,13 +6822,6 @@ F:	Documentation/filesystems/gfs2*.txt
>  F:	fs/gfs2/
>  F:	include/uapi/linux/gfs2_ondisk.h
>  
> -GIGASET ISDN DRIVERS
> -M:	Paul Bolle <pebolle@tiscali.nl>
> -L:	gigaset307x-common@lists.sourceforge.net
> -W:	http://gigaset307x.sourceforge.net/
> -S:	Odd Fixes
> -F:	drivers/staging/isdn/gigaset/
> -
>  GNSS SUBSYSTEM
>  M:	Johan Hovold <johan@kernel.org>
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss.git
> 

^ permalink raw reply

* [PATCH net-next] rt2800usb: Add new rt2800usb device PLANEX GW-USMicroN
From: Masanari Iida @ 2019-07-28 14:07 UTC (permalink / raw)
  To: sgruszka, helmut.schaa, kvalo, davem, linux-wireless, netdev,
	linux-kernel
  Cc: Masanari Iida

This patch add a device ID for PLANEX GW-USMicroN.
Without this patch, I had to echo the device IDs in order to
recognize the device.

# lsusb |grep PLANEX
Bus 002 Device 005: ID 2019:ed14 PLANEX GW-USMicroN

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
---
 drivers/net/wireless/ralink/rt2x00/rt2800usb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800usb.c b/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
index fdf0504b5f1d..0dfb55c69b73 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
@@ -1086,6 +1086,7 @@ static const struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x0846, 0x9013) },
 	{ USB_DEVICE(0x0846, 0x9019) },
 	/* Planex */
+	{ USB_DEVICE(0x2019, 0xed14) },
 	{ USB_DEVICE(0x2019, 0xed19) },
 	/* Ralink */
 	{ USB_DEVICE(0x148f, 0x3573) },
-- 
2.22.0.545.g9c9b961d7eb1


^ permalink raw reply related

* Re: [PATCH] tcp: add new tcp_mtu_probe_floor sysctl
From: Eric Dumazet @ 2019-07-28 13:54 UTC (permalink / raw)
  To: Josh Hunt; +Cc: netdev, David Miller
In-Reply-To: <a9ec9cfd-c381-c02e-7d67-e24373c693d6@akamai.com>

On Sun, Jul 28, 2019 at 1:21 AM Josh Hunt <johunt@akamai.com> wrote:
>
> On 7/27/19 12:05 AM, Eric Dumazet wrote:
> > On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt <johunt@akamai.com> wrote:
> >>
> >> The current implementation of TCP MTU probing can considerably
> >> underestimate the MTU on lossy connections allowing the MSS to get down to
> >> 48. We have found that in almost all of these cases on our networks these
> >> paths can handle much larger MTUs meaning the connections are being
> >> artificially limited. Even though TCP MTU probing can raise the MSS back up
> >> we have seen this not to be the case causing connections to be "stuck" with
> >> an MSS of 48 when heavy loss is present.
> >>
> >> Prior to pushing out this change we could not keep TCP MTU probing enabled
> >> b/c of the above reasons. Now with a reasonble floor set we've had it
> >> enabled for the past 6 months.
> >
> > And what reasonable value have you used ???
>
> Reasonable for some may not be reasonable for others hence the new
> sysctl :) We're currently running with a fairly high value based off of
> the v6 min MTU minus headers and options, etc. We went conservative with
> our setting initially as it seemed a reasonable first step when
> re-enabling TCP MTU probing since with no configurable floor we saw a #
> of cases where connections were using severely reduced mss b/c of loss
> and not b/c of actual path restriction. I plan to reevaluate the setting
> at some point, but since the probing method is still the same it means
> the same clients who got stuck with mss of 48 before will land at
> whatever floor we set. Looking forward we are interested in trying to
> improve TCP MTU probing so it does not penalize clients like this.
>
> A suggestion for a more reasonable floor default would be 512, which is
> the same as the min_pmtu. Given both mechanisms are trying to achieve
> the same goal it seems like they should have a similar min/floor.
>
> >
> >>
> >> The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
> >> administrators the ability to control the floor of MSS probing.
> >>
> >> Signed-off-by: Josh Hunt <johunt@akamai.com>
> >> ---
> >>   Documentation/networking/ip-sysctl.txt | 6 ++++++
> >>   include/net/netns/ipv4.h               | 1 +
> >>   net/ipv4/sysctl_net_ipv4.c             | 9 +++++++++
> >>   net/ipv4/tcp_ipv4.c                    | 1 +
> >>   net/ipv4/tcp_timer.c                   | 2 +-
> >>   5 files changed, 18 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> >> index df33674799b5..49e95f438ed7 100644
> >> --- a/Documentation/networking/ip-sysctl.txt
> >> +++ b/Documentation/networking/ip-sysctl.txt
> >> @@ -256,6 +256,12 @@ tcp_base_mss - INTEGER
> >>          Path MTU discovery (MTU probing).  If MTU probing is enabled,
> >>          this is the initial MSS used by the connection.
> >>
> >> +tcp_mtu_probe_floor - INTEGER
> >> +       If MTU probing is enabled this caps the minimum MSS used for search_low
> >> +       for the connection.
> >> +
> >> +       Default : 48
> >> +
> >>   tcp_min_snd_mss - INTEGER
> >>          TCP SYN and SYNACK messages usually advertise an ADVMSS option,
> >>          as described in RFC 1122 and RFC 6691.
> >> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> >> index bc24a8ec1ce5..c0c0791b1912 100644
> >> --- a/include/net/netns/ipv4.h
> >> +++ b/include/net/netns/ipv4.h
> >> @@ -116,6 +116,7 @@ struct netns_ipv4 {
> >>          int sysctl_tcp_l3mdev_accept;
> >>   #endif
> >>          int sysctl_tcp_mtu_probing;
> >> +       int sysctl_tcp_mtu_probe_floor;
> >>          int sysctl_tcp_base_mss;
> >>          int sysctl_tcp_min_snd_mss;
> >>          int sysctl_tcp_probe_threshold;
> >> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> >> index 0b980e841927..59ded25acd04 100644
> >> --- a/net/ipv4/sysctl_net_ipv4.c
> >> +++ b/net/ipv4/sysctl_net_ipv4.c
> >> @@ -820,6 +820,15 @@ static struct ctl_table ipv4_net_table[] = {
> >>                  .extra2         = &tcp_min_snd_mss_max,
> >>          },
> >>          {
> >> +               .procname       = "tcp_mtu_probe_floor",
> >> +               .data           = &init_net.ipv4.sysctl_tcp_mtu_probe_floor,
> >> +               .maxlen         = sizeof(int),
> >> +               .mode           = 0644,
> >> +               .proc_handler   = proc_dointvec_minmax,
> >> +               .extra1         = &tcp_min_snd_mss_min,
> >> +               .extra2         = &tcp_min_snd_mss_max,
> >> +       },
> >> +       {
> >>                  .procname       = "tcp_probe_threshold",
> >>                  .data           = &init_net.ipv4.sysctl_tcp_probe_threshold,
> >>                  .maxlen         = sizeof(int),
> >> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> >> index d57641cb3477..e0a372676329 100644
> >> --- a/net/ipv4/tcp_ipv4.c
> >> +++ b/net/ipv4/tcp_ipv4.c
> >> @@ -2637,6 +2637,7 @@ static int __net_init tcp_sk_init(struct net *net)
> >>          net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
> >>          net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
> >>          net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
> >> +       net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS;
> >>
> >>          net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
> >>          net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
> >> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> >> index c801cd37cc2a..dbd9d2d0ee63 100644
> >> --- a/net/ipv4/tcp_timer.c
> >> +++ b/net/ipv4/tcp_timer.c
> >> @@ -154,7 +154,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
> >>          } else {
> >>                  mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
> >>                  mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
> >> -               mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len);
> >> +               mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor);
> >>                  mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss);
> >>                  icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
> >>          }
> >
> >
> > Existing sysctl should be enough ?
>
> I don't think so. Changing tcp_min_snd_mss could impact clients that
> really want/need a small mss. When you added the new sysctl I tried to
> analyze the mss values we're seeing to understand what we could possibly
> raise it to. While not a huge amount, we see more clients than I
> expected announcing mss values in the 180-512 range. Given that I would
> not feel comfortable setting tcp_min_snd_mss to say 512 as I suggested
> above.

If these clients need mss values in 180-512 ranges, how MTU probing
would work for them,
if you set a floor to 512 ?

Are we sure the intent of tcp_base_mss was not to act as a floor ?

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index c801cd37cc2a9c11f2dd4b9681137755e501a538..6d15895e9dcfb2eff51bbcf3608c7e68c1970a9e
100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -153,7 +153,7 @@ static void tcp_mtu_probing(struct
inet_connection_sock *icsk, struct sock *sk)
                icsk->icsk_mtup.probe_timestamp = tcp_jiffies32;
        } else {
                mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
-               mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
+               mss = max(net->ipv4.sysctl_tcp_base_mss, mss);
                mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len);
                mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss);
                icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);



>
> >
> > tcp_min_snd_mss  documentation could be slightly updated.
> >
> > And maybe its default value could be raised a bit.
> >
>
> Thanks
> Josh

^ permalink raw reply

* Re: [PATCH net] net: hns: fix LED configuration for marvell phy
From: Pavel Machek @ 2019-07-28 13:24 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: liuyonglong, David Miller, netdev, linux-kernel, linuxarm,
	salil.mehta, yisen.zhuang, shiju.jose
In-Reply-To: <20190725042829.GB14276@lunn.ch>

On Thu 2019-07-25 06:28:29, Andrew Lunn wrote:
> On Thu, Jul 25, 2019 at 11:00:08AM +0800, liuyonglong wrote:
> > > Revert "net: hns: fix LED configuration for marvell phy"
> > > This reverts commit f4e5f775db5a4631300dccd0de5eafb50a77c131.
> > >
> > > Andrew Lunn says this should be handled another way.
> > >
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > 
> > Hi Andrew:
> > 
> > I see this patch have been reverted, can you tell me the better way to do this?
> > Thanks very much!
> 
> Please take a look at the work Matthias Kaehlcke is doing. It has not
> got too far yet, but when it is complete, it should define a generic
> way to configure PHY LEDs.

I don't remember PHY LED discussion from LED mailing list. Would you have a pointer?
Would it make sense to coordinate with LED subsystem?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* [patch net] net: fix ifindex collision during namespace removal
From: Jiri Pirko @ 2019-07-28 12:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, xemul, edumazet, pabeni, idosch, petrm, sd, f.fainelli,
	stephen, mlxsw, Jiri Pirko

From: Jiri Pirko <jiri@mellanox.com>

Commit aca51397d014 ("netns: Fix arbitrary net_device-s corruptions
on net_ns stop.") introduced a possibility to hit a BUG in case device
is returning back to init_net and two following conditions are met:
1) dev->ifindex value is used in a name of another "dev%d"
   device in init_net.
2) dev->name is used by another device in init_net.

Under real life circumstances this is hard to get. Therefore this has
been present happily for over 10 years. To reproduce:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
$ ip netns add ns1
$ ip -n ns1 link add dummy1ns1 type dummy
$ ip -n ns1 link add dummy2ns1 type dummy
$ ip link set enp0s2 netns ns1
$ ip -n ns1 link set enp0s2 name dummy0
[  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
$ ip link add dev4 type dummy
$ ip -n ns1 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
$ ip netns del ns1
[  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
[  158.719316] ------------[ cut here ]------------
[  158.720591] kernel BUG at net/core/dev.c:9824!
[  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
[  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
[  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[  158.727508] Workqueue: netns cleanup_net
[  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
[  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
[  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
[  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
[  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
[  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
[  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
[  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
[  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
[  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  158.762758] Call Trace:
[  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
[  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
[  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
[  158.769870]  ops_exit_list.isra.0+0xa8/0x150
[  158.771544]  cleanup_net+0x446/0x8f0
[  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
[  158.775294]  process_one_work+0xa1a/0x1740
[  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
[  158.779143]  ? do_raw_spin_lock+0x11b/0x280
[  158.780848]  worker_thread+0x9e/0x1060
[  158.782500]  ? process_one_work+0x1740/0x1740
[  158.784454]  kthread+0x31b/0x420
[  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
[  158.788286]  ret_from_fork+0x3a/0x50
[  158.789871] ---[ end trace defd6c657c71f936 ]---
[  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
[  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
[  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
[  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
[  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
[  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
[  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
[  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
[  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
[  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fix this by checking if a device with the same name exists in init_net
and fallback to original code - dev%d to allocate name - in case it does.

This was found using syzkaller.

Fixes: aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2a3be2b279d3..1a24ba26b098 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9817,6 +9817,8 @@ static void __net_exit default_device_exit(struct net *net)
 
 		/* Push remaining network devices to init_net */
 		snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
+		if (__dev_get_by_name(&init_net, fb_name))
+			snprintf(fb_name, IFNAMSIZ, "dev%%d");
 		err = dev_change_net_namespace(dev, &init_net, fb_name);
 		if (err) {
 			pr_emerg("%s: failed to move %s to init_net: %d\n",
-- 
2.21.0


^ permalink raw reply related

* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Sedat Dilek @ 2019-07-28 11:16 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Martin Lau, Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
	Clang-Built-Linux ML, Kees Cook, Nick Desaulniers,
	Nathan Chancellor
In-Reply-To: <57169960-35c2-d9d3-94e4-3b5a43d5aca7@fb.com>

On Sat, Jul 27, 2019 at 7:11 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/27/19 1:16 AM, Sedat Dilek wrote:
> > On Sat, Jul 27, 2019 at 9:36 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>
> >> On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >>>
> >>> On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>>
> >>>> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 7/26/19 2:02 PM, Sedat Dilek wrote:
> >>>>>> On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Yonghong Song,
> >>>>>>>
> >>>>>>> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> >>>>>>>>
> >>>>>>>> Glad to know clang 9 has asm goto support and now It can compile
> >>>>>>>> kernel again.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yupp.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am seeing a problem in the area bpf/seccomp causing
> >>>>>>>>> systemd/journald/udevd services to fail.
> >>>>>>>>>
> >>>>>>>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> >>>>>>>>> to connect stdout to the journal socket, ignoring: Connection refused
> >>>>>>>>>
> >>>>>>>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> >>>>>>>>> BFD linker ld.bfd on Debian/buster AMD64.
> >>>>>>>>> In both cases I use clang-9 (prerelease).
> >>>>>>>>
> >>>>>>>> Looks like it is a lld bug.
> >>>>>>>>
> >>>>>>>> I see the stack trace has __bpf_prog_run32() which is used by
> >>>>>>>> kernel bpf interpreter. Could you try to enable bpf jit
> >>>>>>>>      sysctl net.core.bpf_jit_enable = 1
> >>>>>>>> If this passed, it will prove it is interpreter related.
> >>>>>>>>
> >>>>>>>
> >>>>>>> After...
> >>>>>>>
> >>>>>>> sysctl -w net.core.bpf_jit_enable=1
> >>>>>>>
> >>>>>>> I can start all failed systemd services.
> >>>>>>>
> >>>>>>> systemd-journald.service
> >>>>>>> systemd-udevd.service
> >>>>>>> haveged.service
> >>>>>>>
> >>>>>>> This is in maintenance mode.
> >>>>>>>
> >>>>>>> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> >>>>>>>
> >>>>>>
> >>>>>> This is what I did:
> >>>>>
> >>>>> I probably won't have cycles to debug this potential lld issue.
> >>>>> Maybe you already did, I suggest you put enough reproducible
> >>>>> details in the bug you filed against lld so they can take a look.
> >>>>>
> >>>>
> >>>> I understand and will put the journalctl-log into the CBL issue
> >>>> tracker and update informations.
> >>>>
> >>>> Thanks for your help understanding the BPF correlations.
> >>>>
> >>>> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
> >>>
> >>> jit_enable=1 is enough.
> >>> Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
> >>>
> >>> It sounds like clang miscompiles interpreter.
> >
> > Just to clarify:
> > This does not happen with clang-9 + ld.bfd (GNU/ld linker).
> >
> >>> modprobe test_bpf
> >>> should be able to point out which part of interpreter is broken.
> >>
> >> Maybe we need something like...
> >>
> >> "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
> >>
> >> ...for clang?
> >>
> >
> > Not sure if something like GCC's...
> >
> > -fgcse
> >
> > Perform a global common subexpression elimination pass. This pass also
> > performs global constant and copy propagation.
> >
> > Note: When compiling a program using computed gotos, a GCC extension,
> > you may get better run-time performance if you disable the global
> > common subexpression elimination pass by adding -fno-gcse to the
> > command line.
> >
> > Enabled at levels -O2, -O3, -Os.
> >
> > ...is available for clang.
> >
> > I tried with hopping to turn off "global common subexpression elimination":
> >
> > diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
> > index 383c87300b0d..92f934a1e9ff 100644
> > --- a/arch/x86/net/Makefile
> > +++ b/arch/x86/net/Makefile
> > @@ -3,6 +3,8 @@
> >   # Arch-specific network modules
> >   #
> >
> > +KBUILD_CFLAGS += -O0
>
> This won't work. First, you added to the wrong file. The interpreter
> is at kernel/bpf/core.c.
>

Thanks for the clarification.
I mixed up the x86 BPF JIT compiler with the BPF interpreter.

I see no diff in the disassembled kernel/bpf/core.o in my clang9-bfd
and clang9-lld build-dirs.

l$ objdump -M intel -d linux.clang9-bfd/kernel/bpf/core.o >
bpf_core_o_clang9-bfd.txt
$ objdump -M intel -d linux.clang9-lld/kernel/bpf/core.o >
bpf_core_o_clang9-lld.txt

--- bpf_core_o_clang9-bfd.txt   2019-07-28 13:11:59.363552042 +0200
+++ bpf_core_o_clang9-lld.txt   2019-07-28 13:12:09.975535278 +0200
@@ -1,5 +1,5 @@

-linux.clang9-bfd/kernel/bpf/core.o:     file format elf64-x86-64
+linux.clang9-lld/kernel/bpf/core.o:     file format elf64-x86-64


 Disassembly of section .text:

> Second, kernel may have compilation issues with -O0.
>

Confirmed.

- Sedat -

> > +
> >   ifeq ($(CONFIG_X86_32),y)
> >           obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
> >   else
> >
> > Still see...
> > BROKEN: test_bpf: #294 BPF_MAXINSNS: Jump, gap, jump, ... jited:0
> >
> > - Sedat -
> >

^ permalink raw reply

* ip route JSON format is unparseable for "unreachable" routes
From: Michael Ziegler @ 2019-07-28 11:09 UTC (permalink / raw)
  To: netdev

Hi,

I created a couple "unreachable" routes on one of my systems, like such:

> ip route add unreachable 10.0.0.0/8     metric 255
> ip route add unreachable 192.168.0.0/16 metric 255

Unfortunately this results in unparseable JSON output from "ip":

> # ip -j route show  | jq .
> parse error: Objects must consist of key:value pairs at line 1, column 84

The offending JSON objects are these:

> {"unreachable","dst":"10.0.0.0/8","metric":255,"flags":[]}
> {"unreachable","dst":"192.168.0.0/16","metric":255,"flags":[]}
"unreachable" cannot appear on its own here, it needs to be some kind of
field.

The manpage says to report here, thus I do :) I've searched the
archives, but I wasn't able to find any existing bug reports about this.
I'm running version

> ip utility, iproute2-ss190107

on Debian Buster.

Regards,
Michael.

^ permalink raw reply

* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Sedat Dilek @ 2019-07-28 11:09 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Martin Lau, Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
	Clang-Built-Linux ML, Kees Cook, Nick Desaulniers,
	Nathan Chancellor
In-Reply-To: <934a2a0a-c3fb-fd75-b8a3-c1042d73ca0c@fb.com>

On Sat, Jul 27, 2019 at 7:08 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/27/19 12:36 AM, Sedat Dilek wrote:
> > On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>
> >>> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/26/19 2:02 PM, Sedat Dilek wrote:
> >>>>> On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Yonghong Song,
> >>>>>>
> >>>>>> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> >>>>>>>
> >>>>>>> Glad to know clang 9 has asm goto support and now It can compile
> >>>>>>> kernel again.
> >>>>>>>
> >>>>>>
> >>>>>> Yupp.
> >>>>>>
> >>>>>>>>
> >>>>>>>> I am seeing a problem in the area bpf/seccomp causing
> >>>>>>>> systemd/journald/udevd services to fail.
> >>>>>>>>
> >>>>>>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> >>>>>>>> to connect stdout to the journal socket, ignoring: Connection refused
> >>>>>>>>
> >>>>>>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> >>>>>>>> BFD linker ld.bfd on Debian/buster AMD64.
> >>>>>>>> In both cases I use clang-9 (prerelease).
> >>>>>>>
> >>>>>>> Looks like it is a lld bug.
> >>>>>>>
> >>>>>>> I see the stack trace has __bpf_prog_run32() which is used by
> >>>>>>> kernel bpf interpreter. Could you try to enable bpf jit
> >>>>>>>      sysctl net.core.bpf_jit_enable = 1
> >>>>>>> If this passed, it will prove it is interpreter related.
> >>>>>>>
> >>>>>>
> >>>>>> After...
> >>>>>>
> >>>>>> sysctl -w net.core.bpf_jit_enable=1
> >>>>>>
> >>>>>> I can start all failed systemd services.
> >>>>>>
> >>>>>> systemd-journald.service
> >>>>>> systemd-udevd.service
> >>>>>> haveged.service
> >>>>>>
> >>>>>> This is in maintenance mode.
> >>>>>>
> >>>>>> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> >>>>>>
> >>>>>
> >>>>> This is what I did:
> >>>>
> >>>> I probably won't have cycles to debug this potential lld issue.
> >>>> Maybe you already did, I suggest you put enough reproducible
> >>>> details in the bug you filed against lld so they can take a look.
> >>>>
> >>>
> >>> I understand and will put the journalctl-log into the CBL issue
> >>> tracker and update informations.
> >>>
> >>> Thanks for your help understanding the BPF correlations.
> >>>
> >>> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
> >>
> >> jit_enable=1 is enough.
> >> Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
> >>
> >> It sounds like clang miscompiles interpreter.
> >> modprobe test_bpf
> >> should be able to point out which part of interpreter is broken.
> >
> > Maybe we need something like...
> >
> > "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
> >
> > ...for clang?
>
> Not sure how do you get conclusion it is gcse causing the problem.
> But anyway, adding such flag in the kernel is not a good idea.
> clang/llvm should be fixed instead. Esp. there is still time
> for 9.0.0 release to fix bugs.
>

To clarify: This is a snapshot release of clang-9 built with tc-build.

Building with -O0 is not possible as I see asm-goto failing.

- Sedat -

[1] https://github.com/ClangBuiltLinux/tc-build

> >
> > - Sedat -
> >
> > [1] https://git.kernel.org/linus/3193c0836f203a91bef96d88c64cccf0be090d9c
> >

^ permalink raw reply

* RE: [PATCH] net/mlx5e: Fix zero table prio set by user.
From: Paul Blakey @ 2019-07-28 10:04 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, wenxu
  Cc: Or Gerlitz, Saeed Mahameed, Roi Dayan, Mark Bloch,
	pablo@netfilter.org, netdev@vger.kernel.org
In-Reply-To: <20190726140142.GC4063@localhost.localdomain>


On 7/26/2019 5:01 PM, Marcelo Ricardo Leitner wrote:
> On Fri, Jul 26, 2019 at 08:39:43PM +0800, wenxu wrote:
>>
>> 在 2019/7/26 20:19, Or Gerlitz 写道:
>>> On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>>>> On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>
>>>>> The flow_cls_common_offload prio is zero
>>>>>
>>>>> It leads the invalid table prio in hw.
>>>>>
>>>>> Error: Could not process rule: Invalid argument
>>>>>
>>>>> kernel log:
>>>>> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
>>>>> (table prio: 65535, level: 0, size: 4194304)
>>>>>
>>>>> table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>>>> should check (chain * FDB_MAX_PRIO) + prio is not 0
>>>>>
>>>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>>>> ---
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git
>>>>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> index 089ae4d..64ca90f 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct
>>>> this piece of code isn't in this function, weird how it got to the
>>>> diff, patch applies correctly though !
>>>>
>>>>> mlx5_eswitch *esw)
>>>>>               flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
>>>>>                         MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>>>>>
>>>>> -     table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>>>> +     table_prio = (chain * FDB_MAX_PRIO) + prio;
>>>>> +     if (table_prio)
>>>>> +             table_prio = table_prio - 1;
>>>>>
>>>> This is black magic, even before this fix.
>>>> this -1 seems to be needed in order to call
>>>> create_next_size_table(table_prio) with the previous "table prio" ?
>>>> (table_prio - 1)  ?
>>>>
>>>> The whole thing looks wrong to me since when prio is 0 and chain is 0,
>>>> there is not such thing table_prio - 1.
>>>>
>>>> mlnx eswitch guys in the cc, please advise.
>>> basically, prio 0 is not something we ever get in the driver, since if
>>> user space
>>> specifies 0, the kernel generates some random non-zero prio, and we support
>>> only prios 1-16 -- Wenxu -- what do you run to get this error?
>>>
>>>
>> I run offload with nfatbles(but not tc), there is no prio for each rule.
>>
>> prio of flow_cls_common_offload init as 0.
>>
>> static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
>>
>>                      __be16 proto,
>>                     struct netlink_ext_ack *extack)
>> {
>>     common->protocol = proto;
>>     common->extack = extack;
>> }
>>
>>
>> flow_cls_common_offload
>
> Note that on
> [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
> I asked Pablo on how nftables should behave on this situation.
>
> It's the same issue as in the patch above but being fixed at a
> different level.

That's better, since the original code relied on not having prio 0 as valid, the suggested fix (net/mlx5e: Fix zero table prio set by user) maps NFT offload prio 0 and tc prio 1 to the same

hardware table. This is wrong and can cause issues.

^ permalink raw reply

* Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Kamal Heib @ 2019-07-28  9:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michal Kalderon, ariel.elior, dledford, galpress, linux-rdma,
	davem, netdev
In-Reply-To: <20190725175540.GA18757@ziepe.ca>

On Thu, Jul 25, 2019 at 02:55:40PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 09, 2019 at 05:17:30PM +0300, Michal Kalderon wrote:
> > Create some common API's for adding entries to a xa_mmap.
> > Searching for an entry and freeing one.
> > 
> > The code was copied from the efa driver almost as is, just renamed
> > function to be generic and not efa specific.
> > 
> > Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> >  drivers/infiniband/core/device.c      |   1 +
> >  drivers/infiniband/core/rdma_core.c   |   1 +
> >  drivers/infiniband/core/uverbs_cmd.c  |   1 +
> >  drivers/infiniband/core/uverbs_main.c | 135 ++++++++++++++++++++++++++++++++++
> >  include/rdma/ib_verbs.h               |  46 ++++++++++++
> >  5 files changed, 184 insertions(+)
> > 
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index 8a6ccb936dfe..a830c2c5d691 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> >  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> >  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> >  	SET_DEVICE_OP(dev_ops, mmap);
> > +	SET_DEVICE_OP(dev_ops, mmap_free);
> >  	SET_DEVICE_OP(dev_ops, modify_ah);
> >  	SET_DEVICE_OP(dev_ops, modify_cq);
> >  	SET_DEVICE_OP(dev_ops, modify_device);
> > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..1ed01b02401f 100644
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -816,6 +816,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
> >  
> >  	rdma_restrack_del(&ucontext->res);
> >  
> > +	rdma_user_mmap_entries_remove_free(ucontext);
> >  	ib_dev->ops.dealloc_ucontext(ucontext);
> >  	kfree(ucontext);
> >  
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> > index 7ddd0e5bc6b3..44c0600245e4 100644
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
> >  
> >  	mutex_init(&ucontext->per_mm_list_lock);
> >  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> > +	xa_init(&ucontext->mmap_xa);
> >  
> >  	ret = get_unused_fd_flags(O_CLOEXEC);
> >  	if (ret < 0)
> > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> > index 11c13c1381cf..4b909d7b97de 100644
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -965,6 +965,141 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
> >  }
> >  EXPORT_SYMBOL(rdma_user_mmap_io);
> >  
> > +static inline u64
> > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> > +{
> > +	return (u64)entry->mmap_page << PAGE_SHIFT;
> > +}
> > +
> > +/**
> > + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @key: The key received from rdma_user_mmap_entry_insert which
> > + *     is provided by user as the address to map.
> > + * @len: The length the user wants to map
> > + *
> > + * This function is called when a user tries to mmap a key it
> > + * initially received from the driver. They key was created by
> > + * the function rdma_user_mmap_entry_insert.
> > + *
> > + * Return an entry if exists or NULL if there is no match.
> > + */
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> > +{
> > +	struct rdma_user_mmap_entry *entry;
> > +	u64 mmap_page;
> > +
> > +	mmap_page = key >> PAGE_SHIFT;
> > +	if (mmap_page > U32_MAX)
> > +		return NULL;
> > +
> > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > +	if (!entry || entry->length != len)
> > +		return NULL;
> > +
> > +	ibdev_dbg(ucontext->device,
> > +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> > +		  entry->obj, key, entry->address, entry->length);
> > +
> > +	return entry;
> > +}
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> 
> It is a mistake we keep making, and maybe the war is hopelessly lost
> now, but functions called from a driver should not be part of the
> ib_uverbs module - ideally uverbs is an optional module. They should
> be in ib_core.
> 
> Maybe put this in ib_core_uverbs.c ?
> 
> Kamal, you've been tackling various cleanups, maybe making ib_uverbs
> unloadable again is something you'd be keen on?
>

Yes, Could you please give some background on that?


> > +/**
> > + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @obj: opaque driver object that will be stored in the entry.
> > + * @address: The address that will be mmapped to the user
> > + * @length: Length of the address that will be mmapped
> > + * @mmap_flag: opaque driver flags related to the address (For
> > + *           example could be used for cachability)
> > + *
> > + * This function should be called by drivers that use the rdma_user_mmap
> > + * interface for handling user mmapped addresses. The database is handled in
> > + * the core and helper functions are provided to insert entries into the
> > + * database and extract entries when the user call mmap with the given key.
> > + * The function returns a unique key that should be provided to user, the user
> > + * will use the key to map the given address.
> > + *
> > + * Note this locking scheme cannot support removal of entries,
> > + * except during ucontext destruction when the core code
> > + * guarentees no concurrency.
> > + *
> > + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
> > + */
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> > +				u64 address, u64 length, u8 mmap_flag)
> > +{
> > +	struct rdma_user_mmap_entry *entry;
> > +	u32 next_mmap_page;
> > +	int err;
> > +
> > +	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> > +	if (!entry)
> > +		return RDMA_USER_MMAP_INVALID;
> > +
> > +	entry->obj = obj;
> > +	entry->address = address;
> > +	entry->length = length;
> > +	entry->mmap_flag = mmap_flag;
> > +
> > +	xa_lock(&ucontext->mmap_xa);
> > +	if (check_add_overflow(ucontext->mmap_xa_page,
> > +			       (u32)(length >> PAGE_SHIFT),
> 
> Should this be divide round up ?
> 
> > +			       &next_mmap_page))
> > +		goto err_unlock;
> 
> I still don't like that this algorithm latches into a permanent
> failure when the xa_page wraps.
> 
> It seems worth spending a bit more time here to tidy this.. Keep using
> the mmap_xa_page scheme, but instead do something like
> 
> alloc_cyclic_range():
> 
> while () {
>    // Find first empty element in a cyclic way
>    xa_page_first = mmap_xa_page;
>    xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)
> 
>    // Is there a enough room to have the range?
>    if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
>       mmap_xa_page = 0;
>       continue;
>    }
> 
>    // See if the element before intersects 
>    elm = xa_find(xa, &zero, xa_page_end, 0);
>    if (elm && intersects(xa_page_first, xa_page_last, elm->first, elm->last)) {
>       mmap_xa_page = elm->last + 1;
>       continue
>    }
>   
>    // xa_page_first -> xa_page_end should now be free
>    xa_insert(xa, xa_page_start, entry);
>    mmap_xa_page = xa_page_end + 1;
>    return xa_page_start;
> }
> 
> Approximately, please check it.
> 
> > @@ -2199,6 +2201,17 @@ struct iw_cm_conn_param;
> >  
> >  #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
> >  
> > +#define RDMA_USER_MMAP_FLAG_SHIFT 56
> > +#define RDMA_USER_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
> > +#define RDMA_USER_MMAP_INVALID U64_MAX
> > +struct rdma_user_mmap_entry {
> > +	void *obj;
> > +	u64 address;
> > +	u64 length;
> > +	u32 mmap_page;
> > +	u8 mmap_flag;
> > +};
> > +
> >  /**
> >   * struct ib_device_ops - InfiniBand device operations
> >   * This structure defines all the InfiniBand device operations, providers will
> > @@ -2311,6 +2324,19 @@ struct ib_device_ops {
> >  			      struct ib_udata *udata);
> >  	void (*dealloc_ucontext)(struct ib_ucontext *context);
> >  	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
> > +	/**
> > +	 * Memory that is mapped to the user can only be freed once the
> > +	 * ucontext of the application is destroyed. This is for
> > +	 * security reasons where we don't want an application to have a
> > +	 * mapping to phyiscal memory that is freed and allocated to
> > +	 * another application. For this reason, all the entries are
> > +	 * stored in ucontext and once ucontext is freed mmap_free is
> > +	 * called on each of the entries. They type of the memory that
> 
> They -> the
> 
> > +	 * was mapped may differ between entries and is opaque to the
> > +	 * rdma_user_mmap interface. Therefore needs to be implemented
> > +	 * by the driver in mmap_free.
> > +	 */
> > +	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
> >  	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
> >  	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> >  	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> > @@ -2709,6 +2735,11 @@ void ib_set_device_ops(struct ib_device *device,
> >  #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
> >  int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
> >  		      unsigned long pfn, unsigned long size, pgprot_t prot);
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> > +				u64 address, u64 length, u8 mmap_flag);
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len);
> > +void rdma_user_mmap_entries_remove_free(struct ib_ucontext
> > *ucontext);
> 
> Should remove_free should be in the core-priv header?
> 
> Jason

^ permalink raw reply

* [PATCH net-next] r8169: make use of xmit_more
From: Heiner Kallweit @ 2019-07-28  9:25 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Eric Dumazet

There was a previous attempt to use xmit_more, but the change had to be
reverted because under load sometimes a transmit timeout occurred [0].
Maybe this was caused by a missing memory barrier, the new attempt
keeps the memory barrier before the call to netif_stop_queue like it
is used by the driver as of today. The new attempt also changes the
order of some calls as suggested by Eric.

[0] https://lkml.org/lkml/2019/2/10/39

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 864ca529d..d9261e68f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5637,6 +5637,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	struct device *d = tp_to_dev(tp);
 	dma_addr_t mapping;
 	u32 opts[2], len;
+	bool stop_queue;
+	bool door_bell;
 	int frags;
 
 	if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
@@ -5680,13 +5682,13 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	txd->opts2 = cpu_to_le32(opts[1]);
 
-	netdev_sent_queue(dev, skb->len);
-
 	skb_tx_timestamp(skb);
 
 	/* Force memory writes to complete before releasing descriptor */
 	dma_wmb();
 
+	door_bell = __netdev_sent_queue(dev, skb->len, netdev_xmit_more());
+
 	txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
 
 	/* Force all memory writes to complete before notifying device */
@@ -5694,14 +5696,19 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	RTL_W8(tp, TxPoll, NPQ);
-
-	if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+	stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS);
+	if (unlikely(stop_queue)) {
 		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
 		 * not miss a ring update when it notices a stopped queue.
 		 */
 		smp_wmb();
 		netif_stop_queue(dev);
+	}
+
+	if (door_bell)
+		RTL_W8(tp, TxPoll, NPQ);
+
+	if (unlikely(stop_queue)) {
 		/* Sync with rtl_tx:
 		 * - publish queue status and cur_tx ring index (write barrier)
 		 * - refresh dirty_tx ring index (read barrier).
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Gal Pressman @ 2019-07-28  8:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Michal Kalderon
  Cc: Kamal Heib, Ariel Elior, dledford@redhat.com,
	linux-rdma@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <20190726132316.GA8695@ziepe.ca>

On 26/07/2019 16:23, Jason Gunthorpe wrote:
> On Fri, Jul 26, 2019 at 08:42:07AM +0000, Michal Kalderon wrote:
> 
>>>> But we don't free entires from the xa_array ( only when ucontext is
>>>> destroyed) so how will There be an empty element after we wrap ?
>>>
>>> Oh!
>>>
>>> That should be fixed up too, in the general case if a user is
>>> creating/destroying driver objects in loop we don't want memory usage to
>>> be unbounded.
>>>
>>> The rdma_user_mmap stuff has VMA ops that can refcount the xa entry and
>>> now that this is core code it is easy enough to harmonize the two things and
>>> track the xa side from the struct rdma_umap_priv
>>>
>>> The question is, does EFA or qedr have a use model for this that allows a
>>> userspace verb to create/destroy in a loop? ie do we need to fix this right
>>> now?
> 
>> The mapping occurs for every qp and cq creation. So yes.
>>
>> So do you mean add a ref-cnt to the xarray entry and from umap
>> decrease the refcnt and free?
> 
> Yes, free the entry (release the HW resource) and release the xa_array
> ID.

This is a bit tricky for EFA.
The UAR BAR resources (LLQ for example) aren't cleaned up until the UAR is
deallocated, so many of the entries won't really be freed when the refcount
reaches zero (i.e the HW considers these entries as refcounted as long as the
UAR exists). The best we can do is free the DMA buffers for appropriate entries.

^ permalink raw reply

* Re: INFO: rcu detected stall in vhost_worker
From: Michael S. Tsirkin @ 2019-07-28  8:36 UTC (permalink / raw)
  To: Hillf Danton
  Cc: syzbot, jasowang, kvm, linux-kbuild, linux-kernel, michal.lkml,
	netdev, syzkaller-bugs, torvalds, virtualization, yamada.masahiro
In-Reply-To: <000000000000e87d14058e9728d7@google.com>

On Sat, Jul 27, 2019 at 04:23:23PM +0800, Hillf Danton wrote:
> 
> Fri, 26 Jul 2019 08:26:01 -0700 (PDT)
> > syzbot has bisected this bug to:
> > 
> > commit 0ecfebd2b52404ae0c54a878c872bb93363ada36
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Sun Jul 7 22:41:56 2019 +0000
> > 
> >      Linux 5.2
> > 
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=118810bfa00000
> > start commit:   13bf6d6a Add linux-next specific files for 20190725
> > git tree:       linux-next
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8ae987d803395886
> > dashboard link: https://syzkaller.appspot.com/bug?extid=36e93b425cd6eb54fcc1
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15112f3fa00000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=131ab578600000
> > 
> > Reported-by: syzbot+36e93b425cd6eb54fcc1@syzkaller.appspotmail.com
> > Fixes: 0ecfebd2b524 ("Linux 5.2")
> > 
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -787,7 +787,6 @@ static void vhost_setup_uaddr(struct vho
> 			      size_t size, bool write)
> {
> 	struct vhost_uaddr *addr = &vq->uaddrs[index];
> -	spin_lock(&vq->mmu_lock);
> 
> 	addr->uaddr = uaddr;
> 	addr->size = size;
> @@ -797,7 +796,10 @@ static void vhost_setup_uaddr(struct vho
> static void vhost_setup_vq_uaddr(struct vhost_virtqueue *vq)
> {
> 	spin_lock(&vq->mmu_lock);
> -
> +	/*
> +	 * deadlock if managing to take mmu_lock again while
> +	 * setting up uaddr
> +	 */
> 	vhost_setup_uaddr(vq, VHOST_ADDR_DESC,
> 			  (unsigned long)vq->desc,
> 			  vhost_get_desc_size(vq, vq->num),
> --

Thanks!
I reverted this whole commit.

-- 
MST

^ permalink raw reply

* Re: [PATCH] rocker: fix memory leaks of fib_work on two error return paths
From: Jiri Pirko @ 2019-07-28  7:46 UTC (permalink / raw)
  To: Colin King
  Cc: David Ahern, David S . Miller, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20190727233726.3121-1-colin.king@canonical.com>

Sun, Jul 28, 2019 at 01:37:26AM CEST, colin.king@canonical.com wrote:
>From: Colin Ian King <colin.king@canonical.com>
>
>Currently there are two error return paths that leak memory allocated
>to fib_work. Fix this by kfree'ing fib_work before returning.
>
>Addresses-Coverity: ("Resource leak")
>Fixes: 19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
>Fixes: dbcc4fa718ee ("rocker: Fail attempts to use routes with nexthop objects")
>Signed-off-by: Colin Ian King <colin.king@canonical.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu @ 2019-07-28  6:52 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
In-Reply-To: <1564296769-32294-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v3: subsys_initcall for init_flow_indr_rhashtable
v4: no change

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  10 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    |  10 +-
 include/net/flow_offload.h                         |  39 ++++
 include/net/pkt_cls.h                              |  35 ---
 include/net/sch_generic.h                          |   3 -
 net/core/flow_offload.c                            | 179 ++++++++++++++++
 net/sched/cls_api.c                                | 235 ++-------------------
 7 files changed, 247 insertions(+), 264 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7f747cb..074573b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
 {
 	int err;
 
-	err = __tc_indr_block_cb_register(netdev, rpriv,
-					  mlx5e_rep_indr_setup_tc_cb,
-					  rpriv);
+	err = __flow_indr_block_cb_register(netdev, rpriv,
+					    mlx5e_rep_indr_setup_tc_cb,
+					    rpriv);
 	if (err) {
 		struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
 
@@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
 static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
 					    struct net_device *netdev)
 {
-	__tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb,
-				      rpriv);
+	__flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb,
+					rpriv);
 }
 
 static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index e209f15..6a0f034 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1479,16 +1479,16 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
 		return NOTIFY_OK;
 
 	if (event == NETDEV_REGISTER) {
-		err = __tc_indr_block_cb_register(netdev, app,
-						  nfp_flower_indr_setup_tc_cb,
-						  app);
+		err = __flow_indr_block_cb_register(netdev, app,
+						    nfp_flower_indr_setup_tc_cb,
+						    app);
 		if (err)
 			nfp_flower_cmsg_warn(app,
 					     "Indirect block reg failed - %s\n",
 					     netdev->name);
 	} else if (event == NETDEV_UNREGISTER) {
-		__tc_indr_block_cb_unregister(netdev,
-					      nfp_flower_indr_setup_tc_cb, app);
+		__flow_indr_block_cb_unregister(netdev,
+						nfp_flower_indr_setup_tc_cb, app);
 	}
 
 	return NOTIFY_OK;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 00b9aab..66f89bc 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <net/flow_dissector.h>
+#include <linux/rhashtable.h>
 
 struct flow_match {
 	struct flow_dissector	*dissector;
@@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
 	INIT_LIST_HEAD(&flow_block->cb_list);
 }
 
+typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+				      enum tc_setup_type type, void *type_data);
+
+struct flow_indr_block_cb {
+	struct list_head list;
+	void *cb_priv;
+	flow_indr_block_bind_cb_t *cb;
+	void *cb_ident;
+};
+
+typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
+				       struct flow_block *flow_block,
+				       struct flow_indr_block_cb *indr_block_cb,
+				       enum flow_block_command command);
+
+struct flow_indr_block_dev {
+	struct rhash_head ht_node;
+	struct net_device *dev;
+	unsigned int refcnt;
+	struct list_head cb_list;
+	flow_indr_block_ing_cmd_t *ing_cmd_cb;
+	struct flow_block *flow_block;
+};
+
+struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev);
+
+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				  flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+void __flow_indr_block_cb_unregister(struct net_device *dev,
+				     flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+void flow_indr_block_cb_unregister(struct net_device *dev,
+				   flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
 #endif /* _NET_FLOW_OFFLOAD_H */
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e429809..0790a4e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -70,15 +70,6 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 	return block->q;
 }
 
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-				tc_indr_block_bind_cb_t *cb, void *cb_ident);
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-			      tc_indr_block_bind_cb_t *cb, void *cb_ident);
-void __tc_indr_block_cb_unregister(struct net_device *dev,
-				   tc_indr_block_bind_cb_t *cb, void *cb_ident);
-void tc_indr_block_cb_unregister(struct net_device *dev,
-				 tc_indr_block_bind_cb_t *cb, void *cb_ident);
-
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
 
@@ -137,32 +128,6 @@ void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
 {
 }
 
-static inline
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-				tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	return 0;
-}
-
-static inline
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-			      tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	return 0;
-}
-
-static inline
-void __tc_indr_block_cb_unregister(struct net_device *dev,
-				   tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-}
-
-static inline
-void tc_indr_block_cb_unregister(struct net_device *dev,
-				 tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-}
-
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			       struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6b6b012..d9f359a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -23,9 +23,6 @@
 struct module;
 struct bpf_flow_keys;
 
-typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
-				    enum tc_setup_type type, void *type_data);
-
 struct qdisc_rate_table {
 	struct tc_ratespec rate;
 	u32		data[256];
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index d63b970..9f1ae67 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -2,6 +2,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <net/flow_offload.h>
+#include <linux/rtnetlink.h>
 
 struct flow_rule *flow_rule_alloc(unsigned int num_actions)
 {
@@ -280,3 +281,181 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 	}
 }
 EXPORT_SYMBOL(flow_block_cb_setup_simple);
+
+static struct rhashtable indr_setup_block_ht;
+
+static const struct rhashtable_params flow_indr_setup_block_ht_params = {
+	.key_offset	= offsetof(struct flow_indr_block_dev, dev),
+	.head_offset	= offsetof(struct flow_indr_block_dev, ht_node),
+	.key_len	= sizeof(struct net_device *),
+};
+
+struct flow_indr_block_dev *
+flow_indr_block_dev_lookup(struct net_device *dev)
+{
+	return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
+				      flow_indr_setup_block_ht_params);
+}
+EXPORT_SYMBOL(flow_indr_block_dev_lookup);
+
+static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *dev)
+{
+	struct flow_indr_block_dev *indr_dev;
+
+	indr_dev = flow_indr_block_dev_lookup(dev);
+	if (indr_dev)
+		goto inc_ref;
+
+	indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
+	if (!indr_dev)
+		return NULL;
+
+	INIT_LIST_HEAD(&indr_dev->cb_list);
+	indr_dev->dev = dev;
+	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+				   flow_indr_setup_block_ht_params)) {
+		kfree(indr_dev);
+		return NULL;
+	}
+
+inc_ref:
+	indr_dev->refcnt++;
+	return indr_dev;
+}
+
+static void flow_indr_block_dev_put(struct flow_indr_block_dev *indr_dev)
+{
+	if (--indr_dev->refcnt)
+		return;
+
+	rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+			       flow_indr_setup_block_ht_params);
+	kfree(indr_dev);
+}
+
+static struct flow_indr_block_cb *
+flow_indr_block_cb_lookup(struct flow_indr_block_dev *indr_dev,
+			  flow_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		if (indr_block_cb->cb == cb &&
+		    indr_block_cb->cb_ident == cb_ident)
+			return indr_block_cb;
+	return NULL;
+}
+
+static struct flow_indr_block_cb *
+flow_indr_block_cb_add(struct flow_indr_block_dev *indr_dev, void *cb_priv,
+		       flow_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+
+	indr_block_cb = flow_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	if (indr_block_cb)
+		return ERR_PTR(-EEXIST);
+
+	indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
+	if (!indr_block_cb)
+		return ERR_PTR(-ENOMEM);
+
+	indr_block_cb->cb_priv = cb_priv;
+	indr_block_cb->cb = cb;
+	indr_block_cb->cb_ident = cb_ident;
+	list_add(&indr_block_cb->list, &indr_dev->cb_list);
+
+	return indr_block_cb;
+}
+
+static void flow_indr_block_cb_del(struct flow_indr_block_cb *indr_block_cb)
+{
+	list_del(&indr_block_cb->list);
+	kfree(indr_block_cb);
+}
+
+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				  flow_indr_block_bind_cb_t *cb,
+				  void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
+	int err;
+
+	indr_dev = flow_indr_block_dev_get(dev);
+	if (!indr_dev)
+		return -ENOMEM;
+
+	indr_block_cb = flow_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
+	err = PTR_ERR_OR_ZERO(indr_block_cb);
+	if (err)
+		goto err_dev_put;
+
+	if (indr_dev->ing_cmd_cb)
+		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, indr_block_cb,
+				     FLOW_BLOCK_BIND);
+
+	return 0;
+
+err_dev_put:
+	flow_indr_block_dev_put(indr_dev);
+	return err;
+}
+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_register);
+
+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				flow_indr_block_bind_cb_t *cb,
+				void *cb_ident)
+{
+	int err;
+
+	rtnl_lock();
+	err = __flow_indr_block_cb_register(dev, cb_priv, cb, cb_ident);
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_cb_register);
+
+void __flow_indr_block_cb_unregister(struct net_device *dev,
+				     flow_indr_block_bind_cb_t *cb,
+				     void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
+
+	indr_dev = flow_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return;
+
+	indr_block_cb = flow_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	if (!indr_block_cb)
+		return;
+
+	/* Send unbind message if required to free any block cbs. */
+	if (indr_dev->ing_cmd_cb)
+		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
+				     indr_block_cb,
+				     FLOW_BLOCK_UNBIND);
+
+	flow_indr_block_cb_del(indr_block_cb);
+	flow_indr_block_dev_put(indr_dev);
+}
+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_unregister);
+
+void flow_indr_block_cb_unregister(struct net_device *dev,
+				   flow_indr_block_bind_cb_t *cb,
+				   void *cb_ident)
+{
+	rtnl_lock();
+	__flow_indr_block_cb_unregister(dev, cb, cb_ident);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_cb_unregister);
+
+static int __init init_flow_indr_rhashtable(void)
+{
+	return rhashtable_init(&indr_setup_block_ht,
+			       &flow_indr_setup_block_ht_params);
+}
+subsys_initcall(init_flow_indr_rhashtable);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3565d9a..d551c56 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -37,6 +37,7 @@
 #include <net/tc_act/tc_skbedit.h>
 #include <net/tc_act/tc_ct.h>
 #include <net/tc_act/tc_mpls.h>
+#include <net/flow_offload.h>
 
 extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
 
@@ -545,235 +546,43 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held)
 	}
 }
 
-static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
-{
-	const struct Qdisc_class_ops *cops;
-	struct Qdisc *qdisc;
-
-	if (!dev_ingress_queue(dev))
-		return NULL;
-
-	qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
-	if (!qdisc)
-		return NULL;
-
-	cops = qdisc->ops->cl_ops;
-	if (!cops)
-		return NULL;
-
-	if (!cops->tcf_block)
-		return NULL;
-
-	return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
-}
-
-static struct rhashtable indr_setup_block_ht;
-
-struct tc_indr_block_dev {
-	struct rhash_head ht_node;
-	struct net_device *dev;
-	unsigned int refcnt;
-	struct list_head cb_list;
-	struct tcf_block *block;
-};
-
-struct tc_indr_block_cb {
-	struct list_head list;
-	void *cb_priv;
-	tc_indr_block_bind_cb_t *cb;
-	void *cb_ident;
-};
-
-static const struct rhashtable_params tc_indr_setup_block_ht_params = {
-	.key_offset	= offsetof(struct tc_indr_block_dev, dev),
-	.head_offset	= offsetof(struct tc_indr_block_dev, ht_node),
-	.key_len	= sizeof(struct net_device *),
-};
-
-static struct tc_indr_block_dev *
-tc_indr_block_dev_lookup(struct net_device *dev)
-{
-	return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
-				      tc_indr_setup_block_ht_params);
-}
-
-static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev)
-{
-	struct tc_indr_block_dev *indr_dev;
-
-	indr_dev = tc_indr_block_dev_lookup(dev);
-	if (indr_dev)
-		goto inc_ref;
-
-	indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
-	if (!indr_dev)
-		return NULL;
-
-	INIT_LIST_HEAD(&indr_dev->cb_list);
-	indr_dev->dev = dev;
-	indr_dev->block = tc_dev_ingress_block(dev);
-	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
-				   tc_indr_setup_block_ht_params)) {
-		kfree(indr_dev);
-		return NULL;
-	}
-
-inc_ref:
-	indr_dev->refcnt++;
-	return indr_dev;
-}
-
-static void tc_indr_block_dev_put(struct tc_indr_block_dev *indr_dev)
-{
-	if (--indr_dev->refcnt)
-		return;
-
-	rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
-			       tc_indr_setup_block_ht_params);
-	kfree(indr_dev);
-}
-
-static struct tc_indr_block_cb *
-tc_indr_block_cb_lookup(struct tc_indr_block_dev *indr_dev,
-			tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-
-	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
-		if (indr_block_cb->cb == cb &&
-		    indr_block_cb->cb_ident == cb_ident)
-			return indr_block_cb;
-	return NULL;
-}
-
-static struct tc_indr_block_cb *
-tc_indr_block_cb_add(struct tc_indr_block_dev *indr_dev, void *cb_priv,
-		     tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-
-	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
-	if (indr_block_cb)
-		return ERR_PTR(-EEXIST);
-
-	indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
-	if (!indr_block_cb)
-		return ERR_PTR(-ENOMEM);
-
-	indr_block_cb->cb_priv = cb_priv;
-	indr_block_cb->cb = cb;
-	indr_block_cb->cb_ident = cb_ident;
-	list_add(&indr_block_cb->list, &indr_dev->cb_list);
-
-	return indr_block_cb;
-}
-
-static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb)
-{
-	list_del(&indr_block_cb->list);
-	kfree(indr_block_cb);
-}
-
 static int tcf_block_setup(struct tcf_block *block,
 			   struct flow_block_offload *bo);
 
-static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
-				  struct tc_indr_block_cb *indr_block_cb,
+static void tc_indr_block_ing_cmd(struct net_device *dev,
+				  struct flow_block *flow_block,
+				  struct flow_indr_block_cb *indr_block_cb,
 				  enum flow_block_command command)
 {
+	struct tcf_block *block = flow_block ?
+				  container_of(flow_block,
+					       struct tcf_block,
+					       flow_block) : NULL;
 	struct flow_block_offload bo = {
 		.command	= command,
 		.binder_type	= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
-		.net		= dev_net(indr_dev->dev),
-		.block_shared	= tcf_block_non_null_shared(indr_dev->block),
+		.net		= dev_net(dev),
+		.block_shared	= tcf_block_non_null_shared(block),
 	};
 	INIT_LIST_HEAD(&bo.cb_list);
 
-	if (!indr_dev->block)
-		return;
-
-	bo.block = &indr_dev->block->flow_block;
-
-	indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
-			  &bo);
-	tcf_block_setup(indr_dev->block, &bo);
-}
-
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-				tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
-	int err;
-
-	indr_dev = tc_indr_block_dev_get(dev);
-	if (!indr_dev)
-		return -ENOMEM;
-
-	indr_block_cb = tc_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
-	err = PTR_ERR_OR_ZERO(indr_block_cb);
-	if (err)
-		goto err_dev_put;
-
-	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND);
-	return 0;
-
-err_dev_put:
-	tc_indr_block_dev_put(indr_dev);
-	return err;
-}
-EXPORT_SYMBOL_GPL(__tc_indr_block_cb_register);
-
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-			      tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	int err;
-
-	rtnl_lock();
-	err = __tc_indr_block_cb_register(dev, cb_priv, cb, cb_ident);
-	rtnl_unlock();
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(tc_indr_block_cb_register);
-
-void __tc_indr_block_cb_unregister(struct net_device *dev,
-				   tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
-
-	indr_dev = tc_indr_block_dev_lookup(dev);
-	if (!indr_dev)
+	if (!block)
 		return;
 
-	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
-	if (!indr_block_cb)
-		return;
+	bo.block = flow_block;
 
-	/* Send unbind message if required to free any block cbs. */
-	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND);
-	tc_indr_block_cb_del(indr_block_cb);
-	tc_indr_block_dev_put(indr_dev);
-}
-EXPORT_SYMBOL_GPL(__tc_indr_block_cb_unregister);
+	indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo);
 
-void tc_indr_block_cb_unregister(struct net_device *dev,
-				 tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	rtnl_lock();
-	__tc_indr_block_cb_unregister(dev, cb, cb_ident);
-	rtnl_unlock();
+	tcf_block_setup(block, &bo);
 }
-EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister);
 
 static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 			       struct tcf_block_ext_info *ei,
 			       enum flow_block_command command,
 			       struct netlink_ext_ack *extack)
 {
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
 	struct flow_block_offload bo = {
 		.command	= command,
 		.binder_type	= ei->binder_type,
@@ -784,11 +593,12 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 	};
 	INIT_LIST_HEAD(&bo.cb_list);
 
-	indr_dev = tc_indr_block_dev_lookup(dev);
+	indr_dev = flow_indr_block_dev_lookup(dev);
 	if (!indr_dev)
 		return;
 
-	indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL;
+	indr_dev->flow_block = command == FLOW_BLOCK_BIND ? &block->flow_block : NULL;
+	indr_dev->ing_cmd_cb = command == FLOW_BLOCK_BIND ? tc_indr_block_ing_cmd : NULL;
 
 	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
 		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
@@ -3358,11 +3168,6 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
-	err = rhashtable_init(&indr_setup_block_ht,
-			      &tc_indr_setup_block_ht_params);
-	if (err)
-		goto err_rhash_setup_block_ht;
-
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
 		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
@@ -3376,8 +3181,6 @@ static int __init tc_filter_init(void)
 
 	return 0;
 
-err_rhash_setup_block_ht:
-	unregister_pernet_subsys(&tcf_net_ops);
 err_register_pernet_subsys:
 	destroy_workqueue(tc_filter_wq);
 	return err;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v4 3/3] netfilter: nf_tables_offload: support indr block call
From: wenxu @ 2019-07-28  6:52 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
In-Reply-To: <1564296769-32294-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

nftable support indr-block call. It makes nftable an offload vlan
and tunnel device.

nft add table netdev firewall
nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; }
nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0
nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; }
nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v3: subsys_initcall for init_flow_indr_rhashtable
v4: guarantee only one offload base chain used per indr dev. 
If the indr_block_cmd bind fail return unsupported.

 net/netfilter/nf_tables_offload.c | 131 +++++++++++++++++++++++++++++++-------
 1 file changed, 107 insertions(+), 24 deletions(-)

diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 64f5fd5..19214ad 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -171,24 +171,123 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo,
 	return 0;
 }
 
+static int nft_block_setup(struct nft_base_chain *basechain,
+			   struct flow_block_offload *bo,
+			   enum flow_block_command cmd)
+{
+	int err;
+
+	switch (cmd) {
+	case FLOW_BLOCK_BIND:
+		err = nft_flow_offload_bind(bo, basechain);
+		break;
+	case FLOW_BLOCK_UNBIND:
+		err = nft_flow_offload_unbind(bo, basechain);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		err = -EOPNOTSUPP;
+	}
+
+	return err;
+}
+
+static int nft_block_offload_cmd(struct nft_base_chain *chain,
+				 struct net_device *dev,
+				 enum flow_block_command cmd)
+{
+	struct netlink_ext_ack extack = {};
+	struct flow_block_offload bo = {};
+	int err;
+
+	bo.net = dev_net(dev);
+	bo.block = &chain->flow_block;
+	bo.command = cmd;
+	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	bo.extack = &extack;
+	INIT_LIST_HEAD(&bo.cb_list);
+
+	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
+	if (err < 0)
+		return err;
+
+	return nft_block_setup(chain, &bo, cmd);
+}
+
+static void nft_indr_block_ing_cmd(struct net_device *dev,
+				   struct flow_block *flow_block,
+				   struct flow_indr_block_cb *indr_block_cb,
+				   enum flow_block_command cmd)
+{
+	struct netlink_ext_ack extack = {};
+	struct flow_block_offload bo = {};
+	struct nft_base_chain *chain;
+
+	if (flow_block)
+		return;
+
+	chain = container_of(flow_block, struct nft_base_chain, flow_block);
+
+	bo.net = dev_net(dev);
+	bo.block = flow_block;
+	bo.command = cmd;
+	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	bo.extack = &extack;
+	INIT_LIST_HEAD(&bo.cb_list);
+
+	indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo);
+
+	nft_block_setup(chain, &bo, cmd);
+}
+
+static int nft_indr_block_offload_cmd(struct nft_base_chain *chain,
+				      struct net_device *dev,
+				      enum flow_block_command cmd)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
+	struct flow_block_offload bo = {};
+	struct netlink_ext_ack extack = {};
+
+	bo.net = dev_net(dev);
+	bo.block = &chain->flow_block;
+	bo.command = cmd;
+	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	bo.extack = &extack;
+	INIT_LIST_HEAD(&bo.cb_list);
+
+	indr_dev = flow_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return -EOPNOTSUPP;
+
+	indr_dev->flow_block = cmd == FLOW_BLOCK_BIND ? &chain->flow_block : NULL;
+	indr_dev->ing_cmd_cb = cmd == FLOW_BLOCK_BIND ? nft_indr_block_ing_cmd : NULL;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
+				  &bo);
+
+	if (list_empty(&bo.cb_list))
+		return -EOPNOTSUPP;
+
+	return nft_block_setup(chain, &bo, cmd);
+}
+
 #define FLOW_SETUP_BLOCK TC_SETUP_BLOCK
 
 static int nft_flow_offload_chain(struct nft_trans *trans,
 				  enum flow_block_command cmd)
 {
 	struct nft_chain *chain = trans->ctx.chain;
-	struct netlink_ext_ack extack = {};
-	struct flow_block_offload bo = {};
 	struct nft_base_chain *basechain;
 	struct net_device *dev;
-	int err;
 
 	if (!nft_is_base_chain(chain))
 		return -EOPNOTSUPP;
 
 	basechain = nft_base_chain(chain);
 	dev = basechain->ops.dev;
-	if (!dev || !dev->netdev_ops->ndo_setup_tc)
+	if (!dev)
 		return -EOPNOTSUPP;
 
 	/* Only default policy to accept is supported for now. */
@@ -197,26 +296,10 @@ static int nft_flow_offload_chain(struct nft_trans *trans,
 	    nft_trans_chain_policy(trans) != NF_ACCEPT)
 		return -EOPNOTSUPP;
 
-	bo.command = cmd;
-	bo.block = &basechain->flow_block;
-	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
-	bo.extack = &extack;
-	INIT_LIST_HEAD(&bo.cb_list);
-
-	err = dev->netdev_ops->ndo_setup_tc(dev, FLOW_SETUP_BLOCK, &bo);
-	if (err < 0)
-		return err;
-
-	switch (cmd) {
-	case FLOW_BLOCK_BIND:
-		err = nft_flow_offload_bind(&bo, basechain);
-		break;
-	case FLOW_BLOCK_UNBIND:
-		err = nft_flow_offload_unbind(&bo, basechain);
-		break;
-	}
-
-	return err;
+	if (dev->netdev_ops->ndo_setup_tc)
+		return nft_block_offload_cmd(basechain, dev, cmd);
+	else
+		return nft_indr_block_offload_cmd(basechain, dev, cmd);
 }
 
 int nft_flow_rule_offload_commit(struct net *net)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
From: wenxu @ 2019-07-28  6:52 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
In-Reply-To: <1564296769-32294-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

When thre indr device register, it can get the default block
from tc immediately if the block is exist.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v3: no change
v4: get tc default block without callback

 include/net/pkt_cls.h   |  7 +++++++
 net/core/flow_offload.c |  2 ++
 net/sched/cls_api.c     | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0790a4e..77c3a42 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -54,6 +54,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei);
 
+void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev);
+
 static inline bool tcf_block_shared(struct tcf_block *block)
 {
 	return block->index;
@@ -74,6 +76,11 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
 
 #else
+static inline
+void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev)
+{
+}
+
 static inline bool tcf_block_shared(struct tcf_block *block)
 {
 	return false;
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 9f1ae67..0ca3d51 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -3,6 +3,7 @@
 #include <linux/slab.h>
 #include <net/flow_offload.h>
 #include <linux/rtnetlink.h>
+#include <net/pkt_cls.h>
 
 struct flow_rule *flow_rule_alloc(unsigned int num_actions)
 {
@@ -312,6 +313,7 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de
 
 	INIT_LIST_HEAD(&indr_dev->cb_list);
 	indr_dev->dev = dev;
+	tc_indr_get_default_block(indr_dev);
 	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
 				   flow_indr_setup_block_ht_params)) {
 		kfree(indr_dev);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d551c56..59e9572 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -576,6 +576,39 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
 	tcf_block_setup(block, &bo);
 }
 
+static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
+{
+	const struct Qdisc_class_ops *cops;
+	struct Qdisc *qdisc;
+
+	if (!dev_ingress_queue(dev))
+		return NULL;
+
+	qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
+	if (!qdisc)
+		return NULL;
+
+	cops = qdisc->ops->cl_ops;
+	if (!cops)
+		return NULL;
+
+	if (!cops->tcf_block)
+		return NULL;
+
+	return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
+}
+
+void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev)
+{
+	struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev);
+
+	if (block) {
+		indr_dev->flow_block = &block->flow_block;
+		indr_dev->ing_cmd_cb = tc_indr_block_ing_cmd;
+	}
+}
+EXPORT_SYMBOL(tc_indr_get_default_block);
+
 static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 			       struct tcf_block_ext_info *ei,
 			       enum flow_block_command command,
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v4 0/3] flow_offload: add indr-block in nf_table_offload
From: wenxu @ 2019-07-28  6:52 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev

From: wenxu <wenxu@ucloud.cn>

This series patch make nftables offload support the vlan and
tunnel device offload through indr-block architecture.

The first patch mv tc indr block to flow offload and rename
to flow-indr-block.
Because the new flow-indr-block can't get the tcf_block
directly. The second patch provide a callback to get tcf_block
immediately when the device register and contain a ingress block.
The third patch make nf_tables_offload support flow-indr-block.

wenxu (3):
  flow_offload: move tc indirect block to flow offload
  flow_offload: Support get default block from tc immediately
  netfilter: nf_tables_offload: support indr block call

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  10 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    |  10 +-
 include/net/flow_offload.h                         |  39 ++++
 include/net/pkt_cls.h                              |  42 +---
 include/net/sch_generic.h                          |   3 -
 net/core/flow_offload.c                            | 181 +++++++++++++++
 net/netfilter/nf_tables_offload.c                  | 131 +++++++++--
 net/sched/cls_api.c                                | 246 ++++-----------------
 8 files changed, 385 insertions(+), 277 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* Re: [PATCH v3 bpf-next 0/9] Revamp test_progs as a test running framework
From: Alexei Starovoitov @ 2019-07-28  5:41 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Stanislav Fomichev, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>

On Sat, Jul 27, 2019 at 8:25 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> This patch set makes a number of changes to test_progs selftest, which is
> a collection of many other tests (and sometimes sub-tests as well), to provide
> better testing experience and allow to start convering many individual test
> programs under selftests/bpf into a single and convenient test runner.
>
> Patch #1 fixes issue with Makefile, which makes prog_tests/test.h compiled as
> a C code. This fix allows to change how test.h is generated, providing ability
> to have more control on what and how tests are run.
>
> Patch #2 changes how test.h is auto-generated, which allows to have test
> definitions, instead of just running test functions. This gives ability to do
> more complicated test run policies.
>
> Patch #3 adds `-t <test-name>` and `-n <test-num>` selectors to run only
> subset of tests.
>
> Patch #4 changes libbpf_set_print() to return previously set print callback,
> allowing to temporarily replace current print callback and then set it back.
> This is necessary for some tests that want more control over libbpf logging.
>
> Patch #5 sets up and takes over libbpf logging from individual tests to
> test_prog runner, adding -vv verbosity to capture debug output from libbpf.
> This is useful when debugging failing tests.
>
> Patch #6 furthers test output management and buffers it by default, emitting
> log output only if test fails. This give succinct and clean default test
> output. It's possible to bypass this behavior with -v flag, which will turn
> off test output buffering.
>
> Patch #7 adds support for sub-tests. It also enhances -t and -n selectors to
> both support ability to specify sub-test selectors, as well as enhancing
> number selector to accept sets of test, instead of just individual test
> number.
>
> Patch #8 converts bpf_verif_scale.c test to use sub-test APIs.
>
> Patch #9 converts send_signal.c tests to use sub-test APIs.
>
> v2->v3:
>   - fix buffered output rare unitialized value bug (Alexei);
>   - fix buffered output va_list reuse bug (Alexei);
>   - fix buffered output truncation due to interleaving zero terminators;

Looks great.
Applied. Thanks!

^ permalink raw reply

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-28  4:06 UTC (permalink / raw)
  To: Himadri Pandya
  Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
	linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>

Hi Himadri,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   include/linux/sched.h:609:43: sparse: sparse: bad integer constant expression
   include/linux/sched.h:609:73: sparse: sparse: invalid named zero-width bitfield `value'
   include/linux/sched.h:610:43: sparse: sparse: bad integer constant expression
   include/linux/sched.h:610:67: sparse: sparse: invalid named zero-width bitfield `bucket_id'
   net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    right side has type int
   net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    right side has type int
   net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: bad constant expression type
   net/vmw_vsock/hyperv_transport.c:387:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:388:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:465:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:466:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:666:9: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type

vim +214 net/vmw_vsock/hyperv_transport.c

ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   59  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   60  struct hvs_send_buf {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   61  	/* The header before the payload data */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   62  	struct vmpipe_proto_header hdr;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   63  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   64  	/* The payload */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  @65  	u8 data[HVS_SEND_BUF_SIZE];
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   66  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   67  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   68  #define HVS_HEADER_LEN	(sizeof(struct vmpacket_descriptor) + \
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   69  			 sizeof(struct vmpipe_proto_header))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   70  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   71  /* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write(), and
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   72   * __hv_pkt_iter_next().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   73   */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   74  #define VMBUS_PKT_TRAILER_SIZE	(sizeof(u64))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   75  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   76  #define HVS_PKT_LEN(payload_len)	(HVS_HEADER_LEN + \
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   77  					 ALIGN((payload_len), 8) + \
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   78  					 VMBUS_PKT_TRAILER_SIZE)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   79  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   80  union hvs_service_id {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   81  	uuid_le	srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   82  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   83  	struct {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   84  		unsigned int svm_port;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   85  		unsigned char b[sizeof(uuid_le) - sizeof(unsigned int)];
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   86  	};
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   87  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   88  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   89  /* Per-socket state (accessed via vsk->trans) */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   90  struct hvsock {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   91  	struct vsock_sock *vsk;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   92  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   93  	uuid_le vm_srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   94  	uuid_le host_srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   95  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   96  	struct vmbus_channel *chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   97  	struct vmpacket_descriptor *recv_desc;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   98  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   99  	/* The length of the payload not delivered to userland yet */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  100  	u32 recv_data_len;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  101  	/* The offset of the payload */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  102  	u32 recv_data_off;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  103  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  104  	/* Have we sent the zero-length packet (FIN)? */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  105  	bool fin_sent;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  106  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  107  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  108  /* In the VM, we support Hyper-V Sockets with AF_VSOCK, and the endpoint is
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  109   * <cid, port> (see struct sockaddr_vm). Note: cid is not really used here:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  110   * when we write apps to connect to the host, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  111   * or VMADDR_CID_HOST (both are equivalent) as the remote cid, and when we
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  112   * write apps to bind() & listen() in the VM, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  113   * as the local cid.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  114   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  115   * On the host, Hyper-V Sockets are supported by Winsock AF_HYPERV:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  116   * https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  117   * guide/make-integration-service, and the endpoint is <VmID, ServiceId> with
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  118   * the below sockaddr:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  119   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  120   * struct SOCKADDR_HV
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  121   * {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  122   *    ADDRESS_FAMILY Family;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  123   *    USHORT Reserved;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  124   *    GUID VmId;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  125   *    GUID ServiceId;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  126   * };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  127   * Note: VmID is not used by Linux VM and actually it isn't transmitted via
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  128   * VMBus, because here it's obvious the host and the VM can easily identify
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  129   * each other. Though the VmID is useful on the host, especially in the case
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  130   * of Windows container, Linux VM doesn't need it at all.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  131   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  132   * To make use of the AF_VSOCK infrastructure in Linux VM, we have to limit
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  133   * the available GUID space of SOCKADDR_HV so that we can create a mapping
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  134   * between AF_VSOCK port and SOCKADDR_HV Service GUID. The rule of writing
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  135   * Hyper-V Sockets apps on the host and in Linux VM is:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  136   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  137   ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  138   * The only valid Service GUIDs, from the perspectives of both the host and *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  139   * Linux VM, that can be connected by the other end, must conform to this   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  140   * format: <port>-facb-11e6-bd58-64006a7986d3, and the "port" must be in    *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  141   * this range [0, 0x7FFFFFFF].                                              *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  142   ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  143   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  144   * When we write apps on the host to connect(), the GUID ServiceID is used.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  145   * When we write apps in Linux VM to connect(), we only need to specify the
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  146   * port and the driver will form the GUID and use that to request the host.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  147   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  148   * From the perspective of Linux VM:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  149   * 1. the local ephemeral port (i.e. the local auto-bound port when we call
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  150   * connect() without explicit bind()) is generated by __vsock_bind_stream(),
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  151   * and the range is [1024, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  152   * 2. the remote ephemeral port (i.e. the auto-generated remote port for
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  153   * a connect request initiated by the host's connect()) is generated by
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  154   * hvs_remote_addr_init() and the range is [0x80000000, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  155   */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  156  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  157  #define MAX_LISTEN_PORT			((u32)0x7FFFFFFF)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  158  #define MAX_VM_LISTEN_PORT		MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  159  #define MAX_HOST_LISTEN_PORT		MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  160  #define MIN_HOST_EPHEMERAL_PORT		(MAX_HOST_LISTEN_PORT + 1)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  161  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  162  /* 00000000-facb-11e6-bd58-64006a7986d3 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  163  static const uuid_le srv_id_template =
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  164  	UUID_LE(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  165  		0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  166  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  167  static bool is_valid_srv_id(const uuid_le *id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  168  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  169  	return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  170  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  171  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  172  static unsigned int get_port_by_srv_id(const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  173  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  174  	return *((unsigned int *)svr_id);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  175  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  176  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  177  static void hvs_addr_init(struct sockaddr_vm *addr, const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  178  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  179  	unsigned int port = get_port_by_srv_id(svr_id);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  180  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  181  	vsock_addr_init(addr, VMADDR_CID_ANY, port);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  182  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  183  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  184  static void hvs_remote_addr_init(struct sockaddr_vm *remote,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  185  				 struct sockaddr_vm *local)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  186  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  187  	static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  188  	struct sock *sk;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  189  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  190  	vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  191  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  192  	while (1) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  193  		/* Wrap around ? */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  194  		if (host_ephemeral_port < MIN_HOST_EPHEMERAL_PORT ||
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  195  		    host_ephemeral_port == VMADDR_PORT_ANY)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  196  			host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  197  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  198  		remote->svm_port = host_ephemeral_port++;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  199  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  200  		sk = vsock_find_connected_socket(remote, local);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  201  		if (!sk) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  202  			/* Found an available ephemeral port */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  203  			return;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  204  		}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  205  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  206  		/* Release refcnt got in vsock_find_connected_socket */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  207  		sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  208  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  209  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  210  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  211  static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  212  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  213  	set_channel_pending_send_size(chan,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26 @214  				      HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  215  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  216  	virt_mb();
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  217  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  218  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  219  static bool hvs_channel_readable(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  220  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  221  	u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  222  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  223  	/* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  224  	return readable >= HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  225  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  226  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  227  static int hvs_channel_readable_payload(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  228  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  229  	u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  230  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  231  	if (readable > HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  232  		/* At least we have 1 byte to read. We don't need to return
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  233  		 * the exact readable bytes: see vsock_stream_recvmsg() ->
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  234  		 * vsock_stream_has_data().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  235  		 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  236  		return 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  237  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  238  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  239  	if (readable == HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  240  		/* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  241  		return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  242  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  243  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  244  	/* No payload or FIN */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  245  	return -1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  246  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  247  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  248  static size_t hvs_channel_writable_bytes(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  249  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  250  	u32 writeable = hv_get_bytes_to_write(&chan->outbound);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  251  	size_t ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  252  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  253  	/* The ringbuffer mustn't be 100% full, and we should reserve a
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  254  	 * zero-length-payload packet for the FIN: see hv_ringbuffer_write()
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  255  	 * and hvs_shutdown().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  256  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  257  	if (writeable <= HVS_PKT_LEN(1) + HVS_PKT_LEN(0))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  258  		return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  259  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  260  	ret = writeable - HVS_PKT_LEN(1) - HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  261  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  262  	return round_down(ret, 8);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  263  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  264  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  265  static int hvs_send_data(struct vmbus_channel *chan,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  266  			 struct hvs_send_buf *send_buf, size_t to_write)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  267  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  268  	send_buf->hdr.pkt_type = 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  269  	send_buf->hdr.data_size = to_write;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  270  	return vmbus_sendpacket(chan, &send_buf->hdr,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  271  				sizeof(send_buf->hdr) + to_write,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  272  				0, VM_PKT_DATA_INBAND, 0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  273  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  274  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  275  static void hvs_channel_cb(void *ctx)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  276  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  277  	struct sock *sk = (struct sock *)ctx;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  278  	struct vsock_sock *vsk = vsock_sk(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  279  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  280  	struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  281  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  282  	if (hvs_channel_readable(chan))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  283  		sk->sk_data_ready(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  284  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  285  	if (hv_get_bytes_to_write(&chan->outbound) > 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  286  		sk->sk_write_space(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  287  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  288  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  289  static void hvs_do_close_lock_held(struct vsock_sock *vsk,
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  290  				   bool cancel_timeout)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  291  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  292  	struct sock *sk = sk_vsock(vsk);
b4562ca7925a3be Dexuan Cui       2017-10-19  293  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  294  	sock_set_flag(sk, SOCK_DONE);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  295  	vsk->peer_shutdown = SHUTDOWN_MASK;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  296  	if (vsock_stream_has_data(vsk) <= 0)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  297  		sk->sk_state = TCP_CLOSING;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  298  	sk->sk_state_change(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  299  	if (vsk->close_work_scheduled &&
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  300  	    (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  301  		vsk->close_work_scheduled = false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  302  		vsock_remove_sock(vsk);
b4562ca7925a3be Dexuan Cui       2017-10-19  303  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  304  		/* Release the reference taken while scheduling the timeout */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  305  		sock_put(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  306  	}
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  307  }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  308  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  309  static void hvs_close_connection(struct vmbus_channel *chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  310  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  311  	struct sock *sk = get_per_channel_state(chan);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  312  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  313  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  314  	hvs_do_close_lock_held(vsock_sk(sk), true);
b4562ca7925a3be Dexuan Cui       2017-10-19  315  	release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  316  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  317  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  318  static void hvs_open_connection(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  319  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  320  	uuid_le *if_instance, *if_type;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  321  	unsigned char conn_from_host;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  322  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  323  	struct sockaddr_vm addr;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  324  	struct sock *sk, *new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  325  	struct vsock_sock *vnew = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  326  	struct hvsock *hvs = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  327  	struct hvsock *hvs_new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  328  	int rcvbuf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  329  	int ret;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  330  	int sndbuf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  331  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  332  	if_type = &chan->offermsg.offer.if_type;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  333  	if_instance = &chan->offermsg.offer.if_instance;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  334  	conn_from_host = chan->offermsg.offer.u.pipe.user_def[0];
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  335  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  336  	/* The host or the VM should only listen on a port in
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  337  	 * [0, MAX_LISTEN_PORT]
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  338  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  339  	if (!is_valid_srv_id(if_type) ||
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  340  	    get_port_by_srv_id(if_type) > MAX_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  341  		return;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  342  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  343  	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  344  	sk = vsock_find_bound_socket(&addr);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  345  	if (!sk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  346  		return;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  347  
b4562ca7925a3be Dexuan Cui       2017-10-19  348  	lock_sock(sk);
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  349  	if ((conn_from_host && sk->sk_state != TCP_LISTEN) ||
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  350  	    (!conn_from_host && sk->sk_state != TCP_SYN_SENT))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  351  		goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  352  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  353  	if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  354  		if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  355  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  356  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  357  		new = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  358  				     sk->sk_type, 0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  359  		if (!new)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  360  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  361  
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  362  		new->sk_state = TCP_SYN_SENT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  363  		vnew = vsock_sk(new);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  364  		hvs_new = vnew->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  365  		hvs_new->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  366  	} else {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  367  		hvs = vsock_sk(sk)->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  368  		hvs->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  369  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  370  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  371  	set_channel_read_mode(chan, HV_CALL_DIRECT);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  372  
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  373  	/* Use the socket buffer sizes as hints for the VMBUS ring size. For
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  374  	 * server side sockets, 'sk' is the parent socket and thus, this will
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  375  	 * allow the child sockets to inherit the size from the parent. Keep
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  376  	 * the mins to the default value and align to page size as per VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  377  	 * requirements.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  378  	 * For the max, the socket core library will limit the socket buffer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  379  	 * size that can be set by the user, but, since currently, the hv_sock
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  380  	 * VMBUS ring buffer is physically contiguous allocation, restrict it
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  381  	 * further.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  382  	 * Older versions of hv_sock host side code cannot handle bigger VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  383  	 * ring buffer size. Use the version number to limit the change to newer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  384  	 * versions.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  385  	 */
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  386  	if (vmbus_proto_version < VERSION_WIN10_V5) {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  387  		sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  388  		rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  389  	} else {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 @390  		sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  391  		sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya   2019-07-25  392  		sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  393  		rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  394  		rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya   2019-07-25  395  		rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  396  	}
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  397  
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  398  	ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  399  			 conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  400  	if (ret != 0) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  401  		if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  402  			hvs_new->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  403  			sock_put(new);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  404  		} else {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  405  			hvs->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  406  		}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  407  		goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  408  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  409  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  410  	set_per_channel_state(chan, conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  411  	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  412  
cb359b60416701c Sunil Muthuswamy 2019-06-17  413  	/* Set the pending send size to max packet size to always get
cb359b60416701c Sunil Muthuswamy 2019-06-17  414  	 * notifications from the host when there is enough writable space.
cb359b60416701c Sunil Muthuswamy 2019-06-17  415  	 * The host is optimized to send notifications only when the pending
cb359b60416701c Sunil Muthuswamy 2019-06-17  416  	 * size boundary is crossed, and not always.
cb359b60416701c Sunil Muthuswamy 2019-06-17  417  	 */
cb359b60416701c Sunil Muthuswamy 2019-06-17  418  	hvs_set_channel_pending_send_size(chan);
cb359b60416701c Sunil Muthuswamy 2019-06-17  419  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  420  	if (conn_from_host) {
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  421  		new->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  422  		sk->sk_ack_backlog++;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  423  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  424  		hvs_addr_init(&vnew->local_addr, if_type);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  425  		hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  426  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  427  		hvs_new->vm_srv_id = *if_type;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  428  		hvs_new->host_srv_id = *if_instance;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  429  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  430  		vsock_insert_connected(vnew);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  431  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  432  		vsock_enqueue_accept(sk, new);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  433  	} else {
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  434  		sk->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  435  		sk->sk_socket->state = SS_CONNECTED;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  436  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  437  		vsock_insert_connected(vsock_sk(sk));
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  438  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  439  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  440  	sk->sk_state_change(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  441  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  442  out:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  443  	/* Release refcnt obtained when we called vsock_find_bound_socket() */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  444  	sock_put(sk);
b4562ca7925a3be Dexuan Cui       2017-10-19  445  
b4562ca7925a3be Dexuan Cui       2017-10-19  446  	release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  447  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  448  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  449  static u32 hvs_get_local_cid(void)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  450  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  451  	return VMADDR_CID_ANY;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  452  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  453  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  454  static int hvs_sock_init(struct vsock_sock *vsk, struct vsock_sock *psk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  455  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  456  	struct hvsock *hvs;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  457  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  458  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  459  	hvs = kzalloc(sizeof(*hvs), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  460  	if (!hvs)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  461  		return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  462  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  463  	vsk->trans = hvs;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  464  	hvs->vsk = vsk;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  465  	sk->sk_sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  466  	sk->sk_rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  467  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  468  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  469  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  470  static int hvs_connect(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  471  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  472  	union hvs_service_id vm, host;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  473  	struct hvsock *h = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  474  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  475  	vm.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  476  	vm.svm_port = vsk->local_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  477  	h->vm_srv_id = vm.srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  478  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  479  	host.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  480  	host.svm_port = vsk->remote_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  481  	h->host_srv_id = host.srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  482  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  483  	return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  484  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  485  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  486  static void hvs_shutdown_lock_held(struct hvsock *hvs, int mode)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  487  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  488  	struct vmpipe_proto_header hdr;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  489  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  490  	if (hvs->fin_sent || !hvs->chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  491  		return;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  492  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  493  	/* It can't fail: see hvs_channel_writable_bytes(). */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  494  	(void)hvs_send_data(hvs->chan, (struct hvs_send_buf *)&hdr, 0);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  495  	hvs->fin_sent = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  496  }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  497  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  498  static int hvs_shutdown(struct vsock_sock *vsk, int mode)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  499  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  500  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  501  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  502  	if (!(mode & SEND_SHUTDOWN))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  503  		return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  504  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  505  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  506  	hvs_shutdown_lock_held(vsk->trans, mode);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  507  	release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  508  	return 0;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  509  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  510  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  511  static void hvs_close_timeout(struct work_struct *work)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  512  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  513  	struct vsock_sock *vsk =
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  514  		container_of(work, struct vsock_sock, close_work.work);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  515  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  516  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  517  	sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  518  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  519  	if (!sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  520  		hvs_do_close_lock_held(vsk, false);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  521  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  522  	vsk->close_work_scheduled = false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  523  	release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  524  	sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  525  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  526  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  527  /* Returns true, if it is safe to remove socket; false otherwise */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  528  static bool hvs_close_lock_held(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  529  {
b4562ca7925a3be Dexuan Cui       2017-10-19  530  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  531  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  532  	if (!(sk->sk_state == TCP_ESTABLISHED ||
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  533  	      sk->sk_state == TCP_CLOSING))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  534  		return true;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  535  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  536  	if ((sk->sk_shutdown & SHUTDOWN_MASK) != SHUTDOWN_MASK)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  537  		hvs_shutdown_lock_held(vsk->trans, SHUTDOWN_MASK);
b4562ca7925a3be Dexuan Cui       2017-10-19  538  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  539  	if (sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  540  		return true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  541  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  542  	/* This reference will be dropped by the delayed close routine */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  543  	sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  544  	INIT_DELAYED_WORK(&vsk->close_work, hvs_close_timeout);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  545  	vsk->close_work_scheduled = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  546  	schedule_delayed_work(&vsk->close_work, HVS_CLOSE_TIMEOUT);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  547  	return false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  548  }
b4562ca7925a3be Dexuan Cui       2017-10-19  549  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  550  static void hvs_release(struct vsock_sock *vsk)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  551  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  552  	struct sock *sk = sk_vsock(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  553  	bool remove_sock;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  554  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  555  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  556  	remove_sock = hvs_close_lock_held(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  557  	release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  558  	if (remove_sock)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  559  		vsock_remove_sock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  560  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  561  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  562  static void hvs_destruct(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  563  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  564  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  565  	struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  566  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  567  	if (chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  568  		vmbus_hvsock_device_unregister(chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  569  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  570  	kfree(hvs);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  571  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  572  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  573  static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  574  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  575  	return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  576  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  577  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  578  static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  579  			     size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  580  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  581  	return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  582  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  583  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  584  static int hvs_dgram_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  585  			     struct sockaddr_vm *remote, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  586  			     size_t dgram_len)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  587  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  588  	return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  589  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  590  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  591  static bool hvs_dgram_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  592  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  593  	return false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  594  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  595  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  596  static int hvs_update_recv_data(struct hvsock *hvs)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  597  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  598  	struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  599  	u32 payload_len;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  600  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  601  	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  602  	payload_len = recv_buf->hdr.data_size;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  603  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  604  	if (payload_len > HVS_MTU_SIZE)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  605  		return -EIO;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  606  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  607  	if (payload_len == 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  608  		hvs->vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  609  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  610  	hvs->recv_data_len = payload_len;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  611  	hvs->recv_data_off = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  612  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  613  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  614  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  615  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  616  static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  617  				  size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  618  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  619  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  620  	bool need_refill = !hvs->recv_desc;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  621  	struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  622  	u32 to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  623  	int ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  624  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  625  	if (flags & MSG_PEEK)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  626  		return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  627  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  628  	if (need_refill) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  629  		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  630  		ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  631  		if (ret)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  632  			return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  633  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  634  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  635  	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  636  	to_read = min_t(u32, len, hvs->recv_data_len);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  637  	ret = memcpy_to_msg(msg, recv_buf->data + hvs->recv_data_off, to_read);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  638  	if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  639  		return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  640  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  641  	hvs->recv_data_len -= to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  642  	if (hvs->recv_data_len == 0) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  643  		hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  644  		if (hvs->recv_desc) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  645  			ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  646  			if (ret)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  647  				return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  648  		}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  649  	} else {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  650  		hvs->recv_data_off += to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  651  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  652  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  653  	return to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  654  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  655  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  656  static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  657  				  size_t len)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  658  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  659  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  660  	struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  661  	struct hvs_send_buf *send_buf;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  662  	ssize_t to_write, max_writable;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  663  	ssize_t ret = 0;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  664  	ssize_t bytes_written = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  665  
31113cc83e30924 Himadri Pandya   2019-07-25  666  	BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  667  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  668  	send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  669  	if (!send_buf)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  670  		return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  671  
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  672  	/* Reader(s) could be draining data from the channel as we write.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  673  	 * Maximize bandwidth, by iterating until the channel is found to be
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  674  	 * full.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  675  	 */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  676  	while (len) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  677  		max_writable = hvs_channel_writable_bytes(chan);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  678  		if (!max_writable)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  679  			break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  680  		to_write = min_t(ssize_t, len, max_writable);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  681  		to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  682  		/* memcpy_from_msg is safe for loop as it advances the offsets
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  683  		 * within the message iterator.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  684  		 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  685  		ret = memcpy_from_msg(send_buf->data, msg, to_write);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  686  		if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  687  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  688  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  689  		ret = hvs_send_data(hvs->chan, send_buf, to_write);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  690  		if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  691  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  692  
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  693  		bytes_written += to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  694  		len -= to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  695  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  696  out:
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  697  	/* If any data has been sent, return that */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  698  	if (bytes_written)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  699  		ret = bytes_written;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  700  	kfree(send_buf);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  701  	return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  702  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  703  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  704  static s64 hvs_stream_has_data(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  705  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  706  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  707  	s64 ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  708  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  709  	if (hvs->recv_data_len > 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  710  		return 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  711  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  712  	switch (hvs_channel_readable_payload(hvs->chan)) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  713  	case 1:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  714  		ret = 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  715  		break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  716  	case 0:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  717  		vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  718  		ret = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  719  		break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  720  	default: /* -1 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  721  		ret = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  722  		break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  723  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  724  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  725  	return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  726  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  727  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  728  static s64 hvs_stream_has_space(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  729  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  730  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  731  
cb359b60416701c Sunil Muthuswamy 2019-06-17  732  	return hvs_channel_writable_bytes(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  733  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  734  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  735  static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  736  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  737  	return HVS_MTU_SIZE + 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  738  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  739  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  740  static bool hvs_stream_is_active(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  741  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  742  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  743  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  744  	return hvs->chan != NULL;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  745  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  746  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  747  static bool hvs_stream_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  748  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  749  	/* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0xFFFFFFFF) is
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  750  	 * reserved as ephemeral ports, which are used as the host's ports
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  751  	 * when the host initiates connections.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  752  	 *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  753  	 * Perform this check in the guest so an immediate error is produced
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  754  	 * instead of a timeout.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  755  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  756  	if (port > MAX_HOST_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  757  		return false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  758  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  759  	if (cid == VMADDR_CID_HOST)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  760  		return true;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  761  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  762  	return false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  763  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  764  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  765  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  766  int hvs_notify_poll_in(struct vsock_sock *vsk, size_t target, bool *readable)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  767  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  768  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  769  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  770  	*readable = hvs_channel_readable(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  771  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  772  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  773  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  774  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  775  int hvs_notify_poll_out(struct vsock_sock *vsk, size_t target, bool *writable)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  776  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  777  	*writable = hvs_stream_has_space(vsk) > 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  778  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  779  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  780  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  781  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  782  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  783  int hvs_notify_recv_init(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  784  			 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  785  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  786  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  787  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  788  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  789  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  790  int hvs_notify_recv_pre_block(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  791  			      struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  792  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  793  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  794  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  795  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  796  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  797  int hvs_notify_recv_pre_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  798  				struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  799  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  800  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  801  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  802  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  803  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  804  int hvs_notify_recv_post_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  805  				 ssize_t copied, bool data_read,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  806  				 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  807  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  808  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  809  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  810  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  811  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  812  int hvs_notify_send_init(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  813  			 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  814  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  815  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  816  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  817  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  818  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  819  int hvs_notify_send_pre_block(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  820  			      struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  821  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  822  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  823  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  824  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  825  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  826  int hvs_notify_send_pre_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  827  				struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  828  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  829  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  830  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  831  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  832  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  833  int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  834  				 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  835  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  836  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  837  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  838  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  839  static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  840  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  841  	/* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  842  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  843  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  844  static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  845  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  846  	/* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  847  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  848  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  849  static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  850  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  851  	/* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  852  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  853  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  854  static u64 hvs_get_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  855  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  856  	return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  857  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  858  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  859  static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  860  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  861  	return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  862  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  863  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  864  static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  865  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  866  	return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  867  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  868  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  869  static struct vsock_transport hvs_transport = {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  870  	.get_local_cid            = hvs_get_local_cid,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  871  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  872  	.init                     = hvs_sock_init,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  873  	.destruct                 = hvs_destruct,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  874  	.release                  = hvs_release,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  875  	.connect                  = hvs_connect,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  876  	.shutdown                 = hvs_shutdown,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  877  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  878  	.dgram_bind               = hvs_dgram_bind,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  879  	.dgram_dequeue            = hvs_dgram_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  880  	.dgram_enqueue            = hvs_dgram_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  881  	.dgram_allow              = hvs_dgram_allow,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  882  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  883  	.stream_dequeue           = hvs_stream_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  884  	.stream_enqueue           = hvs_stream_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  885  	.stream_has_data          = hvs_stream_has_data,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  886  	.stream_has_space         = hvs_stream_has_space,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  887  	.stream_rcvhiwat          = hvs_stream_rcvhiwat,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  888  	.stream_is_active         = hvs_stream_is_active,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  889  	.stream_allow             = hvs_stream_allow,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  890  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  891  	.notify_poll_in           = hvs_notify_poll_in,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  892  	.notify_poll_out          = hvs_notify_poll_out,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  893  	.notify_recv_init         = hvs_notify_recv_init,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  894  	.notify_recv_pre_block    = hvs_notify_recv_pre_block,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  895  	.notify_recv_pre_dequeue  = hvs_notify_recv_pre_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  896  	.notify_recv_post_dequeue = hvs_notify_recv_post_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  897  	.notify_send_init         = hvs_notify_send_init,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  898  	.notify_send_pre_block    = hvs_notify_send_pre_block,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  899  	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  900  	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  901  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  902  	.set_buffer_size          = hvs_set_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  903  	.set_min_buffer_size      = hvs_set_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  904  	.set_max_buffer_size      = hvs_set_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  905  	.get_buffer_size          = hvs_get_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  906  	.get_min_buffer_size      = hvs_get_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  907  	.get_max_buffer_size      = hvs_get_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  908  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  909  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  910  static int hvs_probe(struct hv_device *hdev,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  911  		     const struct hv_vmbus_device_id *dev_id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  912  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  913  	struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  914  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  915  	hvs_open_connection(chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  916  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  917  	/* Always return success to suppress the unnecessary error message
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  918  	 * in vmbus_probe(): on error the host will rescind the device in
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  919  	 * 30 seconds and we can do cleanup at that time in
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  920  	 * vmbus_onoffer_rescind().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  921  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  922  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  923  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  924  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  925  static int hvs_remove(struct hv_device *hdev)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  926  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  927  	struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  928  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  929  	vmbus_close(chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  930  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  931  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  932  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  933  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  934  /* This isn't really used. See vmbus_match() and vmbus_probe() */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  935  static const struct hv_vmbus_device_id id_table[] = {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  936  	{},
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  937  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  938  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  939  static struct hv_driver hvs_drv = {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  940  	.name		= "hv_sock",
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  941  	.hvsock		= true,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  942  	.id_table	= id_table,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  943  	.probe		= hvs_probe,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  944  	.remove		= hvs_remove,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  945  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  946  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  947  static int __init hvs_init(void)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  948  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  949  	int ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  950  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  951  	if (vmbus_proto_version < VERSION_WIN10)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  952  		return -ENODEV;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  953  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  954  	ret = vmbus_driver_register(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  955  	if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  956  		return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  957  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  958  	ret = vsock_core_init(&hvs_transport);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  959  	if (ret) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  960  		vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  961  		return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  962  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  963  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  964  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  965  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  966  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  967  static void __exit hvs_exit(void)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  968  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  969  	vsock_core_exit();
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  970  	vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  971  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  972  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  973  module_init(hvs_init);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  974  module_exit(hvs_exit);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  975  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  976  MODULE_DESCRIPTION("Hyper-V Sockets");
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  977  MODULE_VERSION("1.0.0");
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  978  MODULE_LICENSE("GPL");
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  979  MODULE_ALIAS_NETPROTO(PF_VSOCK);

:::::: The code at line 214 was first introduced by commit
:::::: ae0078fcf0a5eb3a8623bfb5f988262e0911fdb9 hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

:::::: TO: Dexuan Cui <decui@microsoft.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: general protection fault in tls_trim_both_msgs
From: syzbot @ 2019-07-28  3:46 UTC (permalink / raw)
  To: ast, aviadye, borisp, bpf, corbet, daniel, davejwatson, davem,
	jakub.kicinski, john.fastabend, kafai, linux-doc, linux-kernel,
	netdev, songliubraving, syzkaller-bugs, yhs
In-Reply-To: <0000000000002b4896058e7abf78@google.com>

syzbot has bisected this bug to:

commit 32857cf57f920cdc03b5095f08febec94cf9c36b
Author: John Fastabend <john.fastabend@gmail.com>
Date:   Fri Jul 19 17:29:18 2019 +0000

     net/tls: fix transition through disconnect with close

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=155064d8600000
start commit:   fde50b96 Add linux-next specific files for 20190726
git tree:       linux-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=175064d8600000
console output: https://syzkaller.appspot.com/x/log.txt?x=135064d8600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4b58274564b354c1
dashboard link: https://syzkaller.appspot.com/bug?extid=0e0fedcad708d12d3032
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14779d64600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1587c842600000

Reported-by: syzbot+0e0fedcad708d12d3032@syzkaller.appspotmail.com
Fixes: 32857cf57f92 ("net/tls: fix transition through disconnect with  
close")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox