Netdev List
 help / color / mirror / Atom feed
* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Pablo Neira Ayuso @ 2019-07-15 18:09 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Laura Garcia, Randy Dunlap, Stephen Rothwell,
	Linux Next Mailing List, Linux Kernel Mailing List, linux-kbuild,
	netdev@vger.kernel.org, Netfilter Development Mailing list
In-Reply-To: <CAK7LNAS0rX_SRXqb=N=Td-DFNWd=PytDFje12gYh2pYNRBVAJA@mail.gmail.com>

On Tue, Jul 16, 2019 at 02:56:09AM +0900, Masahiro Yamada wrote:
> On Tue, Jul 16, 2019 at 2:33 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > On Mon, Jul 15, 2019 at 07:28:04PM +0200, Laura Garcia wrote:
> > > CC'ing netfilter.
> > >
> > > On Mon, Jul 15, 2019 at 6:45 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> > > >
> > > > On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> > > > > Hi all,
> > > > >
> > > > > Please do not add v5.4 material to your linux-next included branches
> > > > > until after v5.3-rc1 has been released.
> > > > >
> > > > > Changes since 20190712:
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
> > > > for include/net/netfilter/nf_tables_offload.h.s:
> > > >
> > > >   CC      include/net/netfilter/nf_tables_offload.h.s
> > [...]
> > > > Should this header file not be tested?
> 
> This means you must endlessly exclude
> headers that include nf_tables.h
> 
> 
> > Yes, it should indeed be added.
> 
> Adding 'header-test-' is the last resort.

OK, so policy now is that all internal headers should compile
standalone, right?

I can have a look and make it compile standalone.

^ permalink raw reply

* Re: [PATCH v3 19/24] vmxnet3: Remove call to memset after dma_alloc_coherent
From: David Miller @ 2019-07-15 18:06 UTC (permalink / raw)
  To: huangfq.daxian; +Cc: doshir, pv-drivers, netdev, linux-kernel
In-Reply-To: <20190715032118.7417-1-huangfq.daxian@gmail.com>

From: Fuqian Huang <huangfq.daxian@gmail.com>
Date: Mon, 15 Jul 2019 11:21:18 +0800

> In commit 518a2f1925c3
> ("dma-mapping: zero memory returned from dma_alloc_*"),
> dma_alloc_coherent has already zeroed the memory.
> So memset is not needed.
> 
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH v3 18/24] hippi: Remove call to memset after pci_alloc_consistent
From: David Miller @ 2019-07-15 18:06 UTC (permalink / raw)
  To: huangfq.daxian; +Cc: jes, linux-hippi, netdev, linux-kernel
In-Reply-To: <20190715031921.7028-1-huangfq.daxian@gmail.com>

From: Fuqian Huang <huangfq.daxian@gmail.com>
Date: Mon, 15 Jul 2019 11:19:21 +0800

> pci_alloc_consistent calls dma_alloc_coherent directly.
> In commit 518a2f1925c3
> ("dma-mapping: zero memory returned from dma_alloc_*"),
> dma_alloc_coherent has already zeroed the memory.
> So memset is not needed.
> 
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH v3 17/24] ethernet: remove redundant memset
From: David Miller @ 2019-07-15 18:06 UTC (permalink / raw)
  To: huangfq.daxian
  Cc: jcliburn, chris.snook, michael.chan, vishal, fugang.duan,
	cooldavid, mlindner, stephen, tariqt, saeedm, leon, jiri, idosch,
	jdmason, manishc, rahulv, GR-Linux-NIC-Dev, chessman, netdev,
	linux-kernel, linux-rdma
In-Reply-To: <20190715031911.6982-1-huangfq.daxian@gmail.com>

From: Fuqian Huang <huangfq.daxian@gmail.com>
Date: Mon, 15 Jul 2019 11:19:11 +0800

> kvzalloc already zeroes the memory during the allocation.
> pci_alloc_consistent calls dma_alloc_coherent directly.
> In commit 518a2f1925c3
> ("dma-mapping: zero memory returned from dma_alloc_*"),
> dma_alloc_coherent has already zeroed the memory.
> So the memset after these function is not needed.
> 
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH v3 02/24] atm: idt77252: Remove call to memset after dma_alloc_coherent
From: David Miller @ 2019-07-15 18:06 UTC (permalink / raw)
  To: huangfq.daxian; +Cc: 3chas3, linux-atm-general, netdev, linux-kernel
In-Reply-To: <20190715031709.6281-1-huangfq.daxian@gmail.com>

From: Fuqian Huang <huangfq.daxian@gmail.com>
Date: Mon, 15 Jul 2019 11:17:09 +0800

> In commit 518a2f1925c3
> ("dma-mapping: zero memory returned from dma_alloc_*"),
> dma_alloc_coherent has already zeroed the memory.
> So memset is not needed.
> 
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>

Applied.

^ permalink raw reply

* [PATCH iproute2 v2] utils: don't match empty strings as prefixes
From: Matteo Croce @ 2019-07-15 18:04 UTC (permalink / raw)
  To: Stephen Hemminger, netdev; +Cc: David Ahern

iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.

This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:

    # ip ''
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever

    # tc ''
    qdisc noqueue 0: dev lo root refcnt 2

    # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
    # ip addr show dev dummy0
    6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
        inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
           valid_lft forever preferred_lft forever

Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 include/utils.h |  2 +-
 lib/utils.c     | 15 ++++++++++-----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 1d9c1127..794d3605 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -198,7 +198,7 @@ int nodev(const char *dev);
 int check_ifname(const char *);
 int get_ifname(char *, const char *);
 const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
-int matches(const char *arg, const char *pattern);
+bool matches(const char *prefix, const char *string);
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
 int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
 
diff --git a/lib/utils.c b/lib/utils.c
index 5da9a478..9ea21fa1 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -871,13 +871,18 @@ const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
 	return name;
 }
 
-int matches(const char *cmd, const char *pattern)
+/* Returns false if 'prefix' is a not empty prefix of 'string'.
+ */
+bool matches(const char *prefix, const char *string)
 {
-	int len = strlen(cmd);
+	if (!*prefix)
+		return true;
+	while (*string && *prefix == *string) {
+		prefix++;
+		string++;
+	}
 
-	if (len > strlen(pattern))
-		return -1;
-	return memcmp(pattern, cmd, len);
+	return !!*prefix;
 }
 
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net v3] net: neigh: fix multiple neigh timer scheduling
From: David Miller @ 2019-07-15 18:04 UTC (permalink / raw)
  To: lorenzo.bianconi; +Cc: netdev, dsahern, marek
In-Reply-To: <552d7c8de6a07e12f7b76791da953e81478138cd.1563134704.git.lorenzo.bianconi@redhat.com>

From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Date: Sun, 14 Jul 2019 23:36:11 +0200

> Neigh timer can be scheduled multiple times from userspace adding
> multiple neigh entries and forcing the neigh timer scheduling passing
> NTF_USE in the netlink requests.
> This will result in a refcount leak and in the following dump stack:
 ...
> Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
> receiving a netlink request with NTF_USE flag set
> 
> Reported-by: Marek Majkowski <marek@cloudflare.com>
> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> ---
> Changes since v2:
> - remove check_timer flag and run neigh_del_timer directly
> Changes since v1:
> - fix compilation errors defining neigh_event_send_check_timer routine

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: Request for backport of 96125bf9985a75db00496dd2bc9249b777d2b19b
From: Loganaden Velvindron @ 2019-07-15 18:01 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAOp4FwSB_FRhpf1H0CdkvfgeYKc53E56yMkQViW4_w_9dY0CVg@mail.gmail.com>

On Fri, Jul 5, 2019 at 6:15 PM Loganaden Velvindron <loganaden@gmail.com> wrote:
>
> Hi folks,
>
> I read the guidelines for LTS/stable.
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>
>
> Although this is not a bugfix, I am humbly submitting a request so
> that commit id
> -- 96125bf9985a75db00496dd2bc9249b777d2b19b Allow 0.0.0.0/8 as a valid
> address range --  is backported to all LTS kernels.
>
> My motivation for such a request is that we need this patch to be as
> widely deployed as possible and as early as possible for interop and
> hopefully move into better utilization of ipv4 addresses space. Hence
> my request for it be added to -stable.
>

Any feedback ?

> Kind regards,
> //Logan

^ permalink raw reply

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Masahiro Yamada @ 2019-07-15 17:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Laura Garcia, Randy Dunlap, Stephen Rothwell,
	Linux Next Mailing List, Linux Kernel Mailing List, linux-kbuild,
	netdev@vger.kernel.org, Netfilter Development Mailing list
In-Reply-To: <20190715173341.zth4na7zekjsesaa@salvia>

On Tue, Jul 16, 2019 at 2:33 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Mon, Jul 15, 2019 at 07:28:04PM +0200, Laura Garcia wrote:
> > CC'ing netfilter.
> >
> > On Mon, Jul 15, 2019 at 6:45 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> > >
> > > On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> > > > Hi all,
> > > >
> > > > Please do not add v5.4 material to your linux-next included branches
> > > > until after v5.3-rc1 has been released.
> > > >
> > > > Changes since 20190712:
> > > >
> > >
> > > Hi,
> > >
> > > I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
> > > for include/net/netfilter/nf_tables_offload.h.s:
> > >
> > >   CC      include/net/netfilter/nf_tables_offload.h.s
> [...]
> > > Should this header file not be tested?

This means you must endlessly exclude
headers that include nf_tables.h


> Yes, it should indeed be added.

Adding 'header-test-' is the last resort.


I had already queued a patch:

git clone -b build-test
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git

commit 9dae5c5fc798e0e970cca4cd1b224dece3ad4766
Author: Masahiro Yamada <yamada.masahiro@socionext.com>
Date:   Mon Jul 15 00:42:56 2019 +0900

    netfilter: nf_tables: split local helpers out to nft_internals.h



If it is OK, I will send it out.



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH v2 1/2] rt2x00usb: fix rx queue hang
From: Kalle Valo @ 2019-07-15 17:53 UTC (permalink / raw)
  To: Soeren Moch
  Cc: Stanislaw Gruszka, Soeren Moch, stable, Helmut Schaa,
	David S. Miller, linux-wireless, netdev, linux-kernel
In-Reply-To: <20190701105314.9707-1-smoch@web.de>

Soeren Moch <smoch@web.de> wrote:

> Since commit ed194d136769 ("usb: core: remove local_irq_save() around
>  ->complete() handler") the handler rt2x00usb_interrupt_rxdone() is
> not running with interrupts disabled anymore. So this completion handler
> is not guaranteed to run completely before workqueue processing starts
> for the same queue entry.
> Be sure to set all other flags in the entry correctly before marking
> this entry ready for workqueue processing. This way we cannot miss error
> conditions that need to be signalled from the completion handler to the
> worker thread.
> Note that rt2x00usb_work_rxdone() processes all available entries, not
> only such for which queue_work() was called.
> 
> This patch is similar to what commit df71c9cfceea ("rt2x00: fix order
> of entry flags modification") did for TX processing.
> 
> This fixes a regression on a RT5370 based wifi stick in AP mode, which
> suddenly stopped data transmission after some period of heavy load. Also
> stopping the hanging hostapd resulted in the error message "ieee80211
> phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
> Other operation modes are probably affected as well, this just was
> the used testcase.
> 
> Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler")
> Cc: stable@vger.kernel.org # 4.20+
> Signed-off-by: Soeren Moch <smoch@web.de>
> Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>

Patch applied to wireless-drivers.git, thanks.

41a531ffa4c5 rt2x00usb: fix rx queue hang

-- 
https://patchwork.kernel.org/patch/11025561/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Michael S. Tsirkin @ 2019-07-15 17:50 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Jason Wang, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190715074416.a3s2i5ausognotbn@steredhat>

On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote:
> On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> > 
> > On 2019/7/12 下午6:00, Stefano Garzarella wrote:
> > > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote:
> > > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote:
> > > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> > > > > > > Hi,
> > > > > > > as Jason suggested some months ago, I looked better at the virtio-net driver to
> > > > > > > understand if we can reuse some parts also in the virtio-vsock driver, since we
> > > > > > > have similar challenges (mergeable buffers, page allocation, small
> > > > > > > packets, etc.).
> > > > > > > 
> > > > > > > Initially, I would add the skbuff in the virtio-vsock in order to re-use
> > > > > > > receive_*() functions.
> > > > > > 
> > > > > > Yes, that will be a good step.
> > > > > > 
> > > > > Okay, I'll go on this way.
> > > > > 
> > > > > > > Then I would move receive_[small, big, mergeable]() and
> > > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, in order to
> > > > > > > call them also from virtio-vsock. I need to do some refactoring (e.g. leave the
> > > > > > > XDP part on the virtio-net driver), but I think it is feasible.
> > > > > > > 
> > > > > > > The idea is to create a virtio-skb.[h,c] where put these functions and a new
> > > > > > > object where stores some attributes needed (e.g. hdr_len ) and status (e.g.
> > > > > > > some fields of struct receive_queue).
> > > > > > 
> > > > > > My understanding is we could be more ambitious here. Do you see any blocker
> > > > > > for reusing virtio-net directly? It's better to reuse not only the functions
> > > > > > but also the logic like NAPI to avoid re-inventing something buggy and
> > > > > > duplicated.
> > > > > > 
> > > > > These are my concerns:
> > > > > - virtio-vsock is not a "net_device", so a lot of code related to
> > > > >    ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, offloading) will be
> > > > >    not used by virtio-vsock.
> > 
> > 
> > Linux support device other than ethernet, so it should not be a problem.
> > 
> > 
> > > > > 
> > > > > - virtio-vsock has a different header. We can consider it as part of
> > > > >    virtio_net payload, but it precludes the compatibility with old hosts. This
> > > > >    was one of the major doubts that made me think about using only the
> > > > >    send/recv skbuff functions, that it shouldn't break the compatibility.
> > 
> > 
> > We can extend the current vnet header helper for it to work for vsock.
> 
> Okay, I'll do it.
> 
> > 
> > 
> > > > > 
> > > > > > > This is an idea of virtio-skb.h that
> > > > > > > I have in mind:
> > > > > > >       struct virtskb;
> > > > > > 
> > > > > > What fields do you want to store in virtskb? It looks to be exist sk_buff is
> > > > > > flexible enough to us?
> > > > > My idea is to store queues information, like struct receive_queue or
> > > > > struct send_queue, and some device attributes (e.g. hdr_len ).
> > 
> > 
> > If you reuse skb or virtnet_info, there is not necessary.
> > 
> 
> Okay.
> 
> > 
> > > > > 
> > > > > > 
> > > > > > >       struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
> > > > > > >       struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
> > > > > > >       struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);
> > > > > > > 
> > > > > > >       int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > > > > > >       int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > > > > > >       int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
> > > > > > > 
> > > > > > > For the Guest->Host path it should be easier, so maybe I can add a
> > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
> > > > > > > of xmit_skb().
> > > > > > 
> > > > > > I may miss something, but I don't see any thing that prevents us from using
> > > > > > xmit_skb() directly.
> > > > > > 
> > > > > Yes, but my initial idea was to make it more parametric and not related to the
> > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> > > > > 'num_buffers' should be handled by the caller.
> > > > > 
> > > > > > > Let me know if you have in mind better names or if I should put these function
> > > > > > > in another place.
> > > > > > > 
> > > > > > > I would like to leave the control part completely separate, so, for example,
> > > > > > > the two drivers will negotiate the features independently and they will call
> > > > > > > the right virtskb_receive_*() function based on the negotiation.
> > > > > > 
> > > > > > If it's one the issue of negotiation, we can simply change the
> > > > > > virtnet_probe() to deal with different devices.
> > > > > > 
> > > > > > 
> > > > > > > I already started to work on it, but before to do more steps and send an RFC
> > > > > > > patch, I would like to hear your opinion.
> > > > > > > Do you think that makes sense?
> > > > > > > Do you see any issue or a better solution?
> > > > > > 
> > > > > > I still think we need to seek a way of adding some codes on virtio-net.c
> > > > > > directly if there's no huge different in the processing of TX/RX. That would
> > > > > > save us a lot time.
> > > > > After the reading of the buffers from the virtqueue I think the process
> > > > > is slightly different, because virtio-net will interface with the network
> > > > > stack, while virtio-vsock will interface with the vsock-core (socket).
> > > > > So the virtio-vsock implements the following:
> > > > > - control flow mechanism to avoid to loose packets, informing the peer
> > > > >    about the amount of memory available in the receive queue using some
> > > > >    fields in the virtio_vsock_hdr
> > > > > - de-multiplexing parsing the virtio_vsock_hdr and choosing the right
> > > > >    socket depending on the port
> > > > > - socket state handling
> > 
> > 
> > I think it's just a branch, for ethernet, go for networking stack. otherwise
> > go for vsock core?
> > 
> 
> Yes, that should work.
> 
> So, I should refactor the functions that can be called also from the vsock
> core, in order to remove "struct net_device *dev" parameter.
> Maybe creating some wrappers for the network stack.
> 
> Otherwise I should create a fake net_device for vsock_core.
> 
> What do you suggest?

Neither.

I think what Jason was saying all along is this:

virtio net doesn't actually lose packets, at least most
of the time. And it actually most of the time
passes all packets to host. So it's possible to use a virtio net
device (possibly with a feature flag that says "does not lose packets,
all packets go to host") and build vsock on top.

and all of this is nice, but don't expect anything easy,
or any quick results.

Also, in a sense it's a missed opportunity: we could cut out a lot
of fat and see just how fast can a protocol that is completely
new and separate from networking stack go.
Instead vsock implementation carries so much baggage from both
networking stack - such as softirq processing - and itself such as
workqueues, global state and crude locking - to the point where
it's actually slower than TCP.

> > 
> > > > > 
> > > > > We can use the virtio-net as transport, but we should add a lot of
> > > > > code to skip "net device" stuff when it is used by the virtio-vsock.
> > 
> > 
> > This could be another choice, but consider it was not transparent to the
> > admin and require new features, we may seek a transparent solution here.
> > 
> > 
> > > > > This could break something in virtio-net, for this reason, I thought to reuse
> > > > > only the send/recv functions starting from the idea to split the virtio-net
> > > > > driver in two parts:
> > > > > a. one with all stuff related to the network stack
> > > > > b. one with the stuff needed to communicate with the host
> > > > > 
> > > > > And use skbuff to communicate between parts. In this way, virtio-vsock
> > > > > can use only the b part.
> > > > > 
> > > > > Maybe we can do this split in a better way, but I'm not sure it is
> > > > > simple.
> > > > > 
> > > > > Thanks,
> > > > > Stefano
> > > > Frankly, skb is a huge structure which adds a lot of
> > > > overhead. I am not sure that using it is such a great idea
> > > > if building a device that does not have to interface
> > > > with the networking stack.
> > 
> > 
> > I believe vsock is mainly used for stream performance not for PPS. So the
> > impact should be minimal. We can use other metadata, just need branch in
> > recv_xxx().
> > 
> 
> Yes, I think stream performance is the case.
> 
> > 
> > > Thanks for the advice!
> > > 
> > > > So I agree with Jason in theory. To clarify, he is basically saying
> > > > current implementation is all wrong, it should be a protocol and we
> > > > should teach networking stack that there are reliable net devices that
> > > > handle just this protocol. We could add a flag in virtio net that
> > > > will say it's such a device.
> > > > 
> > > > Whether it's doable, I don't know, and it's definitely not simple - in
> > > > particular you will have to also re-implement existing devices in these
> > > > terms, and not just virtio - vmware vsock too.
> > 
> > 
> > Merging vsock protocol to exist networking stack could be a long term goal,
> > I believe for the first phase, we can seek to use virtio-net first.
> >
> 
> Yes, I agree.
> 
> > 
> > > > 
> > > > If you want to do a POC you can add a new address family,
> > > > that's easier.
> > > Very interesting!
> > > I agree with you. In this way we can completely split the protocol
> > > logic, from the device.
> > > 
> > > As you said, it will not simple to do, but can be an opportunity to learn
> > > better the Linux networking stack!
> > > I'll try to do a PoC with AF_VSOCK2 that will use the virtio-net.
> > 
> > 
> > I suggest to do this step by step:
> > 
> > 1) use virtio-net but keep some protocol logic
> > 
> > 2) separate protocol logic and merge it to exist Linux networking stack
> 
> Make sense, thanks for the suggestions, I'll try to do these steps!
> 
> Thanks,
> Stefano


An alternative is look at sources of overhead in vsock and get rid of
them, or rewrite it from scratch focusing on performance.


-- 
MST

^ permalink raw reply

* Re: [PATCH] ath10k: work around uninitialized vht_pfr variable
From: Kalle Valo @ 2019-07-15 17:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Arnd Bergmann, Miaoqing Pan, David S. Miller, Rakesh Pillai,
	Brian Norris, Balaji Pothunoori, Wen Gong, Pradeep kumar Chitrapu,
	Sriram R, ath10k, linux-wireless, netdev, linux-kernel,
	clang-built-linux
In-Reply-To: <20190708125050.3689133-1-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> wrote:

> As clang points out, the vht_pfr is assigned to a struct member
> without being initialized in one case:
> 
> drivers/net/wireless/ath/ath10k/mac.c:7528:7: error: variable 'vht_pfr' is used uninitialized whenever 'if' condition
>       is false [-Werror,-Wsometimes-uninitialized]
>                 if (!ath10k_mac_can_set_bitrate_mask(ar, band, mask,
>                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7551:20: note: uninitialized use occurs here
>                 arvif->vht_pfr = vht_pfr;
>                                  ^~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7528:3: note: remove the 'if' if its condition is always true
>                 if (!ath10k_mac_can_set_bitrate_mask(ar, band, mask,
>                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7483:12: note: initialize the variable 'vht_pfr' to silence this warning
>         u8 vht_pfr;
> 
> Add an explicit but probably incorrect initialization here.
> I suspect we want a better fix here, but chose this approach to
> illustrate the issue.
> 
> Fixes: 8b97b055dc9d ("ath10k: fix failure to set multiple fixed rate")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>

Patch applied to wireless-drivers.git, thanks.

ff414f31ce37 ath10k: work around uninitialized vht_pfr variable

-- 
https://patchwork.kernel.org/patch/11034993/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: [PATCH iproute2] utils: don't match empty strings as prefixes
From: Stephen Hemminger @ 2019-07-15 17:37 UTC (permalink / raw)
  To: Matteo Croce; +Cc: netdev, David Ahern
In-Reply-To: <CAGnkfhyyJJR0frmO7Z+bviu6xYnJVitw-G0Nzgv9UQ2PYO1goA@mail.gmail.com>

On Sun, 14 Jul 2019 16:57:54 +0200
Matteo Croce <mcroce@redhat.com> wrote:

> On Wed, Jul 10, 2019 at 1:18 AM Matteo Croce <mcroce@redhat.com> wrote:
> >
> > On Tue, Jul 9, 2019 at 11:38 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:  
> > >
> > > On Tue,  9 Jul 2019 22:40:40 +0200
> > > Matteo Croce <mcroce@redhat.com> wrote:
> > >  
> > > > iproute has an utility function which checks if a string is a prefix for
> > > > another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
> > > > instead of 'address'.
> > > >
> > > > This routine unfortunately considers an empty string as prefix
> > > > of any pattern, leading to undefined behaviour when an empty
> > > > argument is passed to ip:
> > > >
> > > >     # ip ''
> > > >     1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
> > > >         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > >         inet 127.0.0.1/8 scope host lo
> > > >            valid_lft forever preferred_lft forever
> > > >         inet6 ::1/128 scope host
> > > >            valid_lft forever preferred_lft forever
> > > >
> > > >     # tc ''
> > > >     qdisc noqueue 0: dev lo root refcnt 2
> > > >
> > > >     # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
> > > >     # ip addr show dev dummy0
> > > >     6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > > >         link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
> > > >         inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
> > > >            valid_lft forever preferred_lft forever
> > > >
> > > > Rewrite matches() so it takes care of an empty input, and doesn't
> > > > scan the input strings three times: the actual implementation
> > > > does 2 strlen and a memcpy to accomplish the same task.
> > > >
> > > > Signed-off-by: Matteo Croce <mcroce@redhat.com>
> > > > ---
> > > >  include/utils.h |  2 +-
> > > >  lib/utils.c     | 14 +++++++++-----
> > > >  2 files changed, 10 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/include/utils.h b/include/utils.h
> > > > index 927fdc17..f4d12abb 100644
> > > > --- a/include/utils.h
> > > > +++ b/include/utils.h
> > > > @@ -198,7 +198,7 @@ int nodev(const char *dev);
> > > >  int check_ifname(const char *);
> > > >  int get_ifname(char *, const char *);
> > > >  const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
> > > > -int matches(const char *arg, const char *pattern);
> > > > +int matches(const char *prefix, const char *string);
> > > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
> > > >  int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
> > > >
> > > > diff --git a/lib/utils.c b/lib/utils.c
> > > > index be0f11b0..73ce19bb 100644
> > > > --- a/lib/utils.c
> > > > +++ b/lib/utils.c
> > > > @@ -887,13 +887,17 @@ const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
> > > >       return name;
> > > >  }
> > > >
> > > > -int matches(const char *cmd, const char *pattern)
> > > > +/* Check if 'prefix' is a non empty prefix of 'string' */
> > > > +int matches(const char *prefix, const char *string)
> > > >  {
> > > > -     int len = strlen(cmd);
> > > > +     if (!*prefix)
> > > > +             return 1;
> > > > +     while(*string && *prefix == *string) {
> > > > +             prefix++;
> > > > +             string++;
> > > > +     }
> > > >
> > > > -     if (len > strlen(pattern))
> > > > -             return -1;
> > > > -     return memcmp(pattern, cmd, len);
> > > > +     return *prefix;
> > > >  }
> > > >
> > > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits)  
> > >
> > > ERROR: space required before the open parenthesis '('
> > > #134: FILE: lib/utils.c:895:
> > > +       while(*string && *prefix == *string) {
> > >
> > > total: 1 errors, 1 warnings, 30 lines checked
> > >
> > > The empty prefix string is a bug and should not be allowed.
> > > Also return value should be same as old code (yours isn't).
> > >
> > >
> > >  
> >
> > The old return value was the difference between the first pair of
> > bytes, according to the memcmp manpage.
> > All calls only checks if the matches() return value is 0 or not 0:
> >
> > iproute2$ git grep 'matches(' |grep -v -e '== 0' -e '= 0' -e '!matches('
> > include/utils.h:int matches(const char *prefix, const char *string);
> > include/xtables.h:extern void xtables_register_matches(struct
> > xtables_match *, unsigned int);
> > lib/color.c:    if (matches(dup, "-color"))
> > lib/utils.c:int matches(const char *prefix, const char *string)
> > tc/tc.c:                if (matches(argv[0], iter->c))
> >
> > Is it a problem if it returns a non negative value for non matching strings?
> >
> > Regards,
> >
> >
> > --
> > Matteo Croce
> > per aspera ad upstream  
> 
> Hi Stephen,
> 
> should I send a v2 which keeps the old behaviour, even if noone checks
> for all the values?
> Just to clarify, the old behaviour of matches(cmd, pattern) was:
> 
> -1 if len(cmd) > len(pattern)
> 0 if pattern is equal to cmd
> 0 if pattern starts with cmd
> < 0 if pattern is alphabetically lower than cmd
> > 0 if pattern is alphabetically higher than cmd  
> 
> Regards,

Maybe time to make matches() into a boolean since that is how it is used.

^ permalink raw reply

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Pablo Neira Ayuso @ 2019-07-15 17:33 UTC (permalink / raw)
  To: Laura Garcia
  Cc: Randy Dunlap, Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, linux-kbuild, Masahiro Yamada,
	netdev@vger.kernel.org, Netfilter Development Mailing list
In-Reply-To: <CAF90-WirEMg7arNOTmo+tyJ20rt_zeN=nr0OO6Qk0Ss8J4QrUA@mail.gmail.com>

On Mon, Jul 15, 2019 at 07:28:04PM +0200, Laura Garcia wrote:
> CC'ing netfilter.
> 
> On Mon, Jul 15, 2019 at 6:45 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> >
> > On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> > > Hi all,
> > >
> > > Please do not add v5.4 material to your linux-next included branches
> > > until after v5.3-rc1 has been released.
> > >
> > > Changes since 20190712:
> > >
> >
> > Hi,
> >
> > I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
> > for include/net/netfilter/nf_tables_offload.h.s:
> >
> >   CC      include/net/netfilter/nf_tables_offload.h.s
[...]
> > Should this header file not be tested?

Yes, it should indeed be added.

^ permalink raw reply

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Laura Garcia @ 2019-07-15 17:28 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, linux-kbuild, Masahiro Yamada,
	netdev@vger.kernel.org, Netfilter Development Mailing list
In-Reply-To: <ccb5b818-c191-2d9e-311f-b2c79b7f6823@infradead.org>

CC'ing netfilter.

On Mon, Jul 15, 2019 at 6:45 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> > Hi all,
> >
> > Please do not add v5.4 material to your linux-next included branches
> > until after v5.3-rc1 has been released.
> >
> > Changes since 20190712:
> >
>
> Hi,
>
> I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
> for include/net/netfilter/nf_tables_offload.h.s:
>
>   CC      include/net/netfilter/nf_tables_offload.h.s
> In file included from ./../include/net/netfilter/nf_tables_offload.h:5:0,
>                  from <command-line>:0:
> ../include/net/netfilter/nf_tables.h: In function ‘nft_gencursor_next’:
> ../include/net/netfilter/nf_tables.h:1223:14: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return net->nft.gencursor + 1 == 1 ? 1 : 0;
>               ^~~
>               nf
> In file included from ../include/linux/kernel.h:11:0,
>                  from ../include/net/flow_offload.h:4,
>                  from ./../include/net/netfilter/nf_tables_offload.h:4,
>                  from <command-line>:0:
> ../include/net/netfilter/nf_tables.h: In function ‘nft_genmask_cur’:
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:261:17: note: in definition of macro ‘__READ_ONCE’
>   union { typeof(x) __val; char __c[1]; } __u;   \
>                  ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:263:22: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size(&(x), __u.__c, sizeof(x));  \
>                       ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:263:42: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size(&(x), __u.__c, sizeof(x));  \
>                                           ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:265:30: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
>                               ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:265:50: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
>                                                   ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> make[2]: *** [../scripts/Makefile.build:304: include/net/netfilter/nf_tables_offload.h.s] Error 1
>
>
> Should this header file not be tested?
>
> thanks.
> --
> ~Randy

^ permalink raw reply

* Re: INFO: task hung in unregister_netdevice_notifier (3)
From: Oliver Hartkopp @ 2019-07-15 17:16 UTC (permalink / raw)
  To: syzbot, davem, linux-can, linux-kernel, mkl, netdev,
	syzkaller-bugs, Kirill Tkhai
In-Reply-To: <000000000000d018ea058d9c46e3@google.com>

Hello all,

On 14.07.19 06:07, syzbot wrote:
> syzbot has found a reproducer for the following crash on:

the internal users of the CAN networking subsystem like CAN_BCM and 
CAN_RAW hold a number of CAN identifier subscriptions ('filters') for 
CAN netdevices (only type ARPHRD_CAN) in their socket data structures.

The per-socket netdevice notifier is used to manage the ad-hoc removal 
of these filters at netdevice removal time.

What I can see in the console output at

https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000

seems to be a race between an unknown register_netdevice_notifier() call 
("A") and the unregister_netdevice_notifier() ("B") likely invoked by 
bcm_release() ("C"):

[ 1047.294207][ T1049]  schedule+0xa8/0x270
[ 1047.318401][ T1049]  rwsem_down_write_slowpath+0x70a/0xf70
[ 1047.324114][ T1049]  ? downgrade_write+0x3c0/0x3c0
[ 1047.438644][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.443483][ T1049]  ? lock_acquire+0x190/0x410
[ 1047.448191][ T1049]  ? unregister_netdevice_notifier+0x7e/0x390
[ 1047.547227][ T1049]  down_write+0x13c/0x150
[ 1047.579535][ T1049]  ? down_write+0x13c/0x150
[ 1047.584106][ T1049]  ? __down_timeout+0x2d0/0x2d0
[ 1047.635356][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.640721][ T1049]  unregister_netdevice_notifier+0x7e/0x390  <- "B"
[ 1047.646667][ T1049]  ? __sock_release+0x89/0x280
[ 1047.709126][ T1049]  ? register_netdevice_notifier+0x630/0x630 <- "A"
[ 1047.715203][ T1049]  ? __kasan_check_write+0x14/0x20
[ 1047.775138][ T1049]  bcm_release+0x93/0x5e0                    <- "C"
[ 1047.795337][ T1049]  __sock_release+0xce/0x280
[ 1047.829016][ T1049]  sock_close+0x1e/0x30

The question to me is now:

Is the problem located in an (un)register_netdevice_notifier race OR is 
it generally a bad idea to call unregister_netdevice_notifier() in a 
sock release?

I've never seen that kind of problem in the wild. But if it would be the 
latter case wouldn't it be the same problem when someone unloads the 
kernel module at the 'wrong' time?

In commit 328fbe747ad46 ("net: Close race between {un, 
}register_netdevice_notifier() and setup_net()/cleanup_net()") Kirill 
Tkhai reviewed the calling site in CAN_RAW raw_release() which points to 
the same situation. Therefore added him to the recipient list.

Should down_write() be replaced with something like 
rwsem_down_write_slowpath()??

Regards,
Oliver

> HEAD commit:    a2d79c71 Merge tag 'for-5.3/io_uring-20190711' of 
> git://gi..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3539b1747f03988e
> dashboard link: 
> https://syzkaller.appspot.com/bug?extid=0f1827363a305f74996f
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1765c52fa00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com
> 
> INFO: task syz-executor.4:9527 blocked for more than 143 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.4  D28136  9527   9356 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.2:9528 blocked for more than 145 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.2  D28136  9528   9354 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 
> 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 
> c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.0:9529 blocked for more than 147 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.0  D28136  9529   9353 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.5:9533 blocked for more than 148 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.5  D28136  9533   9358 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 
> 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 
> c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.1:9534 blocked for more than 148 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.1  D28136  9534   9359 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.3:9535 blocked for more than 150 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.3  D28136  9535   9351 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1049:
>   #0: 00000000ede263b0 (rcu_read_lock){....}, at: 
> debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5257
> 1 lock held by rsyslogd/9208:
>   #0: 00000000da20b59a (&f->f_pos_lock){+.+.}, at: 
> __fdget_pos+0xee/0x110 fs/file.c:801
> 2 locks held by getty/9298:
>   #0: 00000000e9efae0d (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000007287a12 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9299:
>   #0: 00000000ad0733b0 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000094dd5193 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9300:
>   #0: 00000000692c340f (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 00000000538c7d7d (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9301:
>   #0: 00000000116ea6c7 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 00000000a908a9f7 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9302:
>   #0: 0000000042704f01 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000041cc8671 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9303:
>   #0: 000000001ef3b293 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 000000008b703302 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9304:
>   #0: 0000000095601bb0 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
> 

^ permalink raw reply

* Re: [PATCH bpf 0/5] bpf: allow wide (u64) aligned loads for some fields of bpf_sock_addr
From: Andrii Nakryiko @ 2019-07-15 17:16 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

On Mon, Jul 15, 2019 at 9:40 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> When fixing selftests by adding support for wide stores, Yonghong
> reported that he had seen some examples where clang generates
> single u64 loads for two adjacent u32s as well:
> http://lore.kernel.org/netdev/a66c937f-94c0-eaf8-5b37-8587d66c0c62@fb.com
>
> Let's support aligned u64 reads for some bpf_sock_addr fields
> as well.
>
> (This can probably wait for bpf-next, I'll defer to Younhong and the
> maintainers.)
>
> Cc: Yonghong Song <yhs@fb.com>
>
> Stanislav Fomichev (5):
>   bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok
>   bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and
>     msg_src_ip6
>   selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.c
>   selftests/bpf: add selftests for wide loads
>   bpf: sync bpf.h to tools/
>

LGTM!

For the series:

Acked-by: Andrii Narkyiko <andriin@fb.com>

>  include/linux/filter.h                        |  2 +-
>  include/uapi/linux/bpf.h                      |  4 +-
>  net/core/filter.c                             | 24 ++++--
>  tools/include/uapi/linux/bpf.h                |  4 +-
>  .../selftests/bpf/verifier/wide_access.c      | 73 +++++++++++++++++++
>  .../selftests/bpf/verifier/wide_store.c       | 36 ---------
>  6 files changed, 95 insertions(+), 48 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/verifier/wide_access.c
>  delete mode 100644 tools/testing/selftests/bpf/verifier/wide_store.c
>
> --
> 2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Randy Dunlap @ 2019-07-15 16:43 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, linux-kbuild, Masahiro Yamada,
	netdev@vger.kernel.org
In-Reply-To: <20190715144848.4cc41e07@canb.auug.org.au>

On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Please do not add v5.4 material to your linux-next included branches
> until after v5.3-rc1 has been released.
> 
> Changes since 20190712:
> 

Hi,

I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
for include/net/netfilter/nf_tables_offload.h.s:

  CC      include/net/netfilter/nf_tables_offload.h.s
In file included from ./../include/net/netfilter/nf_tables_offload.h:5:0,
                 from <command-line>:0:
../include/net/netfilter/nf_tables.h: In function ‘nft_gencursor_next’:
../include/net/netfilter/nf_tables.h:1223:14: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return net->nft.gencursor + 1 == 1 ? 1 : 0;
              ^~~
              nf
In file included from ../include/linux/kernel.h:11:0,
                 from ../include/net/flow_offload.h:4,
                 from ./../include/net/netfilter/nf_tables_offload.h:4,
                 from <command-line>:0:
../include/net/netfilter/nf_tables.h: In function ‘nft_genmask_cur’:
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:261:17: note: in definition of macro ‘__READ_ONCE’
  union { typeof(x) __val; char __c[1]; } __u;   \
                 ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:263:22: note: in definition of macro ‘__READ_ONCE’
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                      ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:263:42: note: in definition of macro ‘__READ_ONCE’
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                                          ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:265:30: note: in definition of macro ‘__READ_ONCE’
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                              ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:265:50: note: in definition of macro ‘__READ_ONCE’
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                                                  ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
make[2]: *** [../scripts/Makefile.build:304: include/net/netfilter/nf_tables_offload.h.s] Error 1


Should this header file not be tested?

thanks.
-- 
~Randy

^ permalink raw reply

* [PATCH bpf 5/5] bpf: sync bpf.h to tools/
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Update bpf_sock_addr comments to indicate support for 8-byte reads
from user_ip6 and msg_src_ip6.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f506c68b2612..1f61374fcf81 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3245,7 +3245,7 @@ struct bpf_sock_addr {
 	__u32 user_ip4;		/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 user_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 user_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__u32 user_port;	/* Allows 4-byte read and write.
@@ -3257,7 +3257,7 @@ struct bpf_sock_addr {
 	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 4/5] selftests/bpf: add selftests for wide loads
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Mirror existing wide store tests with wide loads. The only significant
difference is expected error string.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/verifier/wide_access.c      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/bpf/verifier/wide_access.c b/tools/testing/selftests/bpf/verifier/wide_access.c
index 3ac97328432f..ccade9312d21 100644
--- a/tools/testing/selftests/bpf/verifier/wide_access.c
+++ b/tools/testing/selftests/bpf/verifier/wide_access.c
@@ -34,3 +34,40 @@ BPF_SOCK_ADDR_STORE(msg_src_ip6, 3, REJECT,
 		    "invalid bpf_context access off=56 size=8"),
 
 #undef BPF_SOCK_ADDR_STORE
+
+#define BPF_SOCK_ADDR_LOAD(field, off, res, err) \
+{ \
+	"wide load from bpf_sock_addr." #field "[" #off "]", \
+	.insns = { \
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, \
+		    offsetof(struct bpf_sock_addr, field[off])), \
+	BPF_MOV64_IMM(BPF_REG_0, 1), \
+	BPF_EXIT_INSN(), \
+	}, \
+	.result = res, \
+	.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, \
+	.expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, \
+	.errstr = err, \
+}
+
+/* user_ip6[0] is u64 aligned */
+BPF_SOCK_ADDR_LOAD(user_ip6, 0, ACCEPT,
+		   NULL),
+BPF_SOCK_ADDR_LOAD(user_ip6, 1, REJECT,
+		   "invalid bpf_context access off=12 size=8"),
+BPF_SOCK_ADDR_LOAD(user_ip6, 2, ACCEPT,
+		   NULL),
+BPF_SOCK_ADDR_LOAD(user_ip6, 3, REJECT,
+		   "invalid bpf_context access off=20 size=8"),
+
+/* msg_src_ip6[0] is _not_ u64 aligned */
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 0, REJECT,
+		   "invalid bpf_context access off=44 size=8"),
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 1, ACCEPT,
+		   NULL),
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 2, REJECT,
+		   "invalid bpf_context access off=52 size=8"),
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 3, REJECT,
+		   "invalid bpf_context access off=56 size=8"),
+
+#undef BPF_SOCK_ADDR_LOAD
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 3/5] selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.c
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Move the file and rename internal BPF_SOCK_ADDR define to
BPF_SOCK_ADDR_STORE. This selftest will be extended in the next commit
with the wide loads.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/verifier/wide_access.c      | 36 +++++++++++++++++++
 .../selftests/bpf/verifier/wide_store.c       | 36 -------------------
 2 files changed, 36 insertions(+), 36 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/wide_access.c
 delete mode 100644 tools/testing/selftests/bpf/verifier/wide_store.c

diff --git a/tools/testing/selftests/bpf/verifier/wide_access.c b/tools/testing/selftests/bpf/verifier/wide_access.c
new file mode 100644
index 000000000000..3ac97328432f
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/wide_access.c
@@ -0,0 +1,36 @@
+#define BPF_SOCK_ADDR_STORE(field, off, res, err) \
+{ \
+	"wide store to bpf_sock_addr." #field "[" #off "]", \
+	.insns = { \
+	BPF_MOV64_IMM(BPF_REG_0, 1), \
+	BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, \
+		    offsetof(struct bpf_sock_addr, field[off])), \
+	BPF_EXIT_INSN(), \
+	}, \
+	.result = res, \
+	.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, \
+	.expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, \
+	.errstr = err, \
+}
+
+/* user_ip6[0] is u64 aligned */
+BPF_SOCK_ADDR_STORE(user_ip6, 0, ACCEPT,
+		    NULL),
+BPF_SOCK_ADDR_STORE(user_ip6, 1, REJECT,
+		    "invalid bpf_context access off=12 size=8"),
+BPF_SOCK_ADDR_STORE(user_ip6, 2, ACCEPT,
+		    NULL),
+BPF_SOCK_ADDR_STORE(user_ip6, 3, REJECT,
+		    "invalid bpf_context access off=20 size=8"),
+
+/* msg_src_ip6[0] is _not_ u64 aligned */
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 0, REJECT,
+		    "invalid bpf_context access off=44 size=8"),
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 1, ACCEPT,
+		    NULL),
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 2, REJECT,
+		    "invalid bpf_context access off=52 size=8"),
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 3, REJECT,
+		    "invalid bpf_context access off=56 size=8"),
+
+#undef BPF_SOCK_ADDR_STORE
diff --git a/tools/testing/selftests/bpf/verifier/wide_store.c b/tools/testing/selftests/bpf/verifier/wide_store.c
deleted file mode 100644
index 8fe99602ded4..000000000000
--- a/tools/testing/selftests/bpf/verifier/wide_store.c
+++ /dev/null
@@ -1,36 +0,0 @@
-#define BPF_SOCK_ADDR(field, off, res, err) \
-{ \
-	"wide store to bpf_sock_addr." #field "[" #off "]", \
-	.insns = { \
-	BPF_MOV64_IMM(BPF_REG_0, 1), \
-	BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, \
-		    offsetof(struct bpf_sock_addr, field[off])), \
-	BPF_EXIT_INSN(), \
-	}, \
-	.result = res, \
-	.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, \
-	.expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, \
-	.errstr = err, \
-}
-
-/* user_ip6[0] is u64 aligned */
-BPF_SOCK_ADDR(user_ip6, 0, ACCEPT,
-	      NULL),
-BPF_SOCK_ADDR(user_ip6, 1, REJECT,
-	      "invalid bpf_context access off=12 size=8"),
-BPF_SOCK_ADDR(user_ip6, 2, ACCEPT,
-	      NULL),
-BPF_SOCK_ADDR(user_ip6, 3, REJECT,
-	      "invalid bpf_context access off=20 size=8"),
-
-/* msg_src_ip6[0] is _not_ u64 aligned */
-BPF_SOCK_ADDR(msg_src_ip6, 0, REJECT,
-	      "invalid bpf_context access off=44 size=8"),
-BPF_SOCK_ADDR(msg_src_ip6, 1, ACCEPT,
-	      NULL),
-BPF_SOCK_ADDR(msg_src_ip6, 2, REJECT,
-	      "invalid bpf_context access off=52 size=8"),
-BPF_SOCK_ADDR(msg_src_ip6, 3, REJECT,
-	      "invalid bpf_context access off=56 size=8"),
-
-#undef BPF_SOCK_ADDR
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 2/5] bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and msg_src_ip6
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Add explicit check for u64 loads of user_ip6 and msg_src_ip6 and
update the comment.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/uapi/linux/bpf.h |  4 ++--
 net/core/filter.c        | 12 +++++++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6f68438aa4ed..81be929b89fc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3248,7 +3248,7 @@ struct bpf_sock_addr {
 	__u32 user_ip4;		/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 user_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 user_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__u32 user_port;	/* Allows 4-byte read and write.
@@ -3260,7 +3260,7 @@ struct bpf_sock_addr {
 	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
diff --git a/net/core/filter.c b/net/core/filter.c
index c5983ddb1a9f..0f6854ccf894 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6884,9 +6884,19 @@ static bool sock_addr_is_valid_access(int off, int size,
 	case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4):
 	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
 				msg_src_ip6[3]):
-		/* Only narrow read access allowed for now. */
 		if (type == BPF_READ) {
 			bpf_ctx_record_field_size(info, size_default);
+
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   user_ip6))
+				return true;
+
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   msg_src_ip6))
+				return true;
+
 			if (!bpf_ctx_narrow_access_ok(off, size, size_default))
 				return false;
 		} else {
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 1/5] bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok to indicate
that it can be used for both loads and stores.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/filter.h |  2 +-
 net/core/filter.c      | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6d944369ca87..ff65d22cf336 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -747,7 +747,7 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 	return size <= size_default && (size & (size - 1)) == 0;
 }
 
-#define bpf_ctx_wide_store_ok(off, size, type, field)			\
+#define bpf_ctx_wide_access_ok(off, size, type, field)			\
 	(size == sizeof(__u64) &&					\
 	off >= offsetof(type, field) &&					\
 	off + sizeof(__u64) <= offsetofend(type, field) &&		\
diff --git a/net/core/filter.c b/net/core/filter.c
index 47f6386fb17a..c5983ddb1a9f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6890,14 +6890,14 @@ static bool sock_addr_is_valid_access(int off, int size,
 			if (!bpf_ctx_narrow_access_ok(off, size, size_default))
 				return false;
 		} else {
-			if (bpf_ctx_wide_store_ok(off, size,
-						  struct bpf_sock_addr,
-						  user_ip6))
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   user_ip6))
 				return true;
 
-			if (bpf_ctx_wide_store_ok(off, size,
-						  struct bpf_sock_addr,
-						  msg_src_ip6))
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   msg_src_ip6))
 				return true;
 
 			if (size != size_default)
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 0/5] bpf: allow wide (u64) aligned loads for some fields of bpf_sock_addr
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song

When fixing selftests by adding support for wide stores, Yonghong
reported that he had seen some examples where clang generates
single u64 loads for two adjacent u32s as well:
http://lore.kernel.org/netdev/a66c937f-94c0-eaf8-5b37-8587d66c0c62@fb.com

Let's support aligned u64 reads for some bpf_sock_addr fields
as well.

(This can probably wait for bpf-next, I'll defer to Younhong and the
maintainers.)

Cc: Yonghong Song <yhs@fb.com>

Stanislav Fomichev (5):
  bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok
  bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and
    msg_src_ip6
  selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.c
  selftests/bpf: add selftests for wide loads
  bpf: sync bpf.h to tools/

 include/linux/filter.h                        |  2 +-
 include/uapi/linux/bpf.h                      |  4 +-
 net/core/filter.c                             | 24 ++++--
 tools/include/uapi/linux/bpf.h                |  4 +-
 .../selftests/bpf/verifier/wide_access.c      | 73 +++++++++++++++++++
 .../selftests/bpf/verifier/wide_store.c       | 36 ---------
 6 files changed, 95 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/wide_access.c
 delete mode 100644 tools/testing/selftests/bpf/verifier/wide_store.c

-- 
2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* Re: [PATCH net v3] net: neigh: fix multiple neigh timer scheduling
From: David Ahern @ 2019-07-15 16:32 UTC (permalink / raw)
  To: Lorenzo Bianconi, davem; +Cc: netdev, marek
In-Reply-To: <552d7c8de6a07e12f7b76791da953e81478138cd.1563134704.git.lorenzo.bianconi@redhat.com>

On 7/14/19 3:36 PM, Lorenzo Bianconi wrote:
> Neigh timer can be scheduled multiple times from userspace adding
> multiple neigh entries and forcing the neigh timer scheduling passing
> NTF_USE in the netlink requests.
> This will result in a refcount leak and in the following dump stack:
> 

...

> 
> Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
> receiving a netlink request with NTF_USE flag set
> 
> Reported-by: Marek Majkowski <marek@cloudflare.com>
> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> ---
> Changes since v2:
> - remove check_timer flag and run neigh_del_timer directly
> Changes since v1:
> - fix compilation errors defining neigh_event_send_check_timer routine
> ---
>  net/core/neighbour.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@gmail.com>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox