Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/9] rcu: Add support for consolidated-RCU reader checking (v3)
From: Paul E. McKenney @ 2019-07-17  0:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716220205.GB172157@google.com>

On Tue, Jul 16, 2019 at 06:02:05PM -0400, Joel Fernandes wrote:
> On Tue, Jul 16, 2019 at 11:53:03AM -0700, Paul E. McKenney wrote:
> [snip]
> > > > A few more things below.
> > > > > ---
> > > > >  include/linux/rculist.h  | 28 ++++++++++++++++++++-----
> > > > >  include/linux/rcupdate.h |  7 +++++++
> > > > >  kernel/rcu/Kconfig.debug | 11 ++++++++++
> > > > >  kernel/rcu/update.c      | 44 ++++++++++++++++++++++++----------------
> > > > >  4 files changed, 67 insertions(+), 23 deletions(-)
> > > > > 
> > > > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > > > > index e91ec9ddcd30..1048160625bb 100644
> > > > > --- a/include/linux/rculist.h
> > > > > +++ b/include/linux/rculist.h
> > > > > @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
> > > > >   */
> > > > >  #define list_next_rcu(list)	(*((struct list_head __rcu **)(&(list)->next)))
> > > > >  
> > > > > +/*
> > > > > + * Check during list traversal that we are within an RCU reader
> > > > > + */
> > > > > +
> > > > > +#ifdef CONFIG_PROVE_RCU_LIST
> > > > 
> > > > This new Kconfig option is OK temporarily, but unless there is reason to
> > > > fear malfunction that a few weeks of rcutorture, 0day, and -next won't
> > > > find, it would be better to just use CONFIG_PROVE_RCU.  The overall goal
> > > > is to reduce the number of RCU knobs rather than grow them, must though
> > > > history might lead one to believe otherwise.  :-/
> > > 
> > > If you want, we can try to drop this option and just use PROVE_RCU however I
> > > must say there may be several warnings that need to be fixed in a short
> > > period of time (even a few weeks may be too short) considering the 1000+
> > > uses of RCU lists.
> > Do many people other than me build with CONFIG_PROVE_RCU?  If so, then
> > that would be a good reason for a temporary CONFIG_PROVE_RCU_LIST,
> > as in going away in a release or two once the warnings get fixed.
> 
> PROVE_RCU is enabled by default with PROVE_LOCKING, so it is used quite
> heavilty.
> 
> > > But I don't mind dropping it and it may just accelerate the fixing up of all
> > > callers.
> > 
> > I will let you decide based on the above question.  But if you have
> > CONFIG_PROVE_RCU_LIST, as noted below, it needs to depend on RCU_EXPERT.
> 
> Ok, will make it depend. But yes for temporary purpose, I will leave it as a
> config and remove it later.

Very good, thank you!  Plus you got another ack.  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH bpf 1/2] selftests/bpf: fix test_verifier/test_maps make dependencies
From: Stanislav Fomichev @ 2019-07-17  0:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann,
	Alexei Starovoitov, Kernel Team, Ilya Leoshkevich,
	Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <CAEf4BzY7NYZSGuRHVye_CerZ1BBBLsDyOT2ar5sBXsPGy8g0xA@mail.gmail.com>

On 07/16, Andrii Nakryiko wrote:
> On Tue, Jul 16, 2019 at 3:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 07/16, Andrii Nakryiko wrote:
> > > On Tue, Jul 16, 2019 at 12:55 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 07/16, Andrii Nakryiko wrote:
> > > > > e46fc22e60a4 ("selftests/bpf: make directory prerequisites order-only")
> > > > > exposed existing problem in Makefile for test_verifier and test_maps tests:
> > > > > their dependency on auto-generated header file with a list of all tests wasn't
> > > > > recorded explicitly. This patch fixes these issues.
> > > > Why adding it explicitly fixes it? At least for test_verifier, we have
> > > > the following rule:
> > > >
> > > >         test_verifier.c: $(VERIFIER_TESTS_H)
> > > >
> > > > And there should be implicit/builtin test_verifier -> test_verifier.c
> > > > dependency rule.
> > > >
> > > > Same for maps, I guess:
> > > >
> > > >         $(OUTPUT)/test_maps: map_tests/*.c
> > > >         test_maps.c: $(MAP_TESTS_H)
> > > >
> > > > So why is it not working as is? What I'm I missing?
> > >
> > > I don't know exactly why it's not working, but it's clearly because of
> > > that. It's the only difference between how test_progs are set up,
> > > which didn't break, and test_maps/test_verifier, which did.
> > >
> > > Feel free to figure it out through a maze of Makefiles why it didn't
> > > work as expected, but this definitely fixed a breakage (at least for
> > > me).
> > Agreed on not wasting time. I took a brief look (with make -qp) and I
> > don't have any clue.
> 
> Oh, -qp is cool, noted :)
> 
> >
> > By default implicit matching doesn't work:
> > # makefile (from 'Makefile', line 261)
> > /linux/tools/testing/selftests/bpf/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
> > /linux/tools/testing/selftests/bpf/test_maps: map_tests/sk_storage_map.c /linux/tools/testing/selftests/bpf/test_stub.o /linux/tools/testing/selftests/bpf/libbpf.a
> > #  Implicit rule search has not been done.
> > #  File is an intermediate prerequisite.
> > #  Modification time never checked.
> > #  File has not been updated.
> > # variable set hash-table stats:
> > # Load=1/32=3%, Rehash=0, Collisions=0/2=0%
> >
> > If I comment out the following line:
> > $(TEST_GEN_PROGS): $(OUTPUT)/test_stub.o $(BPFOBJ)
> >
> > Then it works:
> > # makefile (from 'Makefile', line 261)
> > /linux/tools/testing/selftests/bpf/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
> > /linux/tools/testing/selftests/bpf/test_maps: test_maps.c map_tests/sk_storage_map.c
> > #  Implicit rule search has been done.
> > #  Implicit/static pattern stem: 'test_maps'
> > #  File is an intermediate prerequisite.
> > #  File does not exist.
> > #  File has not been updated.
> > # variable set hash-table stats:
> > # Load=1/32=3%, Rehash=0, Collisions=0/2=0%
> > #  recipe to execute (from '../lib.mk', line 138):
> >         $(LINK.c) $^ $(LDLIBS) -o $@
> >
> > It's because "File is an intermediate prerequisite.", but I
> > don't see how it's is a intermediate prerequisite for anything...
> 
> Well, it's "File is an intermediate prerequisite." in both cases, so I
> don't know if that's it.
> Makefiles is not my strong suit, honestly, and definitely not an area
> of passion, so no idea
> 
> >
> >
> > One other optional suggestion I have to your second patch: maybe drop all
> > those dependencies on the directories altogether? Why not do the following
> > instead, for example (same for test_progs/test_maps)?
> 
> Some of those _TEST_DIR's are dependencies of multiple targets, so
> you'd need to replicate that `mkdir -p $@` in multiple places, which
> is annoying.
Agreed, but one single "mkdir -p $@" might be better than having three
lines that define XXX_TEST_DIR and then add a target that mkdir's it.
Up to you, I can give it a try once your changes are in; the
Makefile becomes hairier by the day, thx for cleaning it up a bit :-)

> But I also don't think we need to worry about creating them, because
> there is always at least one test (otherwise those tests are useless
> anyways) in those directories, so we might as well just remove those
> dependencies, I guess.
We probably care about them for "make -C tools/selftests O=$PWD/xyz" case
where OUTPUT would point to a fresh/clean directory.

> TBH, those explicit dependencies are good to specify anyways, so I
> think this fix is good. Second patch just makes the layout of
> dependencies similar, so it's easier to spot differences like this
> one.
> 
> As usual, I'll let Alexei and Daniel decide which one to apply (or if
> we need to take some other approach to fixing this).
I agree, both of your changes look good. I was just trying to understand
what's happening because I was assuming implicit rules to always kick
in.

> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index 1296253b3422..c2d087ce6d4b 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -277,12 +277,9 @@ VERIFIER_TESTS_H := $(OUTPUT)/verifier/tests.h
> >  test_verifier.c: $(VERIFIER_TESTS_H)
> >  $(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
> >
> > -VERIFIER_TESTS_DIR = $(OUTPUT)/verifier
> > -$(VERIFIER_TESTS_DIR):
> > -       mkdir -p $@
> > -
> >  VERIFIER_TEST_FILES := $(wildcard verifier/*.c)
> > -$(OUTPUT)/verifier/tests.h: $(VERIFIER_TEST_FILES) | $(VERIFIER_TESTS_DIR)
> > +$(OUTPUT)/verifier/tests.h: $(VERIFIER_TEST_FILES)
> > +       mkdir -p $(dir $@)
> >         $(shell ( cd verifier/; \
> >                   echo '/* Generated header, do not edit */'; \
> >                   echo '#ifdef FILL_ARRAY'; \

^ permalink raw reply

* Re: [PATCH bpf 1/2] selftests/bpf: fix test_verifier/test_maps make dependencies
From: Andrii Nakryiko @ 2019-07-17  0:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann,
	Alexei Starovoitov, Kernel Team, Ilya Leoshkevich,
	Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <20190717001457.GD14834@mini-arch>

On Tue, Jul 16, 2019 at 5:15 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 07/16, Andrii Nakryiko wrote:
> > On Tue, Jul 16, 2019 at 3:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 07/16, Andrii Nakryiko wrote:
> > > > On Tue, Jul 16, 2019 at 12:55 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > >
> > > > > On 07/16, Andrii Nakryiko wrote:
> > > > > > e46fc22e60a4 ("selftests/bpf: make directory prerequisites order-only")
> > > > > > exposed existing problem in Makefile for test_verifier and test_maps tests:
> > > > > > their dependency on auto-generated header file with a list of all tests wasn't
> > > > > > recorded explicitly. This patch fixes these issues.
> > > > > Why adding it explicitly fixes it? At least for test_verifier, we have
> > > > > the following rule:
> > > > >
> > > > >         test_verifier.c: $(VERIFIER_TESTS_H)
> > > > >
> > > > > And there should be implicit/builtin test_verifier -> test_verifier.c
> > > > > dependency rule.
> > > > >
> > > > > Same for maps, I guess:
> > > > >
> > > > >         $(OUTPUT)/test_maps: map_tests/*.c
> > > > >         test_maps.c: $(MAP_TESTS_H)
> > > > >
> > > > > So why is it not working as is? What I'm I missing?
> > > >
> > > > I don't know exactly why it's not working, but it's clearly because of
> > > > that. It's the only difference between how test_progs are set up,
> > > > which didn't break, and test_maps/test_verifier, which did.
> > > >
> > > > Feel free to figure it out through a maze of Makefiles why it didn't
> > > > work as expected, but this definitely fixed a breakage (at least for
> > > > me).
> > > Agreed on not wasting time. I took a brief look (with make -qp) and I
> > > don't have any clue.
> >
> > Oh, -qp is cool, noted :)
> >
> > >
> > > By default implicit matching doesn't work:
> > > # makefile (from 'Makefile', line 261)
> > > /linux/tools/testing/selftests/bpf/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
> > > /linux/tools/testing/selftests/bpf/test_maps: map_tests/sk_storage_map.c /linux/tools/testing/selftests/bpf/test_stub.o /linux/tools/testing/selftests/bpf/libbpf.a
> > > #  Implicit rule search has not been done.
> > > #  File is an intermediate prerequisite.
> > > #  Modification time never checked.
> > > #  File has not been updated.
> > > # variable set hash-table stats:
> > > # Load=1/32=3%, Rehash=0, Collisions=0/2=0%
> > >
> > > If I comment out the following line:
> > > $(TEST_GEN_PROGS): $(OUTPUT)/test_stub.o $(BPFOBJ)
> > >
> > > Then it works:
> > > # makefile (from 'Makefile', line 261)
> > > /linux/tools/testing/selftests/bpf/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
> > > /linux/tools/testing/selftests/bpf/test_maps: test_maps.c map_tests/sk_storage_map.c
> > > #  Implicit rule search has been done.
> > > #  Implicit/static pattern stem: 'test_maps'
> > > #  File is an intermediate prerequisite.
> > > #  File does not exist.
> > > #  File has not been updated.
> > > # variable set hash-table stats:
> > > # Load=1/32=3%, Rehash=0, Collisions=0/2=0%
> > > #  recipe to execute (from '../lib.mk', line 138):
> > >         $(LINK.c) $^ $(LDLIBS) -o $@
> > >
> > > It's because "File is an intermediate prerequisite.", but I
> > > don't see how it's is a intermediate prerequisite for anything...
> >
> > Well, it's "File is an intermediate prerequisite." in both cases, so I
> > don't know if that's it.
> > Makefiles is not my strong suit, honestly, and definitely not an area
> > of passion, so no idea
> >
> > >
> > >
> > > One other optional suggestion I have to your second patch: maybe drop all
> > > those dependencies on the directories altogether? Why not do the following
> > > instead, for example (same for test_progs/test_maps)?
> >
> > Some of those _TEST_DIR's are dependencies of multiple targets, so
> > you'd need to replicate that `mkdir -p $@` in multiple places, which
> > is annoying.
> Agreed, but one single "mkdir -p $@" might be better than having three
> lines that define XXX_TEST_DIR and then add a target that mkdir's it.
> Up to you, I can give it a try once your changes are in; the
> Makefile becomes hairier by the day, thx for cleaning it up a bit :-)
>
> > But I also don't think we need to worry about creating them, because
> > there is always at least one test (otherwise those tests are useless
> > anyways) in those directories, so we might as well just remove those
> > dependencies, I guess.
> We probably care about them for "make -C tools/selftests O=$PWD/xyz" case
> where OUTPUT would point to a fresh/clean directory.

Oh, yes, out-of-tree builds, forgot about that, so yeah, we need that.

>
> > TBH, those explicit dependencies are good to specify anyways, so I
> > think this fix is good. Second patch just makes the layout of
> > dependencies similar, so it's easier to spot differences like this
> > one.
> >
> > As usual, I'll let Alexei and Daniel decide which one to apply (or if
> > we need to take some other approach to fixing this).
> I agree, both of your changes look good. I was just trying to understand
> what's happening because I was assuming implicit rules to always kick
> in.
>
> > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > index 1296253b3422..c2d087ce6d4b 100644
> > > --- a/tools/testing/selftests/bpf/Makefile
> > > +++ b/tools/testing/selftests/bpf/Makefile
> > > @@ -277,12 +277,9 @@ VERIFIER_TESTS_H := $(OUTPUT)/verifier/tests.h
> > >  test_verifier.c: $(VERIFIER_TESTS_H)
> > >  $(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
> > >
> > > -VERIFIER_TESTS_DIR = $(OUTPUT)/verifier
> > > -$(VERIFIER_TESTS_DIR):
> > > -       mkdir -p $@
> > > -
> > >  VERIFIER_TEST_FILES := $(wildcard verifier/*.c)
> > > -$(OUTPUT)/verifier/tests.h: $(VERIFIER_TEST_FILES) | $(VERIFIER_TESTS_DIR)
> > > +$(OUTPUT)/verifier/tests.h: $(VERIFIER_TEST_FILES)
> > > +       mkdir -p $(dir $@)
> > >         $(shell ( cd verifier/; \
> > >                   echo '/* Generated header, do not edit */'; \
> > >                   echo '#ifdef FILL_ARRAY'; \

^ permalink raw reply

* Re: [bpf-next RFC 2/6] tcp: add skb-less helpers to retrieve SYN cookie
From: Petar Penkov @ 2019-07-17  0:23 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Networking, bpf, davem, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Stanislav Fomichev, Petar Penkov
In-Reply-To: <CACAyw99Umy6gaAu1DFTgemRXpZWmxeTSeCZDwdHWzLWeG8Ur3Q@mail.gmail.com>

On Tue, Jul 16, 2019 at 4:35 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Tue, 16 Jul 2019 at 01:27, Petar Penkov <ppenkov.kernel@gmail.com> wrote:
> >
> > From: Petar Penkov <ppenkov@google.com>
> >
> > This patch allows generation of a SYN cookie before an SKB has been
> > allocated, as is the case at XDP.
> >
> > Signed-off-by: Petar Penkov <ppenkov@google.com>
> > ---
> >  include/net/tcp.h    | 11 ++++++
> >  net/ipv4/tcp_input.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
> >  net/ipv4/tcp_ipv4.c  |  8 +++++
> >  net/ipv6/tcp_ipv6.c  |  8 +++++
> >  4 files changed, 106 insertions(+)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index cca3c59b98bf..a128e22c0d5d 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -414,6 +414,17 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
> >                        int estab, struct tcp_fastopen_cookie *foc);
> >  const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
> >
> > +/*
> > + *     BPF SKB-less helpers
> > + */
> > +u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
> > +                        struct tcphdr *tch, u32 *cookie);
> > +u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
> > +                        struct tcphdr *tch, u32 *cookie);
> > +u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
> > +                     const struct tcp_request_sock_ops *af_ops,
> > +                     struct sock *sk, void *iph, struct tcphdr *tch,
> > +                     u32 *cookie);
> >  /*
> >   *     TCP v4 functions exported for the inet6 API
> >   */
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 8892df6de1d4..1406d7e0953c 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -3782,6 +3782,52 @@ static void smc_parse_options(const struct tcphdr *th,
> >  #endif
> >  }
> >
> > +/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
> > + * value on success.
> > + *
> > + * Invoked for BPF SYN cookie generation, so th should be a SYN.
> > + */
> > +static u16 tcp_parse_mss_option(const struct net *net, const struct tcphdr *th,
> > +                               u16 user_mss)
>
> net seems unused?

Ah, good catch! Will remove this in the v2.

>
> > +{
> > +       const unsigned char *ptr = (const unsigned char *)(th + 1);
> > +       int length = (th->doff * 4) - sizeof(struct tcphdr);
> > +       u16 mss = 0;
> > +
> > +       while (length > 0) {
> > +               int opcode = *ptr++;
> > +               int opsize;
> > +
> > +               switch (opcode) {
> > +               case TCPOPT_EOL:
> > +                       return mss;
> > +               case TCPOPT_NOP:        /* Ref: RFC 793 section 3.1 */
> > +                       length--;
> > +                       continue;
> > +               default:
> > +                       if (length < 2)
> > +                               return mss;
> > +                       opsize = *ptr++;
> > +                       if (opsize < 2) /* "silly options" */
> > +                               return mss;
> > +                       if (opsize > length)
> > +                               return mss;     /* fail on partial options */
> > +                       if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
> > +                               u16 in_mss = get_unaligned_be16(ptr);
> > +
> > +                               if (in_mss) {
> > +                                       if (user_mss && user_mss < in_mss)
> > +                                               in_mss = user_mss;
> > +                                       mss = in_mss;
> > +                               }
> > +                       }
> > +                       ptr += opsize - 2;
> > +                       length -= opsize;
> > +               }
> > +       }
> > +       return mss;
> > +}
> > +
> >  /* Look for tcp options. Normally only called on SYN and SYNACK packets.
> >   * But, this can also be called on packets in the established flow when
> >   * the fast version below fails.
> > @@ -6464,6 +6510,39 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
> >         }
> >  }
> >
> > +u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
> > +                     const struct tcp_request_sock_ops *af_ops,
> > +                     struct sock *sk, void *iph, struct tcphdr *th,
> > +                     u32 *cookie)
> > +{
> > +       u16 mss = 0;
> > +#ifdef CONFIG_SYN_COOKIES
> > +       bool is_v4 = rsk_ops->family == AF_INET;
> > +       struct tcp_sock *tp = tcp_sk(sk);
> > +
> > +       if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
> > +           !inet_csk_reqsk_queue_is_full(sk))
> > +               return 0;
> > +
> > +       if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
> > +               return 0;
> > +
> > +       if (sk_acceptq_is_full(sk)) {
> > +               NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> > +               return 0;
> > +       }
> > +
> > +       mss = tcp_parse_mss_option(sock_net(sk), th, tp->rx_opt.user_mss);
> > +       if (!mss)
> > +               mss = af_ops->mss_clamp;
> > +
> > +       tcp_synq_overflow(sk);
> > +       *cookie = is_v4 ? __cookie_v4_init_sequence(iph, th, &mss)
> > +                       : __cookie_v6_init_sequence(iph, th, &mss);
> > +#endif
> > +       return mss;
> > +}
> > +
> >  int tcp_conn_request(struct request_sock_ops *rsk_ops,
> >                      const struct tcp_request_sock_ops *af_ops,
> >                      struct sock *sk, struct sk_buff *skb)
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index d57641cb3477..0e06e59784bd 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1515,6 +1515,14 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
> >         return sk;
> >  }
> >
> > +u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
> > +                        struct tcphdr *tch, u32 *cookie)
> > +{
> > +       return tcp_get_syncookie(&tcp_request_sock_ops,
> > +                                &tcp_request_sock_ipv4_ops, sk, iph, tch,
> > +                                cookie);
> > +}
> > +
> >  /* The socket must have it's spinlock held when we get
> >   * here, unless it is a TCP_LISTEN socket.
> >   *
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index d56a9019a0fe..ce46cdba54bc 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -1058,6 +1058,14 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
> >         return sk;
> >  }
> >
> > +u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
> > +                        struct tcphdr *tch, u32 *cookie)
> > +{
> > +       return tcp_get_syncookie(&tcp6_request_sock_ops,
> > +                                &tcp_request_sock_ipv6_ops, sk, iph, tch,
> > +                                cookie);
> > +}
> > +
> >  static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
> >  {
> >         if (skb->protocol == htons(ETH_P_IP))
> > --
> > 2.22.0.510.g264f2c817a-goog
> >
>
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply

* Re: [PATCH net v3 1/7] net/rds: Give fr_state a chance to transition to FRMR_IS_FREE
From: santosh.shilimkar @ 2019-07-17  0:26 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <491db13c-3843-b57a-c9c5-9c7e7c18381a@oracle.com>

On 7/16/19 3:28 PM, Gerd Rausch wrote:
> In the context of FRMR (ib_frmr.c):
> 
> Memory regions make it onto the "clean_list" via "rds_ib_flush_mr_pool",
> after the memory region has been posted for invalidation via
> "rds_ib_post_inv".
> 
> At that point in time, "fr_state" may still be in state "FRMR_IS_INUSE",
> since the only place where "fr_state" transitions to "FRMR_IS_FREE"
> is in "rds_ib_mr_cqe_handler", which is triggered by a tasklet.
> 
> So in case we notice that "fr_state != FRMR_IS_FREE" (see below),
> we wait for "fr_inv_done" to trigger with a maximum of 10msec.
> Then we check again, and only put the memory region onto the drop_list
> (via "rds_ib_free_frmr") in case the situation remains unchanged.
> 
> This avoids the problem of memory-regions bouncing between "clean_list"
> and "drop_list" before they even have a chance to be properly invalidated.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Thanks for the update.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net v3 2/7] net/rds: Get rid of "wait_clean_list_grace" and add locking
From: santosh.shilimkar @ 2019-07-17  0:26 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <3e608430-4b96-4c25-6593-4479131bb904@oracle.com>

On 7/16/19 3:28 PM, Gerd Rausch wrote:
> Waiting for activity on the "clean_list" to quiesce is no substitute
> for proper locking.
> 
> We can have multiple threads competing for "llist_del_first"
> via "rds_ib_reuse_mr", and a single thread competing
> for "llist_del_all" and "llist_del_first" via "rds_ib_flush_mr_pool".
> 
> Since "llist_del_first" depends on "list->first->next" not to change
> in the midst of the operation, simply waiting for all current calls
> to "rds_ib_reuse_mr" to quiesce across all CPUs is woefully inadequate:
> 
> By the time "wait_clean_list_grace" is done iterating over all CPUs to see
> that there is no concurrent caller to "rds_ib_reuse_mr", a new caller may
> have just shown up on the first CPU.
> 
> Furthermore, <linux/llist.h> explicitly calls out the need for locking:
>   * Cases where locking is needed:
>   * If we have multiple consumers with llist_del_first used in one consumer,
>   * and llist_del_first or llist_del_all used in other consumers,
>   * then a lock is needed.
> 
> Also, while at it, drop the unused "pool" parameter
> from "list_to_llist_nodes".
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-17  0:27 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Eric Dumazet, Networking, bpf, davem, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Stanislav Fomichev, Petar Penkov
In-Reply-To: <CACAyw9_5+3cznRspLJ2ZDcK22keLVtQQHbQOypSs4sx-F=ajow@mail.gmail.com>

Thank you for the reviews!

On Tue, Jul 16, 2019 at 4:56 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Tue, 16 Jul 2019 at 08:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > +             return -EINVAL;
> > > +
> > > +     if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
> > > +             return -EINVAL;
> > > +
> > > +     if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
> > > +             return -EINVAL;
> > > +
> > > +     if (!th->syn || th->ack || th->fin || th->rst)
> > > +             return -EINVAL;
> > > +
> > > +     switch (sk->sk_family) {
> >
> > This is strange, because a dual stack listener will have sk->sk_family set to AF_INET6.
> >
> > What really matters here is if the packet is IPv4 or IPv6.
> >
> > So you need to look at iph->version instead.
> >
> > Then look if the socket family allows this packet to be processed
> > (For example AF_INET6 sockets might prevent IPv4 packets, see sk->sk_ipv6only )

This makes a lot of sense, thanks Eric, will update!
>
> Does this apply for (the existing) tcp_check_syn_cookie as well?

I think we will probably have to update the check there, too.
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply

* Re: [PATCH net v3 3/7] net/rds: Wait for the FRMR_IS_FREE (or FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV
From: santosh.shilimkar @ 2019-07-17  0:27 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <59304e3c-e15f-6a3b-a8d2-63de370ed847@oracle.com>

On 7/16/19 3:29 PM, Gerd Rausch wrote:
> In order to:
> 1) avoid a silly bouncing between "clean_list" and "drop_list"
>     triggered by function "rds_ib_reg_frmr" as it is releases frmr
>     regions whose state is not "FRMR_IS_FREE" right away.
> 
> 2) prevent an invalid access error in a race from a pending
>     "IB_WR_LOCAL_INV" operation with a teardown ("dma_unmap_sg", "put_page")
>     and de-registration ("ib_dereg_mr") of the corresponding
>     memory region.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Thanks for updated version.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net v3 4/7] net/rds: Fix NULL/ERR_PTR inconsistency
From: santosh.shilimkar @ 2019-07-17  0:27 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <0b9b1d80-0be7-a2ba-82da-cdceda467fc8@oracle.com>



On 7/16/19 3:29 PM, Gerd Rausch wrote:
> Make function "rds_ib_try_reuse_ibmr" return NULL in case
> memory region could not be allocated, since callers
> simply check if the return value is not NULL.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net v3 5/7] net/rds: Set fr_state only to FRMR_IS_FREE if IB_WR_LOCAL_INV had been successful
From: santosh.shilimkar @ 2019-07-17  0:28 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <78bfebd7-c445-bd41-c5c5-d49e6ed51230@oracle.com>



On 7/16/19 3:29 PM, Gerd Rausch wrote:
> Fix a bug where fr_state first goes to FRMR_IS_STALE, because of a failure
> of operation IB_WR_LOCAL_INV, but then gets set back to "FRMR_IS_FREE"
> uncoditionally, even though the operation failed.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net v3 7/7] net/rds: Initialize ic->i_fastreg_wrs upon allocation
From: santosh.shilimkar @ 2019-07-17  0:29 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <94bc1575-1772-3619-e8d5-b74463308e0f@oracle.com>



On 7/16/19 3:29 PM, Gerd Rausch wrote:
> Otherwise, if an IB connection is torn down before "rds_ib_setup_qp"
> is called, the value of "ic->i_fastreg_wrs" is still at zero
> (as it wasn't initialized by "rds_ib_setup_qp").
> Consequently "rds_ib_conn_path_shutdown" will spin forever,
> waiting for it to go back to "RDS_IB_DEFAULT_FR_WR",
> which of course will never happen as there are no
> outstanding work requests.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net v3 6/7] net/rds: Keep track of and wait for FRWR segments in use upon shutdown
From: santosh.shilimkar @ 2019-07-17  0:28 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <28ded44a-bce9-8632-d7e8-fe843140658e@oracle.com>

On 7/16/19 3:29 PM, Gerd Rausch wrote:
> Since "rds_ib_free_frmr" and "rds_ib_free_frmr_list" simply put
> the FRMR memory segments on the "drop_list" or "free_list",
> and it is the job of "rds_ib_flush_mr_pool" to reap those entries
> by ultimately issuing a "IB_WR_LOCAL_INV" work-request,
> we need to trigger and then wait for all those memory segments
> attached to a particular connection to be fully released before
> we can move on to release the QP, CQ, etc.
> 
> So we make "rds_ib_conn_path_shutdown" wait for one more
> atomic_t called "i_fastreg_inuse_count" that keeps track of how
> many FRWR memory segments are out there marked "FRMR_IS_INUSE"
> (and also wake_up rds_ib_ring_empty_wait, as they go away).
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Alexei Starovoitov @ 2019-07-17  1:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiong Wang, Andrii Nakryiko, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, bpf, Networking, oss-drivers,
	Yonghong Song
In-Reply-To: <20190716151223.7208c033@cakuba.netronome.com>

On Tue, Jul 16, 2019 at 3:12 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Tue, 16 Jul 2019 09:17:03 -0700, Alexei Starovoitov wrote:
> > I don't think we have a test for such 'dead prog only due to verifier walk'
> > situation. I wonder what happens :)
>
> FWIW we do have verifier and BTF self tests for dead code removal
> of entire subprogs! :)

Thanks! Indeed.

^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Alexei Starovoitov @ 2019-07-17  1:24 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Adrian Ratiu, Alexei Starovoitov, bpf,
	Brendan Gregg, connoro, Daniel Borkmann, duyuchao, Ingo Molnar,
	jeffv, Karim Yaghmour, kernel-team, linux-kselftest,
	Manali Shukla, Manjo Raja Rao, Martin KaFai Lau, Masami Hiramatsu,
	Matt Mullins, Michal Gregorczyk, Michal Gregorczyk,
	Mohammad Husain, namhyung, namhyung, netdev, paul.chaignon,
	primiano, Qais Yousef, Shuah Khan, Song Liu, Srinivas Ramana,
	Steven Rostedt, Tamir Carmeli, Yonghong Song
In-Reply-To: <20190716235500.GA199237@google.com>

On Tue, Jul 16, 2019 at 07:55:00PM -0400, Joel Fernandes wrote:
> On Tue, Jul 16, 2019 at 06:41:50PM -0400, Joel Fernandes wrote:
> > On Tue, Jul 16, 2019 at 03:26:52PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Jul 16, 2019 at 05:30:50PM -0400, Joel Fernandes wrote:
> > > > 
> > > > I also thought about the pinning idea before, but we also want to add support
> > > > for not just raw tracepoints, but also regular tracepoints (events if you
> > > > will). I am hesitant to add a new BPF API just for creating regular
> > > > tracepoints and then pinning those as well.
> > > 
> > > and they should be done through the pinning as well.
> > 
> > Hmm ok, I will give it some more thought.
> 
> I think I can make the new BPF API + pinning approach work, I will try to
> work on something like this and post it soon.
> 
> Also, I had a question below if you don't mind taking a look:
> 
> thanks Alexei!
> 
> > > > I don't see why a new bpf node for a trace event is a bad idea, really.
> > > 
> > > See the patches for kprobe/uprobe FD-based api and the reasons behind it.
> > > tldr: text is racy, doesn't scale, poor security, etc.
> > 
> > Is it possible to use perf without CAP_SYS_ADMIN and control security at the
> > per-event level? We are selective about who can access which event, using
> > selinux. That's how our ftrace-based tracers work. Its fine grained per-event
> > control. That's where I was going with the tracefs approach since we get that
> > granularity using the file system.

android's choice of selinux is not a factor in deciding kernel apis.
It's completely separate discusion wether disallowing particular tracepoints
for given user make sense at all.
Just because you can hack it in via selinux blocking particular
/sys/debug/tracing/ directory and convince yourself that it's somehow
makes android more secure. It doesn't mean that all new api should fit
into this model.
I think allowing one tracepoint and disallowing another is pointless
from security point of view. Tracing bpf program can do bpf_probe_read
of anything.


^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Alexei Starovoitov @ 2019-07-17  1:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, Adrian Ratiu, Alexei Starovoitov,
	bpf, Brendan Gregg, connoro, Daniel Borkmann, duyuchao,
	Ingo Molnar, jeffv, Karim Yaghmour, kernel-team, linux-kselftest,
	Manali Shukla, Manjo Raja Rao, Martin KaFai Lau, Masami Hiramatsu,
	Matt Mullins, Michal Gregorczyk, Michal Gregorczyk,
	Mohammad Husain, namhyung, namhyung, netdev, paul.chaignon,
	primiano, Qais Yousef, Shuah Khan, Song Liu, Srinivas Ramana,
	Tamir Carmeli, Yonghong Song
In-Reply-To: <20190716183117.77b3ed49@gandalf.local.home>

On Tue, Jul 16, 2019 at 06:31:17PM -0400, Steven Rostedt wrote:
> On Tue, 16 Jul 2019 17:30:50 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > I don't see why a new bpf node for a trace event is a bad idea, really.
> > tracefs is how we deal with trace events on Android. We do it in production
> > systems. This is a natural extension to that and fits with the security model
> > well.
> 
> What I would like to see is a way to have BPF inject data into the
> ftrace ring buffer directly. There's a bpf_trace_printk() that I find a
> bit of a hack (especially since it hooks into trace_printk() which is
> only for debugging purposes). Have a dedicated bpf ftrace ring
> buffer event that can be triggered is what I am looking for. Then comes
> the issue of what ring buffer to place it in, as ftrace can have
> multiple ring buffer instances. But these instances are defined by the
> tracefs instances directory. Having a way to associate a bpf program to
> a specific event in a specific tracefs directory could allow for ways to
> trigger writing into the correct ftrace buffer.

bpf can write anything (including full skb) into perf ring buffer.
I don't think there is a use case yet to write into ftrace ring buffer.
But I can be convinced otherwise :)

> But looking over the patches, I see what Alexei means that there's no
> overlap with ftrace and these patches except for the tracefs directory
> itself (which is part of the ftrace infrastructure). And the trace
> events are technically part of the ftrace infrastructure too. I see the
> tracefs interface being used, but I don't see how the bpf programs
> being added affect the ftrace ring buffer or other parts of ftrace. And
> I'm guessing that's what is confusing Alexei.

yep.
What I really like to see some day is proper integration of real ftrace
and bpf. So that bpf progs can be invoked from some of the ftrace machinery.
And the other way around.
Like I'd love to be able to start ftracing all functions from bpf
and stop it from bpf.
The use case is kernel debugging. I'd like to examine a bunch of condition
and start ftracing the execution. Then later decide wether this collection
of ip addresses is interesting to analyze and post process it quickly
while still inside bpf program.

^ permalink raw reply

* RE: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller
From: Andy Duan @ 2019-07-17  1:32 UTC (permalink / raw)
  To: Sven Van Asbroeck
  Cc: David S . Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAGngYiUb5==QSM1-oa4bSeqhGyoaTw_dWjygLo=0X60eX=wQhQ@mail.gmail.com>

From: Sven Van Asbroeck <thesven73@gmail.com> Sent: Tuesday, July 16, 2019 9:19 PM
> Hi Andy,
> 
> On Mon, Jul 15, 2019 at 10:02 PM Andy Duan <fugang.duan@nxp.com>
> wrote:
> >
> > the phylib already can handle mii bus reset and phy device reset
> 
> That's a great suggestion, thank you !! I completely overlooked that code.
> What will happen to the legacy phy reset code in fec? Are there many users
> left?

Yes, so the old legacy code is kept there. But it is better to clean up all if 
there have enough boards to verify them.

^ permalink raw reply

* Re: [PATCH bpf 1/2] selftests/bpf: fix test_verifier/test_maps make dependencies
From: Alexei Starovoitov @ 2019-07-17  1:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Andrii Nakryiko, bpf, Networking,
	Daniel Borkmann, Alexei Starovoitov, Kernel Team,
	Ilya Leoshkevich, Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <CAEf4BzZQ9MRTNH434=oyxBjQQXWEfjMV4nU14-=LfvERafwRKw@mail.gmail.com>

On Tue, Jul 16, 2019 at 5:22 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> > Makefile becomes hairier by the day, thx for cleaning it up a bit :-)
> >
> > > But I also don't think we need to worry about creating them, because
> > > there is always at least one test (otherwise those tests are useless
> > > anyways) in those directories, so we might as well just remove those
> > > dependencies, I guess.
> > We probably care about them for "make -C tools/selftests O=$PWD/xyz" case
> > where OUTPUT would point to a fresh/clean directory.
>
> Oh, yes, out-of-tree builds, forgot about that, so yeah, we need that.

Anyhow the patches fixed the issue I was seeing.
Hence both applied.

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: fix perf_buffer on s390
From: Alexei Starovoitov @ 2019-07-17  1:43 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Ilya Leoshkevich, bpf, Networking, gor, Heiko Carstens
In-Reply-To: <CAEf4BzbWVJcQ2pN9UwYagtBpgoL+8Q+DMwwiUt1iCMciH8YUKA@mail.gmail.com>

On Tue, Jul 16, 2019 at 10:42 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Jul 16, 2019 at 5:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >
> > perf_buffer test fails for exactly the same reason test_attach_probe
> > used to fail: different nanosleep syscall kprobe name.
> >
> > Reuse the test_attach_probe fix.
> >
> > Fixes: ee5cf82ce04a ("selftests/bpf: test perf buffer API")
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > ---
>
> Thanks for the fix!
>
> Acked-by: Andrii Nakryiko <andriin@fb.com>

Applied. Thanks!

^ permalink raw reply

* [PATCH] gve: replace kfree with kvfree
From: Chuhong Yuan @ 2019-07-17  2:05 UTC (permalink / raw)
  Cc: Catherine Sullivan, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

Variables allocated by kvzalloc should not be freed by kfree.
Because they may be allocated by vmalloc.
So we replace kfree with kvfree here.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/google/gve/gve_main.c | 22 +++++++++++-----------
 drivers/net/ethernet/google/gve/gve_rx.c   |  4 ++--
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 24f16e3368cd..10b8e9720c32 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -232,7 +232,7 @@ static int gve_alloc_notify_blocks(struct gve_priv *priv)
 abort_with_msix_enabled:
 	pci_disable_msix(priv->pdev);
 abort_with_msix_vectors:
-	kfree(priv->msix_vectors);
+	kvfree(priv->msix_vectors);
 	priv->msix_vectors = NULL;
 	return err;
 }
@@ -256,7 +256,7 @@ static void gve_free_notify_blocks(struct gve_priv *priv)
 	priv->ntfy_blocks = NULL;
 	free_irq(priv->msix_vectors[priv->mgmt_msix_idx].vector, priv);
 	pci_disable_msix(priv->pdev);
-	kfree(priv->msix_vectors);
+	kvfree(priv->msix_vectors);
 	priv->msix_vectors = NULL;
 }
 
@@ -445,12 +445,12 @@ static int gve_alloc_rings(struct gve_priv *priv)
 	return 0;
 
 free_rx:
-	kfree(priv->rx);
+	kvfree(priv->rx);
 	priv->rx = NULL;
 free_tx_queue:
 	gve_tx_free_rings(priv);
 free_tx:
-	kfree(priv->tx);
+	kvfree(priv->tx);
 	priv->tx = NULL;
 	return err;
 }
@@ -500,7 +500,7 @@ static void gve_free_rings(struct gve_priv *priv)
 			gve_remove_napi(priv, ntfy_idx);
 		}
 		gve_tx_free_rings(priv);
-		kfree(priv->tx);
+		kvfree(priv->tx);
 		priv->tx = NULL;
 	}
 	if (priv->rx) {
@@ -509,7 +509,7 @@ static void gve_free_rings(struct gve_priv *priv)
 			gve_remove_napi(priv, ntfy_idx);
 		}
 		gve_rx_free_rings(priv);
-		kfree(priv->rx);
+		kvfree(priv->rx);
 		priv->rx = NULL;
 	}
 }
@@ -592,9 +592,9 @@ static void gve_free_queue_page_list(struct gve_priv *priv,
 		gve_free_page(&priv->pdev->dev, qpl->pages[i],
 			      qpl->page_buses[i], gve_qpl_dma_dir(priv, id));
 
-	kfree(qpl->page_buses);
+	kvfree(qpl->page_buses);
 free_pages:
-	kfree(qpl->pages);
+	kvfree(qpl->pages);
 	priv->num_registered_pages -= qpl->num_entries;
 }
 
@@ -635,7 +635,7 @@ static int gve_alloc_qpls(struct gve_priv *priv)
 free_qpls:
 	for (j = 0; j <= i; j++)
 		gve_free_queue_page_list(priv, j);
-	kfree(priv->qpls);
+	kvfree(priv->qpls);
 	return err;
 }
 
@@ -644,12 +644,12 @@ static void gve_free_qpls(struct gve_priv *priv)
 	int num_qpls = gve_num_tx_qpls(priv) + gve_num_rx_qpls(priv);
 	int i;
 
-	kfree(priv->qpl_cfg.qpl_id_map);
+	kvfree(priv->qpl_cfg.qpl_id_map);
 
 	for (i = 0; i < num_qpls; i++)
 		gve_free_queue_page_list(priv, i);
 
-	kfree(priv->qpls);
+	kvfree(priv->qpls);
 }
 
 /* Use this to schedule a reset when the device is capable of continuing
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index c1aeabd1c594..1914b8350da7 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -35,7 +35,7 @@ static void gve_rx_free_ring(struct gve_priv *priv, int idx)
 
 	gve_unassign_qpl(priv, rx->data.qpl->id);
 	rx->data.qpl = NULL;
-	kfree(rx->data.page_info);
+	kvfree(rx->data.page_info);
 
 	slots = rx->data.mask + 1;
 	bytes = sizeof(*rx->data.data_ring) * slots;
@@ -168,7 +168,7 @@ static int gve_rx_alloc_ring(struct gve_priv *priv, int idx)
 			  rx->q_resources, rx->q_resources_bus);
 	rx->q_resources = NULL;
 abort_filled:
-	kfree(rx->data.page_info);
+	kvfree(rx->data.page_info);
 abort_with_slots:
 	bytes = sizeof(*rx->data.data_ring) * slots;
 	dma_free_coherent(hdev, bytes, rx->data.data_ring, rx->data.data_bus);
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v5] net: netfilter: Fix rpfilter dropping vrf packets by mistake
From: linmiaohe @ 2019-07-17  2:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso, davem@davemloft.net
  Cc: kadlec@blackhole.kfki.hu, fw@strlen.de, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, netfilter-devel@vger.kernel.org,
	coreteam@netfilter.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Mingfangsen

> On Tue, Jul 17, 2019 at 19:17:36PM +0000, Pablo wrote:
> > On Tue, Jul 02, 2019 at 03:59:36AM +0000, Miaohe Lin wrote:
> > When firewalld is enabled with ipv4/ipv6 rpfilter, vrf
> > ipv4/ipv6 packets will be dropped. Vrf device will pass through 
> > netfilter hook twice. One with enslaved device and another one with l3 
> > master device. So in device may dismatch witch out device because out 
> > device is always enslaved device.So failed with the check of the 
> > rpfilter and drop the packets by mistake.
>
> Applied to nf.git, thanks.

Many thanks. It's really a longterm stuff. Thanks for your
patience. Have a nice day!

Best wishes.

^ permalink raw reply

* Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Alexei Starovoitov @ 2019-07-17  2:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Petar Penkov, netdev, bpf, davem, ast, daniel, edumazet, lmb, sdf,
	Petar Penkov
In-Reply-To: <b6ef24b0-0415-c67d-5a66-21f1c2530414@gmail.com>

On Tue, Jul 16, 2019 at 09:59:26AM +0200, Eric Dumazet wrote:
> 
> 
> On 7/16/19 2:26 AM, Petar Penkov wrote:
> > From: Petar Penkov <ppenkov@google.com>
> > 
> > This helper function allows BPF programs to try to generate SYN
> > cookies, given a reference to a listener socket. The function works
> > from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> > socket in both cases.
> > 
> ...
> >  
> > +BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
> > +	   struct tcphdr *, th, u32, th_len)
> > +{
> > +#ifdef CONFIG_SYN_COOKIES
> > +	u32 cookie;
> > +	u16 mss;
> > +
> > +	if (unlikely(th_len < sizeof(*th)))
> 
> 
> You probably need to check that th_len == th->doff * 4

+1
that is surely necessary for safety.

Considering the limitation of 5 args the api choice is good.
struct bpf_syncookie approach doesn't look natural to me.
And I couldn't come up with any better way to represent this helper.
So let's go with 
return htonl(cookie) | ((u64)mss << 32);
My only question is why htonl ?

Independently of that...
Since we've been hitting this 5 args limit too much,
we need to start thinking how to extend BPF ISA to pass
args 6,7,8 on stack.


^ permalink raw reply

* Re: [PATCH] skbuff: fix compilation warnings in skb_dump()
From: Joe Perches @ 2019-07-17  3:06 UTC (permalink / raw)
  To: Willem de Bruijn, Qian Cai
  Cc: David Miller, Willem de Bruijn, clang-built-linux,
	Network Development, LKML
In-Reply-To: <CAF=yD-KW-XnDvD0i8VbzrkLGNWEY6cPoaEcHy40hbghGXTo+kA@mail.gmail.com>

On Tue, 2019-07-16 at 17:04 +0200, Willem de Bruijn wrote:
> On Tue, Jul 16, 2019 at 4:56 PM Qian Cai <cai@lca.pw> wrote:
> > Fix them by using the proper types, and also fix some checkpatch
> > warnings by using pr_info().
> > 
> > WARNING: printk() should include KERN_<LEVEL> facility level
> > +               printk("%ssk family=%hu type=%u proto=%u\n",
> 
> Converting printk to pr_info lowers all levels to KERN_INFO.
> 
> skb_dump takes an explicit parameter level to be able to log at
> KERN_ERR or KERN_WARNING
> 
> I would like to avoid those checkpatch warnings, but this is not the
> right approach.

Just ignore checkpatch when it doesn't know that
the printk actually includes a KERN_<LEVEL> via
"%s...", level



^ permalink raw reply

* Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-17  3:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Networking, bpf, davem, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Lorenz Bauer, Stanislav Fomichev,
	Petar Penkov
In-Reply-To: <20190717022635.yt7kczxa73kbi7ep@ast-mbp.dhcp.thefacebook.com>

Thank you for your feedback!

On Tue, Jul 16, 2019 at 7:26 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jul 16, 2019 at 09:59:26AM +0200, Eric Dumazet wrote:
> >
> >
> > On 7/16/19 2:26 AM, Petar Penkov wrote:
> > > From: Petar Penkov <ppenkov@google.com>
> > >
> > > This helper function allows BPF programs to try to generate SYN
> > > cookies, given a reference to a listener socket. The function works
> > > from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> > > socket in both cases.
> > >
> > ...
> > >
> > > +BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
> > > +      struct tcphdr *, th, u32, th_len)
> > > +{
> > > +#ifdef CONFIG_SYN_COOKIES
> > > +   u32 cookie;
> > > +   u16 mss;
> > > +
> > > +   if (unlikely(th_len < sizeof(*th)))
> >
> >
> > You probably need to check that th_len == th->doff * 4
>
> +1
> that is surely necessary for safety.

I'll make sure to include this check in the next version.

>
> Considering the limitation of 5 args the api choice is good.
> struct bpf_syncookie approach doesn't look natural to me.
> And I couldn't come up with any better way to represent this helper.
> So let's go with
> return htonl(cookie) | ((u64)mss << 32);
> My only question is why htonl ?

I did it to mirror bpf_tcp_check_syncookie which sees a network order
cookie. That said, it is probably better for the caller to call
bpf_htonl() as they see fit, rather than to do it in the helper. Will
update, thanks.
>
> Independently of that...
> Since we've been hitting this 5 args limit too much,
> we need to start thinking how to extend BPF ISA to pass
> args 6,7,8 on stack.
>

Agreed, having an additional argument would have been helpful for this function.

^ permalink raw reply

* Re: [bpf PATCH v3 0/8] sockmap/tls fixes
From: Jakub Kicinski @ 2019-07-17  4:03 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <156322373173.18678.6003379631139659856.stgit@john-XPS-13-9370>

On Mon, 15 Jul 2019 13:49:01 -0700, John Fastabend wrote:
> Resolve a series of splats discovered by syzbot and an unhash
> TLS issue noted by Eric Dumazet.

I spent most of today poking at this set, and I'll continue tomorrow.
I'm not capitulating yet, but if I can't get it to work for tls_device
soon, I'll make a version which skips the unhash for offload for now..

^ permalink raw reply

* Re: [PATCH net] be2net: Signal that the device cannot transmit during reconfiguration
From: Firo Yang @ 2019-07-17  4:23 UTC (permalink / raw)
  To: Benjamin Poirier, David Miller
  Cc: Ajit Khaparde, Sathya Perla, Somnath Kotur, Sriharsha Basavapatna,
	Saeed Mahameed, netdev@vger.kernel.org
In-Reply-To: <20190716081655.7676-1-bpoirier@suse.com>

I think there is a problem if dev_watchdog() is triggered before netif_carrier_off(). dev_watchdog() might call ->ndo_tx_timeout(), i.e. be_tx_timeout(), if txq timeout  happens. Thus be_tx_timeout() could still be able to access the memory which is being freed by be_update_queues().

Thanks,
Firo

________________________________________
From: Benjamin Poirier
Sent: Tuesday, July 16, 2019 4:16 PM
To: David Miller
Cc: Ajit Khaparde; Sathya Perla; Somnath Kotur; Sriharsha Basavapatna; Saeed Mahameed; Firo Yang; netdev@vger.kernel.org
Subject: [PATCH net] be2net: Signal that the device cannot transmit during reconfiguration

While changing the number of interrupt channels, be2net stops adapter
operation (including netif_tx_disable()) but it doesn't signal that it
cannot transmit. This may lead dev_watchdog() to falsely trigger during
that time.

Add the missing call to netif_carrier_off(), following the pattern used in
many other drivers. netif_carrier_on() is already taken care of in
be_open().

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 82015c8a5ed7..b7a246b33599 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4697,8 +4697,12 @@ int be_update_queues(struct be_adapter *adapter)
        struct net_device *netdev = adapter->netdev;
        int status;

-       if (netif_running(netdev))
+       if (netif_running(netdev)) {
+               /* device cannot transmit now, avoid dev_watchdog timeouts */
+               netif_carrier_off(netdev);
+
                be_close(netdev);
+       }

        be_cancel_worker(adapter);

--
2.22.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox