Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Andrii Nakryiko @ 2019-07-15 23:00 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song
In-Reply-To: <CAEf4BzYDAVUgajz4=dRTu5xQDddp5pi2s=T1BdFmRLZjOwGypQ@mail.gmail.com>

On Mon, Jul 15, 2019 at 3:55 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >
> >
> > Andrii Nakryiko writes:
> >
> > > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> > >>
> > >>
> > >> Andrii Nakryiko writes:
> > >>
> > >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> > >> >>
> > >> >> This is an RFC based on latest bpf-next about acclerating insn patching
> > >> >> speed, it is now near the shape of final PATCH set, and we could see the
> > >> >> changes migrating to list patching would brings, so send out for
> > >> >> comments. Most of the info are in cover letter. I splitted the code in a
> > >> >> way to show API migration more easily.
> > >> >
> > >> >
> > >> > Hey Jiong,
> > >> >
> > >> >
> > >> > Sorry, took me a while to get to this and learn more about instruction
> > >> > patching. Overall this looks good and I think is a good direction.
> > >> > I'll post high-level feedback here, and some more
> > >> > implementation-specific ones in corresponding patches.
> > >>
> > >> Great, thanks very much for the feedbacks. Most of your feedbacks are
> > >> hitting those pain points I exactly had ran into. For some of them, I
> > >> thought similar solutions like yours, but failed due to various
> > >> reasons. Let's go through them again, I could have missed some important
> > >> things.
> > >>
> > >> Please see my replies below.
> > >
> > > Thanks for thoughtful reply :)
> > >
> > >>
> > >> >>
> > >> >> Test Results
> > >> >> ===
> > >> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
> > >> >>     modes (interpreter, JIT, JIT with blinding).
> > >> >>
> > >> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> > >> >>     patching time from 5100s (nearly one and a half hour) to less than
> > >> >>     0.5s for 1M insn patching.
> > >> >>
> > >> >> Known Issues
> > >> >> ===
> > >> >>   - The following warning is triggered when running scale test which
> > >> >>     contains 1M insns and patching:
> > >> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> > >> >>
> > >> >>     This is caused by existing code, it can be reproduced on bpf-next
> > >> >>     master with jit blinding enabled, then run scale unit test, it will
> > >> >>     shown up after half an hour. After this set, patching is very fast, so
> > >> >>     it shows up quickly.
> > >> >>
> > >> >>   - No line info adjustment support when doing insn delete, subprog adj
> > >> >>     is with bug when doing insn delete as well. Generally, removal of insns
> > >> >>     could possibly cause remove of entire line or subprog, therefore
> > >> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
> > >> >>     don't have good idea and clean code for integrating this into the
> > >> >>     linearization code at the moment, will do more experimenting,
> > >> >>     appreciate ideas and suggestions on this.
> > >> >
> > >> > Is there any specific problem to detect which line info to delete? Or
> > >> > what am I missing besides careful implementation?
> > >>
> > >> Mostly line info and subprog info are range info which covers a range of
> > >> insns. Deleting insns could causing you adjusting the range or removing one
> > >> range entirely. subprog info could be fully recalcuated during
> > >> linearization while line info I need some careful implementation and I
> > >> failed to have clean code for this during linearization also as said no
> > >> unit tests to help me understand whether the code is correct or not.
> > >>
> > >
> > > Ok, that's good that it's just about clean implementation. Try to
> > > implement it as clearly as possible. Then post it here, and if it can
> > > be improved someone (me?) will try to help to clean it up further.
> > >
> > > Not a big expert on line info, so can't comment on that,
> > > unfortunately. Maybe Yonghong can chime in (cc'ed)
> > >
> > >
> > >> I will described this latter, spent too much time writing the following
> > >> reply. Might worth an separate discussion thread.
> > >>
> > >> >>
> > >> >>     Insn delete doesn't happen on normal programs, for example Cilium
> > >> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
> > >> >>     not good. That's also why this RFC have a full pass on selftest with
> > >> >>     this known issue.
> > >> >
> > >> > I hope you'll add test for deletion (and w/ corresponding line info)
> > >> > in final patch set :)
> > >>
> > >> Will try. Need to spend some time on BTF format.
> > >> >
> > >> >>
> > >> >>   - Could further use mem pool to accelerate the speed, changes are trivial
> > >> >>     on top of this RFC, and could be 2x extra faster. Not included in this
> > >> >>     RFC as reducing the algo complexity from quadratic to linear of insn
> > >> >>     number is the first step.
> > >> >
> > >> > Honestly, I think that would add more complexity than necessary, and I
> > >> > think we can further speed up performance without that, see below.
> > >> >
> > >> >>
> > >> >> Background
> > >> >> ===
> > >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> > >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> > >> >> remove insns.
> > >> >>
> > >> >> At the moment, insn patching is quadratic of insn number, this is due to
> > >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> > >> >>
> > >> >>   for insn inside prog
> > >> >>     patch insn + regeneate bpf prog
> > >> >>     for insn inside new prog
> > >> >>       adjust jump target
> > >> >>
> > >> >> This is causing significant time spending when a bpf prog requires large
> > >> >> amount of patching on different insns. Benchmarking shows it could take
> > >> >> more than half minutes to finish patching when patching number is more
> > >> >> than 50K, and the time spent could be more than one hour when patching
> > >> >> number is around 1M.
> > >> >>
> > >> >>   15000   :    3s
> > >> >>   45000   :   29s
> > >> >>   95000   :  125s
> > >> >>   195000  :  712s
> > >> >>   1000000 : 5100s
> > >> >>
> > >> >> This RFC introduces new patching infrastructure. Before doing insn
> > >> >> patching, insns in bpf prog are turned into a singly linked list, insert
> > >> >> new insns just insert new list node, delete insns just set delete flag.
> > >> >> And finally, the list is linearized back into array, and branch target
> > >> >> adjustment is done for all jump insns during linearization. This algo
> > >> >> brings the time complexity from quadratic to linear of insn number.
> > >> >>
> > >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> > >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> > >> >> to less than 0.5s.
> > >> >>
> > >> >> Patching API
> > >> >> ===
> > >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> > >> >> where only BPF insns are patched. The other is "verification layer" where
> > >> >> insns have corresponding aux info as well high level subprog info, so
> > >> >> insn patching means aux info needs to be patched as well, and subprog info
> > >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> > >> >> should always be updated after insn patching.
> > >> >>
> > >> >> So, list creation, destroy, insert, delete is the same for both layer,
> > >> >> but lineration is different. "verification layer" patching require extra
> > >> >> work. Therefore the patch APIs are:
> > >> >>
> > >> >>    list creation:                bpf_create_list_insn
> > >> >>    list patch:                   bpf_patch_list_insn
> > >> >>    list pre-patch:               bpf_prepatch_list_insn
> > >> >
> > >> > I think pre-patch name is very confusing, until I read full
> > >> > description I couldn't understand what it's supposed to be used for.
> > >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> > >> > me wondering whether instruction buffer is inserted after instruction,
> > >> > or instruction is replaced with a bunch of instructions.
> > >> >
> > >> > So how about two more specific names:
> > >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> > >> > instruction with a list of patch instructions)
> > >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> > >> > one is pretty clear).
> > >>
> > >> My sense on English word is not great, will switch to above which indeed
> > >> reads more clear.
> > >>
> > >> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> > >> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> > >> >
> > >> > These two functions are both quite involved, as well as share a lot of
> > >> > common code. I'd rather have one linearize instruction, that takes env
> > >> > as an optional parameter. If env is specified (which is the case for
> > >> > all cases except for constant blinding pass), then adjust aux_data and
> > >> > subprogs along the way.
> > >>
> > >> Two version of lineration and how to unify them was a painpoint to me. I
> > >> thought to factor out some of the common code out, but it actually doesn't
> > >> count much, the final size counting + insnsi resize parts are the same,
> > >> then things start to diverge since the "Copy over insn" loop.
> > >>
> > >> verifier layer needs to copy and initialize aux data etc. And jump
> > >> relocation is different. At core layer, the use case is JIT blinding which
> > >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
> > >
> > > Sorry, I didn't get what "could expand an jump_imm insn into a
> > > and/or/jump_reg sequence", maybe you can clarify if I'm missing
> > > something.
> > >
> > > But from your cover letter description, core layer has no jumps at
> > > all, while verifier has jumps inside patch buffer. So, if you support
> > > jumps inside of patch buffer, it will automatically work for core
> > > layer. Or what am I missing?
> >
> > I meant in core layer (JIT blinding), there is the following patching:
> >
> > input:
> >   insn 0             insn 0
> >   insn 1             insn 1
> >   jmp_imm   >>       mov_imm  \
> >   insn 2             xor_imm    insn seq expanded from jmp_imm
> >   insn 3             jmp_reg  /
> >                      insn 2
> >                      insn 3
> >
> >
> > jmp_imm is the insn that will be patched, and the actually transformation
> > is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
> > at the end of the patch buffer, must jump to the same destination as the
> > original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
> > be relocated, and the jump destination is outside of patch buffer.
>
>
> Ok, great, thanks for explaining, yeah it's definitely something that
> we should be able to support. BUT. It got me thinking a bit more and I
> think I have simpler and more elegant solution now, again, supporting
> both core-layer and verifier-layer operations.
>
> struct bpf_patchable_insn {
>    struct bpf_patchable_insn *next;
>    struct bpf_insn insn;
>    int orig_idx; /* original non-patched index */
>    int new_idx;  /* new index, will be filled only during linearization */
> };
>
> struct bpf_patcher {
>     /* dummy head node of a chain of patchable instructions */
>     struct bpf_patchable_insn insn_head;
>     /* dynamic array of size(original instruction count)
>      * this is a map from original instruction index to a first
>      * patchable instruction that replaced that instruction (or
>      * just original instruction as bpf_patchable_insn).
>      */
>     int *orig_idx_to_patchable_insn;
>     int cnt;
> };
>
> Few points, but it should be pretty clear just from comments and definitions:
> 1. When you created bpf_patcher, you create patchabe_insn list, fill
> orig_idx_to_patchable_insn map to store proper pointers. This array is
> NEVER changed after that.
> 2. When replacing instruction, you re-use struct bpf_patchable_insn
> for first patched instruction, then append after that (not prepend to
> next instruction to not disrupt orig_idx -> patchable_insn mapping).
> 3. During linearizations, you first traverse the chain of instructions
> and trivially assing new_idxs.
> 4. No need for patchabe_insn->target anymore. All jumps use relative
> instruction offsets, right? So when you need to determine new
> instruction index during linearization, you just do (after you
> calculated new instruction indicies):
>
> func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
>    int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
>    int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
>    adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
> }

Forgot to mention. To handle deleted insns, you can either traverse
till you find first non-deleted instruction after that one, or during
filling of new_idx, just make sure to that new_idx of deleted
instruction is the same as the first non-deleted instruction's
new_idx.

For subprogs (presuming there is no case where we can just eliminate
entire subprog), you can just do simple look up from original index to
a new index. No need to copy/recalculate, etc. It's just orig_idx ->
new_idx mapping.

>
> The idea is that we want to support quick look-up by original
> instruction index. That's what orig_idx_to_patchable_insn provides. On
> the other hand, no existing instruction is ever referencing newly
> patched instruction by its new offset, so with careful implementation,
> you can transparently support all the cases, regardless if it's in
> core layer or verifier layer (so, e.g., verifier layer patched
> instructions now will be able to jump out of patched buffer, if
> necessary, neat, right?).
>
> It is cleaner than everything we've discussed so far. Unless I missed
> something critical (it's all quite convoluted, so I might have
> forgotten some parts already). Let me know what you think.
>
>
> >
> > This means for core layer (jit blinding), it needs to take care of insn
> > inside patch buffer.
> >
> > > Just compared two version of linearize side by side. From what I can
> > > see, unified version could look like this, high-level:
> > >
> > > 1. Count final insn count (but see my other suggestions how to avoid
> > > that altogether). If not changed - exit.
> > > 2. Realloc insn buffer, copy just instructions (not aux_data yet).
> > > Build idx_map, if necessary.
> > > 3. (if env) then bpf_patchable_insn has aux_data, so now do another
> > > pass and copy it into resulting array.
> > > 4. (if env) Copy sub info. Though I'd see if we can just reuse old
> > > ones and just adjust offsets. I'm not sure why we need to allocate new
> > > array, subprogram count shouldn't change, right?
> >
> > If there is no dead insn elimination opt, then we could just adjust
> > offsets. When there is insn deleting, I feel the logic becomes more
> > complex. One subprog could be completely deleted or partially deleted, so
> > I feel just recalculate the whole subprog info as a side-product is
> > much simpler.
>
> What's the situation where entirety of subprog can be deleted?
>
>
> >
> > > 5. (common) Relocate jumps. Not clear why core layer doesn't care
> > > about PATCHED (or, alternatively, why verifier layer cares).
> >
> > See above, in this RFC, core layer care PATCHED during relocating jumps,
> > and verifier layer doesn't.
> >
> > > And again, with targets pointer it will look totally different (and
> > > simpler).
> >
> > Yes, will see how the code looks.
> >
> > > 6. (if env) adjust subprogs
> > > 7. (common) Adjust prog's line info.
> > >
> > > The devil is in the details, but I think this will still be better if
> > > contained in one function if a bunch of `if (env)` checks. Still
> > > pretty linear.
> > >
> > >> jump_reg is at the end of the patch buffer, it should be relocated. While
> > >> all use case in verifier layer, no jump in the prog will be patched and all
> > >> new jumps in patch buffer will jump inside the buffer locally so no need to
> > >> resolve.
> > >>
> > >> And yes we could unify them into one and control the diverge using
> > >> argument, but then where to place the function is an issue. My
> > >> understanding is verifier.c is designed to be on top of core.c and core.c
> > >> should not reference and no need to be aware of any verifier specific data
> > >> structures, for example env or bpf_aux_insn_data etc.
> > >
> > > Func prototype where it is. Maybe forward-declare verifier env struct.
> > > Implementation in verifier.c?
> > >
> > >>
> > >> So, in this RFC, I had choosed to write separate linerization function for
> > >> core and verifier layer. Does this make sense?
> > >
> > > See above. Let's still try to make it better.
> > >
> > >>
> > >> >
> > >> > This would keep logic less duplicated and shouldn't complexity beyond
> > >> > few null checks in few places.
> > >> >
> > >> >>    list destroy:                 bpf_destroy_list_insn
> > >> >>
> > >> >
> > >> > I'd also add a macro foreach_list_insn instead of explicit for loops
> > >> > in multiple places. That would also allow to skip deleted instructions
> > >> > transparently.
> > >> >
> > >> >> list patch could change the insn at patch point, it will invalid the aux
> > >> >
> > >> > typo: invalid -> invalidate
> > >>
> > >> Ack.
> > >>
> > >> >
> > >> >> info at patching point. list pre-patch insert new insns before patch point
> > >> >> where the insn and associated aux info are not touched, it is used for
> > >> >> example in convert_ctx_access when generating prologue.
> > >> >>
> > >> >> Typical API sequence for one patching pass:
> > >> >>
> > >> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> > >> >>    for (elem = list; elem; elem = elem->next)
> > >> >>       patch_buf = gen_patch_buf_logic;
> > >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> > >> >>    bpf_prog = bpf_linearize_list_insn(list)
> > >> >>    bpf_destroy_list_insn(list)
> > >> >>
> > >> >> Several patching passes could also share the same list:
> > >> >>
> > >> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> > >> >>    for (elem = list; elem; elem = elem->next)
> > >> >>       patch_buf = gen_patch_buf_logic1;
> > >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> > >> >>    for (elem = list; elem; elem = elem->next)
> > >> >>       patch_buf = gen_patch_buf_logic2;
> > >> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> > >> >>    bpf_prog = bpf_linearize_list_insn(list)
> > >> >>    bpf_destroy_list_insn(list)
> > >> >>
> > >> >> but note new inserted insns int early passes won't have aux info except
> > >> >> zext info. So, if one patch pass requires all aux info updated and
> > >> >> recalculated for all insns including those pathced, it should first
> > >> >> linearize the old list, then re-create the list. The RFC always create and
> > >> >> linearize the list for each migrated patching pass separately.
> > >> >
> > >> > I think we should do just one list creation, few passes of patching
> > >> > and then linearize once. That will save quite a lot of memory
> > >> > allocation and will speed up a lot of things. All the verifier
> > >> > patching happens one after the other without any other functionality
> > >> > in between, so there shouldn't be any problem.
> > >>
> > >> Yes, as mentioned above, it is possible and I had tried to do it in an very
> > >> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
> > >> same list, but then the 32-bit zero extension insertion pass requires
> > >> aux.zext_dst set properly for all instructions including those patched
> > >
> > > So zext_dst. Seems like it's easily calculatable, so doesn't seem like
> > > it even needs to be accessed from aux_data.
> > >
> > > But. I can see at least two ways to do this:
> > > 1. those patching passes that care about aux_data, should just do
> > > extra check for NULL. Because when we adjust insns now, we just leave
> > > zero-initialized aux_data, except for zext_dst and seen. So it's easy
> > > to default to them if aux_data is NULL for patchable_insn.
> > > 2. just allocate and fill them out them when applying patch insns
> > > buffer. It's not a duplication, we already fill them out during
> > > patching today. So just do the same, except through malloc()'ed
> > > pointer instead. At the end they will be copied into linear resulting
> > > array during linearization (uniformly with non-patched insns).
> > >
> > >> one which we need to linearize the list first (as we set zext_dst during
> > >> linerization), or the other choice is we do the zext_dst initialization
> > >> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
> > >> between core and verifier layer.
> > >
> > > List construction is much simpler, even if we have to have extra
> > > check, similar to `if (env) { do_extra(); }`, IMO, it's fine.
> > >
> > >>
> > >> > As for aux_data. We can solve that even more simply and reliably by
> > >> > storing a pointer along the struct bpf_list_insn
> > >>
> > >> This is exactly what I had implemented initially, but then the issue is how
> > >> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
> > >> but later found zext_dst info is required for all insns, so I end up
> > >> duplicating zext_dst in bpf_list_insn.
> > >
> > > See above. No duplication. You have a pointer. Whether aux_data is in
> > > original array or was malloc()'ed, doesn't matter. But no duplication
> > > of fields.
> > >
> > >>
> > >> This leads me worrying we need to keep duplicating fields there as soon as
> > >> there is new similar requirements in future patching pass and I thought it
> > >> might be better to just reference the aux_data inside env using orig_idx,
> > >> this avoids duplicating information, but we need to make sure used fields
> > >> inside aux_data for patched insn update-to-date during linearization or
> > >> patching list.
> > >>
> > >> > (btw, how about calling it bpf_patchable_insn?).
> > >>
> > >> No preference, will use this one.
> > >>
> > >> > Here's how I propose to represent this patchable instruction:
> > >> >
> > >> > struct bpf_list_insn {
> > >> >        struct bpf_insn insn;
> > >> >        struct bpf_list_insn *next;
> > >> >        struct bpf_list_insn *target;
> > >> >        struct bpf_insn_aux_data *aux_data;
> > >> >        s32 orig_idx; // can repurpose this to have three meanings:
> > >> >                      // -2 - deleted
> > >> >                      // -1 - patched/inserted insn
> > >> >                      // >=0 - original idx
> > >>
> > >> I actually had experimented the -2/-1/0 trick, exactly the same number
> > >> assignment :) IIRC the code was not clear compared with using flag, the
> > >> reason seems to be:
> > >>   1. we still need orig_idx of an patched insn somehow, meaning negate the
> > >>      index.
> > >
> > > Not following, original index with be >=0, no?
> > >
> > >>   2. somehow somecode need to know whether one insn is deleted or patched
> > >>      after the negation, so I end up with some ugly code.
> > >
> > > So that's why you'll have constants with descriptive name for -2 and -1.
> > >
> > >>
> > >> Anyway, I might had not thought hard enough on this, I will retry using the
> > >> special index instead of flag, hopefully I could have clean code this time.
> > >>
> > >
> > > Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
> > > (orig_idx >= 0) { ... }` are very confusing.
> > >
> > >> > };
> > >> >
> > >> > The idea would be as follows:
> > >> > 1. when creating original list, target pointer will point directly to
> > >> > a patchable instruction wrapper for jumps/calls. This will allow to
> > >> > stop tracking and re-calculating jump offsets and instruction indicies
> > >> > until linearization.
> > >>
> > >> Not sure I have followed the idea of "target" pointer. At the moment we are
> > >> using index mapping array (generated as by-product during coping insn).
> > >>
> > >> While the "target" pointer means to during list initialization, each jump
> > >> insn will have target initialized to the list node of the converted jump
> > >> destination insn, and all those non-jump insns are with NULL? Then during
> > >> linearization you assign index to each list node (could be done as
> > >> by-product of other pass) before insn coping which could then relocate the
> > >> insn during the coping as the "target" would have final index calculated?
> > >> Am I following correctly?
> > >
> > > Yes, I think you are understanding correctly what I'm saying. For
> > > implementation, you can do it in few ways, through few passes or with
> > > some additional data, is less important. See what's cleanest.
> > >
> > >>
> > >> > 2. aux_data is also filled at that point. Later at linearization time
> > >> > you'd just iterate over all the instructions in final order and copy
> > >> > original aux_data, if it's present. And then just repace env's
> > >> > aux_data array at the end, should be very simple and fast.
> > >>
> > >> As explained, I am worried making aux_data a pointer will causing
> > >> duplicating some fields into list_insn if the fields are required for
> > >> patched insns.
> > >
> > > Addressed above, I don't think there will be any duplication, because
> > > we pass aux_data by pointer.
> > >
> > >>
> > >> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
> > >> > same list of instructions and those passes will just keep inserting
> > >> > instruction buffers. Given we have restriction that all the jumps are
> > >> > only within patch buffer, it will be trivial to construct proper
> > >> > patchable instruction wrappers for newly added instructions, with NULL
> > >> > for aux_data and possibly non-NULL target (if it's a JMP insn).
> > >> > 4. After those passes, linearize, adjust subprogs (for this you'll
> > >> > probably still need to create index mapping, right?), copy or create
> > >> > new aux_data.
> > >> > 5. Done.
> > >> >
> > >> > What do you think? I think this should be overall simpler and faster.
> > >> > But let me know if I'm missing something.
> > >>
> > >> Thanks for all these thoughts, they are very good suggestions and reminds
> > >> me to revisit some points I had forgotten. I will do the following things:
> > >>
> > >>   1. retry the negative index solution to eliminate flag if the result code
> > >>      could be clean.
> > >>   2. the "target" pointer seems make sense, it makes list_insn bigger but
> > >>      normally space trade with time, so I will try to implement it to see
> > >>      how the code looks like.
> > >>   3. I still have concerns on making aux_data as pointer. Mostly due to
> > >>      patched insn will have NULL pointer and in case aux info of patched
> > >>      insn is required, we need to duplicate info inside list_insn. For
> > >>      example 32-bit zext opt requires zext_dst.
> > >>
> > >
> > >
> > > So one more thing I wanted to suggest. I'll try to keep high-level
> > > suggestions here.
> > >
> > > What about having a wrapper for patchable_insn list, where you can
> > > store some additional data, like final count and whatever else. It
> > > will eliminate some passes (counting) and will make list handling
> > > easier (because you can have a dummy head pointer, so no special
> > > handling of first element
> >
> > Will try it.
> >
> > > you had this concern in patch #1, I
> > > believe). But it will be clear if it's beneficial once implemented.
> >
> > >> Regards,
> > >> Jiong
> > >>
> > >> >>
> > >> >> Compared with old patching code, this new infrastructure has much less core
> > >> >> code, even though the final code has a couple of extra lines but that is
> > >> >> mostly due to for list based infrastructure, we need to do more error
> > >> >> checks, so the list and associated aux data structure could be freed when
> > >> >> errors happens.
> > >> >>
> > >> >> Patching Restrictions
> > >> >> ===
> > >> >>   - For core layer, the linearization assume no new jumps inside patch buf.
> > >> >>     Currently, the only user of this layer is jit blinding.
> > >> >>   - For verifier layer, there could be new jumps inside patch buf, but
> > >> >>     they should have branch target resolved themselves, meaning new jumps
> > >> >>     doesn't jump to insns out of the patch buf. This is the case for all
> > >> >>     existing verifier layer users.
> > >> >>   - bpf_insn_aux_data for all patched insns including the one at patch
> > >> >>     point are invalidated, only 32-bit zext info will be recalcuated.
> > >> >>     If the aux data of insn at patch point needs to be retained, it is
> > >> >>     purely insn insertion, so need to use the pre-patch API.
> > >> >>
> > >> >> I plan to send out a PATCH set once I finished insn deletion line info adj
> > >> >> support, please have a looks at this RFC, and appreciate feedbacks.
> > >> >>
> > >> >> Jiong Wang (8):
> > >> >>   bpf: introducing list based insn patching infra to core layer
> > >> >>   bpf: extend list based insn patching infra to verification layer
> > >> >>   bpf: migrate jit blinding to list patching infra
> > >> >>   bpf: migrate convert_ctx_accesses to list patching infra
> > >> >>   bpf: migrate fixup_bpf_calls to list patching infra
> > >> >>   bpf: migrate zero extension opt to list patching infra
> > >> >>   bpf: migrate insn remove to list patching infra
> > >> >>   bpf: delete all those code around old insn patching infrastructure
> > >> >>
> > >> >>  include/linux/bpf_verifier.h |   1 -
> > >> >>  include/linux/filter.h       |  27 +-
> > >> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
> > >> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
> > >> >>  4 files changed, 580 insertions(+), 528 deletions(-)
> > >> >>
> > >> >> --
> > >> >> 2.7.4
> > >> >>
> > >>
> >

^ permalink raw reply

* Re: [PATCH v2 bpf-next] selftests/bpf: remove logic duplication in test_verifier.c
From: Daniel Borkmann @ 2019-07-15 23:03 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast
  Cc: andrii.nakryiko, kernel-team, Krzesimir Nowak
In-Reply-To: <20190712174441.4089282-1-andriin@fb.com>

On 7/12/19 7:44 PM, Andrii Nakryiko wrote:
> test_verifier tests can specify single- and multi-runs tests. Internally
> logic of handling them is duplicated. Get rid of it by making single run
> retval/data specification to be a first run spec.
> 
> Cc: Krzesimir Nowak <krzesimir@kinvolk.io>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Stephen Hemminger @ 2019-07-15 23:37 UTC (permalink / raw)
  To: Vedang Patel
  Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern
In-Reply-To: <1563231104-19912-2-git-send-email-vedang.patel@intel.com>

On Mon, 15 Jul 2019 15:51:41 -0700
Vedang Patel <vedang.patel@intel.com> wrote:

> @@ -405,6 +420,7 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>  	struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
>  	struct tc_mqprio_qopt *qopt = 0;
>  	__s32 clockid = CLOCKID_INVALID;
> +	__u32 taprio_flags = 0;
>  	int i;
>  
>  	if (opt == NULL)
> @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>  
>  	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
>  
> +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> +	}
> +

Overall this looks fine, but three small comments:
1. It is better not to do unnecessary variable initialization
2. It is better to move variables into the basic block where they are used.
3. Use the print_0xhex() instead of print_uint() for hex values. The difference
   is that in the JSON output, print_uint would be decimal but the print_0xhex
   is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.



^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 3/5] taprio: add support for setting txtime_delay.
From: Stephen Hemminger @ 2019-07-15 23:38 UTC (permalink / raw)
  To: Vedang Patel
  Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern
In-Reply-To: <1563231104-19912-3-git-send-email-vedang.patel@intel.com>

On Mon, 15 Jul 2019 15:51:42 -0700
Vedang Patel <vedang.patel@intel.com> wrote:

> +			if (get_s32(&txtime_delay, *argv, 0)) {

Is txtime_delay of a negative value meaningful?

^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Jakub Kicinski @ 2019-07-16  0:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vedang Patel, netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, m-karicheri2, dsahern
In-Reply-To: <20190715163743.2c6cec2b@hermes.lan>

On Mon, 15 Jul 2019 16:37:43 -0700, Stephen Hemminger wrote:
> On Mon, 15 Jul 2019 15:51:41 -0700
> Vedang Patel <vedang.patel@intel.com> wrote:
> > @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> >  
> >  	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
> >  
> > +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> > +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> > +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> > +	}
>[...]
> 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
>    is that in the JSON output, print_uint would be decimal but the print_0xhex
>    is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.

In my humble personal experience scripting tests using iproute2 and
bpftool with Python I found printing the "hex string" instead of just
outputing the integer value counter productive :( Even tho it looks
better to the eye, JSON is primarily for machine processing and hex
strings have to be manually converted.

^ permalink raw reply

* Re: [pull request][net 0/3] Mellanox, mlx5 fixes 2019-07-15
From: David Miller @ 2019-07-16  0:20 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20190715200940.31799-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 15 Jul 2019 20:09:53 +0000

> This pull request provides mlx5 TC flower and tunnel fixes for
> kernel 5.2 from Eli and Vlad.
> 
> Please pull and let me know if there is any problem.

Pulled, thanks Saeed.

^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Stephen Hemminger @ 2019-07-16  0:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vedang Patel, netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, m-karicheri2, dsahern
In-Reply-To: <20190715171515.248460a6@cakuba.netronome.com>

On Mon, 15 Jul 2019 17:15:15 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Mon, 15 Jul 2019 16:37:43 -0700, Stephen Hemminger wrote:
> > On Mon, 15 Jul 2019 15:51:41 -0700
> > Vedang Patel <vedang.patel@intel.com> wrote:  
> > > @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> > >  
> > >  	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
> > >  
> > > +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> > > +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> > > +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> > > +	}  
> >[...]
> > 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
> >    is that in the JSON output, print_uint would be decimal but the print_0xhex
> >    is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.  
> 
> In my humble personal experience scripting tests using iproute2 and
> bpftool with Python I found printing the "hex string" instead of just
> outputing the integer value counter productive :( Even tho it looks
> better to the eye, JSON is primarily for machine processing and hex
> strings have to be manually converted.

If it is hex on normal output, it should be hex on JSON output.
And what ever the normal output format is has to be accepted on command line as input.

^ permalink raw reply

* [bpf-next RFC 0/6] Introduce a BPF helper to generate SYN cookies
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov

From: Petar Penkov <ppenkov@google.com>

This patch series introduces a BPF helper function that allows generating SYN
cookies from BPF.

The first two patches in the series modify several TCP helper functions to
allow for SKB-less operation, as is the case with XDP.

The third patch introduces the bpf_tcp_gen_syncookie helper function which
generates a SYN cookie for either XDP or TC programs. 

The last three patches sync tools/ and add a test. 

The primary design consideration I see in the patch series is the return value
of the helper function. Currently bpf_tcp_gen_syncookie returns a 64-bit value
that contains both the 32-bit syncookie, and the 16-bit mss value which is
encoded in the cookie. On error, it would return a negative value instead. I
chose this over writing the cookie into the provided TCP packet to avoid writing
packet data as currently if a helper changes the packet data, the first argument
has to point to the context (can this be relaxed?). 

To make the API cleaner we can instead return something like the struct below
though the return type would then not really be RET_INTEGER or any of the
currently existing return types.
struct bpf_syncookie {
	u16 error; // or u8 error, u8 unused for future use
	u16 mss;
	u32 syncookie;
}

Petar Penkov (6):
  tcp: tcp_syn_flood_action read port from socket
  tcp: add skb-less helpers to retrieve SYN cookie
  bpf: add bpf_tcp_gen_syncookie helper
  bpf: sync bpf.h to tools/
  selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
  selftests/bpf: add test for bpf_tcp_gen_syncookie

 include/net/tcp.h                             | 11 +++
 include/uapi/linux/bpf.h                      | 30 ++++++-
 net/core/filter.c                             | 62 +++++++++++++
 net/ipv4/tcp_input.c                          | 87 +++++++++++++++++--
 net/ipv4/tcp_ipv4.c                           |  8 ++
 net/ipv6/tcp_ipv6.c                           |  8 ++
 tools/include/uapi/linux/bpf.h                | 37 +++++++-
 tools/testing/selftests/bpf/bpf_helpers.h     |  3 +
 .../bpf/progs/test_tcp_check_syncookie_kern.c | 28 ++++--
 .../bpf/test_tcp_check_syncookie_user.c       | 61 +++++++++++--
 10 files changed, 313 insertions(+), 22 deletions(-)

-- 
2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* [bpf-next RFC 1/6] tcp: tcp_syn_flood_action read port from socket
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This allows us to call this function before an SKB has been
allocated.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 net/ipv4/tcp_input.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a22fb3b..8892df6de1d4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6422,9 +6422,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc);
 /*
  * Return true if a syncookie should be sent
  */
-static bool tcp_syn_flood_action(const struct sock *sk,
-				 const struct sk_buff *skb,
-				 const char *proto)
+static bool tcp_syn_flood_action(const struct sock *sk, const char *proto)
 {
 	struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
 	const char *msg = "Dropping request";
@@ -6444,7 +6442,7 @@ static bool tcp_syn_flood_action(const struct sock *sk,
 	    net->ipv4.sysctl_tcp_syncookies != 2 &&
 	    xchg(&queue->synflood_warned, 1) == 0)
 		net_info_ratelimited("%s: Possible SYN flooding on port %d. %s.  Check SNMP counters.\n",
-				     proto, ntohs(tcp_hdr(skb)->dest), msg);
+				     proto, sk->sk_num, msg);
 
 	return want_cookie;
 }
@@ -6487,7 +6485,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	 */
 	if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
 	     inet_csk_reqsk_queue_is_full(sk)) && !isn) {
-		want_cookie = tcp_syn_flood_action(sk, skb, rsk_ops->slab_name);
+		want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
 		if (!want_cookie)
 			goto drop;
 	}
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 2/6] tcp: add skb-less helpers to retrieve SYN cookie
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 include/net/tcp.h    | 11 ++++++
 net/ipv4/tcp_input.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_ipv4.c  |  8 +++++
 net/ipv6/tcp_ipv6.c  |  8 +++++
 4 files changed, 106 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cca3c59b98bf..a128e22c0d5d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -414,6 +414,17 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
 		       int estab, struct tcp_fastopen_cookie *foc);
 const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
 
+/*
+ *	BPF SKB-less helpers
+ */
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+			 struct tcphdr *tch, u32 *cookie);
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+			 struct tcphdr *tch, u32 *cookie);
+u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
+		      const struct tcp_request_sock_ops *af_ops,
+		      struct sock *sk, void *iph, struct tcphdr *tch,
+		      u32 *cookie);
 /*
  *	TCP v4 functions exported for the inet6 API
  */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8892df6de1d4..1406d7e0953c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3782,6 +3782,52 @@ static void smc_parse_options(const struct tcphdr *th,
 #endif
 }
 
+/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
+ * value on success.
+ *
+ * Invoked for BPF SYN cookie generation, so th should be a SYN.
+ */
+static u16 tcp_parse_mss_option(const struct net *net, const struct tcphdr *th,
+				u16 user_mss)
+{
+	const unsigned char *ptr = (const unsigned char *)(th + 1);
+	int length = (th->doff * 4) - sizeof(struct tcphdr);
+	u16 mss = 0;
+
+	while (length > 0) {
+		int opcode = *ptr++;
+		int opsize;
+
+		switch (opcode) {
+		case TCPOPT_EOL:
+			return mss;
+		case TCPOPT_NOP:	/* Ref: RFC 793 section 3.1 */
+			length--;
+			continue;
+		default:
+			if (length < 2)
+				return mss;
+			opsize = *ptr++;
+			if (opsize < 2) /* "silly options" */
+				return mss;
+			if (opsize > length)
+				return mss;	/* fail on partial options */
+			if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
+				u16 in_mss = get_unaligned_be16(ptr);
+
+				if (in_mss) {
+					if (user_mss && user_mss < in_mss)
+						in_mss = user_mss;
+					mss = in_mss;
+				}
+			}
+			ptr += opsize - 2;
+			length -= opsize;
+		}
+	}
+	return mss;
+}
+
 /* Look for tcp options. Normally only called on SYN and SYNACK packets.
  * But, this can also be called on packets in the established flow when
  * the fast version below fails.
@@ -6464,6 +6510,39 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
 	}
 }
 
+u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
+		      const struct tcp_request_sock_ops *af_ops,
+		      struct sock *sk, void *iph, struct tcphdr *th,
+		      u32 *cookie)
+{
+	u16 mss = 0;
+#ifdef CONFIG_SYN_COOKIES
+	bool is_v4 = rsk_ops->family == AF_INET;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
+	    !inet_csk_reqsk_queue_is_full(sk))
+		return 0;
+
+	if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
+		return 0;
+
+	if (sk_acceptq_is_full(sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		return 0;
+	}
+
+	mss = tcp_parse_mss_option(sock_net(sk), th, tp->rx_opt.user_mss);
+	if (!mss)
+		mss = af_ops->mss_clamp;
+
+	tcp_synq_overflow(sk);
+	*cookie = is_v4 ? __cookie_v4_init_sequence(iph, th, &mss)
+			: __cookie_v6_init_sequence(iph, th, &mss);
+#endif
+	return mss;
+}
+
 int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		     const struct tcp_request_sock_ops *af_ops,
 		     struct sock *sk, struct sk_buff *skb)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d57641cb3477..0e06e59784bd 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1515,6 +1515,14 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
 	return sk;
 }
 
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+			 struct tcphdr *tch, u32 *cookie)
+{
+	return tcp_get_syncookie(&tcp_request_sock_ops,
+				 &tcp_request_sock_ipv4_ops, sk, iph, tch,
+				 cookie);
+}
+
 /* The socket must have it's spinlock held when we get
  * here, unless it is a TCP_LISTEN socket.
  *
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d56a9019a0fe..ce46cdba54bc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1058,6 +1058,14 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
 	return sk;
 }
 
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+			 struct tcphdr *tch, u32 *cookie)
+{
+	return tcp_get_syncookie(&tcp6_request_sock_ops,
+				 &tcp_request_sock_ipv6_ops, sk, iph, tch,
+				 cookie);
+}
+
 static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 {
 	if (skb->protocol == htons(ETH_P_IP))
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/bpf.h | 30 ++++++++++++++++++-
 net/core/filter.c        | 62 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6f68438aa4ed..abf4a85c76d1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2713,6 +2713,33 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *	Description
+ *		Try to issue a SYN cookie for the packet with corresponding
+ *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ *		*iph* points to the start of the IPv4 or IPv6 header, while
+ *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
+ *		**sizeof**\ (**struct ip6hdr**).
+ *
+ *		*th* points to the start of the TCP header, while *th_len*
+ *		contains **sizeof**\ (**struct tcphdr**).
+ *
+ *	Return
+ *		On success, lower 32 bits hold the generated SYN cookie in
+ *		network order and the higher 32 bits hold the MSS value for that
+ *		cookie.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** SYN cookie cannot be issued due to error
+ *
+ *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ *		**-ENOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ *		**-EPROTONOSUPPORT** *sk* family is not AF_INET/AF_INET6
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2824,7 +2851,8 @@ union bpf_attr {
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
-	FN(send_signal),
+	FN(send_signal),		\
+	FN(tcp_gen_syncookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 47f6386fb17a..109fd1e286f4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5850,6 +5850,64 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
+	   struct tcphdr *, th, u32, th_len)
+{
+#ifdef CONFIG_SYN_COOKIES
+	u32 cookie;
+	u16 mss;
+
+	if (unlikely(th_len < sizeof(*th)))
+		return -EINVAL;
+
+	if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
+		return -EINVAL;
+
+	if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
+		return -EINVAL;
+
+	if (!th->syn || th->ack || th->fin || th->rst)
+		return -EINVAL;
+
+	switch (sk->sk_family) {
+	case AF_INET:
+		if (unlikely(iph_len < sizeof(struct iphdr)))
+			return -EINVAL;
+		mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
+		break;
+
+#if IS_BUILTIN(CONFIG_IPV6)
+	case AF_INET6:
+		if (unlikely(iph_len < sizeof(struct ipv6hdr)))
+			return -EINVAL;
+		mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
+		break;
+#endif /* CONFIG_IPV6 */
+
+	default:
+		return -EPROTONOSUPPORT;
+	}
+	if (mss <= 0)
+		return -ENOENT;
+
+	return htonl(cookie) | ((u64)mss << 32);
+#else
+	return -ENOTSUPP;
+#endif /* CONFIG_SYN_COOKIES */
+}
+
+static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
+	.func		= bpf_tcp_gen_syncookie,
+	.gpl_only	= true, /* __cookie_v*_init_sequence() is GPL */
+	.pkt_access	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_SOCK_COMMON,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -6135,6 +6193,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_check_syncookie_proto;
 	case BPF_FUNC_skb_ecn_set_ce:
 		return &bpf_skb_ecn_set_ce_proto;
+	case BPF_FUNC_tcp_gen_syncookie:
+		return &bpf_tcp_gen_syncookie_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
@@ -6174,6 +6234,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_skc_lookup_tcp_proto;
 	case BPF_FUNC_tcp_check_syncookie:
 		return &bpf_tcp_check_syncookie_proto;
+	case BPF_FUNC_tcp_gen_syncookie:
+		return &bpf_tcp_gen_syncookie_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 5/6] selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Expose bpf_tcp_gen_syncookie to selftests.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/testing/selftests/bpf/bpf_helpers.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 5a3d92c8bec8..19f01e967402 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -228,6 +228,9 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
 static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
 	(void *)BPF_FUNC_sk_storage_delete;
 static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
+static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
+					  int ip_len, void *tcp, int tcp_len) =
+	(void *) BPF_FUNC_tcp_gen_syncookie;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 4/6] bpf: sync bpf.h to tools/
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Sync updated documentation for bpf_redirect_map.

Sync the bpf_tcp_gen_syncookie helper function definition with the one
in tools/uapi.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/include/uapi/linux/bpf.h | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f506c68b2612..abf4a85c76d1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1571,8 +1571,11 @@ union bpf_attr {
  * 		but this is only implemented for native XDP (with driver
  * 		support) as of this writing).
  *
- * 		All values for *flags* are reserved for future usage, and must
- * 		be left at zero.
+ * 		The lower two bits of *flags* are used as the return code if
+ * 		the map lookup fails. This is so that the return value can be
+ * 		one of the XDP program return codes up to XDP_TX, as chosen by
+ * 		the caller. Any higher bits in the *flags* argument must be
+ * 		unset.
  *
  * 		When used to redirect packets to net devices, this helper
  * 		provides a high performance increase over **bpf_redirect**\ ().
@@ -2710,6 +2713,33 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *	Description
+ *		Try to issue a SYN cookie for the packet with corresponding
+ *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ *		*iph* points to the start of the IPv4 or IPv6 header, while
+ *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
+ *		**sizeof**\ (**struct ip6hdr**).
+ *
+ *		*th* points to the start of the TCP header, while *th_len*
+ *		contains **sizeof**\ (**struct tcphdr**).
+ *
+ *	Return
+ *		On success, lower 32 bits hold the generated SYN cookie in
+ *		network order and the higher 32 bits hold the MSS value for that
+ *		cookie.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** SYN cookie cannot be issued due to error
+ *
+ *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ *		**-ENOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ *		**-EPROTONOSUPPORT** *sk* family is not AF_INET/AF_INET6
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2821,7 +2851,8 @@ union bpf_attr {
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
-	FN(send_signal),
+	FN(send_signal),		\
+	FN(tcp_gen_syncookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 6/6] selftests/bpf: add test for bpf_tcp_gen_syncookie
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Modify the existing bpf_tcp_check_syncookie test to also generate a
SYN cookie, pass the packet to the kernel, and verify that the two
cookies are the same (and both valid). Since cloned SKBs are skipped
during generic XDP, this test does not issue a SYN cookie when run in
XDP mode. We therefore only check that a valid SYN cookie was issued at
the TC hook.

Additionally, verify that the MSS for that SYN cookie is within
expected range.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 .../bpf/progs/test_tcp_check_syncookie_kern.c | 28 +++++++--
 .../bpf/test_tcp_check_syncookie_user.c       | 61 ++++++++++++++++---
 2 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
index 1ab095bcacd8..229832766f42 100644
--- a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
@@ -19,8 +19,8 @@
 struct bpf_map_def SEC("maps") results = {
 	.type = BPF_MAP_TYPE_ARRAY,
 	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
-	.max_entries = 1,
+	.value_size = sizeof(__u32),
+	.max_entries = 3,
 };
 
 static __always_inline void check_syncookie(void *ctx, void *data,
@@ -33,8 +33,10 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 	struct ipv6hdr *ipv6h;
 	struct tcphdr *tcph;
 	int ret;
+	__u32 key_mss = 2;
+	__u32 key_gen = 1;
 	__u32 key = 0;
-	__u64 value = 1;
+	__s64 seq_mss;
 
 	ethh = data;
 	if (ethh + 1 > data_end)
@@ -66,6 +68,8 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		if (sk->state != BPF_TCP_LISTEN)
 			goto release;
 
+		seq_mss = bpf_tcp_gen_syncookie(sk, ipv4h, sizeof(*ipv4h),
+						tcph, sizeof(*tcph));
 		ret = bpf_tcp_check_syncookie(sk, ipv4h, sizeof(*ipv4h),
 					      tcph, sizeof(*tcph));
 		break;
@@ -95,6 +99,9 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		if (sk->state != BPF_TCP_LISTEN)
 			goto release;
 
+		seq_mss = bpf_tcp_gen_syncookie(sk, ipv6h, sizeof(*ipv6h),
+						tcph, sizeof(*tcph));
+
 		ret = bpf_tcp_check_syncookie(sk, ipv6h, sizeof(*ipv6h),
 					      tcph, sizeof(*tcph));
 		break;
@@ -103,8 +110,19 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		return;
 	}
 
-	if (ret == 0)
-		bpf_map_update_elem(&results, &key, &value, 0);
+	if (seq_mss > 0) {
+		__u32 cookie = bpf_ntohl((__u32)seq_mss);
+		__u32 mss = seq_mss >> 32;
+
+		bpf_map_update_elem(&results, &key_gen, &cookie, 0);
+		bpf_map_update_elem(&results, &key_mss, &mss, 0);
+	}
+
+	if (ret == 0) {
+		__u32 cookie = bpf_ntohl(tcph->ack_seq) - 1;
+
+		bpf_map_update_elem(&results, &key, &cookie, 0);
+	}
 
 release:
 	bpf_sk_release(sk);
diff --git a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
index 87829c86c746..f3ff49ceb481 100644
--- a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
+++ b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2018 Facebook
 // Copyright (c) 2019 Cloudflare
 
+#include <limits.h>
 #include <string.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -77,7 +78,7 @@ static int connect_to_server(int server_fd)
 	return fd;
 }
 
-static int get_map_fd_by_prog_id(int prog_id)
+static int get_map_fd_by_prog_id(int prog_id, bool *xdp)
 {
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
@@ -104,6 +105,8 @@ static int get_map_fd_by_prog_id(int prog_id)
 		goto err;
 	}
 
+	*xdp = info.type == BPF_PROG_TYPE_XDP;
+
 	map_fd = bpf_map_get_fd_by_id(map_ids[0]);
 	if (map_fd < 0)
 		log_err("Failed to get fd by map id %d", map_ids[0]);
@@ -113,18 +116,32 @@ static int get_map_fd_by_prog_id(int prog_id)
 	return map_fd;
 }
 
-static int run_test(int server_fd, int results_fd)
+static int run_test(int server_fd, int results_fd, bool xdp)
 {
 	int client = -1, srv_client = -1;
 	int ret = 0;
 	__u32 key = 0;
-	__u64 value = 0;
+	__u32 key_gen = 1;
+	__u32 key_mss = 2;
+	__u32 value = 0;
+	__u32 value_gen = 0;
+	__u32 value_mss = 0;
 
 	if (bpf_map_update_elem(results_fd, &key, &value, 0) < 0) {
 		log_err("Can't clear results");
 		goto err;
 	}
 
+	if (bpf_map_update_elem(results_fd, &key_gen, &value_gen, 0) < 0) {
+		log_err("Can't clear results");
+		goto err;
+	}
+
+	if (bpf_map_update_elem(results_fd, &key_mss, &value_mss, 0) < 0) {
+		log_err("Can't clear results");
+		goto err;
+	}
+
 	client = connect_to_server(server_fd);
 	if (client == -1)
 		goto err;
@@ -140,8 +157,35 @@ static int run_test(int server_fd, int results_fd)
 		goto err;
 	}
 
-	if (value != 1) {
-		log_err("Didn't match syncookie: %llu", value);
+	if (value == 0) {
+		log_err("Didn't match syncookie: %u", value);
+		goto err;
+	}
+
+	if (bpf_map_lookup_elem(results_fd, &key_gen, &value_gen) < 0) {
+		log_err("Can't lookup result");
+		goto err;
+	}
+
+	if (xdp && value_gen == 0) {
+		// SYN packets do not get passed through generic XDP, skip the
+		// rest of the test.
+		log_err("Did not find SYN cookie at XDP.");
+		goto out;
+	}
+
+	if (bpf_map_lookup_elem(results_fd, &key_mss, &value_mss) < 0) {
+		log_err("Can't lookup result");
+		goto err;
+	}
+
+	if (value != value_gen) {
+		log_err("BPF generated cookie does not match kernel one");
+		goto err;
+	}
+
+	if (value_mss < 536 || value_mss > USHRT_MAX) {
+		log_err("Unexpected MSS retrieved");
 		goto err;
 	}
 
@@ -163,13 +207,14 @@ int main(int argc, char **argv)
 	int server_v6 = -1;
 	int results = -1;
 	int err = 0;
+	bool xdp;
 
 	if (argc < 2) {
 		fprintf(stderr, "Usage: %s prog_id\n", argv[0]);
 		exit(1);
 	}
 
-	results = get_map_fd_by_prog_id(atoi(argv[1]));
+	results = get_map_fd_by_prog_id(atoi(argv[1]), &xdp);
 	if (results < 0) {
 		log_err("Can't get map");
 		goto err;
@@ -194,10 +239,10 @@ int main(int argc, char **argv)
 	if (server_v6 == -1)
 		goto err;
 
-	if (run_test(server, results))
+	if (run_test(server, results, xdp))
 		goto err;
 
-	if (run_test(server_v6, results))
+	if (run_test(server_v6, results, xdp))
 		goto err;
 
 	printf("ok\n");
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Jakub Kicinski @ 2019-07-16  0:34 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vedang Patel, netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, m-karicheri2, dsahern
In-Reply-To: <20190715172422.4e127da2@hermes.lan>

On Mon, 15 Jul 2019 17:24:22 -0700, Stephen Hemminger wrote:
> On Mon, 15 Jul 2019 17:15:15 -0700
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> > On Mon, 15 Jul 2019 16:37:43 -0700, Stephen Hemminger wrote:  
> > > On Mon, 15 Jul 2019 15:51:41 -0700
> > > Vedang Patel <vedang.patel@intel.com> wrote:    
> > > > @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> > > >  
> > > >  	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
> > > >  
> > > > +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> > > > +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> > > > +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> > > > +	}    
> > >[...]
> > > 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
> > >    is that in the JSON output, print_uint would be decimal but the print_0xhex
> > >    is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.    
> > 
> > In my humble personal experience scripting tests using iproute2 and
> > bpftool with Python I found printing the "hex string" instead of just
> > outputing the integer value counter productive :( Even tho it looks
> > better to the eye, JSON is primarily for machine processing and hex
> > strings have to be manually converted.  
> 
> If it is hex on normal output, it should be hex on JSON output.
> And what ever the normal output format is has to be accepted on command line as input.

Ah, I forgot the output == input principle in iproute2!
In any case if there was ever a vote whether to limit this principle to
non-JSON output, and make machines' life easier, I'd vote 'yes' :)

^ permalink raw reply

* [PATCH net-next v2 0/7] net/rds: RDMA fixes
From: Gerd Rausch @ 2019-07-16  0:41 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev; +Cc: David Miller

A number of net/rds fixes necessary to make "rds_rdma.ko"
pass some basic Oracle internal tests.

Gerd Rausch (7):
  net/rds: Give fr_state a chance to transition to FRMR_IS_FREE
  net/rds: Get rid of "wait_clean_list_grace" and add locking
  net/rds: Wait for the FRMR_IS_FREE (or FRMR_IS_STALE) transition after
    posting IB_WR_LOCAL_INV
  net/rds: Fix NULL/ERR_PTR inconsistency
  net/rds: Set fr_state only to FRMR_IS_FREE if IB_WR_LOCAL_INV had been
    successful
  net/rds: Keep track of and wait for FRWR segments in use upon shutdown
  net/rds: Initialize ic->i_fastreg_wrs upon allocation

 net/rds/ib.h      |  1 +
 net/rds/ib_cm.c   |  9 ++++-
 net/rds/ib_frmr.c | 84 ++++++++++++++++++++++++++++++++++++++++++-----
 net/rds/ib_mr.h   |  4 +++
 net/rds/ib_rdma.c | 60 +++++++++++----------------------
 5 files changed, 109 insertions(+), 49 deletions(-)

-- 

Changes in submitted patch v2:
* Use "wait_event" instead of "wait_event_timeout" in order to
  not have a deadline for the HCA firmware to respond.

^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Patel, Vedang @ 2019-07-16  1:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, Jamal Hadi Salim, Cong Wang,
	jiri@resnulli.us, Gomes, Vinicius, Dorileo, Leandro,
	jakub.kicinski@netronome.com, m-karicheri2@ti.com,
	dsahern@gmail.com
In-Reply-To: <20190715163743.2c6cec2b@hermes.lan>

Hi Stephen,

> On Jul 15, 2019, at 4:37 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 15 Jul 2019 15:51:41 -0700
> Vedang Patel <vedang.patel@intel.com> wrote:
> 
>> @@ -405,6 +420,7 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>> 	struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
>> 	struct tc_mqprio_qopt *qopt = 0;
>> 	__s32 clockid = CLOCKID_INVALID;
>> +	__u32 taprio_flags = 0;
>> 	int i;
>> 
>> 	if (opt == NULL)
>> @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>> 
>> 	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
>> 
>> +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
>> +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
>> +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
>> +	}
>> +
> 
> Overall this looks fine, but three small comments:
> 1. It is better not to do unnecessary variable initialization
> 2. It is better to move variables into the basic block where they are used.
> 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
>   is that in the JSON output, print_uint would be decimal but the print_0xhex
>   is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.
Thanks for they inputs. I will incorporate your comments and send the updated series in a couple of days.

-Vedang
> 
> 


^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 3/5] taprio: add support for setting txtime_delay.
From: Patel, Vedang @ 2019-07-16  1:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Gomes, Vinicius, Dorileo, Leandro, jakub.kicinski@netronome.com,
	m-karicheri2@ti.com, dsahern@gmail.com
In-Reply-To: <20190715163822.32596123@hermes.lan>



> On Jul 15, 2019, at 4:38 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 15 Jul 2019 15:51:42 -0700
> Vedang Patel <vedang.patel@intel.com> wrote:
> 
>> +			if (get_s32(&txtime_delay, *argv, 0)) {
> 
> Is txtime_delay of a negative value meaningful?

No, txtime-delay should always be a positive value. I will change this to u32 here.  I will also make the corresponding changes in the kernel and send the updated patch. 

Thanks,
Vedang

^ permalink raw reply

* Re: [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-16  1:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Simon Horman, wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190715082747.fdlpvekbqyhwx724@salvia>

ok

I will remove all unnecessary spaces and send the v2 patch

Thansk Pablo

Pablo Neira Ayuso <pablo@netfilter.org> 于2019年7月15日周一 下午4:27写道：
>
> On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> > On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > > ---
> > >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > > index 94d9d34..98e358e 100644
> > > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> > >             return 0;
> > >     }
> > >
> > > -   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > -                    sizeof(unsigned long), GFP_KERNEL);
> > > +   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > +                   sizeof(unsigned long), GFP_KERNEL);
>
> May I ask one thing? :-)
>
> Please, remove all unnecessary spaces in one go, search for:
>
>         git grep "=  "
>
> in the netfilter tree, and send a v2 for this one.
>
> Thanks.

^ permalink raw reply

* Re: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Xiaoming Ni @ 2019-07-16  2:00 UTC (permalink / raw)
  To: Vasily Averin, gregkh@linuxfoundation.org
  Cc: adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, Huangjianhui (Alex), Dailei,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <5521e5a4-66d9-aaf8-3a12-3999bfc6be8b@virtuozzo.com>

On 2019/7/15 13:38, Vasily Averin wrote:
> On 7/14/19 5:45 AM, Xiaoming Ni wrote:
>> On 2019/7/12 22:07, gregkh@linuxfoundation.org wrote:
>>> On Fri, Jul 12, 2019 at 09:11:57PM +0800, Xiaoming Ni wrote:
>>>> On 2019/7/11 21:57, Vasily Averin wrote:
>>>>> On 7/11/19 4:55 AM, Nixiaoming wrote:
>>>>>> On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>>>>>>> On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>>>>>>>> Registering the same notifier to a hook repeatedly can cause the hook
>>>>>>>> list to form a ring or lose other members of the list.
>>>>>>>
>>>>>>> I think is not enough to _prevent_ 2nd register attempt,
>>>>>>> it's enough to detect just attempt and generate warning to mark host in bad state.
>>>>>>>
>>>>>>
>>>>>> Duplicate registration is prevented in my patch, not just "mark host in bad state"
>>>>>>
>>>>>> Duplicate registration is checked and exited in notifier_chain_cond_register()
>>>>>>
>>>>>> Duplicate registration was checked in notifier_chain_register() but only 
>>>>>> the alarm was triggered without exiting. added by commit 831246570d34692e 
>>>>>> ("kernel/notifier.c: double register detection")
>>>>>>
>>>>>> My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
>>>>>>  which triggers an alarm and exits when a duplicate registration is detected.
>>>>>>
>>>>>>> Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>>>>>>> and it can lead to host crash in any time: 
>>>>>>> you can unregister notifier on first attempt it can be too early, it can be still in use.
>>>>>>> on the other hand you can never call 2nd unregister at all.
>>>>>>
>>>>>> Since the member was not added to the linked list at the time of the second registration, 
>>>>>> no linked list ring was formed. 
>>>>>> The member is released on the first unregistration and -ENOENT on the second unregistration.
>>>>>> After patching, the fault has been alleviated
>>>>>
>>>>> You are wrong here.
>>>>> 2nd notifier's registration is a pure bug, this should never happen.
>>>>> If you know the way to reproduce this situation -- you need to fix it. 
>>>>>
>>>>> 2nd registration can happen in 2 cases:
>>>>> 1) missed rollback, when someone forget to call unregister after successfull registration, 
>>>>> and then tried to call register again. It can lead to crash for example when according module will be unloaded.
>>>>> 2) some subsystem is registered twice, for example from  different namespaces.
>>>>> in this case unregister called during sybsystem cleanup in first namespace will incorrectly remove notifier used 
>>>>> in second namespace, it also can lead to unexpacted behaviour.
>>>>>
>>>> So in these two cases, is it more reasonable to trigger BUG() directly when checking for duplicate registration ?
>>>> But why does current notifier_chain_register() just trigger WARN() without exiting ?
>>>> notifier_chain_cond_register() direct exit without triggering WARN() ?
>>>
>>> It should recover from this, if it can be detected.  The main point is
>>> that not all apis have to be this "robust" when used within the kernel
>>> as we do allow for the callers to know what they are doing :)
>>>
>> In the notifier_chain_register(), the condition ( (*nl) == n) is the same registration of the same hook.
>>  We can intercept this situation and avoid forming a linked list ring to make the API more rob
> 
> Once again -- yes, you CAN prevent list corruption, but you CANNOT recover the host and return it back to safe state.
> If double register event was detected -- it means you have bug in kernel.
> 
> Yes, you can add BUG here and crash the host immediately, but I prefer to use warning in such situations.
> 
>>> If this does not cause any additional problems or slow downs, it's
>>> probably fine to add.
>>>
>> Notifier_chain_register() is not a system hotspot function.
>> At the same time, there is already a WARN_ONCE judgment. There is no new judgment in the new patch.
>> It only changes the processing under the condition of (*nl) == n, which will not cause performance problems.
>> At the same time, avoiding the formation of a link ring can make the system more robust.
> 
> I disagree, 
> yes, node will have correct list, but anyway node will work wrong and can crash the host in any time.

Sorry, my description is not accurate.

My patch feature does not prevent users from repeatedly registering hooks.
But avoiding the chain ring caused by the user repeatedly registering the hook

There are no modules for duplicate registration hooks in the current system.
But considering that not all modules are in the kernel source tree,
In order to improve the robustness of the kernel API, we should avoid the linked list ring caused by repeated registration.
Or in order to improve the efficiency of problem location, when the duplicate registration is checked, the system crashes directly.

On the other hand, the difference between notifier_chain_register() and notifier_chain_cond_register() for duplicate registrations is confusing:
Blocking the formation of the linked list ring in notifier_chain_cond_register()
There is no interception of the linked list ring in notifier_chain_register(), just an alarm.
Give me the illusion: Isn't notifier_chain_register() allowed to create a linked list ring?

Thanks

xiaoming Ni




^ permalink raw reply

* RE: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller
From: Andy Duan @ 2019-07-16  2:02 UTC (permalink / raw)
  To: Sven Van Asbroeck
  Cc: David S . Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190715210512.15823-1-TheSven73@gmail.com>

From: Sven Van Asbroeck <thesven73@gmail.com> Sent: Tuesday, July 16, 2019 5:05 AM
> The current fec driver allows the PHY to be reset via a gpio, specified in the
> devicetree. However, some PHYs need to be reset in a more complex way.
> 
> To accommodate such PHYs, allow an optional reset controller in the fec
> devicetree. If no reset controller is found, the fec will fall back to the legacy
> reset behaviour.
> 
> Example:
> &fec {
>         phy-mode = "rgmii";
>         resets = <&phy_reset>;
>         reset-names = "phy";
>         status = "okay";
> };
> 
> Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
> ---
> 
> Will send a Documentation patch if this receives a positive review.
> 
>  drivers/net/ethernet/freescale/fec_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 38f10f7dcbc3..5a5f3ed6f16d 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -61,6 +61,7 @@
>  #include <linux/regulator/consumer.h>
>  #include <linux/if_vlan.h>
>  #include <linux/pinctrl/consumer.h>
> +#include <linux/reset.h>
>  #include <linux/prefetch.h>
>  #include <soc/imx/cpuidle.h>
> 
> @@ -3335,6 +3336,7 @@ static int fec_enet_get_irq_cnt(struct
> platform_device *pdev)  static int  fec_probe(struct platform_device *pdev)
> {
> +       struct reset_control *phy_reset;
>         struct fec_enet_private *fep;
>         struct fec_platform_data *pdata;
>         struct net_device *ndev;
> @@ -3490,7 +3492,9 @@ fec_probe(struct platform_device *pdev)
>         pm_runtime_set_active(&pdev->dev);
>         pm_runtime_enable(&pdev->dev);
> 
> -       ret = fec_reset_phy(pdev);
> +       phy_reset = devm_reset_control_get_exclusive(&pdev->dev,
> "phy");
> +       ret = IS_ERR(phy_reset) ? fec_reset_phy(pdev) :
> +                       reset_control_reset(phy_reset);
>         if (ret)
>                 goto failed_reset;

The patch looks fine.
But the reset mechanism in the driver should be abandoned since
the phylib already can handle mii bus reset and phy device reset like
below piece code:

of_mdiobus_register()
    of_mdiobus_register_phy()
        phy_device_register()
            mdiobus_register_device()
                /* Assert the reset signal */
                mdio_device_reset(mdiodev, 1);
            /* Deassert the reset signal */
            mdio_device_reset(&phydev->mdio, 0);

dts:
        ethernet-phy@0 {
            compatible = "ethernet-phy-id0141.0e90", "ethernet-phy-ieee802.3-c45";
            interrupt-parent = <&PIC>;
            interrupts = <35 1>;
            reg = <0>;

            resets = <&rst 8>;
            reset-names = "phy";
            reset-gpios = <&gpio1 4 1>;
            reset-assert-us = <1000>;
            reset-deassert-us = <2000>;
        };
    };
            
Please refer to doc:
Documentation/devicetree/bindings/net/ethernet-phy.yaml
> 
> --
> 2.17.1


^ permalink raw reply

* Re: [PATCH net-next v2 0/7] net/rds: RDMA fixes
From: David Miller @ 2019-07-16  2:05 UTC (permalink / raw)
  To: gerd.rausch; +Cc: santosh.shilimkar, netdev
In-Reply-To: <510cd678-67d6-bd53-1d8e-7a74c4efb14a@oracle.com>

net-next is closed, and why are you submitting bug fixes for net-next
when 'net' is the appropriate tree to target for that purpose?

^ permalink raw reply

* [PATCH v2] net/netfilter: remove unnecessary spaces
From: yangxingwu @ 2019-07-16  2:13 UTC (permalink / raw)
  To: pablo
  Cc: wensong, horms, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, joe, yangxingwu
In-Reply-To: <20190715082747.fdlpvekbqyhwx724@salvia>

this patch removes extra spaces

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
---
 net/netfilter/ipset/ip_set_hash_gen.h  | 2 +-
 net/netfilter/ipset/ip_set_list_set.c  | 2 +-
 net/netfilter/ipvs/ip_vs_core.c        | 2 +-
 net/netfilter/ipvs/ip_vs_mh.c          | 4 ++--
 net/netfilter/ipvs/ip_vs_proto_tcp.c   | 2 +-
 net/netfilter/nf_conntrack_ftp.c       | 2 +-
 net/netfilter/nf_conntrack_proto_tcp.c | 2 +-
 net/netfilter/nfnetlink_log.c          | 4 ++--
 net/netfilter/nfnetlink_queue.c        | 4 ++--
 net/netfilter/xt_IDLETIMER.c           | 2 +-
 10 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 10f6196..eb907d2 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -954,7 +954,7 @@ struct htype {
 		mtype_data_netmask(d, NCIDR_GET(h->nets[j].cidr[0]));
 #endif
 		key = HKEY(d, h->initval, t->htable_bits);
-		n =  rcu_dereference_bh(hbucket(t, key));
+		n = rcu_dereference_bh(hbucket(t, key));
 		if (!n)
 			continue;
 		for (i = 0; i < n->pos; i++) {
diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index 8ada318..5c2be76 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -289,7 +289,7 @@ struct list_set {
 	if (n &&
 	    !(SET_WITH_TIMEOUT(set) &&
 	      ip_set_timeout_expired(ext_timeout(n, set))))
-		n =  NULL;
+		n = NULL;
 
 	e = kzalloc(set->dsize, GFP_ATOMIC);
 	if (!e)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 7138556..6b3ae76 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -615,7 +615,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
 				      iph->protocol == IPPROTO_UDP) ?
 				      IP_VS_CONN_F_ONE_PACKET : 0;
-		union nf_inet_addr daddr =  { .all = { 0, 0, 0, 0 } };
+		union nf_inet_addr daddr = { .all = { 0, 0, 0, 0 } };
 
 		/* create a new connection entry */
 		IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
index 94d9d34..da0280c 100644
--- a/net/netfilter/ipvs/ip_vs_mh.c
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
 		return 0;
 	}
 
-	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
-			 sizeof(unsigned long), GFP_KERNEL);
+	table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
+			sizeof(unsigned long), GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 915ac82..c7b46a9 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -710,7 +710,7 @@ static int __ip_vs_tcp_init(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd
 							sizeof(tcp_timeouts));
 	if (!pd->timeout_table)
 		return -ENOMEM;
-	pd->tcp_state_table =  tcp_states;
+	pd->tcp_state_table = tcp_states;
 	return 0;
 }
 
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 8c6c11b..26c1ff8 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -162,7 +162,7 @@ static int try_rfc959(const char *data, size_t dlen,
 	if (length == 0)
 		return 0;
 
-	cmd->u3.ip =  htonl((array[0] << 24) | (array[1] << 16) |
+	cmd->u3.ip = htonl((array[0] << 24) | (array[1] << 16) |
 				    (array[2] << 8) | array[3]);
 	cmd->u.tcp.port = htons((array[4] << 8) | array[5]);
 	return length;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 1e2cc83..48f3a67 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1225,7 +1225,7 @@ static int tcp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
 	[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL] = { .type = NLA_U8 },
 	[CTA_PROTOINFO_TCP_WSCALE_REPLY]    = { .type = NLA_U8 },
 	[CTA_PROTOINFO_TCP_FLAGS_ORIGINAL]  = { .len = sizeof(struct nf_ct_tcp_flags) },
-	[CTA_PROTOINFO_TCP_FLAGS_REPLY]	    = { .len =  sizeof(struct nf_ct_tcp_flags) },
+	[CTA_PROTOINFO_TCP_FLAGS_REPLY]	    = { .len = sizeof(struct nf_ct_tcp_flags) },
 };
 
 #define TCP_NLATTR_SIZE	( \
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 6dee4f9..d69e186 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -651,7 +651,7 @@ static void nfulnl_instance_free_rcu(struct rcu_head *head)
 	/* FIXME: do we want to make the size calculation conditional based on
 	 * what is actually present?  way more branches and checks, but more
 	 * memory efficient... */
-	size =    nlmsg_total_size(sizeof(struct nfgenmsg))
+	size = nlmsg_total_size(sizeof(struct nfgenmsg))
 		+ nla_total_size(sizeof(struct nfulnl_msg_packet_hdr))
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
@@ -668,7 +668,7 @@ static void nfulnl_instance_free_rcu(struct rcu_head *head)
 		+ nla_total_size(sizeof(struct nfgenmsg));	/* NLMSG_DONE */
 
 	if (in && skb_mac_header_was_set(skb)) {
-		size +=   nla_total_size(skb->dev->hard_header_len)
+		size += nla_total_size(skb->dev->hard_header_len)
 			+ nla_total_size(sizeof(u_int16_t))	/* hwtype */
 			+ nla_total_size(sizeof(u_int16_t));	/* hwlen */
 	}
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 89750f7..a1ef6e3 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -394,7 +394,7 @@ static int nfqnl_put_bridge(struct nf_queue_entry *entry, struct sk_buff *skb)
 	char *secdata = NULL;
 	u32 seclen = 0;
 
-	size =    nlmsg_total_size(sizeof(struct nfgenmsg))
+	size = nlmsg_total_size(sizeof(struct nfgenmsg))
 		+ nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
@@ -453,7 +453,7 @@ static int nfqnl_put_bridge(struct nf_queue_entry *entry, struct sk_buff *skb)
 	}
 
 	if (queue->flags & NFQA_CFG_F_UID_GID) {
-		size +=  (nla_total_size(sizeof(u_int32_t))	/* uid */
+		size += (nla_total_size(sizeof(u_int32_t))	/* uid */
 			+ nla_total_size(sizeof(u_int32_t)));	/* gid */
 	}
 
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
index 9cec9ea..f56d3ed 100644
--- a/net/netfilter/xt_IDLETIMER.c
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -283,7 +283,7 @@ static int __init idletimer_tg_init(void)
 
 	idletimer_tg_kobj = &idletimer_tg_device->kobj;
 
-	err =  xt_register_target(&idletimer_tg);
+	err = xt_register_target(&idletimer_tg);
 	if (err < 0) {
 		pr_debug("couldn't register xt target\n");
 		goto out_dev;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-16  2:18 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Simon Horman, wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190715082747.fdlpvekbqyhwx724@salvia>

Pablo

v2 has been sent

I made the following changes:

1. remove all unnecessary spaces in one go
2. revert bitmap_alloc ( since it's irrelevant to this subject)
3. chenge subject to "net/netfiler:remove unnecessary space"

thanks

Pablo Neira Ayuso <pablo@netfilter.org> 于2019年7月15日周一 下午4:27写道：
>
> On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> > On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > > ---
> > >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > > index 94d9d34..98e358e 100644
> > > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> > >             return 0;
> > >     }
> > >
> > > -   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > -                    sizeof(unsigned long), GFP_KERNEL);
> > > +   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > +                   sizeof(unsigned long), GFP_KERNEL);
>
> May I ask one thing? :-)
>
> Please, remove all unnecessary spaces in one go, search for:
>
>         git grep "=  "
>
> in the netfilter tree, and send a v2 for this one.
>
> Thanks.

^ permalink raw reply

* [PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"
From: Hariprasad Kelam @ 2019-07-16  2:20 UTC (permalink / raw)
  To: Vlad Yasevich, Neil Horman, Marcelo Ricardo Leitner,
	David S. Miller, linux-sctp, netdev, linux-kernel

This patch removes NULL checks before calling kfree.

fixes below issues reported by coccicheck
net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
freeing functions is not needed.

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
---
 net/sctp/sm_make_chunk.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ed39396..36bd8a6e 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2582,8 +2582,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 	case SCTP_PARAM_STATE_COOKIE:
 		asoc->peer.cookie_len =
 			ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
-		if (asoc->peer.cookie)
-			kfree(asoc->peer.cookie);
+		kfree(asoc->peer.cookie);
 		asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp);
 		if (!asoc->peer.cookie)
 			retval = 0;
@@ -2648,8 +2647,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 			goto fall_through;
 
 		/* Save peer's random parameter */
-		if (asoc->peer.peer_random)
-			kfree(asoc->peer.peer_random);
+		kfree(asoc->peer.peer_random);
 		asoc->peer.peer_random = kmemdup(param.p,
 					    ntohs(param.p->length), gfp);
 		if (!asoc->peer.peer_random) {
@@ -2663,8 +2661,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 			goto fall_through;
 
 		/* Save peer's HMAC list */
-		if (asoc->peer.peer_hmacs)
-			kfree(asoc->peer.peer_hmacs);
+		kfree(asoc->peer.peer_hmacs);
 		asoc->peer.peer_hmacs = kmemdup(param.p,
 					    ntohs(param.p->length), gfp);
 		if (!asoc->peer.peer_hmacs) {
@@ -2680,8 +2677,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 		if (!ep->auth_enable)
 			goto fall_through;
 
-		if (asoc->peer.peer_chunks)
-			kfree(asoc->peer.peer_chunks);
+		kfree(asoc->peer.peer_chunks);
 		asoc->peer.peer_chunks = kmemdup(param.p,
 					    ntohs(param.p->length), gfp);
 		if (!asoc->peer.peer_chunks)
-- 
2.7.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox