Netdev List
 help / color / mirror / Atom feed
* Re: TCP reaching to maximum throughput after a long time
From: Ben Greear @ 2016-04-12 20:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list,
	Kama, Meirav
In-Reply-To: <1460492253.6473.596.camel@edumazet-glaptop3.roam.corp.google.com>

On 04/12/2016 01:17 PM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 13:11 -0700, Ben Greear wrote:
>> On 04/12/2016 12:31 PM, Machani, Yaniv wrote:
>>> On Tue, Apr 12, 2016 at 18:04:52, Ben Greear wrote:
>>>> On 04/12/2016 07:52 AM, Eric Dumazet wrote:
>>>>> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>>>>>>
>>>>
>>>> If you are using 'Cubic' TCP congestion control, then please try
>>>> something different.
>>>> It was broken last I checked, at least when used with the ath10k driver.
>>>>
>>>
>>> Thanks Ben, this indeed seems to be the issue !
>>> Switching to reno got me to max throughput instantly.
>>>
>>> I'm still looking through the thread you have shared, but from what I understand there is no planned fix for it ?
>>
>> I think at the time it was blamed on ath10k and no one cared to try to fix it.
>>
>> Or, maybe no one really uses CUBIC anymore?
>>
>> Either way, I have no plans to try to fix CUBIC, but maybe someone who knows
>> this code better could give it a try.
>
> Well, cubic seems to work in many cases, assuming they are not too many
> drops.
>
> Assuming one flow can get nominal speed in few RTT is kind a dream, and
> so far nobody claimed a CC was able to do that, while still being fair
> and resilient.
>
> TCP CC are full of heuristics, and by definition heuristics that were
> working 6 years ago might need to be refreshed.
>
> We are still maintaining Cubic for sure.

It worked well enough for years that I didn't even know other algorithms were
available.  It was broken around 4.0 time, and I reported it to the list,
and no one seemed to really care enough to do anything about it.  I changed
to reno and ignored the problem as well.

It is trivially easy to see the regression when using ath10k NIC, and from this email
thread, I guess other NICs have similar issues.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: Issue with ping source address display
From: Julian Anastasov @ 2016-04-12 20:27 UTC (permalink / raw)
  To: Daniele Orlandi; +Cc: netdev
In-Reply-To: <570D50C8.4010409@orlandi.com>


	Hello,

On Tue, 12 Apr 2016, Daniele Orlandi wrote:

> On 12/04/2016 21:38, Julian Anastasov wrote:
> > 
> > 	What is the kernel version?
> 
> Much time passed and I don't remember, however the latest test with IPv6
> ping was made on this host:
> 
> root@monitor:~# uname -a
> Linux monitor 4.4.0-17-generic #33-Ubuntu SMP Tue Mar 29 17:17:28 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux

	I asked because I remember for such problem but
at the same time your first report was from Sep 2014, long
before the bad period: 4.0 - 4.1. Can you find and try
this fix from 4.2?:

commit 34b99df4e6256ddafb663c6de0711dceceddfe0e
Author: Julian Anastasov <ja@ssi.bg>
Date:   Tue Jun 23 08:34:39 2015 +0300

    ip: report the original address of ICMP messages
    
    ICMP messages can trigger ICMP and local errors. In this case
    serr->port is 0 and starting from Linux 4.0 we do not return
    the original target address to the error queue readers.
    Add function to define which errors provide addr_offset.
    With this fix my ping command is not silent anymore.

	Because I'm not sure if your kernel includes it.
I also remember that the ping utility can use random memory
when such 4.0/4.1 kernel returns msg_namelen=0:

http://marc.info/?l=linux-netdev&m=143509000420707&w=2

Thread:
http://marc.info/?t=143503781500001&r=1&w=2

Regards

^ permalink raw reply

* Re: TCP reaching to maximum throughput after a long time
From: Eric Dumazet @ 2016-04-12 20:29 UTC (permalink / raw)
  To: Ben Greear
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list,
	Kama, Meirav
In-Reply-To: <570D5944.5070100@candelatech.com>

On Tue, 2016-04-12 at 13:23 -0700, Ben Greear wrote:

> It worked well enough for years that I didn't even know other algorithms were
> available.  It was broken around 4.0 time, and I reported it to the list,
> and no one seemed to really care enough to do anything about it.  I changed
> to reno and ignored the problem as well.
> 
> It is trivially easy to see the regression when using ath10k NIC, and from this email
> thread, I guess other NICs have similar issues.

Since it is so trivial, why don't you start a bisection ?

I asked a capture, I did not say ' switch to Reno or whatever ', right ?

Guessing is nice, but investigating and fixing is better.

Do not assume that nothing can be done, please ?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] ibmvnic: Defer tx completion processing using a wait queue
From: John Allen @ 2016-04-12 21:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Falcon, netdev, linuxppc-dev
In-Reply-To: <1460491940.6473.592.camel@edumazet-glaptop3.roam.corp.google.com>

On 04/12/2016 03:12 PM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 14:38 -0500, John Allen wrote:
>> Moves tx completion processing out of interrupt context, deferring work
>> using a wait queue. With this work now deferred, we must account for the
>> possibility that skbs can be sent faster than we can process completion
>> requests in which case the tx buffer will overflow. If the tx buffer is
>> full, ibmvnic_xmit will return NETDEV_TX_BUSY and stop the current tx
>> queue. Subsequently, the queue will be restarted in ibmvnic_complete_tx
>> when all pending tx completion requests have been cleared.
> 
> 1) Why is this needed ?

In the current ibmvnic implementation, tx completion processing is done in
interrupt context. Depending on the load, this can block further
interrupts for a long time. This patch just creates a bottom half so that
when a tx completion interrupt comes in, we can defer the majority of the
work and exit interrupt context quickly.

> 
> 2) If it is needed, why is this not done in a generic way, so that other
> drivers can use this ?

I'm still fairly new to network driver development so I'm not in tune with
the needs of other drivers. My assumption was that the wait queue data
structure was a reasonably generic way to handle something like this. Is
there a more appropriate/generic way of implementing a bottom half for
this purpose?

-John
> 
> Thanks.
> 
> 
> 
> 

^ permalink raw reply

* [PATCH] sctp: add support for RPS and RFS
From: Marcelo Ricardo Leitner @ 2016-04-12 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Vlad Yasevich, linux-sctp

This patch adds what's missing to properly support RPS and RFS on SCTP,
as some of it is already implemented in common calls.

Having support for RPS and RFS allows better scaling specially because
not all NICs support hashing SCTP headers.

Save the hash right when we dequeue a skb from inqueue so we do it only
once per skb instead of per chunk. New sockets will then inherit the
hash through sctp_copy_sock().

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/inqueue.c | 3 +++
 net/sctp/socket.c  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index 7e8a16c77039e1b70ef89f3e862dbb332bcc614f..b335ffcef0b901b0e71bf0843057dbf5877da31b 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -163,6 +163,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 		chunk->singleton = 1;
 		ch = (sctp_chunkhdr_t *) chunk->skb->data;
 		chunk->data_accepted = 0;
+
+		if (chunk->asoc)
+			sock_rps_save_rxhash(chunk->asoc->base.sk, chunk->skb);
 	}
 
 	chunk->chunk_hdr = ch;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 878d28eda1a68dc639d7b9f3f663d7518f21bb32..36697f85ce48be39f41ddedd9f53369c7f9e28d8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6430,6 +6430,8 @@ unsigned int sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 	poll_wait(file, sk_sleep(sk), wait);
 
+	sock_rps_record_flow(sk);
+
 	/* A TCP-style listening socket becomes readable when the accept queue
 	 * is not empty.
 	 */
@@ -7186,6 +7188,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 	newsk->sk_lingertime = sk->sk_lingertime;
 	newsk->sk_rcvtimeo = sk->sk_rcvtimeo;
 	newsk->sk_sndtimeo = sk->sk_sndtimeo;
+	newsk->sk_rxhash = sk->sk_rxhash;
 
 	newinet = inet_sk(newsk);
 
-- 
2.5.0

^ permalink raw reply related

* Re: TCP reaching to maximum throughput after a long time
From: Ben Greear @ 2016-04-12 21:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Machani, Yaniv, netdev, David S. Miller, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Nandita Dukkipati, open list,
	Kama, Meirav
In-Reply-To: <1460492999.6473.599.camel@edumazet-glaptop3.roam.corp.google.com>

On 04/12/2016 01:29 PM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 13:23 -0700, Ben Greear wrote:
>
>> It worked well enough for years that I didn't even know other algorithms were
>> available.  It was broken around 4.0 time, and I reported it to the list,
>> and no one seemed to really care enough to do anything about it.  I changed
>> to reno and ignored the problem as well.
>>
>> It is trivially easy to see the regression when using ath10k NIC, and from this email
>> thread, I guess other NICs have similar issues.
>
> Since it is so trivial, why don't you start a bisection ?

I vaguely remember doing a bisect, but I can't find any email about
that, so maybe I didn't.  At any rate, it is somewhere between 3.17 and 4.0.
 From memory, it was between 3.19 and 4.0, but I am not certain of that.

Neil's suggestion, from the thread below, is that it was likely:  "605ad7f tcp: refine TSO autosizing"

Here is previous email thread:

https://www.mail-archive.com/netdev@vger.kernel.org/msg80803.html

This one has a link to a pcap I made at the time:

https://www.mail-archive.com/netdev@vger.kernel.org/msg80890.html

>
> I asked a capture, I did not say ' switch to Reno or whatever ', right ?
>
> Guessing is nice, but investigating and fixing is better.
>
> Do not assume that nothing can be done, please ?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] sctp: add support for RPS and RFS
From: Tom Herbert @ 2016-04-12 21:50 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Linux Kernel Network Developers, Neil Horman, Vlad Yasevich,
	linux-sctp
In-Reply-To: <d0b676c036479fe304218a05e1b03a7ba274426c.1460495323.git.marcelo.leitner@gmail.com>

On Tue, Apr 12, 2016 at 2:11 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> This patch adds what's missing to properly support RPS and RFS on SCTP,
> as some of it is already implemented in common calls.
>
> Having support for RPS and RFS allows better scaling specially because
> not all NICs support hashing SCTP headers.
>
> Save the hash right when we dequeue a skb from inqueue so we do it only
> once per skb instead of per chunk. New sockets will then inherit the
> hash through sctp_copy_sock().
>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  net/sctp/inqueue.c | 3 +++
>  net/sctp/socket.c  | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
> index 7e8a16c77039e1b70ef89f3e862dbb332bcc614f..b335ffcef0b901b0e71bf0843057dbf5877da31b 100644
> --- a/net/sctp/inqueue.c
> +++ b/net/sctp/inqueue.c
> @@ -163,6 +163,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
>                 chunk->singleton = 1;
>                 ch = (sctp_chunkhdr_t *) chunk->skb->data;
>                 chunk->data_accepted = 0;
> +
> +               if (chunk->asoc)
> +                       sock_rps_save_rxhash(chunk->asoc->base.sk, chunk->skb);
>         }
>
>         chunk->chunk_hdr = ch;
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 878d28eda1a68dc639d7b9f3f663d7518f21bb32..36697f85ce48be39f41ddedd9f53369c7f9e28d8 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -6430,6 +6430,8 @@ unsigned int sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
>
>         poll_wait(file, sk_sleep(sk), wait);
>
> +       sock_rps_record_flow(sk);
> +
>         /* A TCP-style listening socket becomes readable when the accept queue
>          * is not empty.
>          */
> @@ -7186,6 +7188,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
>         newsk->sk_lingertime = sk->sk_lingertime;
>         newsk->sk_rcvtimeo = sk->sk_rcvtimeo;
>         newsk->sk_sndtimeo = sk->sk_sndtimeo;
> +       newsk->sk_rxhash = sk->sk_rxhash;
>
>         newinet = inet_sk(newsk);
>
HI Marcelo,

sock_rps_record_flow should probably be in sctp_poll also (like it is
in tcp_poll).

Thanks,
Tom

> --
> 2.5.0
>

^ permalink raw reply

* Re: [PATCH] sctp: add support for RPS and RFS
From: Marcelo Ricardo Leitner @ 2016-04-12 21:57 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Neil Horman, Vlad Yasevich,
	linux-sctp
In-Reply-To: <CALx6S35by22eLr-FOFj_RQvOLZm+=cyKYn1MeeD3QQqEEcsRAg@mail.gmail.com>

On Tue, Apr 12, 2016 at 02:50:45PM -0700, Tom Herbert wrote:
> On Tue, Apr 12, 2016 at 2:11 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > This patch adds what's missing to properly support RPS and RFS on SCTP,
> > as some of it is already implemented in common calls.
> >
> > Having support for RPS and RFS allows better scaling specially because
> > not all NICs support hashing SCTP headers.
> >
> > Save the hash right when we dequeue a skb from inqueue so we do it only
> > once per skb instead of per chunk. New sockets will then inherit the
> > hash through sctp_copy_sock().
> >
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > ---
> >  net/sctp/inqueue.c | 3 +++
> >  net/sctp/socket.c  | 3 +++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
> > index 7e8a16c77039e1b70ef89f3e862dbb332bcc614f..b335ffcef0b901b0e71bf0843057dbf5877da31b 100644
> > --- a/net/sctp/inqueue.c
> > +++ b/net/sctp/inqueue.c
> > @@ -163,6 +163,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
> >                 chunk->singleton = 1;
> >                 ch = (sctp_chunkhdr_t *) chunk->skb->data;
> >                 chunk->data_accepted = 0;
> > +
> > +               if (chunk->asoc)
> > +                       sock_rps_save_rxhash(chunk->asoc->base.sk, chunk->skb);
> >         }
> >
> >         chunk->chunk_hdr = ch;
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index 878d28eda1a68dc639d7b9f3f663d7518f21bb32..36697f85ce48be39f41ddedd9f53369c7f9e28d8 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -6430,6 +6430,8 @@ unsigned int sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
> >
> >         poll_wait(file, sk_sleep(sk), wait);
> >
> > +       sock_rps_record_flow(sk);
> > +
> >         /* A TCP-style listening socket becomes readable when the accept queue
> >          * is not empty.
> >          */
> > @@ -7186,6 +7188,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
> >         newsk->sk_lingertime = sk->sk_lingertime;
> >         newsk->sk_rcvtimeo = sk->sk_rcvtimeo;
> >         newsk->sk_sndtimeo = sk->sk_sndtimeo;
> > +       newsk->sk_rxhash = sk->sk_rxhash;
> >
> >         newinet = inet_sk(newsk);
> >
> HI Marcelo,
> 
> sock_rps_record_flow should probably be in sctp_poll also (like it is
> in tcp_poll).
> 
> Thanks,
> Tom

Hi Tom,

Yes, that's the middle chunk on this patch, no?

Note that for udp it's done after the actual poll, while for tcp it's
being saved before it. I went more udp-style on this one.

Thanks,
Marcelo

^ permalink raw reply

* Re: [PATCH] sctp: add support for RPS and RFS
From: Tom Herbert @ 2016-04-12 21:58 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Linux Kernel Network Developers, Neil Horman, Vlad Yasevich,
	linux-sctp
In-Reply-To: <20160412215728.GI15005@localhost.localdomain>

On Tue, Apr 12, 2016 at 2:57 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Tue, Apr 12, 2016 at 02:50:45PM -0700, Tom Herbert wrote:
>> On Tue, Apr 12, 2016 at 2:11 PM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > This patch adds what's missing to properly support RPS and RFS on SCTP,
>> > as some of it is already implemented in common calls.
>> >
>> > Having support for RPS and RFS allows better scaling specially because
>> > not all NICs support hashing SCTP headers.
>> >
>> > Save the hash right when we dequeue a skb from inqueue so we do it only
>> > once per skb instead of per chunk. New sockets will then inherit the
>> > hash through sctp_copy_sock().
>> >
>> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> > ---
>> >  net/sctp/inqueue.c | 3 +++
>> >  net/sctp/socket.c  | 3 +++
>> >  2 files changed, 6 insertions(+)
>> >
>> > diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
>> > index 7e8a16c77039e1b70ef89f3e862dbb332bcc614f..b335ffcef0b901b0e71bf0843057dbf5877da31b 100644
>> > --- a/net/sctp/inqueue.c
>> > +++ b/net/sctp/inqueue.c
>> > @@ -163,6 +163,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
>> >                 chunk->singleton = 1;
>> >                 ch = (sctp_chunkhdr_t *) chunk->skb->data;
>> >                 chunk->data_accepted = 0;
>> > +
>> > +               if (chunk->asoc)
>> > +                       sock_rps_save_rxhash(chunk->asoc->base.sk, chunk->skb);
>> >         }
>> >
>> >         chunk->chunk_hdr = ch;
>> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> > index 878d28eda1a68dc639d7b9f3f663d7518f21bb32..36697f85ce48be39f41ddedd9f53369c7f9e28d8 100644
>> > --- a/net/sctp/socket.c
>> > +++ b/net/sctp/socket.c
>> > @@ -6430,6 +6430,8 @@ unsigned int sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
>> >
>> >         poll_wait(file, sk_sleep(sk), wait);
>> >
>> > +       sock_rps_record_flow(sk);
>> > +
>> >         /* A TCP-style listening socket becomes readable when the accept queue
>> >          * is not empty.
>> >          */
>> > @@ -7186,6 +7188,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
>> >         newsk->sk_lingertime = sk->sk_lingertime;
>> >         newsk->sk_rcvtimeo = sk->sk_rcvtimeo;
>> >         newsk->sk_sndtimeo = sk->sk_sndtimeo;
>> > +       newsk->sk_rxhash = sk->sk_rxhash;
>> >
>> >         newinet = inet_sk(newsk);
>> >
>> HI Marcelo,
>>
>> sock_rps_record_flow should probably be in sctp_poll also (like it is
>> in tcp_poll).
>>
>> Thanks,
>> Tom
>
> Hi Tom,
>
> Yes, that's the middle chunk on this patch, no?
>
> Note that for udp it's done after the actual poll, while for tcp it's
> being saved before it. I went more udp-style on this one.
>

Yes, I missed that.

> Thanks,
> Marcelo

^ permalink raw reply

* [PATCH net-next 0/5] BPF updates
From: Daniel Borkmann @ 2016-04-12 22:10 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, tgraf, bblanco, brendan.d.gregg, netdev,
	Daniel Borkmann

This series adds a new verifier argument type called
ARG_PTR_TO_RAW_STACK and converts related helpers to make
use of it. Basic idea is that we can save init of stack
memory when the helper function is guaranteed to fully
fill out the passed buffer in every path. Series also adds
test cases and converts samples. For more details, please
see individual patches.

Thanks!

Daniel Borkmann (5):
  bpf, verifier: add bpf_call_arg_meta for passing meta data
  bpf, verifier: add ARG_PTR_TO_RAW_STACK type
  bpf: convert relevant helper args to ARG_PTR_TO_RAW_STACK
  bpf, samples: don't zero data when not needed
  bpf, samples: add test cases for raw stack

 include/linux/bpf.h            |   5 +
 kernel/bpf/helpers.c           |  17 ++-
 kernel/bpf/verifier.c          |  97 +++++++++++----
 kernel/trace/bpf_trace.c       |  10 +-
 net/core/filter.c              |  57 ++++++---
 samples/bpf/offwaketime_kern.c |  10 +-
 samples/bpf/test_verifier.c    | 268 +++++++++++++++++++++++++++++++++++++++++
 samples/bpf/tracex1_kern.c     |   4 +-
 samples/bpf/tracex2_kern.c     |   4 +-
 samples/bpf/tracex5_kern.c     |   6 +-
 10 files changed, 421 insertions(+), 57 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [PATCH net-next 1/5] bpf, verifier: add bpf_call_arg_meta for passing meta data
From: Daniel Borkmann @ 2016-04-12 22:10 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, tgraf, bblanco, brendan.d.gregg, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1460497071.git.daniel@iogearbox.net>

Currently, when the verifier checks calls in check_call() function, we
call check_func_arg() for all 5 arguments e.g. to make sure expected types
are correct. In some cases, we collect meta data (here: map pointer) to
perform additional checks such as checking stack boundary on key/value
sizes for subsequent arguments. As we're going to extend the meta data,
add a generic struct bpf_call_arg_meta that we can use for passing into
check_func_arg().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6c5d7cd..202f8f7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -205,6 +205,10 @@ struct verifier_env {
 #define BPF_COMPLEXITY_LIMIT_INSNS	65536
 #define BPF_COMPLEXITY_LIMIT_STACK	1024
 
+struct bpf_call_arg_meta {
+	struct bpf_map *map_ptr;
+};
+
 /* verbose verifier prints what it's seeing
  * bpf_check() is called under lock, so no race to access these global vars
  */
@@ -822,7 +826,8 @@ static int check_stack_boundary(struct verifier_env *env, int regno,
 }
 
 static int check_func_arg(struct verifier_env *env, u32 regno,
-			  enum bpf_arg_type arg_type, struct bpf_map **mapp)
+			  enum bpf_arg_type arg_type,
+			  struct bpf_call_arg_meta *meta)
 {
 	struct reg_state *reg = env->cur_state.regs + regno;
 	enum bpf_reg_type expected_type;
@@ -875,14 +880,13 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 
 	if (arg_type == ARG_CONST_MAP_PTR) {
 		/* bpf_map_xxx(map_ptr) call: remember that map_ptr */
-		*mapp = reg->map_ptr;
-
+		meta->map_ptr = reg->map_ptr;
 	} else if (arg_type == ARG_PTR_TO_MAP_KEY) {
 		/* bpf_map_xxx(..., map_ptr, ..., key) call:
 		 * check that [key, key + map->key_size) are within
 		 * stack limits and initialized
 		 */
-		if (!*mapp) {
+		if (!meta->map_ptr) {
 			/* in function declaration map_ptr must come before
 			 * map_key, so that it's verified and known before
 			 * we have to check map_key here. Otherwise it means
@@ -891,19 +895,19 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 			verbose("invalid map_ptr to access map->key\n");
 			return -EACCES;
 		}
-		err = check_stack_boundary(env, regno, (*mapp)->key_size,
+		err = check_stack_boundary(env, regno, meta->map_ptr->key_size,
 					   false);
 	} else if (arg_type == ARG_PTR_TO_MAP_VALUE) {
 		/* bpf_map_xxx(..., map_ptr, ..., value) call:
 		 * check [value, value + map->value_size) validity
 		 */
-		if (!*mapp) {
+		if (!meta->map_ptr) {
 			/* kernel subsystem misconfigured verifier */
 			verbose("invalid map_ptr to access map->value\n");
 			return -EACCES;
 		}
-		err = check_stack_boundary(env, regno, (*mapp)->value_size,
-					   false);
+		err = check_stack_boundary(env, regno,
+					   meta->map_ptr->value_size, false);
 	} else if (arg_type == ARG_CONST_STACK_SIZE ||
 		   arg_type == ARG_CONST_STACK_SIZE_OR_ZERO) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_STACK_SIZE_OR_ZERO);
@@ -954,8 +958,8 @@ static int check_call(struct verifier_env *env, int func_id)
 	struct verifier_state *state = &env->cur_state;
 	const struct bpf_func_proto *fn = NULL;
 	struct reg_state *regs = state->regs;
-	struct bpf_map *map = NULL;
 	struct reg_state *reg;
+	struct bpf_call_arg_meta meta;
 	int i, err;
 
 	/* find function prototype */
@@ -978,20 +982,22 @@ static int check_call(struct verifier_env *env, int func_id)
 		return -EINVAL;
 	}
 
+	memset(&meta, 0, sizeof(meta));
+
 	/* check args */
-	err = check_func_arg(env, BPF_REG_1, fn->arg1_type, &map);
+	err = check_func_arg(env, BPF_REG_1, fn->arg1_type, &meta);
 	if (err)
 		return err;
-	err = check_func_arg(env, BPF_REG_2, fn->arg2_type, &map);
+	err = check_func_arg(env, BPF_REG_2, fn->arg2_type, &meta);
 	if (err)
 		return err;
-	err = check_func_arg(env, BPF_REG_3, fn->arg3_type, &map);
+	err = check_func_arg(env, BPF_REG_3, fn->arg3_type, &meta);
 	if (err)
 		return err;
-	err = check_func_arg(env, BPF_REG_4, fn->arg4_type, &map);
+	err = check_func_arg(env, BPF_REG_4, fn->arg4_type, &meta);
 	if (err)
 		return err;
-	err = check_func_arg(env, BPF_REG_5, fn->arg5_type, &map);
+	err = check_func_arg(env, BPF_REG_5, fn->arg5_type, &meta);
 	if (err)
 		return err;
 
@@ -1013,18 +1019,18 @@ static int check_call(struct verifier_env *env, int func_id)
 		 * can check 'value_size' boundary of memory access
 		 * to map element returned from bpf_map_lookup_elem()
 		 */
-		if (map == NULL) {
+		if (meta.map_ptr == NULL) {
 			verbose("kernel subsystem misconfigured verifier\n");
 			return -EINVAL;
 		}
-		regs[BPF_REG_0].map_ptr = map;
+		regs[BPF_REG_0].map_ptr = meta.map_ptr;
 	} else {
 		verbose("unknown return type %d of func %d\n",
 			fn->ret_type, func_id);
 		return -EINVAL;
 	}
 
-	err = check_map_func_compatibility(map, func_id);
+	err = check_map_func_compatibility(meta.map_ptr, func_id);
 	if (err)
 		return err;
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 2/5] bpf, verifier: add ARG_PTR_TO_RAW_STACK type
From: Daniel Borkmann @ 2016-04-12 22:10 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, tgraf, bblanco, brendan.d.gregg, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1460497071.git.daniel@iogearbox.net>

When passing buffers from eBPF stack space into a helper function, we have
ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure
that such buffers are initialized, within boundaries, etc.

However, the downside with this is that we have a couple of helper functions
such as bpf_skb_load_bytes() that fill out the passed buffer in the expected
success case anyway, so zero initializing them prior to the helper call is
unneeded/wasted instructions in the eBPF program that can be avoided.

Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK.
The idea is to skip the STACK_MISC check in check_stack_boundary() and color
the related stack slots as STACK_MISC after we checked all call arguments.

Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of
the helper function will fill the provided buffer area, so that we cannot leak
any uninitialized stack memory. This f.e. means that error paths need to
memset() the buffers, but the expected fast-path doesn't have to do this
anymore.

Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK
argument, we can keep it simple and don't need to check for multiple areas.
Should in future such a use-case really appear, we have check_raw_mode() that
will make sure we implement support for it first.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h   |  5 +++++
 kernel/bpf/verifier.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b2365a6..5fb3c61 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -66,6 +66,11 @@ enum bpf_arg_type {
 	 * functions that access data on eBPF program stack
 	 */
 	ARG_PTR_TO_STACK,	/* any pointer to eBPF program stack */
+	ARG_PTR_TO_RAW_STACK,	/* any pointer to eBPF program stack, area does not
+				 * need to be initialized, helper function must fill
+				 * all bytes or clear them in error case.
+				 */
+
 	ARG_CONST_STACK_SIZE,	/* number of bytes accessed from stack */
 	ARG_CONST_STACK_SIZE_OR_ZERO, /* number of bytes accessed from stack or 0 */
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 202f8f7..9c843a5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -207,6 +207,9 @@ struct verifier_env {
 
 struct bpf_call_arg_meta {
 	struct bpf_map *map_ptr;
+	bool raw_mode;
+	int regno;
+	int access_size;
 };
 
 /* verbose verifier prints what it's seeing
@@ -789,7 +792,8 @@ static int check_xadd(struct verifier_env *env, struct bpf_insn *insn)
  * and all elements of stack are initialized
  */
 static int check_stack_boundary(struct verifier_env *env, int regno,
-				int access_size, bool zero_size_allowed)
+				int access_size, bool zero_size_allowed,
+				struct bpf_call_arg_meta *meta)
 {
 	struct verifier_state *state = &env->cur_state;
 	struct reg_state *regs = state->regs;
@@ -815,6 +819,12 @@ static int check_stack_boundary(struct verifier_env *env, int regno,
 		return -EACCES;
 	}
 
+	if (meta && meta->raw_mode) {
+		meta->access_size = access_size;
+		meta->regno = regno;
+		return 0;
+	}
+
 	for (i = 0; i < access_size; i++) {
 		if (state->stack_slot_type[MAX_BPF_STACK + off + i] != STACK_MISC) {
 			verbose("invalid indirect read from stack off %d+%d size %d\n",
@@ -859,7 +869,8 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		expected_type = CONST_PTR_TO_MAP;
 	} else if (arg_type == ARG_PTR_TO_CTX) {
 		expected_type = PTR_TO_CTX;
-	} else if (arg_type == ARG_PTR_TO_STACK) {
+	} else if (arg_type == ARG_PTR_TO_STACK ||
+		   arg_type == ARG_PTR_TO_RAW_STACK) {
 		expected_type = PTR_TO_STACK;
 		/* One exception here. In case function allows for NULL to be
 		 * passed in as argument, it's a CONST_IMM type. Final test
@@ -867,6 +878,7 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 		 */
 		if (reg->type == CONST_IMM && reg->imm == 0)
 			expected_type = CONST_IMM;
+		meta->raw_mode = arg_type == ARG_PTR_TO_RAW_STACK;
 	} else {
 		verbose("unsupported arg_type %d\n", arg_type);
 		return -EFAULT;
@@ -896,7 +908,7 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		err = check_stack_boundary(env, regno, meta->map_ptr->key_size,
-					   false);
+					   false, NULL);
 	} else if (arg_type == ARG_PTR_TO_MAP_VALUE) {
 		/* bpf_map_xxx(..., map_ptr, ..., value) call:
 		 * check [value, value + map->value_size) validity
@@ -907,7 +919,8 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		err = check_stack_boundary(env, regno,
-					   meta->map_ptr->value_size, false);
+					   meta->map_ptr->value_size,
+					   false, NULL);
 	} else if (arg_type == ARG_CONST_STACK_SIZE ||
 		   arg_type == ARG_CONST_STACK_SIZE_OR_ZERO) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_STACK_SIZE_OR_ZERO);
@@ -922,7 +935,7 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		err = check_stack_boundary(env, regno - 1, reg->imm,
-					   zero_size_allowed);
+					   zero_size_allowed, meta);
 	}
 
 	return err;
@@ -953,6 +966,24 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 	return 0;
 }
 
+static int check_raw_mode(const struct bpf_func_proto *fn)
+{
+	int count = 0;
+
+	if (fn->arg1_type == ARG_PTR_TO_RAW_STACK)
+		count++;
+	if (fn->arg2_type == ARG_PTR_TO_RAW_STACK)
+		count++;
+	if (fn->arg3_type == ARG_PTR_TO_RAW_STACK)
+		count++;
+	if (fn->arg4_type == ARG_PTR_TO_RAW_STACK)
+		count++;
+	if (fn->arg5_type == ARG_PTR_TO_RAW_STACK)
+		count++;
+
+	return count > 1 ? -EINVAL : 0;
+}
+
 static int check_call(struct verifier_env *env, int func_id)
 {
 	struct verifier_state *state = &env->cur_state;
@@ -984,6 +1015,15 @@ static int check_call(struct verifier_env *env, int func_id)
 
 	memset(&meta, 0, sizeof(meta));
 
+	/* We only support one arg being in raw mode at the moment, which
+	 * is sufficient for the helper functions we have right now.
+	 */
+	err = check_raw_mode(fn);
+	if (err) {
+		verbose("kernel subsystem misconfigured func %d\n", func_id);
+		return err;
+	}
+
 	/* check args */
 	err = check_func_arg(env, BPF_REG_1, fn->arg1_type, &meta);
 	if (err)
@@ -1001,6 +1041,15 @@ static int check_call(struct verifier_env *env, int func_id)
 	if (err)
 		return err;
 
+	/* Mark slots with STACK_MISC in case of raw mode, stack offset
+	 * is inferred from register state.
+	 */
+	for (i = 0; i < meta.access_size; i++) {
+		err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
+		if (err)
+			return err;
+	}
+
 	/* reset caller saved regs */
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
 		reg = regs + caller_saved[i];
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 3/5] bpf: convert relevant helper args to ARG_PTR_TO_RAW_STACK
From: Daniel Borkmann @ 2016-04-12 22:10 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, tgraf, bblanco, brendan.d.gregg, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1460497071.git.daniel@iogearbox.net>

This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument
type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(),
bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm()
and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can
also be removed since the verifier already makes sure we stay within bounds
on stack buffers.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/helpers.c     | 17 +++++++++++----
 kernel/trace/bpf_trace.c | 10 ++++++---
 net/core/filter.c        | 57 +++++++++++++++++++++++++++++++++---------------
 3 files changed, 60 insertions(+), 24 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 50da680..ad7a057 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -163,17 +163,26 @@ static u64 bpf_get_current_comm(u64 r1, u64 size, u64 r3, u64 r4, u64 r5)
 	struct task_struct *task = current;
 	char *buf = (char *) (long) r1;
 
-	if (!task)
-		return -EINVAL;
+	if (unlikely(!task))
+		goto err_clear;
 
-	strlcpy(buf, task->comm, min_t(size_t, size, sizeof(task->comm)));
+	strncpy(buf, task->comm, size);
+
+	/* Verifier guarantees that size > 0. For task->comm exceeding
+	 * size, guarantee that buf is %NUL-terminated. Unconditionally
+	 * done here to save the size test.
+	 */
+	buf[size - 1] = 0;
 	return 0;
+err_clear:
+	memset(buf, 0, size);
+	return -EINVAL;
 }
 
 const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.func		= bpf_get_current_comm,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_STACK,
+	.arg1_type	= ARG_PTR_TO_RAW_STACK,
 	.arg2_type	= ARG_CONST_STACK_SIZE,
 };
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 413ec56..6855878 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -62,17 +62,21 @@ EXPORT_SYMBOL_GPL(trace_call_bpf);
 static u64 bpf_probe_read(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
 	void *dst = (void *) (long) r1;
-	int size = (int) r2;
+	int ret, size = (int) r2;
 	void *unsafe_ptr = (void *) (long) r3;
 
-	return probe_kernel_read(dst, unsafe_ptr, size);
+	ret = probe_kernel_read(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0))
+		memset(dst, 0, size);
+
+	return ret;
 }
 
 static const struct bpf_func_proto bpf_probe_read_proto = {
 	.func		= bpf_probe_read,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_STACK,
+	.arg1_type	= ARG_PTR_TO_RAW_STACK,
 	.arg2_type	= ARG_CONST_STACK_SIZE,
 	.arg3_type	= ARG_ANYTHING,
 };
diff --git a/net/core/filter.c b/net/core/filter.c
index e8486ba..5d2ac2b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1409,16 +1409,19 @@ static u64 bpf_skb_load_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 	unsigned int len = (unsigned int) r4;
 	void *ptr;
 
-	if (unlikely((u32) offset > 0xffff || len > MAX_BPF_STACK))
-		return -EFAULT;
+	if (unlikely((u32) offset > 0xffff))
+		goto err_clear;
 
 	ptr = skb_header_pointer(skb, offset, len, to);
 	if (unlikely(!ptr))
-		return -EFAULT;
+		goto err_clear;
 	if (ptr != to)
 		memcpy(to, ptr, len);
 
 	return 0;
+err_clear:
+	memset(to, 0, len);
+	return -EFAULT;
 }
 
 static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
@@ -1427,7 +1430,7 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_STACK,
+	.arg3_type	= ARG_PTR_TO_RAW_STACK,
 	.arg4_type	= ARG_CONST_STACK_SIZE,
 };
 
@@ -1756,12 +1759,19 @@ static u64 bpf_skb_get_tunnel_key(u64 r1, u64 r2, u64 size, u64 flags, u64 r5)
 	struct bpf_tunnel_key *to = (struct bpf_tunnel_key *) (long) r2;
 	const struct ip_tunnel_info *info = skb_tunnel_info(skb);
 	u8 compat[sizeof(struct bpf_tunnel_key)];
+	void *to_orig = to;
+	int err;
 
-	if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6))))
-		return -EINVAL;
-	if (ip_tunnel_info_af(info) != bpf_tunnel_key_af(flags))
-		return -EPROTO;
+	if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6)))) {
+		err = -EINVAL;
+		goto err_clear;
+	}
+	if (ip_tunnel_info_af(info) != bpf_tunnel_key_af(flags)) {
+		err = -EPROTO;
+		goto err_clear;
+	}
 	if (unlikely(size != sizeof(struct bpf_tunnel_key))) {
+		err = -EINVAL;
 		switch (size) {
 		case offsetof(struct bpf_tunnel_key, tunnel_label):
 		case offsetof(struct bpf_tunnel_key, tunnel_ext):
@@ -1771,12 +1781,12 @@ static u64 bpf_skb_get_tunnel_key(u64 r1, u64 r2, u64 size, u64 flags, u64 r5)
 			 * a common path later on.
 			 */
 			if (ip_tunnel_info_af(info) != AF_INET)
-				return -EINVAL;
+				goto err_clear;
 set_compat:
 			to = (struct bpf_tunnel_key *)compat;
 			break;
 		default:
-			return -EINVAL;
+			goto err_clear;
 		}
 	}
 
@@ -1793,9 +1803,12 @@ set_compat:
 	}
 
 	if (unlikely(size != sizeof(struct bpf_tunnel_key)))
-		memcpy((void *)(long) r2, to, size);
+		memcpy(to_orig, to, size);
 
 	return 0;
+err_clear:
+	memset(to_orig, 0, size);
+	return err;
 }
 
 static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
@@ -1803,7 +1816,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_STACK,
+	.arg2_type	= ARG_PTR_TO_RAW_STACK,
 	.arg3_type	= ARG_CONST_STACK_SIZE,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -1813,16 +1826,26 @@ static u64 bpf_skb_get_tunnel_opt(u64 r1, u64 r2, u64 size, u64 r4, u64 r5)
 	struct sk_buff *skb = (struct sk_buff *) (long) r1;
 	u8 *to = (u8 *) (long) r2;
 	const struct ip_tunnel_info *info = skb_tunnel_info(skb);
+	int err;
 
 	if (unlikely(!info ||
-		     !(info->key.tun_flags & TUNNEL_OPTIONS_PRESENT)))
-		return -ENOENT;
-	if (unlikely(size < info->options_len))
-		return -ENOMEM;
+		     !(info->key.tun_flags & TUNNEL_OPTIONS_PRESENT))) {
+		err = -ENOENT;
+		goto err_clear;
+	}
+	if (unlikely(size < info->options_len)) {
+		err = -ENOMEM;
+		goto err_clear;
+	}
 
 	ip_tunnel_info_opts_get(to, info);
+	if (size > info->options_len)
+		memset(to + info->options_len, 0, size - info->options_len);
 
 	return info->options_len;
+err_clear:
+	memset(to, 0, size);
+	return err;
 }
 
 static const struct bpf_func_proto bpf_skb_get_tunnel_opt_proto = {
@@ -1830,7 +1853,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_opt_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_STACK,
+	.arg2_type	= ARG_PTR_TO_RAW_STACK,
 	.arg3_type	= ARG_CONST_STACK_SIZE,
 };
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 4/5] bpf, samples: don't zero data when not needed
From: Daniel Borkmann @ 2016-04-12 22:10 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, tgraf, bblanco, brendan.d.gregg, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1460497071.git.daniel@iogearbox.net>

Remove the zero initialization in the sample programs where appropriate.
Note that this is an optimization which is now possible, old programs
still doing the zero initialization are just fine as well. Also, make
sure we don't have padding issues when we don't memset() the entire
struct anymore.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/offwaketime_kern.c | 10 ++++++----
 samples/bpf/tracex1_kern.c     |  4 +---
 samples/bpf/tracex2_kern.c     |  4 ++--
 samples/bpf/tracex5_kern.c     |  6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/samples/bpf/offwaketime_kern.c b/samples/bpf/offwaketime_kern.c
index 983629a..e7d9a0a 100644
--- a/samples/bpf/offwaketime_kern.c
+++ b/samples/bpf/offwaketime_kern.c
@@ -11,7 +11,7 @@
 #include <linux/version.h>
 #include <linux/sched.h>
 
-#define _(P) ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;})
+#define _(P) ({typeof(P) val; bpf_probe_read(&val, sizeof(val), &P); val;})
 
 #define MINBLOCK_US	1
 
@@ -61,7 +61,7 @@ SEC("kprobe/try_to_wake_up")
 int waker(struct pt_regs *ctx)
 {
 	struct task_struct *p = (void *) PT_REGS_PARM1(ctx);
-	struct wokeby_t woke = {};
+	struct wokeby_t woke;
 	u32 pid;
 
 	pid = _(p->pid);
@@ -75,17 +75,19 @@ int waker(struct pt_regs *ctx)
 
 static inline int update_counts(void *ctx, u32 pid, u64 delta)
 {
-	struct key_t key = {};
 	struct wokeby_t *woke;
 	u64 zero = 0, *val;
+	struct key_t key;
 
+	__builtin_memset(&key.waker, 0, sizeof(key.waker));
 	bpf_get_current_comm(&key.target, sizeof(key.target));
 	key.tret = bpf_get_stackid(ctx, &stackmap, STACKID_FLAGS);
+	key.wret = 0;
 
 	woke = bpf_map_lookup_elem(&wokeby, &pid);
 	if (woke) {
 		key.wret = woke->ret;
-		__builtin_memcpy(&key.waker, woke->name, TASK_COMM_LEN);
+		__builtin_memcpy(&key.waker, woke->name, sizeof(key.waker));
 		bpf_map_delete_elem(&wokeby, &pid);
 	}
 
diff --git a/samples/bpf/tracex1_kern.c b/samples/bpf/tracex1_kern.c
index 3f450a8..107da14 100644
--- a/samples/bpf/tracex1_kern.c
+++ b/samples/bpf/tracex1_kern.c
@@ -23,16 +23,14 @@ int bpf_prog1(struct pt_regs *ctx)
 	/* attaches to kprobe netif_receive_skb,
 	 * looks for packets on loobpack device and prints them
 	 */
-	char devname[IFNAMSIZ] = {};
+	char devname[IFNAMSIZ];
 	struct net_device *dev;
 	struct sk_buff *skb;
 	int len;
 
 	/* non-portable! works for the given kernel only */
 	skb = (struct sk_buff *) PT_REGS_PARM1(ctx);
-
 	dev = _(skb->dev);
-
 	len = _(skb->len);
 
 	bpf_probe_read(devname, sizeof(devname), dev->name);
diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
index 6d6eefd..5e11c20 100644
--- a/samples/bpf/tracex2_kern.c
+++ b/samples/bpf/tracex2_kern.c
@@ -66,7 +66,7 @@ struct hist_key {
 	char comm[16];
 	u64 pid_tgid;
 	u64 uid_gid;
-	u32 index;
+	u64 index;
 };
 
 struct bpf_map_def SEC("maps") my_hist_map = {
@@ -82,7 +82,7 @@ int bpf_prog3(struct pt_regs *ctx)
 	long write_size = PT_REGS_PARM3(ctx);
 	long init_val = 1;
 	long *value;
-	struct hist_key key = {};
+	struct hist_key key;
 
 	key.index = log2l(write_size);
 	key.pid_tgid = bpf_get_current_pid_tgid();
diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c
index b3f4295..f95f232 100644
--- a/samples/bpf/tracex5_kern.c
+++ b/samples/bpf/tracex5_kern.c
@@ -22,7 +22,7 @@ struct bpf_map_def SEC("maps") progs = {
 SEC("kprobe/seccomp_phase1")
 int bpf_prog1(struct pt_regs *ctx)
 {
-	struct seccomp_data sd = {};
+	struct seccomp_data sd;
 
 	bpf_probe_read(&sd, sizeof(sd), (void *)PT_REGS_PARM1(ctx));
 
@@ -40,7 +40,7 @@ int bpf_prog1(struct pt_regs *ctx)
 /* we jump here when syscall number == __NR_write */
 PROG(__NR_write)(struct pt_regs *ctx)
 {
-	struct seccomp_data sd = {};
+	struct seccomp_data sd;
 
 	bpf_probe_read(&sd, sizeof(sd), (void *)PT_REGS_PARM1(ctx));
 	if (sd.args[2] == 512) {
@@ -53,7 +53,7 @@ PROG(__NR_write)(struct pt_regs *ctx)
 
 PROG(__NR_read)(struct pt_regs *ctx)
 {
-	struct seccomp_data sd = {};
+	struct seccomp_data sd;
 
 	bpf_probe_read(&sd, sizeof(sd), (void *)PT_REGS_PARM1(ctx));
 	if (sd.args[2] > 128 && sd.args[2] <= 1024) {
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 5/5] bpf, samples: add test cases for raw stack
From: Daniel Borkmann @ 2016-04-12 22:10 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, tgraf, bblanco, brendan.d.gregg, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1460497071.git.daniel@iogearbox.net>

This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.

  [...]
  #84 raw_stack: no skb_load_bytes OK
  #85 raw_stack: skb_load_bytes, no init OK
  #86 raw_stack: skb_load_bytes, init OK
  #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
  #88 raw_stack: skb_load_bytes, spilled regs corruption OK
  #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
  #90 raw_stack: skb_load_bytes, spilled regs + data OK
  #91 raw_stack: skb_load_bytes, invalid access 1 OK
  #92 raw_stack: skb_load_bytes, invalid access 2 OK
  #93 raw_stack: skb_load_bytes, invalid access 3 OK
  #94 raw_stack: skb_load_bytes, invalid access 4 OK
  #95 raw_stack: skb_load_bytes, invalid access 5 OK
  #96 raw_stack: skb_load_bytes, invalid access 6 OK
  #97 raw_stack: skb_load_bytes, large access OK
  Summary: 98 PASSED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/test_verifier.c | 268 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 268 insertions(+)

diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
index 4b51a90..9eba8d1 100644
--- a/samples/bpf/test_verifier.c
+++ b/samples/bpf/test_verifier.c
@@ -309,6 +309,19 @@ static struct bpf_test tests[] = {
 		.result_unpriv = REJECT,
 	},
 	{
+		"check valid spill/fill, skb mark",
+		.insns = {
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_6, -8),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.result_unpriv = ACCEPT,
+	},
+	{
 		"check corrupted spill/fill",
 		.insns = {
 			/* spill R1(ctx) into stack */
@@ -1180,6 +1193,261 @@ static struct bpf_test tests[] = {
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 	},
+	{
+		"raw_stack: no skb_load_bytes",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			/* Call to skb_load_bytes() omitted. */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid read from stack off -8+0 size 8",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, no init",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, init",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_6, 0, 0xcafe),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, spilled regs around bounds",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, -8), /* spill ctx from R1 */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,  8), /* spill ctx from R1 */
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, -8), /* fill ctx into R0 */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,  8), /* fill ctx into R2 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, spilled regs corruption",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), /* spill ctx from R1 */
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0), /* fill ctx into R0 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "R0 invalid mem access 'inv'",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, spilled regs corruption 2",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, -8), /* spill ctx from R1 */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,  0), /* spill ctx from R1 */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,  8), /* spill ctx from R1 */
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, -8), /* fill ctx into R0 */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,  8), /* fill ctx into R2 */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_6,  0), /* fill ctx into R3 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_3,
+				    offsetof(struct __sk_buff, pkt_type)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_3),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "R3 invalid mem access 'inv'",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, spilled regs + data",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, -8), /* spill ctx from R1 */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,  0), /* spill ctx from R1 */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1,  8), /* spill ctx from R1 */
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, -8), /* fill ctx into R0 */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6,  8), /* fill ctx into R2 */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_6,  0), /* fill data into R3 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_3),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, invalid access 1",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -513),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid stack type R3 off=-513 access_size=8",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, invalid access 2",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -1),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid stack type R3 off=-1 access_size=8",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, invalid access 3",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, 0xffffffff),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 0xffffffff),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid stack type R3 off=-1 access_size=-1",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, invalid access 4",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -1),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 0x7fffffff),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid stack type R3 off=-1 access_size=2147483647",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, invalid access 5",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -512),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 0x7fffffff),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid stack type R3 off=-512 access_size=2147483647",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, invalid access 6",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -512),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid stack type R3 off=-512 access_size=0",
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"raw_stack: skb_load_bytes, large access",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -512),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_6),
+			BPF_MOV64_IMM(BPF_REG_4, 512),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_skb_load_bytes),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
 };
 
 static int probe_filter_length(struct bpf_insn *fp)
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH] mwifiex: fix possible NULL dereference
From: Andy Shevchenko @ 2016-04-12 22:15 UTC (permalink / raw)
  To: Rustad, Mark D
  Cc: Sudip Mukherjee, Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo,
	linux-kernel@vger.kernel.org, open list:TI WILINK WIRELES...,
	netdev, Sudip Mukherjee
In-Reply-To: <867653EE-136B-488D-8F85-716B1ED9BD1A@intel.com>

On Tue, Apr 12, 2016 at 8:43 PM, Rustad, Mark D <mark.d.rustad@intel.com> wrote:
> Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
>
>> On Mon, Apr 11, 2016 at 6:27 PM, Sudip Mukherjee
>> <sudipm.mukherjee@gmail.com> wrote:
>>>
>>> From: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
>>>
>>> We have a check for card just after dereferencing it. So if it is NULL
>>> we have already dereferenced it before its check. Lets dereference it
>>> after checking card for NULL.

>>> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>>> @@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct
>>> mwifiex_adapter *adapter)
>>>  {
>>>         struct pcie_service_card *card = adapter->card;>>
>>
>> Let's say it's 0.
>>
>>>        const struct mwifiex_pcie_card_reg *reg;
>>> -       struct pci_dev *pdev = card->dev;>>
>>
>> This would be equal to offset of dev member in pcie_service_card struct.
>>
>> Nothing wrong here.
>
> Actually, that is not true. The dereference of card tells the compiler that
> card can't be NULL, so it is free to eliminate the check below.
> Unbelievably, this can even happen for a reference such as &ptr->thing where
> the pointer isn't even actually dereferenced at all!

Hmm... Can we look at the result assembly? If I'm not mistaken,
compiler wouldn't even try to calculate pdev pointer before first use
of it.


>
>>> +       struct pci_dev *pdev;
>>>         int i;
>>>
>>>         if (card) {
>>> +               pdev = card->dev;
>>>                 if (card->msix_enable) {
>>>                         for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* [PATCH iproute2] ss: 64bit inode numbers
From: Eric Dumazet @ 2016-04-12 22:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Lets prepare for a possibility to have 64bit inode numbers for sockets,
even if the kernel currently enforces 32bit numbers.

Presumably, if both kernel and userland are 64bit (no 32bit emulation),
kernel could switch to 64bit inode numbers soon.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 misc/ss.c |   38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 38cf331..3355da5 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -377,7 +377,7 @@ static FILE *ephemeral_ports_open(void)
 
 struct user_ent {
 	struct user_ent	*next;
-	unsigned int	ino;
+	__u64		ino;
 	int		pid;
 	int		fd;
 	char		*process;
@@ -388,17 +388,18 @@ struct user_ent {
 #define USER_ENT_HASH_SIZE	256
 struct user_ent *user_ent_hash[USER_ENT_HASH_SIZE];
 
-static int user_ent_hashfn(unsigned int ino)
+static int user_ent_hashfn(__u64 ino)
 {
-	int val = (ino >> 24) ^ (ino >> 16) ^ (ino >> 8) ^ ino;
+	unsigned int val = (unsigned int)ino;
 
+	val ^= (unsigned int)(ino >> 32);
+	val ^= (val >> 16);
+	val ^= (val >> 8);
 	return val & (USER_ENT_HASH_SIZE - 1);
 }
 
-static void user_ent_add(unsigned int ino, char *process,
-					int pid, int fd,
-					char *proc_ctx,
-					char *sock_ctx)
+static void user_ent_add(__u64 ino, char *process, int pid, int fd,
+			 char *proc_ctx, char *sock_ctx)
 {
 	struct user_ent *p, **pp;
 
@@ -494,7 +495,7 @@ static void user_ent_hash_build(void)
 
 		while ((d1 = readdir(dir1)) != NULL) {
 			const char *pattern = "socket:[";
-			unsigned int ino;
+			__u64 ino;
 			char lnk[64];
 			int fd;
 			ssize_t link_len;
@@ -513,7 +514,7 @@ static void user_ent_hash_build(void)
 			if (strncmp(lnk, pattern, strlen(pattern)))
 				continue;
 
-			sscanf(lnk, "socket:[%u]", &ino);
+			sscanf(lnk, "socket:[%llu]", &ino);
 
 			snprintf(tmp, sizeof(tmp), "%s/%d/fd/%s",
 					root, pid, d1->d_name);
@@ -549,7 +550,7 @@ enum entry_types {
 };
 
 #define ENTRY_BUF_SIZE 512
-static int find_entry(unsigned int ino, char **buf, int type)
+static int find_entry(__u64 ino, char **buf, int type)
 {
 	struct user_ent *p;
 	int cnt = 0;
@@ -728,10 +729,10 @@ struct sockstat {
 	int		    rport;
 	int		    state;
 	int		    rq, wq;
-	unsigned int ino;
-	unsigned int uid;
+	unsigned int	    uid;
 	int		    refcnt;
 	unsigned int	    iface;
+	__u64		    ino;
 	unsigned long long  sk;
 	char *name;
 	char *peer_name;
@@ -801,7 +802,7 @@ static void sock_details_print(struct sockstat *s)
 	if (s->uid)
 		printf(" uid:%u", s->uid);
 
-	printf(" ino:%u", s->ino);
+	printf(" ino:%llu", s->ino);
 	printf(" sk:%llx", s->sk);
 }
 
@@ -1799,7 +1800,7 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 		return 0;
 
 	opt[0] = 0;
-	n = sscanf(data, "%x %x:%x %x:%x %x %d %d %u %d %llx %d %d %d %d %d %[^\n]\n",
+	n = sscanf(data, "%x %x:%x %x:%x %x %d %d %llu %d %llx %d %d %d %d %d %[^\n]\n",
 		   &s.ss.state, &s.ss.wq, &s.ss.rq,
 		   &s.timer, &s.timeout, &s.retrans, &s.ss.uid, &s.probes,
 		   &s.ss.ino, &s.ss.refcnt, &s.ss.sk, &rto, &ato, &s.qack, &s.cwnd,
@@ -2481,7 +2482,7 @@ static int dgram_show_line(char *line, const struct filter *f, int family)
 		return 0;
 
 	opt[0] = 0;
-	n = sscanf(data, "%x %x:%x %*x:%*x %*x %d %*d %u %d %llx %[^\n]\n",
+	n = sscanf(data, "%x %x:%x %*x:%*x %*x %d %*d %llu %d %llx %[^\n]\n",
 	       &s.state, &s.wq, &s.rq,
 	       &s.uid, &s.ino,
 	       &s.refcnt, &s.sk, opt);
@@ -2829,7 +2830,7 @@ static int unix_show(struct filter *f)
 		u->name = NULL;
 		u->peer_name = NULL;
 
-		if (sscanf(buf, "%x: %x %x %x %x %x %d %s",
+		if (sscanf(buf, "%x: %x %x %x %x %x %llu %s",
 			   &u->rport, &u->rq, &u->wq, &flags, &u->type,
 			   &u->state, &u->ino, name) < 8)
 			name[0] = 0;
@@ -3085,9 +3086,10 @@ static int packet_show_line(char *buf, const struct filter *f, int fam)
 {
 	unsigned long long sk;
 	struct sockstat stat = {};
-	int type, prot, iface, state, rq, uid, ino;
+	int type, prot, iface, state, rq, uid;
+	__u64 ino;
 
-	sscanf(buf, "%llx %*d %d %x %d %d %u %u %u",
+	sscanf(buf, "%llx %*d %d %x %d %d %u %u %llu",
 			&sk,
 			&type, &prot, &iface, &state,
 			&rq, &uid, &ino);

^ permalink raw reply related

* Re: Issue with ping source address display
From: 吉藤英明 @ 2016-04-12 22:59 UTC (permalink / raw)
  To: Daniele Orlandi; +Cc: network dev
In-Reply-To: <570D35CF.9070006@orlandi.com>

Hi,

2016-04-13 2:52 GMT+09:00 Daniele Orlandi <daniele@orlandi.com>:
>
> Hello,
>
> More than one year ago I posted the following message but it hasn't
> received a reply, now I've been stung by a similar issue, you may want
> to investigate:
>
>
> I noticed that when ping receives ICMP messages from different sources
> the first IP address is always used and displayed:
>
>
> vihai@seviolab:~$ ping -V
> ping utility, iputils-s20121221
>
> This is a (simulated) flapping route:
>
> vihai@seviolab:~$ ping 10.254.10.140
> PING 10.254.10.140 (10.254.10.140) 56(84) bytes of data.
> From 192.168.1.1 icmp_seq=1 Destination Host Unreachable
> From 192.168.1.1 icmp_seq=2 Destination Host Unreachable
> From 192.168.1.1 icmp_seq=3 Destination Host Unreachable
> 64 bytes from 192.168.1.1: icmp_seq=4 ttl=61 time=24.7 ms
> 64 bytes from 192.168.1.1: icmp_seq=5 ttl=61 time=25.6 ms
> 64 bytes from 192.168.1.1: icmp_seq=6 ttl=61 time=69.6 ms
> From 192.168.1.1 icmp_seq=7 Destination Host Unreachable
> From 192.168.1.1 icmp_seq=8 Destination Host Unreachable
> From 192.168.1.1 icmp_seq=9 Destination Host Unreachable
> ^C
> --- 10.254.10.140 ping statistics ---
> 9 packets transmitted, 3 received, +6 errors, 66% packet loss, time 8001ms
> rtt min/avg/max/mdev = 24.797/40.061/69.692/20.955 ms
>
>
> The sources, however are different:
>
> vihai@seviolab:~$ sudo tcpdump -n icmp
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
> ^[OA^[OA^[OA17:09:55.932981 IP 192.168.1.21 > 10.254.10.140: ICMP echo
> request, id 9278, seq 1, length 64
> 17:09:55.933234 IP 192.168.1.1 > 192.168.1.21: ICMP host 10.254.10.140
> unreachable, length 92
> 17:09:56.933169 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 2, length 64
> 17:09:56.933416 IP 192.168.1.1 > 192.168.1.21: ICMP host 10.254.10.140
> unreachable, length 92
> 17:09:57.933160 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 3, length 64
> 17:09:57.933404 IP 192.168.1.1 > 192.168.1.21: ICMP host 10.254.10.140
> unreachable, length 92
> 17:09:58.933163 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 4, length 64
> 17:09:58.957939 IP 10.254.10.140 > 192.168.1.21: ICMP echo reply, id
> 9278, seq 4, length 64
> 17:09:59.935050 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 5, length 64
> 17:09:59.960724 IP 10.254.10.140 > 192.168.1.21: ICMP echo reply, id
> 9278, seq 5, length 64
> 17:10:00.936177 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 6, length 64
> 17:10:01.005849 IP 10.254.10.140 > 192.168.1.21: ICMP echo reply, id
> 9278, seq 6, length 64
> 17:10:01.936313 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 7, length 64
> 17:10:01.936626 IP 192.168.1.1 > 192.168.1.21: ICMP host 10.254.10.140
> unreachable, length 92
> 17:10:02.935321 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 8, length 64
> 17:10:02.935591 IP 192.168.1.1 > 192.168.1.21: ICMP host 10.254.10.140
> unreachable, length 92
> 17:10:03.934322 IP 192.168.1.21 > 10.254.10.140: ICMP echo request, id
> 9278, seq 9, length 64
> 17:10:03.934613 IP 192.168.1.1 > 192.168.1.21: ICMP host 10.254.10.140
> unreachable, length 92
>

Thank you for your report.  I'll try fixing it.

--yoshfuji

>
>
>
> Tried with a different ping implementation (RouterOS) and the behaviour
> seems correct:
>
> [vihai@SevioLab SW1] > ping 10.254.10.140
> HOST                                     SIZE TTL TIME  STATUS
> 192.168.1.1                             84  64 0ms   host unreachable
> 192.168.1.1                             84  64 0ms   host unreachable
> 192.168.1.1                             84  64 0ms   host unreachable
> 10.254.10.140                           56  61 20ms
> 10.254.10.140                           56  61 46ms
> 10.254.10.140                           56  61 37ms
> 192.168.1.1                             84  64 0ms   host unreachable
> 192.168.1.1                             84  64 0ms   host unreachable
> 192.168.1.1                             84  64 0ms   host unreachable
>     sent=9 received=3 packet-loss=66% min-rtt=20ms avg-rtt=34ms max-rtt=46ms
>
>
> Recently I was pinging with IPv6, a router in between filtered the
> packet, however the shown source address was not the right one:
>
> root@monitor:~# ping6 -i 0.2 www.google.com
> PING www.google.com(mil01s25-in-x04.1e100.net) 56 data bytes
> From mil01s25-in-x04.1e100.net icmp_seq=1 Destination unreachable: No route
> From mil01s25-in-x04.1e100.net icmp_seq=2 Destination unreachable: No route
> From mil01s25-in-x04.1e100.net icmp_seq=3 Destination unreachable: No route
>
> 19:33:19.589285 IP6 2a01:2d8:aca0:fce:944e:c8ff:fe4d:96de >
> mil01s25-in-x04.1e100.net: ICMP6, echo request, seq 1, length 64
> 19:33:19.611666 IP6 mix-br2.intercom.it >
> 2a01:2d8:aca0:fce:944e:c8ff:fe4d:96de: ICMP6, destination unreachable,
> unreachable route mil01s25-in-x04.1e100.net, length 112
>
>
> Bye,
>

^ permalink raw reply

* [PATCH 0/8] Netfilter updates for net-next
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains the first batch of Netfilter updates for
your net-next tree.

1) Define pr_fmt() in nf_conntrack, from Weongyo Jeong.

2) Define and register netfilter's afinfo for the bridge family,
   this comes in preparation for native nfqueue's bridge for nft,
   from Stephane Bryant.

3) Add new attributes to store layer 2 and VLAN headers to nfqueue,
   also from Stephane Bryant.

4) Parse new NFQA_VLAN and NFQA_L2HDR nfqueue netlink attributes
   coming from userspace, from Stephane Bryant.

5) Use net->ipv6.devconf_all->hop_limit instead of hardcoded hop_limit
   in IPv6 SYNPROXY, from Liping Zhang.

6) Remove unnecessary check for dst == NULL in nf_reject_ipv6,
   from Haishuang Yan.

7) Deinline ctnetlink event report functions, from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!

----------------------------------------------------------------

The following changes since commit e46b4e2b46e173889b19999b8bd033d5e8b3acf0:

  Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace (2016-03-24 10:52:25 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git HEAD

for you to fetch changes up to ecdfb48cddfd1096343148113d5b1bd789033aa8:

  netfilter: conntrack: move expectation event helper to ecache.c (2016-04-12 23:01:57 +0200)

----------------------------------------------------------------
Florian Westphal (2):
      netfilter: conntrack: de-inline nf_conntrack_eventmask_report
      netfilter: conntrack: move expectation event helper to ecache.c

Haishuang Yan (1):
      netfilter: ipv6: unnecessary to check whether ip6_route_output() returns NULL

Liping Zhang (1):
      netfilter: ip6t_SYNPROXY: remove magic number for hop_limit

Stephane Bryant (3):
      netfilter: bridge: add nf_afinfo to enable queuing to userspace
      netfilter: bridge: pass L2 header and VLAN as netlink attributes in queues to userspace
      netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR

Weongyo Jeong (1):
      netfilter: nf_conntrack: Uses pr_fmt() for logging.

 include/net/netfilter/nf_conntrack_ecache.h    | 108 ++++---------------------
 include/uapi/linux/netfilter/nfnetlink_queue.h |  10 +++
 net/bridge/netfilter/nf_tables_bridge.c        |  47 ++++++++++-
 net/ipv6/netfilter/ip6t_SYNPROXY.c             |  56 +++++++------
 net/ipv6/netfilter/nf_reject_ipv6.c            |   2 +-
 net/netfilter/nf_conntrack_core.c              |  15 ++--
 net/netfilter/nf_conntrack_ecache.c            |  84 +++++++++++++++++++
 net/netfilter/nfnetlink_queue.c                | 105 ++++++++++++++++++++++++
 8 files changed, 298 insertions(+), 129 deletions(-)

^ permalink raw reply

* [PATCH 4/8] netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460502166-20340-1-git-send-email-pablo@netfilter.org>

From: Stephane Bryant <stephane.ml.bryant@gmail.com>

This makes nf queues use NFQA_VLAN and NFQA_L2HDR in verdict to modify the
original skb

Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_queue.c | 47 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 6889c7c..5bebe78 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -964,12 +964,18 @@ static struct notifier_block nfqnl_rtnl_notifier = {
 	.notifier_call	= nfqnl_rcv_nl_event,
 };
 
+static const struct nla_policy nfqa_vlan_policy[NFQA_VLAN_MAX + 1] = {
+	[NFQA_VLAN_TCI]		= { .type = NLA_U16},
+	[NFQA_VLAN_PROTO]	= { .type = NLA_U16},
+};
+
 static const struct nla_policy nfqa_verdict_policy[NFQA_MAX+1] = {
 	[NFQA_VERDICT_HDR]	= { .len = sizeof(struct nfqnl_msg_verdict_hdr) },
 	[NFQA_MARK]		= { .type = NLA_U32 },
 	[NFQA_PAYLOAD]		= { .type = NLA_UNSPEC },
 	[NFQA_CT]		= { .type = NLA_UNSPEC },
 	[NFQA_EXP]		= { .type = NLA_UNSPEC },
+	[NFQA_VLAN]		= { .type = NLA_NESTED },
 };
 
 static const struct nla_policy nfqa_verdict_batch_policy[NFQA_MAX+1] = {
@@ -1083,6 +1089,40 @@ static struct nf_conn *nfqnl_ct_parse(struct nfnl_ct_hook *nfnl_ct,
 	return ct;
 }
 
+static int nfqa_parse_bridge(struct nf_queue_entry *entry,
+			     const struct nlattr * const nfqa[])
+{
+	if (nfqa[NFQA_VLAN]) {
+		struct nlattr *tb[NFQA_VLAN_MAX + 1];
+		int err;
+
+		err = nla_parse_nested(tb, NFQA_VLAN_MAX, nfqa[NFQA_VLAN],
+				       nfqa_vlan_policy);
+		if (err < 0)
+			return err;
+
+		if (!tb[NFQA_VLAN_TCI] || !tb[NFQA_VLAN_PROTO])
+			return -EINVAL;
+
+		entry->skb->vlan_tci = ntohs(nla_get_be16(tb[NFQA_VLAN_TCI]));
+		entry->skb->vlan_proto = nla_get_be16(tb[NFQA_VLAN_PROTO]);
+	}
+
+	if (nfqa[NFQA_L2HDR]) {
+		int mac_header_len = entry->skb->network_header -
+			entry->skb->mac_header;
+
+		if (mac_header_len != nla_len(nfqa[NFQA_L2HDR]))
+			return -EINVAL;
+		else if (mac_header_len > 0)
+			memcpy(skb_mac_header(entry->skb),
+			       nla_data(nfqa[NFQA_L2HDR]),
+			       mac_header_len);
+	}
+
+	return 0;
+}
+
 static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 			      struct sk_buff *skb,
 			      const struct nlmsghdr *nlh,
@@ -1098,6 +1138,7 @@ static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 	struct nfnl_ct_hook *nfnl_ct;
 	struct nf_conn *ct = NULL;
 	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+	int err;
 
 	queue = instance_lookup(q, queue_num);
 	if (!queue)
@@ -1124,6 +1165,12 @@ static int nfqnl_recv_verdict(struct net *net, struct sock *ctnl,
 			ct = nfqnl_ct_parse(nfnl_ct, nlh, nfqa, entry, &ctinfo);
 	}
 
+	if (entry->state.pf == PF_BRIDGE) {
+		err = nfqa_parse_bridge(entry, nfqa);
+		if (err < 0)
+			return err;
+	}
+
 	if (nfqa[NFQA_PAYLOAD]) {
 		u16 payload_len = nla_len(nfqa[NFQA_PAYLOAD]);
 		int diff = payload_len - entry->skb->len;
-- 
2.1.4


^ permalink raw reply related

* [PATCH 6/8] netfilter: ipv6: unnecessary to check whether ip6_route_output() returns NULL
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460502166-20340-1-git-send-email-pablo@netfilter.org>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

ip6_route_output() never returns NULL, so it is not appropriate to
check if the return value is NULL.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv6/netfilter/nf_reject_ipv6.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 4709f65..a540022 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -158,7 +158,7 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 	fl6.fl6_dport = otcph->source;
 	security_skb_classify_flow(oldskb, flowi6_to_flowi(&fl6));
 	dst = ip6_route_output(net, NULL, &fl6);
-	if (dst == NULL || dst->error) {
+	if (dst->error) {
 		dst_release(dst);
 		return;
 	}
-- 
2.1.4


^ permalink raw reply related

* [PATCH 1/8] netfilter: nf_conntrack: Uses pr_fmt() for logging.
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460502166-20340-1-git-send-email-pablo@netfilter.org>

From: Weongyo Jeong <weongyo.linux@gmail.com>

Uses pr_fmt() macro for debugging messages of nf_conntrack module.

Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index afde5f5..2fd6074 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -12,6 +12,8 @@
  * published by the Free Software Foundation.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/types.h>
 #include <linux/netfilter.h>
 #include <linux/module.h>
@@ -966,7 +968,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 
 	if (!l4proto->new(ct, skb, dataoff, timeouts)) {
 		nf_conntrack_free(ct);
-		pr_debug("init conntrack: can't track with proto module\n");
+		pr_debug("can't track with proto module\n");
 		return NULL;
 	}
 
@@ -988,7 +990,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 		spin_lock(&nf_conntrack_expect_lock);
 		exp = nf_ct_find_expectation(net, zone, tuple);
 		if (exp) {
-			pr_debug("conntrack: expectation arrives ct=%p exp=%p\n",
+			pr_debug("expectation arrives ct=%p exp=%p\n",
 				 ct, exp);
 			/* Welcome, Mr. Bond.  We've been expecting you... */
 			__set_bit(IPS_EXPECTED_BIT, &ct->status);
@@ -1053,7 +1055,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
 	if (!nf_ct_get_tuple(skb, skb_network_offset(skb),
 			     dataoff, l3num, protonum, net, &tuple, l3proto,
 			     l4proto)) {
-		pr_debug("resolve_normal_ct: Can't get tuple\n");
+		pr_debug("Can't get tuple\n");
 		return NULL;
 	}
 
@@ -1079,14 +1081,13 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
 	} else {
 		/* Once we've had two way comms, always ESTABLISHED. */
 		if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
-			pr_debug("nf_conntrack_in: normal packet for %p\n", ct);
+			pr_debug("normal packet for %p\n", ct);
 			*ctinfo = IP_CT_ESTABLISHED;
 		} else if (test_bit(IPS_EXPECTED_BIT, &ct->status)) {
-			pr_debug("nf_conntrack_in: related packet for %p\n",
-				 ct);
+			pr_debug("related packet for %p\n", ct);
 			*ctinfo = IP_CT_RELATED;
 		} else {
-			pr_debug("nf_conntrack_in: new packet for %p\n", ct);
+			pr_debug("new packet for %p\n", ct);
 			*ctinfo = IP_CT_NEW;
 		}
 		*set_reply = 0;
-- 
2.1.4

^ permalink raw reply related

* [PATCH 2/8] netfilter: bridge: add nf_afinfo to enable queuing to userspace
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460502166-20340-1-git-send-email-pablo@netfilter.org>

From: Stephane Bryant <stephane.ml.bryant@gmail.com>

This just adds and registers a nf_afinfo for the ethernet
bridge, which enables queuing to userspace for the AF_BRIDGE
family. No checksum computation is done.

Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nf_tables_bridge.c | 47 +++++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/nf_tables_bridge.c b/net/bridge/netfilter/nf_tables_bridge.c
index 7fcdd72..a78c4e2 100644
--- a/net/bridge/netfilter/nf_tables_bridge.c
+++ b/net/bridge/netfilter/nf_tables_bridge.c
@@ -162,15 +162,57 @@ static const struct nf_chain_type filter_bridge = {
 			  (1 << NF_BR_POST_ROUTING),
 };
 
+static void nf_br_saveroute(const struct sk_buff *skb,
+			    struct nf_queue_entry *entry)
+{
+}
+
+static int nf_br_reroute(struct net *net, struct sk_buff *skb,
+			 const struct nf_queue_entry *entry)
+{
+	return 0;
+}
+
+static __sum16 nf_br_checksum(struct sk_buff *skb, unsigned int hook,
+			      unsigned int dataoff, u_int8_t protocol)
+{
+	return 0;
+}
+
+static __sum16 nf_br_checksum_partial(struct sk_buff *skb, unsigned int hook,
+				      unsigned int dataoff, unsigned int len,
+				      u_int8_t protocol)
+{
+	return 0;
+}
+
+static int nf_br_route(struct net *net, struct dst_entry **dst,
+		       struct flowi *fl, bool strict __always_unused)
+{
+	return 0;
+}
+
+static const struct nf_afinfo nf_br_afinfo = {
+	.family                 = AF_BRIDGE,
+	.checksum               = nf_br_checksum,
+	.checksum_partial       = nf_br_checksum_partial,
+	.route                  = nf_br_route,
+	.saveroute              = nf_br_saveroute,
+	.reroute                = nf_br_reroute,
+	.route_key_size         = 0,
+};
+
 static int __init nf_tables_bridge_init(void)
 {
 	int ret;
 
+	nf_register_afinfo(&nf_br_afinfo);
 	nft_register_chain_type(&filter_bridge);
 	ret = register_pernet_subsys(&nf_tables_bridge_net_ops);
-	if (ret < 0)
+	if (ret < 0) {
 		nft_unregister_chain_type(&filter_bridge);
-
+		nf_unregister_afinfo(&nf_br_afinfo);
+	}
 	return ret;
 }
 
@@ -178,6 +220,7 @@ static void __exit nf_tables_bridge_exit(void)
 {
 	unregister_pernet_subsys(&nf_tables_bridge_net_ops);
 	nft_unregister_chain_type(&filter_bridge);
+	nf_unregister_afinfo(&nf_br_afinfo);
 }
 
 module_init(nf_tables_bridge_init);
-- 
2.1.4

^ permalink raw reply related

* [PATCH 3/8] netfilter: bridge: pass L2 header and VLAN as netlink attributes in queues to userspace
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460502166-20340-1-git-send-email-pablo@netfilter.org>

From: Stephane Bryant <stephane.ml.bryant@gmail.com>

- This creates 2 netlink attribute NFQA_VLAN and NFQA_L2HDR.
- These are filled up for the PF_BRIDGE family on the way to userspace.
- NFQA_VLAN is a nested attribute, with the NFQA_VLAN_PROTO and the
  NFQA_VLAN_TCI carrying the corresponding vlan_proto and vlan_tci
  fields from the skb using big endian ordering (and using the CFI
  bit as the VLAN_TAG_PRESENT flag in vlan_tci as in the skb)

Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nfnetlink_queue.h | 10 +++++
 net/netfilter/nfnetlink_queue.c                | 58 ++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index b67a853..ae30841 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -30,6 +30,14 @@ struct nfqnl_msg_packet_timestamp {
 	__aligned_be64	usec;
 };
 
+enum nfqnl_vlan_attr {
+	NFQA_VLAN_UNSPEC,
+	NFQA_VLAN_PROTO,		/* __be16 skb vlan_proto */
+	NFQA_VLAN_TCI,			/* __be16 skb htons(vlan_tci) */
+	__NFQA_VLAN_MAX,
+};
+#define NFQA_VLAN_MAX (__NFQA_VLAN_MAX + 1)
+
 enum nfqnl_attr_type {
 	NFQA_UNSPEC,
 	NFQA_PACKET_HDR,
@@ -50,6 +58,8 @@ enum nfqnl_attr_type {
 	NFQA_UID,			/* __u32 sk uid */
 	NFQA_GID,			/* __u32 sk gid */
 	NFQA_SECCTX,			/* security context string */
+	NFQA_VLAN,			/* nested attribute: packet vlan info */
+	NFQA_L2HDR,			/* full L2 header */
 
 	__NFQA_MAX
 };
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 7542999..6889c7c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -295,6 +295,59 @@ static u32 nfqnl_get_sk_secctx(struct sk_buff *skb, char **secdata)
 	return seclen;
 }
 
+static u32 nfqnl_get_bridge_size(struct nf_queue_entry *entry)
+{
+	struct sk_buff *entskb = entry->skb;
+	u32 nlalen = 0;
+
+	if (entry->state.pf != PF_BRIDGE || !skb_mac_header_was_set(entskb))
+		return 0;
+
+	if (skb_vlan_tag_present(entskb))
+		nlalen += nla_total_size(nla_total_size(sizeof(__be16)) +
+					 nla_total_size(sizeof(__be16)));
+
+	if (entskb->network_header > entskb->mac_header)
+		nlalen += nla_total_size((entskb->network_header -
+					  entskb->mac_header));
+
+	return nlalen;
+}
+
+static int nfqnl_put_bridge(struct nf_queue_entry *entry, struct sk_buff *skb)
+{
+	struct sk_buff *entskb = entry->skb;
+
+	if (entry->state.pf != PF_BRIDGE || !skb_mac_header_was_set(entskb))
+		return 0;
+
+	if (skb_vlan_tag_present(entskb)) {
+		struct nlattr *nest;
+
+		nest = nla_nest_start(skb, NFQA_VLAN | NLA_F_NESTED);
+		if (!nest)
+			goto nla_put_failure;
+
+		if (nla_put_be16(skb, NFQA_VLAN_TCI, htons(entskb->vlan_tci)) ||
+		    nla_put_be16(skb, NFQA_VLAN_PROTO, entskb->vlan_proto))
+			goto nla_put_failure;
+
+		nla_nest_end(skb, nest);
+	}
+
+	if (entskb->mac_header < entskb->network_header) {
+		int len = (int)(entskb->network_header - entskb->mac_header);
+
+		if (nla_put(skb, NFQA_L2HDR, len, skb_mac_header(entskb)))
+			goto nla_put_failure;
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
 static struct sk_buff *
 nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 			   struct nf_queue_entry *entry,
@@ -334,6 +387,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 	if (entskb->tstamp.tv64)
 		size += nla_total_size(sizeof(struct nfqnl_msg_packet_timestamp));
 
+	size += nfqnl_get_bridge_size(entry);
+
 	if (entry->state.hook <= NF_INET_FORWARD ||
 	   (entry->state.hook == NF_INET_POST_ROUTING && entskb->sk == NULL))
 		csum_verify = !skb_csum_unnecessary(entskb);
@@ -497,6 +552,9 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		}
 	}
 
+	if (nfqnl_put_bridge(entry, skb) < 0)
+		goto nla_put_failure;
+
 	if (entskb->tstamp.tv64) {
 		struct nfqnl_msg_packet_timestamp ts;
 		struct timespec64 kts = ktime_to_timespec64(skb->tstamp);
-- 
2.1.4

^ permalink raw reply related

* [PATCH 5/8] netfilter: ip6t_SYNPROXY: remove magic number for hop_limit
From: Pablo Neira Ayuso @ 2016-04-12 23:02 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460502166-20340-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <liping.zhang@spreadtrum.com>

Replace '64' with the per-net ipv6_devconf_all's hop_limit when
building the ipv6 header.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv6/netfilter/ip6t_SYNPROXY.c | 56 ++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 26 deletions(-)

diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index 3deed58..5d778dd 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -20,15 +20,16 @@
 #include <net/netfilter/nf_conntrack_synproxy.h>
 
 static struct ipv6hdr *
-synproxy_build_ip(struct sk_buff *skb, const struct in6_addr *saddr,
-				       const struct in6_addr *daddr)
+synproxy_build_ip(struct net *net, struct sk_buff *skb,
+		  const struct in6_addr *saddr,
+		  const struct in6_addr *daddr)
 {
 	struct ipv6hdr *iph;
 
 	skb_reset_network_header(skb);
 	iph = (struct ipv6hdr *)skb_put(skb, sizeof(*iph));
 	ip6_flow_hdr(iph, 0, 0);
-	iph->hop_limit	= 64;	//XXX
+	iph->hop_limit	= net->ipv6.devconf_all->hop_limit;
 	iph->nexthdr	= IPPROTO_TCP;
 	iph->saddr	= *saddr;
 	iph->daddr	= *daddr;
@@ -37,13 +38,12 @@ synproxy_build_ip(struct sk_buff *skb, const struct in6_addr *saddr,
 }
 
 static void
-synproxy_send_tcp(const struct synproxy_net *snet,
+synproxy_send_tcp(struct net *net,
 		  const struct sk_buff *skb, struct sk_buff *nskb,
 		  struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
 		  struct ipv6hdr *niph, struct tcphdr *nth,
 		  unsigned int tcp_hdr_size)
 {
-	struct net *net = nf_ct_net(snet->tmpl);
 	struct dst_entry *dst;
 	struct flowi6 fl6;
 
@@ -84,7 +84,7 @@ free_nskb:
 }
 
 static void
-synproxy_send_client_synack(const struct synproxy_net *snet,
+synproxy_send_client_synack(struct net *net,
 			    const struct sk_buff *skb, const struct tcphdr *th,
 			    const struct synproxy_options *opts)
 {
@@ -103,7 +103,7 @@ synproxy_send_client_synack(const struct synproxy_net *snet,
 		return;
 	skb_reserve(nskb, MAX_TCP_HEADER);
 
-	niph = synproxy_build_ip(nskb, &iph->daddr, &iph->saddr);
+	niph = synproxy_build_ip(net, nskb, &iph->daddr, &iph->saddr);
 
 	skb_reset_transport_header(nskb);
 	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
@@ -121,15 +121,16 @@ synproxy_send_client_synack(const struct synproxy_net *snet,
 
 	synproxy_build_options(nth, opts);
 
-	synproxy_send_tcp(snet, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
+	synproxy_send_tcp(net, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
 			  niph, nth, tcp_hdr_size);
 }
 
 static void
-synproxy_send_server_syn(const struct synproxy_net *snet,
+synproxy_send_server_syn(struct net *net,
 			 const struct sk_buff *skb, const struct tcphdr *th,
 			 const struct synproxy_options *opts, u32 recv_seq)
 {
+	struct synproxy_net *snet = synproxy_pernet(net);
 	struct sk_buff *nskb;
 	struct ipv6hdr *iph, *niph;
 	struct tcphdr *nth;
@@ -144,7 +145,7 @@ synproxy_send_server_syn(const struct synproxy_net *snet,
 		return;
 	skb_reserve(nskb, MAX_TCP_HEADER);
 
-	niph = synproxy_build_ip(nskb, &iph->saddr, &iph->daddr);
+	niph = synproxy_build_ip(net, nskb, &iph->saddr, &iph->daddr);
 
 	skb_reset_transport_header(nskb);
 	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
@@ -165,12 +166,12 @@ synproxy_send_server_syn(const struct synproxy_net *snet,
 
 	synproxy_build_options(nth, opts);
 
-	synproxy_send_tcp(snet, skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
+	synproxy_send_tcp(net, skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
 			  niph, nth, tcp_hdr_size);
 }
 
 static void
-synproxy_send_server_ack(const struct synproxy_net *snet,
+synproxy_send_server_ack(struct net *net,
 			 const struct ip_ct_tcp *state,
 			 const struct sk_buff *skb, const struct tcphdr *th,
 			 const struct synproxy_options *opts)
@@ -189,7 +190,7 @@ synproxy_send_server_ack(const struct synproxy_net *snet,
 		return;
 	skb_reserve(nskb, MAX_TCP_HEADER);
 
-	niph = synproxy_build_ip(nskb, &iph->daddr, &iph->saddr);
+	niph = synproxy_build_ip(net, nskb, &iph->daddr, &iph->saddr);
 
 	skb_reset_transport_header(nskb);
 	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
@@ -205,11 +206,11 @@ synproxy_send_server_ack(const struct synproxy_net *snet,
 
 	synproxy_build_options(nth, opts);
 
-	synproxy_send_tcp(snet, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
+	synproxy_send_tcp(net, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size);
 }
 
 static void
-synproxy_send_client_ack(const struct synproxy_net *snet,
+synproxy_send_client_ack(struct net *net,
 			 const struct sk_buff *skb, const struct tcphdr *th,
 			 const struct synproxy_options *opts)
 {
@@ -227,7 +228,7 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
 		return;
 	skb_reserve(nskb, MAX_TCP_HEADER);
 
-	niph = synproxy_build_ip(nskb, &iph->saddr, &iph->daddr);
+	niph = synproxy_build_ip(net, nskb, &iph->saddr, &iph->daddr);
 
 	skb_reset_transport_header(nskb);
 	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
@@ -243,15 +244,16 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
 
 	synproxy_build_options(nth, opts);
 
-	synproxy_send_tcp(snet, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
+	synproxy_send_tcp(net, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
 			  niph, nth, tcp_hdr_size);
 }
 
 static bool
-synproxy_recv_client_ack(const struct synproxy_net *snet,
+synproxy_recv_client_ack(struct net *net,
 			 const struct sk_buff *skb, const struct tcphdr *th,
 			 struct synproxy_options *opts, u32 recv_seq)
 {
+	struct synproxy_net *snet = synproxy_pernet(net);
 	int mss;
 
 	mss = __cookie_v6_check(ipv6_hdr(skb), th, ntohl(th->ack_seq) - 1);
@@ -267,7 +269,7 @@ synproxy_recv_client_ack(const struct synproxy_net *snet,
 	if (opts->options & XT_SYNPROXY_OPT_TIMESTAMP)
 		synproxy_check_timestamp_cookie(opts);
 
-	synproxy_send_server_syn(snet, skb, th, opts, recv_seq);
+	synproxy_send_server_syn(net, skb, th, opts, recv_seq);
 	return true;
 }
 
@@ -275,7 +277,8 @@ static unsigned int
 synproxy_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_synproxy_info *info = par->targinfo;
-	struct synproxy_net *snet = synproxy_pernet(par->net);
+	struct net *net = par->net;
+	struct synproxy_net *snet = synproxy_pernet(net);
 	struct synproxy_options opts = {};
 	struct tcphdr *th, _th;
 
@@ -304,12 +307,12 @@ synproxy_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 					  XT_SYNPROXY_OPT_SACK_PERM |
 					  XT_SYNPROXY_OPT_ECN);
 
-		synproxy_send_client_synack(snet, skb, th, &opts);
+		synproxy_send_client_synack(net, skb, th, &opts);
 		return NF_DROP;
 
 	} else if (th->ack && !(th->fin || th->rst || th->syn)) {
 		/* ACK from client */
-		synproxy_recv_client_ack(snet, skb, th, &opts, ntohl(th->seq));
+		synproxy_recv_client_ack(net, skb, th, &opts, ntohl(th->seq));
 		return NF_DROP;
 	}
 
@@ -320,7 +323,8 @@ static unsigned int ipv6_synproxy_hook(void *priv,
 				       struct sk_buff *skb,
 				       const struct nf_hook_state *nhs)
 {
-	struct synproxy_net *snet = synproxy_pernet(nhs->net);
+	struct net *net = nhs->net;
+	struct synproxy_net *snet = synproxy_pernet(net);
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct;
 	struct nf_conn_synproxy *synproxy;
@@ -384,7 +388,7 @@ static unsigned int ipv6_synproxy_hook(void *priv,
 			 * therefore we need to add 1 to make the SYN sequence
 			 * number match the one of first SYN.
 			 */
-			if (synproxy_recv_client_ack(snet, skb, th, &opts,
+			if (synproxy_recv_client_ack(net, skb, th, &opts,
 						     ntohl(th->seq) + 1))
 				this_cpu_inc(snet->stats->cookie_retrans);
 
@@ -410,12 +414,12 @@ static unsigned int ipv6_synproxy_hook(void *priv,
 				  XT_SYNPROXY_OPT_SACK_PERM);
 
 		swap(opts.tsval, opts.tsecr);
-		synproxy_send_server_ack(snet, state, skb, th, &opts);
+		synproxy_send_server_ack(net, state, skb, th, &opts);
 
 		nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq));
 
 		swap(opts.tsval, opts.tsecr);
-		synproxy_send_client_ack(snet, skb, th, &opts);
+		synproxy_send_client_ack(net, skb, th, &opts);
 
 		consume_skb(skb);
 		return NF_STOLEN;
-- 
2.1.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox