* Re: [PATCH] net: ehea: Mark expected switch fall-through
From: Kees Cook @ 2019-07-29 16:38 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Douglas Miller, David S. Miller, netdev, linux-kernel,
Stephen Rothwell
In-Reply-To: <20190729003009.GA25425@embeddedor>
On Sun, Jul 28, 2019 at 07:30:09PM -0500, Gustavo A. R. Silva wrote:
> Mark switch cases where we are expecting to fall through.
>
> This patch fixes the following warning:
>
> drivers/net/ethernet/ibm/ehea/ehea_main.c: In function 'ehea_mem_notifier':
> include/linux/printk.h:311:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
> printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/ethernet/ibm/ehea/ehea_main.c:3253:3: note: in expansion of macro 'pr_info'
> pr_info("memory offlining canceled");
> ^~~~~~~
> drivers/net/ethernet/ibm/ehea/ehea_main.c:3256:2: note: here
> case MEM_ONLINE:
> ^~~~
>
> Notice that, in this particular case, the code comment is
> modified in accordance with what GCC is expecting to find.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
-Kees
> ---
> drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> index 4138a8480347..cca71ba7a74a 100644
> --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> @@ -3251,7 +3251,7 @@ static int ehea_mem_notifier(struct notifier_block *nb,
> switch (action) {
> case MEM_CANCEL_OFFLINE:
> pr_info("memory offlining canceled");
> - /* Fall through: re-add canceled memory block */
> + /* Fall through - re-add canceled memory block */
>
> case MEM_ONLINE:
> pr_info("memory is going online");
> --
> 2.22.0
>
--
Kees Cook
^ permalink raw reply
* Re: [PATCH] arcnet: com90xx: Mark expected switch fall-throughs
From: David Miller @ 2019-07-29 16:38 UTC (permalink / raw)
To: gustavo; +Cc: m.grzeschik, netdev, linux-kernel, sfr, keescook
In-Reply-To: <20190729110953.GA3048@embeddedor>
From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Mon, 29 Jul 2019 06:09:53 -0500
> Mark switch cases where we are expecting to fall through.
>
> This patch fixes the following warnings (Building: powerpc allyesconfig):
...
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Applied.
^ permalink raw reply
* Re: [PATCH] arcnet: com90io: Mark expected switch fall-throughs
From: David Miller @ 2019-07-29 16:39 UTC (permalink / raw)
To: gustavo; +Cc: m.grzeschik, netdev, linux-kernel, sfr, keescook
In-Reply-To: <20190729111320.GA3193@embeddedor>
From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Mon, 29 Jul 2019 06:13:20 -0500
> Mark switch cases where we are expecting to fall through.
>
> This patch fixes the following warnings (Building: powerpc allyesconfig):
...
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: spider_net: Mark expected switch fall-through
From: Kees Cook @ 2019-07-29 16:39 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Ishizaki Kou, David S. Miller, netdev, linux-kernel,
Stephen Rothwell
In-Reply-To: <20190729003251.GA25556@embeddedor>
On Sun, Jul 28, 2019 at 07:32:51PM -0500, Gustavo A. R. Silva wrote:
> Mark switch cases where we are expecting to fall through.
>
> This patch fixes the following warning:
>
> drivers/net/ethernet/toshiba/spider_net.c: In function 'spider_net_release_tx_chain':
> drivers/net/ethernet/toshiba/spider_net.c:783:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
> if (!brutal) {
> ^
> drivers/net/ethernet/toshiba/spider_net.c:792:3: note: here
> case SPIDER_NET_DESCR_RESPONSE_ERROR:
> ^~~~
>
> Notice that, in this particular case, the code comment is
> modified in accordance with what GCC is expecting to find.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
-Kees
> ---
> drivers/net/ethernet/toshiba/spider_net.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/toshiba/spider_net.c b/drivers/net/ethernet/toshiba/spider_net.c
> index 5b196ebfed49..0f346761a2b2 100644
> --- a/drivers/net/ethernet/toshiba/spider_net.c
> +++ b/drivers/net/ethernet/toshiba/spider_net.c
> @@ -788,6 +788,7 @@ spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
> /* fallthrough, if we release the descriptors
> * brutally (then we don't care about
> * SPIDER_NET_DESCR_CARDOWNED) */
> + /* Fall through */
>
> case SPIDER_NET_DESCR_RESPONSE_ERROR:
> case SPIDER_NET_DESCR_PROTECTION_ERROR:
> --
> 2.22.0
>
--
Kees Cook
^ permalink raw reply
* Re: [PATCH] arcnet: arc-rimi: Mark expected switch fall-throughs
From: David Miller @ 2019-07-29 16:39 UTC (permalink / raw)
To: gustavo; +Cc: m.grzeschik, netdev, linux-kernel, sfr, keescook
In-Reply-To: <20190729111550.GA3327@embeddedor>
From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Mon, 29 Jul 2019 06:15:50 -0500
> Mark switch cases where we are expecting to fall through.
>
> This patch fixes the following warnings (Building: powerpc allyesconfig):
...
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] MAINTAINERS: Remove mailing-list entry for XDP (eXpress Data Path)
From: David Miller @ 2019-07-29 16:39 UTC (permalink / raw)
To: brouer
Cc: xdp-newbies, netdev, bpf, ast, daniel, jakub.kicinski,
john.fastabend
In-Reply-To: <156440259790.6123.1563221733550893420.stgit@carbon>
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Mon, 29 Jul 2019 14:16:37 +0200
> This removes the mailing list xdp-newbies@vger.kernel.org from the XDP
> kernel maintainers entry.
>
> Being in the kernel MAINTAINERS file successfully caused the list to
> receive kbuild bot warnings, syzbot reports and sometimes developer
> patches. The level of details in these messages, doesn't match the
> target audience of the XDP-newbies list. This is based on a survey on
> the mailing list, where 73% voted for removal from MAINTAINERS file.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
I'll apply this to 'net', thanks Jesper.
^ permalink raw reply
* Re: [PATCH] net/af_iucv: mark expected switch fall-throughs
From: Kees Cook @ 2019-07-29 16:39 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Julian Wiedmann, Ursula Braun, David S. Miller, linux-s390,
netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190729145947.GA9494@embeddedor>
On Mon, Jul 29, 2019 at 09:59:47AM -0500, Gustavo A. R. Silva wrote:
> Mark switch cases where we are expecting to fall through.
>
> This patch fixes the following warnings:
>
> net/iucv/af_iucv.c: warning: this statement may fall
> through [-Wimplicit-fallthrough=]: => 537:3, 519:6, 2246:6, 510:6
>
> Notice that, in this particular case, the code comment is
> modified in accordance with what GCC is expecting to find.
>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
-Kees
> ---
> net/iucv/af_iucv.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
> index 09e1694b6d34..ebb62a4ebe30 100644
> --- a/net/iucv/af_iucv.c
> +++ b/net/iucv/af_iucv.c
> @@ -512,7 +512,9 @@ static void iucv_sock_close(struct sock *sk)
> sk->sk_state = IUCV_DISCONN;
> sk->sk_state_change(sk);
> }
> - case IUCV_DISCONN: /* fall through */
> + /* fall through */
> +
> + case IUCV_DISCONN:
> sk->sk_state = IUCV_CLOSING;
> sk->sk_state_change(sk);
>
> @@ -525,8 +527,9 @@ static void iucv_sock_close(struct sock *sk)
> iucv_sock_in_state(sk, IUCV_CLOSED, 0),
> timeo);
> }
> + /* fall through */
>
> - case IUCV_CLOSING: /* fall through */
> + case IUCV_CLOSING:
> sk->sk_state = IUCV_CLOSED;
> sk->sk_state_change(sk);
>
> @@ -535,8 +538,9 @@ static void iucv_sock_close(struct sock *sk)
>
> skb_queue_purge(&iucv->send_skb_q);
> skb_queue_purge(&iucv->backlog_skb_q);
> + /* fall through */
>
> - default: /* fall through */
> + default:
> iucv_sever_path(sk, 1);
> }
>
> @@ -2247,10 +2251,10 @@ static int afiucv_hs_rcv(struct sk_buff *skb, struct net_device *dev,
> kfree_skb(skb);
> break;
> }
> - /* fall through and receive non-zero length data */
> + /* fall through - and receive non-zero length data */
> case (AF_IUCV_FLAG_SHT):
> /* shutdown request */
> - /* fall through and receive zero length data */
> + /* fall through - and receive zero length data */
> case 0:
> /* plain data frame */
> IUCV_SKB_CB(skb)->class = trans_hdr->iucv_hdr.class;
> --
> 2.22.0
>
--
Kees Cook
^ permalink raw reply
* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Stefano Garzarella @ 2019-07-29 16:41 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190729115904-mutt-send-email-mst@kernel.org>
On Mon, Jul 29, 2019 at 12:01:37PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > In order to do more precise accounting maybe we can use the buffer size,
> > instead of payload size when we update the credit available.
> > In this way, the credit available for each socket will reflect the memory
> > actually used.
> >
> > I should check better, because I'm not sure what happen if the peer sees
> > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > buffer).
> > The other option is to copy each packet in a new buffer like I did in
> > the v2 [2], but this forces us to make a copy for each packet that does
> > not fill the entire buffer, perhaps too expensive.
> >
> > [2] https://patchwork.kernel.org/patch/10938741/
> >
>
> So one thing we can easily do is to under-report the
> available credit. E.g. if we copy up to 256bytes,
> then report just 256bytes for every buffer in the queue.
>
Ehm sorry, I got lost :(
Can you explain better?
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH] isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
From: David Miller @ 2019-07-29 16:46 UTC (permalink / raw)
To: juliana.rodrigueiro; +Cc: isdn, netdev
In-Reply-To: <2635856.so0i2TFZOM@rocinante.m.i2n>
From: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Date: Mon, 29 Jul 2019 10:20:56 +0200
> @@ -1705,12 +1705,22 @@ static int
> setup_hfcsusb(struct hfcsusb *hw)
> {
> u_char b;
> + int ret;
> + void *dmabuf = kmalloc(sizeof(u_char), GFP_KERNEL);
Please order these local variable declarations from longest to shortest line.
Thank you.
^ permalink raw reply
* Re: [PATCH v3] net: sched: Fix a possible null-pointer dereference in dequeue_func()
From: David Miller @ 2019-07-29 16:47 UTC (permalink / raw)
To: baijiaju1990; +Cc: jhs, xiyou.wangcong, jiri, netdev, linux-kernel
In-Reply-To: <20190729082433.28981-1-baijiaju1990@gmail.com>
From: Jia-Ju Bai <baijiaju1990@gmail.com>
Date: Mon, 29 Jul 2019 16:24:33 +0800
> In dequeue_func(), there is an if statement on line 74 to check whether
> skb is NULL:
> if (skb)
>
> When skb is NULL, it is used on line 77:
> prefetch(&skb->end);
>
> Thus, a possible null-pointer dereference may occur.
>
> To fix this bug, skb->end is used when skb is not NULL.
>
> This bug is found by a static analysis tool STCheck written by us.
>
> Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH] net: ag71xx: use resource_size for the ioremap size
From: David Miller @ 2019-07-29 16:48 UTC (permalink / raw)
To: dingxiang; +Cc: jcliburn, chris.snook, netdev, linux-kernel
In-Reply-To: <1564390882-28002-1-git-send-email-dingxiang@cmss.chinamobile.com>
From: Ding Xiang <dingxiang@cmss.chinamobile.com>
Date: Mon, 29 Jul 2019 17:01:22 +0800
> use resource_size to calcuate ioremap size and make
> the code simpler.
>
> Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH net v2] net: bridge: delete local fdb on device init failure
From: David Miller @ 2019-07-29 16:50 UTC (permalink / raw)
To: nikolay; +Cc: netdev, roopa, bridge, syzbot+88533dc8b582309bf3ee
In-Reply-To: <20190729092841.4260-1-nikolay@cumulusnetworks.com>
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Mon, 29 Jul 2019 12:28:41 +0300
> On initialization failure we have to delete the local fdb which was
> inserted due to the default pvid creation. This problem has been present
> since the inception of default_pvid. Note that currently there are 2 cases:
> 1) in br_dev_init() when br_multicast_init() fails
> 2) if register_netdevice() fails after calling ndo_init()
>
> This patch takes care of both since br_vlan_flush() is called on both
> occasions. Also the new fdb delete would be a no-op on normal bridge
> device destruction since the local fdb would've been already flushed by
> br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
> called last when adding a port thus nothing can fail after it.
>
> Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> v2: reworded the commit message and comment so they're not plural, we're
> talking about a single bridge local fdb added on the init vlan creation
> of the default pvid
>
> Tested with the provided reproducer and can no longer trigger the leak.
> Also tested the br_multicast_init() failure manually by making it always
> return an error.
Applied and queued up for -stable, thank you.
^ permalink raw reply
* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Stefano Garzarella @ 2019-07-29 16:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190729161903.yhaj5rfcvleexkhc@steredhat>
On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > In order to do more precise accounting maybe we can use the buffer size,
> > > instead of payload size when we update the credit available.
> > > In this way, the credit available for each socket will reflect the memory
> > > actually used.
> > >
> > > I should check better, because I'm not sure what happen if the peer sees
> > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > buffer).
> > >
> > > The other option is to copy each packet in a new buffer like I did in
> > > the v2 [2], but this forces us to make a copy for each packet that does
> > > not fill the entire buffer, perhaps too expensive.
> > >
> > > [2] https://patchwork.kernel.org/patch/10938741/
> > >
> > >
> > > Thanks,
> > > Stefano
> >
> > Interesting. You are right, and at some level the protocol forces copies.
> >
> > We could try to detect that the actual memory is getting close to
> > admin limits and force copies on queued packets after the fact.
> > Is that practical?
>
> Yes, I think it is doable!
> We can decrease the credit available with the buffer size queued, and
> when the buffer size of packet to queue is bigger than the credit
> available, we can copy it.
>
> >
> > And yes we can extend the credit accounting to include buffer size.
> > That's a protocol change but maybe it makes sense.
>
> Since we send to the other peer the credit available, maybe this
> change can be backwards compatible (I'll check better this).
What I said was wrong.
We send a counter (increased when the user consumes the packets) and the
"buf_alloc" (the max memory allowed) to the other peer.
It makes a difference between a local counter (increased when the
packets are sent) and the remote counter to calculate the credit available:
u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
{
u32 ret;
spin_lock_bh(&vvs->tx_lock);
ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
if (ret > credit)
ret = credit;
vvs->tx_cnt += ret;
spin_unlock_bh(&vvs->tx_lock);
return ret;
}
Maybe I can play with "buf_alloc" to take care of bytes queued but not
used.
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
From: Jakub Kicinski @ 2019-07-29 16:51 UTC (permalink / raw)
To: wenxu; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <c405ff42-dfc5-6f3f-061c-7788e1204afa@ucloud.cn>
On Mon, 29 Jul 2019 15:05:34 +0800, wenxu wrote:
> On 7/29/2019 12:42 PM, Jakub Kicinski wrote:
> > On Mon, 29 Jul 2019 10:43:56 +0800, wenxu wrote:
> >> On 7/29/2019 4:16 AM, Jakub Kicinski wrote:
> >>> I don't know the nft code, but it seems unlikely it wouldn't have the
> >>> same problem/need..
> >> nft don't have the same problem. The offload rule can only attached
> >> to offload base chain.
> >>
> >> Th offload base chain is created after the device driver loaded (the
> >> device exist).
> > For indirect blocks the block is on the tunnel device and the offload
> > target is another device. E.g. you offload rules from a VXLAN device
> > onto the ASIC. The ASICs driver does not have to be loaded when VXLAN
> > device is created.
> >
> > So I feel like either the chain somehow directly references the offload
> > target (in which case the indirect infrastructure with hash lookup etc
> > is not needed for nft), or indirect infra is needed, and we need to take
> > care of replays.
>
> So you mean the case is there are two card A and B both can offload vxlan.
>
> First vxlan device offload with A. And then the B driver loaded, So the rules
> should replay to B device?
That'd be one example, yes.
^ permalink raw reply
* Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
From: Jakub Kicinski @ 2019-07-29 16:55 UTC (permalink / raw)
To: wenxu; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <449a5603-80e9-ad7d-5c02-bf57558f9603@ucloud.cn>
On Mon, 29 Jul 2019 15:18:03 +0800, wenxu wrote:
> On 7/29/2019 12:42 PM, Jakub Kicinski wrote:
> > On Mon, 29 Jul 2019 10:43:56 +0800, wenxu wrote:
> >> On 7/29/2019 4:16 AM, Jakub Kicinski wrote:
> >>> I don't know the nft code, but it seems unlikely it wouldn't have the
> >>> same problem/need..
> >> nft don't have the same problem. The offload rule can only attached
> >> to offload base chain.
> >>
> >> Th offload base chain is created after the device driver loaded (the
> >> device exist).
> > For indirect blocks the block is on the tunnel device and the offload
> > target is another device. E.g. you offload rules from a VXLAN device
> > onto the ASIC. The ASICs driver does not have to be loaded when VXLAN
> > device is created.
> >
> > So I feel like either the chain somehow directly references the offload
> > target (in which case the indirect infrastructure with hash lookup etc
> > is not needed for nft), or indirect infra is needed, and we need to take
> > care of replays.
>
> I think the nft is different with tc.
>
> In tc case we can create vxlan device add a ingress qdisc with a block success
>
> Then the ASIC driver loaded, then register the vxlan indr-dev and get the block
> adn replay it to hardware
>
> But in the nft case, The base chain flags with offload. Create an offload netdev
> base chain on vxlan device will fail if there is no indr-device to offload.
Can you show us the offload chain spec? Does it specify offload to the
vxlan device or the ASIC device?
Indir-devs can come and go, how do you handle a situation where offload
chain was installed with indir listener present, but then the ASIC
driver got removed?
^ permalink raw reply
* [bpf-next,v2 0/6] Introduce a BPF helper to generate SYN cookies
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
From: Petar Penkov <ppenkov@google.com>
This patch series introduces a BPF helper function that allows generating SYN
cookies from BPF. Currently, this helper is enabled at both the TC hook and the
XDP hook.
The first two patches in the series add/modify several TCP helper functions to
allow for SKB-less operation, as is the case at the XDP hook.
The third patch introduces the bpf_tcp_gen_syncookie helper function which
generates a SYN cookie for either XDP or TC programs. The return value of
this function contains both the MSS value, encoded in the cookie, and the
cookie itself.
The last three patches sync tools/ and add a test.
Performance evaluation:
I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from
random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being
sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they
carry no options. If I attached a simple program that checks if a packet is
IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after
swapping src/dest, recomputing the checksum, and setting the ACK flag, I
observed 10Mpps being sent back out.
Changes since v1:
1/ Added performance numbers to the cover letter
2/ Patch 2: Refactored a bit to fix compilation issues
3/ Patch 3: Changed ENOTSUPP to EOPNOTSUPP at Toke's suggestion
Changes since RFC:
1/ Cookie is returned in host order at Alexei's suggestion
2/ If cookies are not enabled via a sysctl, the helper function returns
-ENOENT instead of -EINVAL at Lorenz's suggestion
3/ Fixed documentation to properly reflect that MSS is 16 bits at
Lorenz's suggestion
4/ BPF helper requires TCP length to match ->doff field, rather than to simply
be no more than 20 bytes at Eric and Alexei's suggestion
5/ Packet type is looked up from the packet version field, rather than from the
socket. v4 packets are rejected on v6-only sockets but should work with
dual stack listeners at Eric's suggestion
6/ Removed unnecessary `net` argument from helper function in patch 2 at
Lorenz's suggestion
7/ Changed test to only pass MSS option so we can convince the verifier that the
memory access is not out of bounds
Note that 7/ below illustrates the verifier might need to be extended to allow
passing a variable tcph->doff to the helper function like below:
__u32 thlen = tcph->doff * 4;
if (thlen < sizeof(*tcph))
return;
__s64 cookie = bpf_tcp_gen_syncookie(sk, ipv4h, 20, tcph, thlen);
Petar Penkov (6):
tcp: tcp_syn_flood_action read port from socket
tcp: add skb-less helpers to retrieve SYN cookie
bpf: add bpf_tcp_gen_syncookie helper
bpf: sync bpf.h to tools/
selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
selftests/bpf: add test for bpf_tcp_gen_syncookie
include/net/tcp.h | 10 +++
include/uapi/linux/bpf.h | 30 ++++++-
net/core/filter.c | 73 +++++++++++++++++
net/ipv4/tcp_input.c | 81 +++++++++++++++++--
net/ipv4/tcp_ipv4.c | 15 ++++
net/ipv6/tcp_ipv6.c | 15 ++++
tools/include/uapi/linux/bpf.h | 37 ++++++++-
tools/testing/selftests/bpf/bpf_helpers.h | 3 +
.../bpf/progs/test_tcp_check_syncookie_kern.c | 48 +++++++++--
.../selftests/bpf/test_tcp_check_syncookie.sh | 3 +
.../bpf/test_tcp_check_syncookie_user.c | 61 ++++++++++++--
11 files changed, 354 insertions(+), 22 deletions(-)
--
2.22.0.709.g102302147b-goog
^ permalink raw reply
* [bpf-next,v2 1/6] tcp: tcp_syn_flood_action read port from socket
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>
From: Petar Penkov <ppenkov@google.com>
This allows us to call this function before an SKB has been
allocated.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
---
net/ipv4/tcp_input.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a22fb3b..8892df6de1d4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6422,9 +6422,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc);
/*
* Return true if a syncookie should be sent
*/
-static bool tcp_syn_flood_action(const struct sock *sk,
- const struct sk_buff *skb,
- const char *proto)
+static bool tcp_syn_flood_action(const struct sock *sk, const char *proto)
{
struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
const char *msg = "Dropping request";
@@ -6444,7 +6442,7 @@ static bool tcp_syn_flood_action(const struct sock *sk,
net->ipv4.sysctl_tcp_syncookies != 2 &&
xchg(&queue->synflood_warned, 1) == 0)
net_info_ratelimited("%s: Possible SYN flooding on port %d. %s. Check SNMP counters.\n",
- proto, ntohs(tcp_hdr(skb)->dest), msg);
+ proto, sk->sk_num, msg);
return want_cookie;
}
@@ -6487,7 +6485,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
*/
if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
inet_csk_reqsk_queue_is_full(sk)) && !isn) {
- want_cookie = tcp_syn_flood_action(sk, skb, rsk_ops->slab_name);
+ want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
if (!want_cookie)
goto drop;
}
--
2.22.0.709.g102302147b-goog
^ permalink raw reply related
* [bpf-next,v2 2/6] tcp: add skb-less helpers to retrieve SYN cookie
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>
From: Petar Penkov <ppenkov@google.com>
This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
---
include/net/tcp.h | 10 ++++++
net/ipv4/tcp_input.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/tcp_ipv4.c | 15 +++++++++
net/ipv6/tcp_ipv6.c | 15 +++++++++
4 files changed, 113 insertions(+)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e5cf514ba118..fb7e153aecc5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -414,6 +414,16 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
int estab, struct tcp_fastopen_cookie *foc);
const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
+/*
+ * BPF SKB-less helpers
+ */
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+ struct tcphdr *th, u32 *cookie);
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+ struct tcphdr *th, u32 *cookie);
+u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops,
+ const struct tcp_request_sock_ops *af_ops,
+ struct sock *sk, struct tcphdr *th);
/*
* TCP v4 functions exported for the inet6 API
*/
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8892df6de1d4..706cbb3b2986 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3782,6 +3782,49 @@ static void smc_parse_options(const struct tcphdr *th,
#endif
}
+/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
+ * value on success.
+ */
+static u16 tcp_parse_mss_option(const struct tcphdr *th, u16 user_mss)
+{
+ const unsigned char *ptr = (const unsigned char *)(th + 1);
+ int length = (th->doff * 4) - sizeof(struct tcphdr);
+ u16 mss = 0;
+
+ while (length > 0) {
+ int opcode = *ptr++;
+ int opsize;
+
+ switch (opcode) {
+ case TCPOPT_EOL:
+ return mss;
+ case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */
+ length--;
+ continue;
+ default:
+ if (length < 2)
+ return mss;
+ opsize = *ptr++;
+ if (opsize < 2) /* "silly options" */
+ return mss;
+ if (opsize > length)
+ return mss; /* fail on partial options */
+ if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
+ u16 in_mss = get_unaligned_be16(ptr);
+
+ if (in_mss) {
+ if (user_mss && user_mss < in_mss)
+ in_mss = user_mss;
+ mss = in_mss;
+ }
+ }
+ ptr += opsize - 2;
+ length -= opsize;
+ }
+ }
+ return mss;
+}
+
/* Look for tcp options. Normally only called on SYN and SYNACK packets.
* But, this can also be called on packets in the established flow when
* the fast version below fails.
@@ -6464,6 +6507,36 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
}
}
+/* If a SYN cookie is required and supported, returns a clamped MSS value to be
+ * used for SYN cookie generation.
+ */
+u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops,
+ const struct tcp_request_sock_ops *af_ops,
+ struct sock *sk, struct tcphdr *th)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+ u16 mss;
+
+ if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
+ !inet_csk_reqsk_queue_is_full(sk))
+ return 0;
+
+ if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
+ return 0;
+
+ if (sk_acceptq_is_full(sk)) {
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+ return 0;
+ }
+
+ mss = tcp_parse_mss_option(th, tp->rx_opt.user_mss);
+ if (!mss)
+ mss = af_ops->mss_clamp;
+
+ return mss;
+}
+EXPORT_SYMBOL_GPL(tcp_get_syncookie_mss);
+
int tcp_conn_request(struct request_sock_ops *rsk_ops,
const struct tcp_request_sock_ops *af_ops,
struct sock *sk, struct sk_buff *skb)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d57641cb3477..10217393cda6 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1515,6 +1515,21 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
return sk;
}
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+ struct tcphdr *th, u32 *cookie)
+{
+ u16 mss = 0;
+#ifdef CONFIG_SYN_COOKIES
+ mss = tcp_get_syncookie_mss(&tcp_request_sock_ops,
+ &tcp_request_sock_ipv4_ops, sk, th);
+ if (mss) {
+ *cookie = __cookie_v4_init_sequence(iph, th, &mss);
+ tcp_synq_overflow(sk);
+ }
+#endif
+ return mss;
+}
+
/* The socket must have it's spinlock held when we get
* here, unless it is a TCP_LISTEN socket.
*
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5da069e91cac..87f44d3250ee 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1063,6 +1063,21 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
return sk;
}
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+ struct tcphdr *th, u32 *cookie)
+{
+ u16 mss = 0;
+#ifdef CONFIG_SYN_COOKIES
+ mss = tcp_get_syncookie_mss(&tcp6_request_sock_ops,
+ &tcp_request_sock_ipv6_ops, sk, th);
+ if (mss) {
+ *cookie = __cookie_v6_init_sequence(iph, th, &mss);
+ tcp_synq_overflow(sk);
+ }
+#endif
+ return mss;
+}
+
static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
{
if (skb->protocol == htons(ETH_P_IP))
--
2.22.0.709.g102302147b-goog
^ permalink raw reply related
* [bpf-next,v2 5/6] selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>
From: Petar Penkov <ppenkov@google.com>
Expose bpf_tcp_gen_syncookie to selftests.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
---
tools/testing/selftests/bpf/bpf_helpers.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index f804f210244e..120aa86c58d3 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -228,6 +228,9 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
(void *)BPF_FUNC_sk_storage_delete;
static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
+static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
+ int ip_len, void *tcp, int tcp_len) =
+ (void *) BPF_FUNC_tcp_gen_syncookie;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
--
2.22.0.709.g102302147b-goog
^ permalink raw reply related
* [bpf-next,v2 4/6] bpf: sync bpf.h to tools/
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>
From: Petar Penkov <ppenkov@google.com>
Sync updated documentation for bpf_redirect_map.
Sync the bpf_tcp_gen_syncookie helper function definition with the one
in tools/uapi.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
---
tools/include/uapi/linux/bpf.h | 37 +++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3d7fc67ec1b8..5a54f1011db8 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1571,8 +1571,11 @@ union bpf_attr {
* but this is only implemented for native XDP (with driver
* support) as of this writing).
*
- * All values for *flags* are reserved for future usage, and must
- * be left at zero.
+ * The lower two bits of *flags* are used as the return code if
+ * the map lookup fails. This is so that the return value can be
+ * one of the XDP program return codes up to XDP_TX, as chosen by
+ * the caller. Any higher bits in the *flags* argument must be
+ * unset.
*
* When used to redirect packets to net devices, this helper
* provides a high performance increase over **bpf_redirect**\ ().
@@ -2710,6 +2713,33 @@ union bpf_attr {
* **-EPERM** if no permission to send the *sig*.
*
* **-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ * Description
+ * Try to issue a SYN cookie for the packet with corresponding
+ * IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ * *iph* points to the start of the IPv4 or IPv6 header, while
+ * *iph_len* contains **sizeof**\ (**struct iphdr**) or
+ * **sizeof**\ (**struct ip6hdr**).
+ *
+ * *th* points to the start of the TCP header, while *th_len*
+ * contains the length of the TCP header.
+ *
+ * Return
+ * On success, lower 32 bits hold the generated SYN cookie in
+ * followed by 16 bits which hold the MSS value for that cookie,
+ * and the top 16 bits are unused.
+ *
+ * On failure, the returned value is one of the following:
+ *
+ * **-EINVAL** SYN cookie cannot be issued due to error
+ *
+ * **-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2821,7 +2851,8 @@ union bpf_attr {
FN(strtoul), \
FN(sk_storage_get), \
FN(sk_storage_delete), \
- FN(send_signal),
+ FN(send_signal), \
+ FN(tcp_gen_syncookie),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
--
2.22.0.709.g102302147b-goog
^ permalink raw reply related
* [bpf-next,v2 6/6] selftests/bpf: add test for bpf_tcp_gen_syncookie
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>
From: Petar Penkov <ppenkov@google.com>
Modify the existing bpf_tcp_check_syncookie test to also generate a
SYN cookie, pass the packet to the kernel, and verify that the two
cookies are the same (and both valid). Since cloned SKBs are skipped
during generic XDP, this test does not issue a SYN cookie when run in
XDP mode. We therefore only check that a valid SYN cookie was issued at
the TC hook.
Additionally, verify that the MSS for that SYN cookie is within
expected range.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
---
.../bpf/progs/test_tcp_check_syncookie_kern.c | 48 +++++++++++++--
.../selftests/bpf/test_tcp_check_syncookie.sh | 3 +
.../bpf/test_tcp_check_syncookie_user.c | 61 ++++++++++++++++---
3 files changed, 99 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
index 1ab095bcacd8..d8803dfa8d32 100644
--- a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
@@ -19,10 +19,29 @@
struct bpf_map_def SEC("maps") results = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
- .value_size = sizeof(__u64),
- .max_entries = 1,
+ .value_size = sizeof(__u32),
+ .max_entries = 3,
};
+static __always_inline __s64 gen_syncookie(void *data_end, struct bpf_sock *sk,
+ void *iph, __u32 ip_size,
+ struct tcphdr *tcph)
+{
+ __u32 thlen = tcph->doff * 4;
+
+ if (tcph->syn && !tcph->ack) {
+ // packet should only have an MSS option
+ if (thlen != 24)
+ return 0;
+
+ if ((void *)tcph + thlen > data_end)
+ return 0;
+
+ return bpf_tcp_gen_syncookie(sk, iph, ip_size, tcph, thlen);
+ }
+ return 0;
+}
+
static __always_inline void check_syncookie(void *ctx, void *data,
void *data_end)
{
@@ -33,8 +52,10 @@ static __always_inline void check_syncookie(void *ctx, void *data,
struct ipv6hdr *ipv6h;
struct tcphdr *tcph;
int ret;
+ __u32 key_mss = 2;
+ __u32 key_gen = 1;
__u32 key = 0;
- __u64 value = 1;
+ __s64 seq_mss;
ethh = data;
if (ethh + 1 > data_end)
@@ -66,6 +87,9 @@ static __always_inline void check_syncookie(void *ctx, void *data,
if (sk->state != BPF_TCP_LISTEN)
goto release;
+ seq_mss = gen_syncookie(data_end, sk, ipv4h, sizeof(*ipv4h),
+ tcph);
+
ret = bpf_tcp_check_syncookie(sk, ipv4h, sizeof(*ipv4h),
tcph, sizeof(*tcph));
break;
@@ -95,6 +119,9 @@ static __always_inline void check_syncookie(void *ctx, void *data,
if (sk->state != BPF_TCP_LISTEN)
goto release;
+ seq_mss = gen_syncookie(data_end, sk, ipv6h, sizeof(*ipv6h),
+ tcph);
+
ret = bpf_tcp_check_syncookie(sk, ipv6h, sizeof(*ipv6h),
tcph, sizeof(*tcph));
break;
@@ -103,8 +130,19 @@ static __always_inline void check_syncookie(void *ctx, void *data,
return;
}
- if (ret == 0)
- bpf_map_update_elem(&results, &key, &value, 0);
+ if (seq_mss > 0) {
+ __u32 cookie = (__u32)seq_mss;
+ __u32 mss = seq_mss >> 32;
+
+ bpf_map_update_elem(&results, &key_gen, &cookie, 0);
+ bpf_map_update_elem(&results, &key_mss, &mss, 0);
+ }
+
+ if (ret == 0) {
+ __u32 cookie = bpf_ntohl(tcph->ack_seq) - 1;
+
+ bpf_map_update_elem(&results, &key, &cookie, 0);
+ }
release:
bpf_sk_release(sk);
diff --git a/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh b/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh
index d48e51716d19..9b3617d770a5 100755
--- a/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh
+++ b/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh
@@ -37,6 +37,9 @@ setup()
ns1_exec ip link set lo up
ns1_exec sysctl -w net.ipv4.tcp_syncookies=2
+ ns1_exec sysctl -w net.ipv4.tcp_window_scaling=0
+ ns1_exec sysctl -w net.ipv4.tcp_timestamps=0
+ ns1_exec sysctl -w net.ipv4.tcp_sack=0
wait_for_ip 127.0.0.1
wait_for_ip ::1
diff --git a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
index 87829c86c746..b9e991d43155 100644
--- a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
+++ b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
@@ -2,6 +2,7 @@
// Copyright (c) 2018 Facebook
// Copyright (c) 2019 Cloudflare
+#include <limits.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
@@ -77,7 +78,7 @@ static int connect_to_server(int server_fd)
return fd;
}
-static int get_map_fd_by_prog_id(int prog_id)
+static int get_map_fd_by_prog_id(int prog_id, bool *xdp)
{
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
@@ -104,6 +105,8 @@ static int get_map_fd_by_prog_id(int prog_id)
goto err;
}
+ *xdp = info.type == BPF_PROG_TYPE_XDP;
+
map_fd = bpf_map_get_fd_by_id(map_ids[0]);
if (map_fd < 0)
log_err("Failed to get fd by map id %d", map_ids[0]);
@@ -113,18 +116,32 @@ static int get_map_fd_by_prog_id(int prog_id)
return map_fd;
}
-static int run_test(int server_fd, int results_fd)
+static int run_test(int server_fd, int results_fd, bool xdp)
{
int client = -1, srv_client = -1;
int ret = 0;
__u32 key = 0;
- __u64 value = 0;
+ __u32 key_gen = 1;
+ __u32 key_mss = 2;
+ __u32 value = 0;
+ __u32 value_gen = 0;
+ __u32 value_mss = 0;
if (bpf_map_update_elem(results_fd, &key, &value, 0) < 0) {
log_err("Can't clear results");
goto err;
}
+ if (bpf_map_update_elem(results_fd, &key_gen, &value_gen, 0) < 0) {
+ log_err("Can't clear results");
+ goto err;
+ }
+
+ if (bpf_map_update_elem(results_fd, &key_mss, &value_mss, 0) < 0) {
+ log_err("Can't clear results");
+ goto err;
+ }
+
client = connect_to_server(server_fd);
if (client == -1)
goto err;
@@ -140,8 +157,35 @@ static int run_test(int server_fd, int results_fd)
goto err;
}
- if (value != 1) {
- log_err("Didn't match syncookie: %llu", value);
+ if (value == 0) {
+ log_err("Didn't match syncookie: %u", value);
+ goto err;
+ }
+
+ if (bpf_map_lookup_elem(results_fd, &key_gen, &value_gen) < 0) {
+ log_err("Can't lookup result");
+ goto err;
+ }
+
+ if (xdp && value_gen == 0) {
+ // SYN packets do not get passed through generic XDP, skip the
+ // rest of the test.
+ printf("Skipping XDP cookie check\n");
+ goto out;
+ }
+
+ if (bpf_map_lookup_elem(results_fd, &key_mss, &value_mss) < 0) {
+ log_err("Can't lookup result");
+ goto err;
+ }
+
+ if (value != value_gen) {
+ log_err("BPF generated cookie does not match kernel one");
+ goto err;
+ }
+
+ if (value_mss < 536 || value_mss > USHRT_MAX) {
+ log_err("Unexpected MSS retrieved");
goto err;
}
@@ -163,13 +207,14 @@ int main(int argc, char **argv)
int server_v6 = -1;
int results = -1;
int err = 0;
+ bool xdp;
if (argc < 2) {
fprintf(stderr, "Usage: %s prog_id\n", argv[0]);
exit(1);
}
- results = get_map_fd_by_prog_id(atoi(argv[1]));
+ results = get_map_fd_by_prog_id(atoi(argv[1]), &xdp);
if (results < 0) {
log_err("Can't get map");
goto err;
@@ -194,10 +239,10 @@ int main(int argc, char **argv)
if (server_v6 == -1)
goto err;
- if (run_test(server, results))
+ if (run_test(server, results, xdp))
goto err;
- if (run_test(server_v6, results))
+ if (run_test(server_v6, results, xdp))
goto err;
printf("ok\n");
--
2.22.0.709.g102302147b-goog
^ permalink raw reply related
* [bpf-next,v2 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-29 16:59 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, toke, Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>
From: Petar Penkov <ppenkov@google.com>
This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
---
include/uapi/linux/bpf.h | 30 ++++++++++++++++-
net/core/filter.c | 73 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e985f07a98ed..5a54f1011db8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2713,6 +2713,33 @@ union bpf_attr {
* **-EPERM** if no permission to send the *sig*.
*
* **-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ * Description
+ * Try to issue a SYN cookie for the packet with corresponding
+ * IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ * *iph* points to the start of the IPv4 or IPv6 header, while
+ * *iph_len* contains **sizeof**\ (**struct iphdr**) or
+ * **sizeof**\ (**struct ip6hdr**).
+ *
+ * *th* points to the start of the TCP header, while *th_len*
+ * contains the length of the TCP header.
+ *
+ * Return
+ * On success, lower 32 bits hold the generated SYN cookie in
+ * followed by 16 bits which hold the MSS value for that cookie,
+ * and the top 16 bits are unused.
+ *
+ * On failure, the returned value is one of the following:
+ *
+ * **-EINVAL** SYN cookie cannot be issued due to error
+ *
+ * **-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2824,7 +2851,8 @@ union bpf_attr {
FN(strtoul), \
FN(sk_storage_get), \
FN(sk_storage_delete), \
- FN(send_signal),
+ FN(send_signal), \
+ FN(tcp_gen_syncookie),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 3961437ccc50..d7848d6944ff 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5850,6 +5850,75 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
.arg5_type = ARG_CONST_SIZE,
};
+BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
+ struct tcphdr *, th, u32, th_len)
+{
+#ifdef CONFIG_SYN_COOKIES
+ u32 cookie;
+ u16 mss;
+
+ if (unlikely(th_len < sizeof(*th) || th_len != th->doff * 4))
+ return -EINVAL;
+
+ if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
+ return -EINVAL;
+
+ if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
+ return -ENOENT;
+
+ if (!th->syn || th->ack || th->fin || th->rst)
+ return -EINVAL;
+
+ if (unlikely(iph_len < sizeof(struct iphdr)))
+ return -EINVAL;
+
+ /* Both struct iphdr and struct ipv6hdr have the version field at the
+ * same offset so we can cast to the shorter header (struct iphdr).
+ */
+ switch (((struct iphdr *)iph)->version) {
+ case 4:
+ if (sk->sk_family == AF_INET6 && sk->sk_ipv6only)
+ return -EINVAL;
+
+ mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
+ break;
+
+#if IS_BUILTIN(CONFIG_IPV6)
+ case 6:
+ if (unlikely(iph_len < sizeof(struct ipv6hdr)))
+ return -EINVAL;
+
+ if (sk->sk_family != AF_INET6)
+ return -EINVAL;
+
+ mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
+ break;
+#endif /* CONFIG_IPV6 */
+
+ default:
+ return -EPROTONOSUPPORT;
+ }
+ if (mss <= 0)
+ return -ENOENT;
+
+ return cookie | ((u64)mss << 32);
+#else
+ return -EOPNOTSUPP;
+#endif /* CONFIG_SYN_COOKIES */
+}
+
+static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
+ .func = bpf_tcp_gen_syncookie,
+ .gpl_only = true, /* __cookie_v*_init_sequence() is GPL */
+ .pkt_access = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_SOCK_COMMON,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_PTR_TO_MEM,
+ .arg5_type = ARG_CONST_SIZE,
+};
+
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func)
@@ -6139,6 +6208,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_tcp_check_syncookie_proto;
case BPF_FUNC_skb_ecn_set_ce:
return &bpf_skb_ecn_set_ce_proto;
+ case BPF_FUNC_tcp_gen_syncookie:
+ return &bpf_tcp_gen_syncookie_proto;
#endif
default:
return bpf_base_func_proto(func_id);
@@ -6178,6 +6249,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_xdp_skc_lookup_tcp_proto;
case BPF_FUNC_tcp_check_syncookie:
return &bpf_tcp_check_syncookie_proto;
+ case BPF_FUNC_tcp_gen_syncookie:
+ return &bpf_tcp_gen_syncookie_proto;
#endif
default:
return bpf_base_func_proto(func_id);
--
2.22.0.709.g102302147b-goog
^ permalink raw reply related
* [PATCH 3/3 net-next] linux: Remove bvec page_offset, use bv_offset
From: Jonathan Lemon @ 2019-07-29 17:19 UTC (permalink / raw)
To: willy, davem; +Cc: kernel-team, netdev
In-Reply-To: <20190729171941.250569-1-jonathan.lemon@gmail.com>
Now that page_offset is referenced through accessors, remove
the union, and use bv_offset.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
include/linux/bvec.h | 5 +----
include/linux/skbuff.h | 10 +++++-----
2 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 7f2b2ea9399c..a032f01e928c 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -18,10 +18,7 @@
struct bio_vec {
struct page *bv_page;
unsigned int bv_len;
- union {
- __u32 page_offset;
- unsigned int bv_offset;
- };
+ unsigned int bv_offset;
};
struct bvec_iter {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7d94a78067ee..68334e56a697 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2078,7 +2078,7 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
* on page_is_pfmemalloc doing the right thing(tm).
*/
frag->bv_page = page;
- frag->page_offset = off;
+ frag->bv_offset = off;
skb_frag_size_set(frag, size);
page = compound_head(page);
@@ -2863,7 +2863,7 @@ static inline void skb_propagate_pfmemalloc(struct page *page,
*/
static inline unsigned int skb_frag_off(const skb_frag_t *frag)
{
- return frag->page_offset;
+ return frag->bv_offset;
}
/**
@@ -2873,7 +2873,7 @@ static inline unsigned int skb_frag_off(const skb_frag_t *frag)
*/
static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
{
- frag->page_offset += delta;
+ frag->bv_offset += delta;
}
/**
@@ -2883,7 +2883,7 @@ static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
*/
static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset)
{
- frag->page_offset = offset;
+ frag->bv_offset = offset;
}
/**
@@ -2894,7 +2894,7 @@ static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset)
static inline void skb_frag_off_set_from(skb_frag_t *fragto,
const skb_frag_t *fragfrom)
{
- fragto->page_offset = fragfrom->page_offset;
+ fragto->bv_offset = fragfrom->bv_offset;
}
/**
--
2.17.1
^ permalink raw reply related
* [PATCH 0/3 net-next] Finish conversion of skb_frag_t to bio_vec
From: Jonathan Lemon @ 2019-07-29 17:19 UTC (permalink / raw)
To: willy, davem; +Cc: kernel-team, netdev
The recent conversion of skb_frag_t to bio_vec did not include
skb_frag's page_offset. Add accessor functions for this field,
utilize them, and remove the union, restoring the original structure.
Jonathan Lemon (3):
linux: Add page_offset accessors
net: Use skb_frag_off accessors
linux: Remove bvec page_offset, use bv_offset
drivers/atm/eni.c | 2 +-
drivers/hsi/clients/ssi_protocol.c | 2 +-
drivers/infiniband/hw/hfi1/vnic_sdma.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 3 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
.../ethernet/cavium/thunder/nicvf_queues.c | 2 +-
drivers/net/ethernet/chelsio/cxgb3/sge.c | 2 +-
drivers/net/ethernet/emulex/benet/be_main.c | 12 ++--
.../ethernet/freescale/fs_enet/fs_enet-main.c | 2 +-
drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +-
drivers/net/ethernet/jme.c | 4 +-
drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +-
.../net/ethernet/myricom/myri10ge/myri10ge.c | 6 +-
drivers/net/ethernet/sfc/tx.c | 2 +-
drivers/net/ethernet/sun/cassini.c | 8 +--
drivers/net/ethernet/sun/niu.c | 2 +-
drivers/net/ethernet/sun/sunvnet_common.c | 4 +-
drivers/net/ethernet/ti/netcp_core.c | 2 +-
drivers/net/hyperv/netvsc_drv.c | 4 +-
drivers/net/thunderbolt.c | 2 +-
drivers/net/usb/usbnet.c | 2 +-
drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
drivers/net/xen-netback/netback.c | 6 +-
drivers/net/xen-netfront.c | 8 +--
drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 2 +-
drivers/scsi/fcoe/fcoe.c | 3 +-
drivers/scsi/fcoe/fcoe_transport.c | 2 +-
drivers/scsi/qedf/qedf_main.c | 2 +-
.../staging/unisys/visornic/visornic_main.c | 2 +-
drivers/target/iscsi/cxgbit/cxgbit_target.c | 4 +-
include/linux/bvec.h | 5 +-
include/linux/skbuff.h | 63 +++++++++++++++++--
net/appletalk/ddp.c | 4 +-
net/core/datagram.c | 6 +-
net/core/dev.c | 2 +-
net/core/pktgen.c | 2 +-
net/core/skbuff.c | 54 ++++++++--------
net/ipv4/tcp.c | 6 +-
net/ipv4/tcp_output.c | 2 +-
net/kcm/kcmsock.c | 2 +-
net/tls/tls_device.c | 8 +--
net/tls/tls_device_fallback.c | 2 +-
net/xfrm/xfrm_ipcomp.c | 2 +-
46 files changed, 158 insertions(+), 108 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jonathan Lemon @ 2019-07-29 17:19 UTC (permalink / raw)
To: willy, davem; +Cc: kernel-team, netdev
In-Reply-To: <20190729171941.250569-1-jonathan.lemon@gmail.com>
Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
and skb_frag_off_set_from() accessors for page_offset.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
include/linux/skbuff.h | 61 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 56 insertions(+), 5 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 718742b1c505..7d94a78067ee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -331,7 +331,7 @@ static inline void skb_frag_size_set(skb_frag_t *frag, unsigned int size)
}
/**
- * skb_frag_size_add - Incrementes the size of a skb fragment by %delta
+ * skb_frag_size_add - Increments the size of a skb fragment by %delta
* @frag: skb fragment
* @delta: value to add
*/
@@ -2857,6 +2857,46 @@ static inline void skb_propagate_pfmemalloc(struct page *page,
skb->pfmemalloc = true;
}
+/**
+ * skb_frag_off - Returns the offset of a skb fragment
+ * @frag: the paged fragment
+ */
+static inline unsigned int skb_frag_off(const skb_frag_t *frag)
+{
+ return frag->page_offset;
+}
+
+/**
+ * skb_frag_off_add - Increments the offset of a skb fragment by %delta
+ * @frag: skb fragment
+ * @delta: value to add
+ */
+static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
+{
+ frag->page_offset += delta;
+}
+
+/**
+ * skb_frag_off_set - Sets the offset of a skb fragment
+ * @frag: skb fragment
+ * @offset: offset of fragment
+ */
+static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset)
+{
+ frag->page_offset = offset;
+}
+
+/**
+ * skb_frag_off_set_from - Sets the offset of a skb fragment from another fragment
+ * @fragto: skb fragment where offset is set
+ * @fragfrom: skb fragment offset is copied from
+ */
+static inline void skb_frag_off_set_from(skb_frag_t *fragto,
+ const skb_frag_t *fragfrom)
+{
+ fragto->page_offset = fragfrom->page_offset;
+}
+
/**
* skb_frag_page - retrieve the page referred to by a paged fragment
* @frag: the paged fragment
@@ -2923,7 +2963,7 @@ static inline void skb_frag_unref(struct sk_buff *skb, int f)
*/
static inline void *skb_frag_address(const skb_frag_t *frag)
{
- return page_address(skb_frag_page(frag)) + frag->page_offset;
+ return page_address(skb_frag_page(frag)) + skb_frag_off(frag);
}
/**
@@ -2939,7 +2979,18 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
if (unlikely(!ptr))
return NULL;
- return ptr + frag->page_offset;
+ return ptr + skb_frag_off(frag);
+}
+
+/**
+ * skb_frag_page_set_from - sets the page in a fragment from another fragment
+ * @fragto: skb fragment where page is set
+ * @fragfrom: skb fragment page is copied from
+ */
+static inline void skb_frag_page_set_from(skb_frag_t *fragto,
+ const skb_frag_t *fragfrom)
+{
+ fragto->bv_page = fragfrom->bv_page;
}
/**
@@ -2987,7 +3038,7 @@ static inline dma_addr_t skb_frag_dma_map(struct device *dev,
enum dma_data_direction dir)
{
return dma_map_page(dev, skb_frag_page(frag),
- frag->page_offset + offset, size, dir);
+ skb_frag_off(frag) + offset, size, dir);
}
static inline struct sk_buff *pskb_copy(struct sk_buff *skb,
@@ -3157,7 +3208,7 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
const skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1];
return page == skb_frag_page(frag) &&
- off == frag->page_offset + skb_frag_size(frag);
+ off == skb_frag_off(frag) + skb_frag_size(frag);
}
return false;
}
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox