* Re: [PATCH] Prevent reading uninitialized memory with socketfilters
@ 2010-11-10 18:18 Dan Rosenberg
2010-11-10 18:21 ` David Miller
0 siblings, 1 reply; 27+ messages in thread
From: Dan Rosenberg @ 2010-11-10 18:18 UTC (permalink / raw)
To: David Miller; +Cc: netdev, stable, security
The code sample I linked to clearly demonstrates exactly how to accomplish this, if you had bothered to read it. I install a socket filter that loads the desired uninitialized int into the accumulator, performs some bit shifting to get the desired byte, and returns the accumulator. Since the packet size is greater than 256, the packet will be truncated based on the value of that byte, allowing me to determine its value.
There are no restrictions - I can access any byte in the array regardless of its value.
-Dan
------Original Message------
From: David Miller
To: drosenberg@vsecurity.com
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org
Cc: security@kernel.org
Subject: Re: [PATCH] Prevent reading uninitialized memory with socketfilters
Sent: Nov 10, 2010 1:07 PM
From: Dan Rosenberg <drosenberg@vsecurity.com>
Date: Wed, 10 Nov 2010 06:12:47 -0500
>
>>
>> Prove it.
>
> I hope this was a joke.
It absolutely is not.
You are very much not the first person ever to try and add an
expensive memset() here.
So the onus is really on you to prove this assertion and show the
exact code path by which the user can actually see any uninitialized
kernel stack memory (he can't, he can peek at certain values in a
certain extremely contrived range, making the leak useless), rather
than point us at some web external site archive of a list posting
which we cannot easily quote and reply to here.
I think you cannot do it, really. Except in the AF_PACKET case, the
sockets can only see "0" or a negative error code, not the actual
sk_run_filter() return value.
In the one exception, AF_PACKET, the range of values the user can
see are in the range of MTU of the device being accessed, which
realistically is 1500 bytes. This means the user cannot see any
kernel stack value outside of the range 0 to 1500, which isn't
worth using this expensive memset to guard against at all.
I don't even think it's worth adding all of the extra cpu cycles
incurred by Eric Dumazet's scheme of using a bitmap test on every
single memory buffer access.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 18:18 [PATCH] Prevent reading uninitialized memory with socketfilters Dan Rosenberg @ 2010-11-10 18:21 ` David Miller 2010-11-10 18:33 ` Eric Dumazet 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2010-11-10 18:21 UTC (permalink / raw) To: drosenberg; +Cc: netdev, stable, security From: "Dan Rosenberg" <drosenberg@vsecurity.com> Date: Wed, 10 Nov 2010 18:18:08 +0000 > The code sample I linked to clearly demonstrates exactly how to > accomplish this, if you had bothered to read it. I told you why I didn't read it, if you had bothered to read my reply properly :-) Anyways, I realize we have to do something, but memset() is going to completely kill performance. I consider Eric's suggestion the closest to acceptable cost at this point but even that is hard to digest for me. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 18:21 ` David Miller @ 2010-11-10 18:33 ` Eric Dumazet 2010-11-10 18:38 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Eric Dumazet @ 2010-11-10 18:33 UTC (permalink / raw) To: David Miller; +Cc: drosenberg, netdev, stable, security Le mercredi 10 novembre 2010 à 10:21 -0800, David Miller a écrit : > From: "Dan Rosenberg" <drosenberg@vsecurity.com> > Date: Wed, 10 Nov 2010 18:18:08 +0000 > > > The code sample I linked to clearly demonstrates exactly how to > > accomplish this, if you had bothered to read it. > > I told you why I didn't read it, if you had bothered to read my > reply properly :-) > > Anyways, I realize we have to do something, but memset() is going > to completely kill performance. I consider Eric's suggestion the > closest to acceptable cost at this point but even that is hard > to digest for me. Most filters dont use mem[] at all, so the added cost seems OK to me, but we can work to use a compile time check, to make memset(mem, 0, length) a filter parameter if you prefer removing the test on each load(mem[K]). This memset() could be avoided if the compiler() can be sure all load(mem[K]) follow a prior store(mem[K]) Its not a five minutes patch, I tried to work on it but it was a bit hard, for a very remote security risk. (On x86 platform, incoming packets are handled in SOFTIRQ stack, not the kernel stack of current thread anyway) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 18:33 ` Eric Dumazet @ 2010-11-10 18:38 ` David Miller 2010-11-16 13:08 ` [PATCH] filter: Optimize instruction revalidation code Tetsuo Handa 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2010-11-10 18:38 UTC (permalink / raw) To: eric.dumazet; +Cc: drosenberg, netdev, stable, security From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 10 Nov 2010 19:33:44 +0100 > Most filters dont use mem[] at all, so the added cost seems OK to me, > but we can work to use a compile time check, to make memset(mem, 0, > length) a filter parameter if you prefer removing the test on each > load(mem[K]). > > This memset() could be avoided if the compiler() can be sure all > load(mem[K]) follow a prior store(mem[K]) > > Its not a five minutes patch, I tried to work on it but it was a bit > hard, for a very remote security risk. > > (On x86 platform, incoming packets are handled in SOFTIRQ stack, not the > kernel stack of current thread anyway) Understood, I'm going to apply your patch for now with the "const filter pointer" and "u32 f_k = fentry->k;" changes. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] filter: Optimize instruction revalidation code. 2010-11-10 18:38 ` David Miller @ 2010-11-16 13:08 ` Tetsuo Handa 2010-11-16 13:11 ` Michael Tokarev ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Tetsuo Handa @ 2010-11-16 13:08 UTC (permalink / raw) To: davem, eric.dumazet; +Cc: drosenberg, netdev In the wake of commit 57fe93b3 "filter: make sure filters dont read uninitialized memory", I came up with this patch. I don't know how to test. Please review and do tests. This patch optimizes sk_chk_filter(). Using gcc 4.1.2 on x86_32, the size of net/core/filter.o changed from 7416 bytes to 5260 bytes ("make allnoconfig" + CONFIG_NET=y). By the way, if "struct sock_filter *filter" argument of sk_run_filter() does not change after it was revalidated at sk_chk_filter(), can't we detect and reject "BPF_S_LD_MEM/BPF_S_LDX_MEM before BPF_S_ST/BPF_S_STX" at sk_chk_filter()? ---------------------------------------- From 2bf36130bd4912590b409b47a6a9a82e2884e035 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue, 16 Nov 2010 21:39:50 +0900 Subject: [PATCH] filter: Optimize instruction revalidation code. Since repeating small value to small value conversion using switch() clause's case statement is wasteful, this patch instroduces u16 to u16 mapping table and removes most of case statements. As a result, the size of net/core/filter.o is reduced by about 30% on x86. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- net/core/filter.c | 214 ++++++++++++++++------------------------------------- 1 files changed, 65 insertions(+), 149 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 23e9b2a..85be3d8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); */ int sk_chk_filter(struct sock_filter *filter, int flen) { - struct sock_filter *ftest; + /* + * Valid instructions are initialized to non-0. + * Invalid instructions are initialized to 0. + */ + static u16 codes[] = { + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, + [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, + [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, + [BPF_ALU|BPF_SUB|BPF_X] = BPF_S_ALU_SUB_X + 1, + [BPF_ALU|BPF_MUL|BPF_K] = BPF_S_ALU_MUL_K + 1, + [BPF_ALU|BPF_MUL|BPF_X] = BPF_S_ALU_MUL_X + 1, + [BPF_ALU|BPF_DIV|BPF_X] = BPF_S_ALU_DIV_X + 1, + [BPF_ALU|BPF_AND|BPF_K] = BPF_S_ALU_AND_K + 1, + [BPF_ALU|BPF_AND|BPF_X] = BPF_S_ALU_AND_X + 1, + [BPF_ALU|BPF_OR|BPF_K] = BPF_S_ALU_OR_K + 1, + [BPF_ALU|BPF_OR|BPF_X] = BPF_S_ALU_OR_X + 1, + [BPF_ALU|BPF_LSH|BPF_K] = BPF_S_ALU_LSH_K + 1, + [BPF_ALU|BPF_LSH|BPF_X] = BPF_S_ALU_LSH_X + 1, + [BPF_ALU|BPF_RSH|BPF_K] = BPF_S_ALU_RSH_K + 1, + [BPF_ALU|BPF_RSH|BPF_X] = BPF_S_ALU_RSH_X + 1, + [BPF_ALU|BPF_NEG] = BPF_S_ALU_NEG + 1, + [BPF_LD|BPF_W|BPF_ABS] = BPF_S_LD_W_ABS + 1, + [BPF_LD|BPF_H|BPF_ABS] = BPF_S_LD_H_ABS + 1, + [BPF_LD|BPF_B|BPF_ABS] = BPF_S_LD_B_ABS + 1, + [BPF_LD|BPF_W|BPF_LEN] = BPF_S_LD_W_LEN + 1, + [BPF_LD|BPF_W|BPF_IND] = BPF_S_LD_W_IND + 1, + [BPF_LD|BPF_H|BPF_IND] = BPF_S_LD_H_IND + 1, + [BPF_LD|BPF_B|BPF_IND] = BPF_S_LD_B_IND + 1, + [BPF_LD|BPF_IMM] = BPF_S_LD_IMM + 1, + [BPF_LDX|BPF_W|BPF_LEN] = BPF_S_LDX_W_LEN + 1, + [BPF_LDX|BPF_B|BPF_MSH] = BPF_S_LDX_B_MSH + 1, + [BPF_LDX|BPF_IMM] = BPF_S_LDX_IMM + 1, + [BPF_MISC|BPF_TAX] = BPF_S_MISC_TAX + 1, + [BPF_MISC|BPF_TXA] = BPF_S_MISC_TXA + 1, + [BPF_RET|BPF_K] = BPF_S_RET_K + 1, + [BPF_RET|BPF_A] = BPF_S_RET_A + 1, + [BPF_ALU|BPF_DIV|BPF_K] = BPF_S_ALU_DIV_K + 1, + [BPF_LD|BPF_MEM] = BPF_S_LD_MEM + 1, + [BPF_LDX|BPF_MEM] = BPF_S_LDX_MEM + 1, + [BPF_ST] = BPF_S_ST + 1, + [BPF_STX] = BPF_S_STX + 1, + [BPF_JMP|BPF_JA] = BPF_S_JMP_JA + 1, + [BPF_JMP|BPF_JEQ|BPF_K] = BPF_S_JMP_JEQ_K + 1, + [BPF_JMP|BPF_JEQ|BPF_X] = BPF_S_JMP_JEQ_X + 1, + [BPF_JMP|BPF_JGE|BPF_K] = BPF_S_JMP_JGE_K + 1, + [BPF_JMP|BPF_JGE|BPF_X] = BPF_S_JMP_JGE_X + 1, + [BPF_JMP|BPF_JGT|BPF_K] = BPF_S_JMP_JGT_K + 1, + [BPF_JMP|BPF_JGT|BPF_X] = BPF_S_JMP_JGT_X + 1, + [BPF_JMP|BPF_JSET|BPF_K] = BPF_S_JMP_JSET_K + 1, + [BPF_JMP|BPF_JSET|BPF_X] = BPF_S_JMP_JSET_X + 1, + }; int pc; if (flen == 0 || flen > BPF_MAXINSNS) @@ -391,135 +441,26 @@ int sk_chk_filter(struct sock_filter *filter, int flen) /* check the filter code now */ for (pc = 0; pc < flen; pc++) { - ftest = &filter[pc]; - - /* Only allow valid instructions */ - switch (ftest->code) { - case BPF_ALU|BPF_ADD|BPF_K: - ftest->code = BPF_S_ALU_ADD_K; - break; - case BPF_ALU|BPF_ADD|BPF_X: - ftest->code = BPF_S_ALU_ADD_X; - break; - case BPF_ALU|BPF_SUB|BPF_K: - ftest->code = BPF_S_ALU_SUB_K; - break; - case BPF_ALU|BPF_SUB|BPF_X: - ftest->code = BPF_S_ALU_SUB_X; - break; - case BPF_ALU|BPF_MUL|BPF_K: - ftest->code = BPF_S_ALU_MUL_K; - break; - case BPF_ALU|BPF_MUL|BPF_X: - ftest->code = BPF_S_ALU_MUL_X; - break; - case BPF_ALU|BPF_DIV|BPF_X: - ftest->code = BPF_S_ALU_DIV_X; - break; - case BPF_ALU|BPF_AND|BPF_K: - ftest->code = BPF_S_ALU_AND_K; - break; - case BPF_ALU|BPF_AND|BPF_X: - ftest->code = BPF_S_ALU_AND_X; - break; - case BPF_ALU|BPF_OR|BPF_K: - ftest->code = BPF_S_ALU_OR_K; - break; - case BPF_ALU|BPF_OR|BPF_X: - ftest->code = BPF_S_ALU_OR_X; - break; - case BPF_ALU|BPF_LSH|BPF_K: - ftest->code = BPF_S_ALU_LSH_K; - break; - case BPF_ALU|BPF_LSH|BPF_X: - ftest->code = BPF_S_ALU_LSH_X; - break; - case BPF_ALU|BPF_RSH|BPF_K: - ftest->code = BPF_S_ALU_RSH_K; - break; - case BPF_ALU|BPF_RSH|BPF_X: - ftest->code = BPF_S_ALU_RSH_X; - break; - case BPF_ALU|BPF_NEG: - ftest->code = BPF_S_ALU_NEG; - break; - case BPF_LD|BPF_W|BPF_ABS: - ftest->code = BPF_S_LD_W_ABS; - break; - case BPF_LD|BPF_H|BPF_ABS: - ftest->code = BPF_S_LD_H_ABS; - break; - case BPF_LD|BPF_B|BPF_ABS: - ftest->code = BPF_S_LD_B_ABS; - break; - case BPF_LD|BPF_W|BPF_LEN: - ftest->code = BPF_S_LD_W_LEN; - break; - case BPF_LD|BPF_W|BPF_IND: - ftest->code = BPF_S_LD_W_IND; - break; - case BPF_LD|BPF_H|BPF_IND: - ftest->code = BPF_S_LD_H_IND; - break; - case BPF_LD|BPF_B|BPF_IND: - ftest->code = BPF_S_LD_B_IND; - break; - case BPF_LD|BPF_IMM: - ftest->code = BPF_S_LD_IMM; - break; - case BPF_LDX|BPF_W|BPF_LEN: - ftest->code = BPF_S_LDX_W_LEN; - break; - case BPF_LDX|BPF_B|BPF_MSH: - ftest->code = BPF_S_LDX_B_MSH; - break; - case BPF_LDX|BPF_IMM: - ftest->code = BPF_S_LDX_IMM; - break; - case BPF_MISC|BPF_TAX: - ftest->code = BPF_S_MISC_TAX; - break; - case BPF_MISC|BPF_TXA: - ftest->code = BPF_S_MISC_TXA; - break; - case BPF_RET|BPF_K: - ftest->code = BPF_S_RET_K; - break; - case BPF_RET|BPF_A: - ftest->code = BPF_S_RET_A; - break; + struct sock_filter *ftest = &filter[pc]; + u16 code = ftest->code; + if (code >= ARRAY_SIZE(codes)) + return 0; /* Some instructions need special checks */ - - /* check for division by zero */ + switch (code) { case BPF_ALU|BPF_DIV|BPF_K: + /* check for division by zero */ if (ftest->k == 0) return -EINVAL; - ftest->code = BPF_S_ALU_DIV_K; break; - - /* check for invalid memory addresses */ case BPF_LD|BPF_MEM: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_LD_MEM; - break; case BPF_LDX|BPF_MEM: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_LDX_MEM; - break; case BPF_ST: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_ST; - break; case BPF_STX: + /* check for invalid memory addresses */ if (ftest->k >= BPF_MEMWORDS) return -EINVAL; - ftest->code = BPF_S_STX; break; - case BPF_JMP|BPF_JA: /* * Note, the large ftest->k might cause loops. @@ -528,40 +469,14 @@ int sk_chk_filter(struct sock_filter *filter, int flen) */ if (ftest->k >= (unsigned)(flen-pc-1)) return -EINVAL; - ftest->code = BPF_S_JMP_JA; - break; - - case BPF_JMP|BPF_JEQ|BPF_K: - ftest->code = BPF_S_JMP_JEQ_K; - break; - case BPF_JMP|BPF_JEQ|BPF_X: - ftest->code = BPF_S_JMP_JEQ_X; - break; - case BPF_JMP|BPF_JGE|BPF_K: - ftest->code = BPF_S_JMP_JGE_K; - break; - case BPF_JMP|BPF_JGE|BPF_X: - ftest->code = BPF_S_JMP_JGE_X; - break; - case BPF_JMP|BPF_JGT|BPF_K: - ftest->code = BPF_S_JMP_JGT_K; - break; - case BPF_JMP|BPF_JGT|BPF_X: - ftest->code = BPF_S_JMP_JGT_X; break; - case BPF_JMP|BPF_JSET|BPF_K: - ftest->code = BPF_S_JMP_JSET_K; - break; - case BPF_JMP|BPF_JSET|BPF_X: - ftest->code = BPF_S_JMP_JSET_X; - break; - - default: - return -EINVAL; } - - /* for conditionals both must be safe */ - switch (ftest->code) { + code = codes[code]; + if (!code--) + return -EINVAL; + + /* for conditionals both must be safe */ + switch (code) { case BPF_S_JMP_JEQ_K: case BPF_S_JMP_JEQ_X: case BPF_S_JMP_JGE_K: @@ -574,6 +489,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen) pc + ftest->jf + 1 >= flen) return -EINVAL; } + ftest->code = code; } /* last instruction must be a RET code */ -- 1.6.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] filter: Optimize instruction revalidation code. 2010-11-16 13:08 ` [PATCH] filter: Optimize instruction revalidation code Tetsuo Handa @ 2010-11-16 13:11 ` Michael Tokarev 2010-11-16 13:44 ` Eric Dumazet ` (2 subsequent siblings) 3 siblings, 0 replies; 27+ messages in thread From: Michael Tokarev @ 2010-11-16 13:11 UTC (permalink / raw) To: Tetsuo Handa; +Cc: davem, eric.dumazet, drosenberg, netdev 16.11.2010 16:08, Tetsuo Handa wrote: [] > net/core/filter.c | 214 ++++++++++++++++------------------------------------- > 1 files changed, 65 insertions(+), 149 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 23e9b2a..85be3d8 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); > */ > int sk_chk_filter(struct sock_filter *filter, int flen) > { > - struct sock_filter *ftest; > + /* > + * Valid instructions are initialized to non-0. > + * Invalid instructions are initialized to 0. > + */ > + static u16 codes[] = { > + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, How about using some "const" there? :) /mjt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] filter: Optimize instruction revalidation code. 2010-11-16 13:08 ` [PATCH] filter: Optimize instruction revalidation code Tetsuo Handa 2010-11-16 13:11 ` Michael Tokarev @ 2010-11-16 13:44 ` Eric Dumazet 2010-11-16 14:31 ` [PATCH v2] " Tetsuo Handa 2010-11-16 22:13 ` [PATCH] " Hagen Paul Pfeifer 2010-11-16 23:24 ` Changli Gao 3 siblings, 1 reply; 27+ messages in thread From: Eric Dumazet @ 2010-11-16 13:44 UTC (permalink / raw) To: Tetsuo Handa; +Cc: davem, drosenberg, netdev Le mardi 16 novembre 2010 à 22:08 +0900, Tetsuo Handa a écrit : > In the wake of commit 57fe93b3 "filter: make sure filters dont read > uninitialized memory", I came up with this patch. I don't know how to test. > Please review and do tests. > > This patch optimizes sk_chk_filter(). Using gcc 4.1.2 on x86_32, the size of > net/core/filter.o changed from 7416 bytes to 5260 bytes > ("make allnoconfig" + CONFIG_NET=y). > > By the way, if "struct sock_filter *filter" argument of sk_run_filter() does > not change after it was revalidated at sk_chk_filter(), can't we detect and > reject "BPF_S_LD_MEM/BPF_S_LDX_MEM before BPF_S_ST/BPF_S_STX" at > sk_chk_filter()? Sure, this was the idea, but seems complex because of branches. Patches are welcomed ;) > ---------------------------------------- > From 2bf36130bd4912590b409b47a6a9a82e2884e035 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Tue, 16 Nov 2010 21:39:50 +0900 > Subject: [PATCH] filter: Optimize instruction revalidation code. > > Since repeating small value to small value conversion using switch() clause's > case statement is wasteful, this patch instroduces u16 to u16 mapping table > and removes most of case statements. As a result, the size of net/core/filter.o > is reduced by about 30% on x86. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- Patch seems fine to me, with the 'const' codes[] Michael Tokarev already spotted. Thanks ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2] filter: Optimize instruction revalidation code. 2010-11-16 13:44 ` Eric Dumazet @ 2010-11-16 14:31 ` Tetsuo Handa 2010-11-16 16:30 ` Eric Dumazet 0 siblings, 1 reply; 27+ messages in thread From: Tetsuo Handa @ 2010-11-16 14:31 UTC (permalink / raw) To: eric.dumazet, mjt; +Cc: davem, drosenberg, netdev Eric Dumazet wrote: > Patch seems fine to me, with the 'const' codes[] Michael Tokarev already > spotted. Sorry, I forgot to evaluate default: return -EINVAL; } case. If caller passed ftest->code == BPF_S_ALU_DIV_K (translated value) rather than ftest->code == BPF_ALU|BPF_DIV|BPF_K (original value), the check case BPF_ALU|BPF_DIV|BPF_K: if (ftest->k == 0) return -EINVAL; ftest->code = BPF_S_ALU_DIV_K; break; is bypassed. The problem is that original value and translated value overwraps. Can we change translated value in order to guarantee that these values never overwraps? ---------------------------------------- From 7b6a7b784fa096383aadc86d32bff6b8329a2e66 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue, 16 Nov 2010 22:37:45 +0900 Subject: [PATCH v2] filter: optimize instruction revalidation code. Since repeating small value to small value conversion using switch() clause's case statement is wasteful, this patch instroduces u16 to u16 mapping table and removes most of case statements. As a result, the size of net/core/filter.o is reduced by about 27% on x86. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- net/core/filter.c | 223 ++++++++++++++++------------------------------------- 1 files changed, 68 insertions(+), 155 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 23e9b2a..ef1d226 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); */ int sk_chk_filter(struct sock_filter *filter, int flen) { - struct sock_filter *ftest; + /* + * Valid instructions are initialized to non-0. + * Invalid instructions are initialized to 0. + */ + static const u16 codes[] = { + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, + [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, + [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, + [BPF_ALU|BPF_SUB|BPF_X] = BPF_S_ALU_SUB_X + 1, + [BPF_ALU|BPF_MUL|BPF_K] = BPF_S_ALU_MUL_K + 1, + [BPF_ALU|BPF_MUL|BPF_X] = BPF_S_ALU_MUL_X + 1, + [BPF_ALU|BPF_DIV|BPF_X] = BPF_S_ALU_DIV_X + 1, + [BPF_ALU|BPF_AND|BPF_K] = BPF_S_ALU_AND_K + 1, + [BPF_ALU|BPF_AND|BPF_X] = BPF_S_ALU_AND_X + 1, + [BPF_ALU|BPF_OR|BPF_K] = BPF_S_ALU_OR_K + 1, + [BPF_ALU|BPF_OR|BPF_X] = BPF_S_ALU_OR_X + 1, + [BPF_ALU|BPF_LSH|BPF_K] = BPF_S_ALU_LSH_K + 1, + [BPF_ALU|BPF_LSH|BPF_X] = BPF_S_ALU_LSH_X + 1, + [BPF_ALU|BPF_RSH|BPF_K] = BPF_S_ALU_RSH_K + 1, + [BPF_ALU|BPF_RSH|BPF_X] = BPF_S_ALU_RSH_X + 1, + [BPF_ALU|BPF_NEG] = BPF_S_ALU_NEG + 1, + [BPF_LD|BPF_W|BPF_ABS] = BPF_S_LD_W_ABS + 1, + [BPF_LD|BPF_H|BPF_ABS] = BPF_S_LD_H_ABS + 1, + [BPF_LD|BPF_B|BPF_ABS] = BPF_S_LD_B_ABS + 1, + [BPF_LD|BPF_W|BPF_LEN] = BPF_S_LD_W_LEN + 1, + [BPF_LD|BPF_W|BPF_IND] = BPF_S_LD_W_IND + 1, + [BPF_LD|BPF_H|BPF_IND] = BPF_S_LD_H_IND + 1, + [BPF_LD|BPF_B|BPF_IND] = BPF_S_LD_B_IND + 1, + [BPF_LD|BPF_IMM] = BPF_S_LD_IMM + 1, + [BPF_LDX|BPF_W|BPF_LEN] = BPF_S_LDX_W_LEN + 1, + [BPF_LDX|BPF_B|BPF_MSH] = BPF_S_LDX_B_MSH + 1, + [BPF_LDX|BPF_IMM] = BPF_S_LDX_IMM + 1, + [BPF_MISC|BPF_TAX] = BPF_S_MISC_TAX + 1, + [BPF_MISC|BPF_TXA] = BPF_S_MISC_TXA + 1, + [BPF_RET|BPF_K] = BPF_S_RET_K + 1, + [BPF_RET|BPF_A] = BPF_S_RET_A + 1, + [BPF_ALU|BPF_DIV|BPF_K] = BPF_S_ALU_DIV_K + 1, + [BPF_LD|BPF_MEM] = BPF_S_LD_MEM + 1, + [BPF_LDX|BPF_MEM] = BPF_S_LDX_MEM + 1, + [BPF_ST] = BPF_S_ST + 1, + [BPF_STX] = BPF_S_STX + 1, + [BPF_JMP|BPF_JA] = BPF_S_JMP_JA + 1, + [BPF_JMP|BPF_JEQ|BPF_K] = BPF_S_JMP_JEQ_K + 1, + [BPF_JMP|BPF_JEQ|BPF_X] = BPF_S_JMP_JEQ_X + 1, + [BPF_JMP|BPF_JGE|BPF_K] = BPF_S_JMP_JGE_K + 1, + [BPF_JMP|BPF_JGE|BPF_X] = BPF_S_JMP_JGE_X + 1, + [BPF_JMP|BPF_JGT|BPF_K] = BPF_S_JMP_JGT_K + 1, + [BPF_JMP|BPF_JGT|BPF_X] = BPF_S_JMP_JGT_X + 1, + [BPF_JMP|BPF_JSET|BPF_K] = BPF_S_JMP_JSET_K + 1, + [BPF_JMP|BPF_JSET|BPF_X] = BPF_S_JMP_JSET_X + 1, + }; int pc; if (flen == 0 || flen > BPF_MAXINSNS) @@ -391,136 +441,30 @@ int sk_chk_filter(struct sock_filter *filter, int flen) /* check the filter code now */ for (pc = 0; pc < flen; pc++) { - ftest = &filter[pc]; - - /* Only allow valid instructions */ - switch (ftest->code) { - case BPF_ALU|BPF_ADD|BPF_K: - ftest->code = BPF_S_ALU_ADD_K; - break; - case BPF_ALU|BPF_ADD|BPF_X: - ftest->code = BPF_S_ALU_ADD_X; - break; - case BPF_ALU|BPF_SUB|BPF_K: - ftest->code = BPF_S_ALU_SUB_K; - break; - case BPF_ALU|BPF_SUB|BPF_X: - ftest->code = BPF_S_ALU_SUB_X; - break; - case BPF_ALU|BPF_MUL|BPF_K: - ftest->code = BPF_S_ALU_MUL_K; - break; - case BPF_ALU|BPF_MUL|BPF_X: - ftest->code = BPF_S_ALU_MUL_X; - break; - case BPF_ALU|BPF_DIV|BPF_X: - ftest->code = BPF_S_ALU_DIV_X; - break; - case BPF_ALU|BPF_AND|BPF_K: - ftest->code = BPF_S_ALU_AND_K; - break; - case BPF_ALU|BPF_AND|BPF_X: - ftest->code = BPF_S_ALU_AND_X; - break; - case BPF_ALU|BPF_OR|BPF_K: - ftest->code = BPF_S_ALU_OR_K; - break; - case BPF_ALU|BPF_OR|BPF_X: - ftest->code = BPF_S_ALU_OR_X; - break; - case BPF_ALU|BPF_LSH|BPF_K: - ftest->code = BPF_S_ALU_LSH_K; - break; - case BPF_ALU|BPF_LSH|BPF_X: - ftest->code = BPF_S_ALU_LSH_X; - break; - case BPF_ALU|BPF_RSH|BPF_K: - ftest->code = BPF_S_ALU_RSH_K; - break; - case BPF_ALU|BPF_RSH|BPF_X: - ftest->code = BPF_S_ALU_RSH_X; - break; - case BPF_ALU|BPF_NEG: - ftest->code = BPF_S_ALU_NEG; - break; - case BPF_LD|BPF_W|BPF_ABS: - ftest->code = BPF_S_LD_W_ABS; - break; - case BPF_LD|BPF_H|BPF_ABS: - ftest->code = BPF_S_LD_H_ABS; - break; - case BPF_LD|BPF_B|BPF_ABS: - ftest->code = BPF_S_LD_B_ABS; - break; - case BPF_LD|BPF_W|BPF_LEN: - ftest->code = BPF_S_LD_W_LEN; - break; - case BPF_LD|BPF_W|BPF_IND: - ftest->code = BPF_S_LD_W_IND; - break; - case BPF_LD|BPF_H|BPF_IND: - ftest->code = BPF_S_LD_H_IND; - break; - case BPF_LD|BPF_B|BPF_IND: - ftest->code = BPF_S_LD_B_IND; - break; - case BPF_LD|BPF_IMM: - ftest->code = BPF_S_LD_IMM; - break; - case BPF_LDX|BPF_W|BPF_LEN: - ftest->code = BPF_S_LDX_W_LEN; - break; - case BPF_LDX|BPF_B|BPF_MSH: - ftest->code = BPF_S_LDX_B_MSH; - break; - case BPF_LDX|BPF_IMM: - ftest->code = BPF_S_LDX_IMM; - break; - case BPF_MISC|BPF_TAX: - ftest->code = BPF_S_MISC_TAX; - break; - case BPF_MISC|BPF_TXA: - ftest->code = BPF_S_MISC_TXA; - break; - case BPF_RET|BPF_K: - ftest->code = BPF_S_RET_K; - break; - case BPF_RET|BPF_A: - ftest->code = BPF_S_RET_A; - break; + struct sock_filter *ftest = &filter[pc]; + u16 code = ftest->code; + if (code >= ARRAY_SIZE(codes)) + return -EINVAL; + code = codes[code]; + if (!code--) + return -EINVAL; /* Some instructions need special checks */ - + switch (code) { + case BPF_S_ALU_DIV_K: /* check for division by zero */ - case BPF_ALU|BPF_DIV|BPF_K: if (ftest->k == 0) return -EINVAL; - ftest->code = BPF_S_ALU_DIV_K; - break; - - /* check for invalid memory addresses */ - case BPF_LD|BPF_MEM: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_LD_MEM; - break; - case BPF_LDX|BPF_MEM: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_LDX_MEM; break; - case BPF_ST: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_ST; - break; - case BPF_STX: + case BPF_S_LD_MEM: + case BPF_S_LDX_MEM: + case BPF_S_ST: + case BPF_S_STX: + /* check for invalid memory addresses */ if (ftest->k >= BPF_MEMWORDS) return -EINVAL; - ftest->code = BPF_S_STX; break; - - case BPF_JMP|BPF_JA: + case BPF_S_JMP_JA: /* * Note, the large ftest->k might cause loops. * Compare this with conditional jumps below, @@ -528,40 +472,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen) */ if (ftest->k >= (unsigned)(flen-pc-1)) return -EINVAL; - ftest->code = BPF_S_JMP_JA; - break; - - case BPF_JMP|BPF_JEQ|BPF_K: - ftest->code = BPF_S_JMP_JEQ_K; - break; - case BPF_JMP|BPF_JEQ|BPF_X: - ftest->code = BPF_S_JMP_JEQ_X; - break; - case BPF_JMP|BPF_JGE|BPF_K: - ftest->code = BPF_S_JMP_JGE_K; - break; - case BPF_JMP|BPF_JGE|BPF_X: - ftest->code = BPF_S_JMP_JGE_X; - break; - case BPF_JMP|BPF_JGT|BPF_K: - ftest->code = BPF_S_JMP_JGT_K; - break; - case BPF_JMP|BPF_JGT|BPF_X: - ftest->code = BPF_S_JMP_JGT_X; - break; - case BPF_JMP|BPF_JSET|BPF_K: - ftest->code = BPF_S_JMP_JSET_K; break; - case BPF_JMP|BPF_JSET|BPF_X: - ftest->code = BPF_S_JMP_JSET_X; - break; - - default: - return -EINVAL; - } - - /* for conditionals both must be safe */ - switch (ftest->code) { case BPF_S_JMP_JEQ_K: case BPF_S_JMP_JEQ_X: case BPF_S_JMP_JGE_K: @@ -570,10 +481,12 @@ int sk_chk_filter(struct sock_filter *filter, int flen) case BPF_S_JMP_JGT_X: case BPF_S_JMP_JSET_X: case BPF_S_JMP_JSET_K: + /* for conditionals both must be safe */ if (pc + ftest->jt + 1 >= flen || pc + ftest->jf + 1 >= flen) return -EINVAL; } + ftest->code = code; } /* last instruction must be a RET code */ -- 1.6.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2] filter: Optimize instruction revalidation code. 2010-11-16 14:31 ` [PATCH v2] " Tetsuo Handa @ 2010-11-16 16:30 ` Eric Dumazet 2010-11-17 1:19 ` [PATCH v3] " Tetsuo Handa 0 siblings, 1 reply; 27+ messages in thread From: Eric Dumazet @ 2010-11-16 16:30 UTC (permalink / raw) To: Tetsuo Handa; +Cc: mjt, davem, drosenberg, netdev Le mardi 16 novembre 2010 à 23:31 +0900, Tetsuo Handa a écrit : > Eric Dumazet wrote: > > Patch seems fine to me, with the 'const' codes[] Michael Tokarev already > > spotted. > > Sorry, I forgot to evaluate > > default: > return -EINVAL; > } > > case. If caller passed ftest->code == BPF_S_ALU_DIV_K (translated value) > rather than ftest->code == BPF_ALU|BPF_DIV|BPF_K (original value), the check > > case BPF_ALU|BPF_DIV|BPF_K: > if (ftest->k == 0) > return -EINVAL; > ftest->code = BPF_S_ALU_DIV_K; > break; > > is bypassed. The problem is that original value and translated value overwraps. > Can we change translated value in order to guarantee that these values never > overwraps? I dont understand the problem... Once translated, you have to test the translated code, not the original one ;) > ---------------------------------------- > From 7b6a7b784fa096383aadc86d32bff6b8329a2e66 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Tue, 16 Nov 2010 22:37:45 +0900 > Subject: [PATCH v2] filter: optimize instruction revalidation code. > > Since repeating small value to small value conversion using switch() clause's > case statement is wasteful, this patch instroduces u16 to u16 mapping table > and removes most of case statements. As a result, the size of net/core/filter.o > is reduced by about 27% on x86. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- > net/core/filter.c | 223 ++++++++++++++++------------------------------------- > 1 files changed, 68 insertions(+), 155 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 23e9b2a..ef1d226 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); > */ > int sk_chk_filter(struct sock_filter *filter, int flen) > { > - struct sock_filter *ftest; > + /* > + * Valid instructions are initialized to non-0. > + * Invalid instructions are initialized to 0. > + */ > + static const u16 codes[] = { why u16 ? You store translated instructions, so u8 is OK > + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, > + [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, > + [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, > + [BPF_ALU|BPF_SUB|BPF_X] = BPF_S_ALU_SUB_X + 1, > + [BPF_ALU|BPF_MUL|BPF_K] = BPF_S_ALU_MUL_K + 1, Also fix the indentation at the end of sk_chk_filter() You have 3 extra tabulations : /* last instruction must be a RET code */ switch (filter[flen - 1].code) { case BPF_S_RET_K: case BPF_S_RET_A: return 0; break; !here! default: !here! return -EINVAL; !here! } } EXPORT_SYMBOL(sk_chk_filter); ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-16 16:30 ` Eric Dumazet @ 2010-11-17 1:19 ` Tetsuo Handa 2010-11-17 7:48 ` Eric Dumazet 2010-11-18 18:58 ` David Miller 0 siblings, 2 replies; 27+ messages in thread From: Tetsuo Handa @ 2010-11-17 1:19 UTC (permalink / raw) To: mjt, davem, drosenberg, hagen, xiaosuo; +Cc: netdev Eric Dumazet wrote: > I dont understand the problem... > Once translated, you have to test the translated code, not the original > one ;) I moved the test to after translation. > why u16 ? > > You store translated instructions, so u8 is OK I chose u16 because type of filter->code is __u16. But I changed to use u8 as you suggested that translated code fits in u8. > Also fix the indentation at the end of sk_chk_filter() > > You have 3 extra tabulations : Fixed. Hagen Paul Pfeifer wrote: > Maybe I don't get it, but you increment the opcode by one, but you never > increment the opcode in sk_run_filter() - do I miss something? Did you test > the your patch (a trivial tcpdump rule should be sufficient)? I added a comment line. Changli Gao wrote: > > + struct sock_filter *ftest = &filter[pc]; > > Why move the define here? To suppress compiler's warning about mixed declaration. > > + u16 code = ftest->code; > > > > + if (code >= ARRAY_SIZE(codes)) > > + return 0; > > return -EINVAL; Fixed in v2. Thanks. > But how about this: > > enum { > BPF_S_RET_K = 1, If BPF_S_* are only for kernel internal use, I think we don't need to translate from the beginning because only net/core/filter.c uses BPF_S_*. BPF_S_* are exposed to userspace via /usr/include/linux/filter.h since 2.6.36. Is it no problem to change? Filesize change (x86_32) by this patch: gcc 3.3.5: 7184 -> 5060 gcc 4.4.3: 7972 -> 5588 ---------------------------------------- >From b8777ab64bc31dbbe499eb62c2ffd29add7e79c8 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 17 Nov 2010 09:46:33 +0900 Subject: [PATCH v3] filter: Optimize instruction revalidation code. Since repeating u16 value to u8 value conversion using switch() clause's case statement is wasteful, this patch introduces u16 to u8 mapping table and removes most of case statements. As a result, the size of net/core/filter.o is reduced by about 29% on x86. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- net/core/filter.c | 231 +++++++++++++++++------------------------------------ 1 files changed, 72 insertions(+), 159 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 23e9b2a..03dc071 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); */ int sk_chk_filter(struct sock_filter *filter, int flen) { - struct sock_filter *ftest; + /* + * Valid instructions are initialized to non-0. + * Invalid instructions are initialized to 0. + */ + static const u8 codes[] = { + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, + [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, + [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, + [BPF_ALU|BPF_SUB|BPF_X] = BPF_S_ALU_SUB_X + 1, + [BPF_ALU|BPF_MUL|BPF_K] = BPF_S_ALU_MUL_K + 1, + [BPF_ALU|BPF_MUL|BPF_X] = BPF_S_ALU_MUL_X + 1, + [BPF_ALU|BPF_DIV|BPF_X] = BPF_S_ALU_DIV_X + 1, + [BPF_ALU|BPF_AND|BPF_K] = BPF_S_ALU_AND_K + 1, + [BPF_ALU|BPF_AND|BPF_X] = BPF_S_ALU_AND_X + 1, + [BPF_ALU|BPF_OR|BPF_K] = BPF_S_ALU_OR_K + 1, + [BPF_ALU|BPF_OR|BPF_X] = BPF_S_ALU_OR_X + 1, + [BPF_ALU|BPF_LSH|BPF_K] = BPF_S_ALU_LSH_K + 1, + [BPF_ALU|BPF_LSH|BPF_X] = BPF_S_ALU_LSH_X + 1, + [BPF_ALU|BPF_RSH|BPF_K] = BPF_S_ALU_RSH_K + 1, + [BPF_ALU|BPF_RSH|BPF_X] = BPF_S_ALU_RSH_X + 1, + [BPF_ALU|BPF_NEG] = BPF_S_ALU_NEG + 1, + [BPF_LD|BPF_W|BPF_ABS] = BPF_S_LD_W_ABS + 1, + [BPF_LD|BPF_H|BPF_ABS] = BPF_S_LD_H_ABS + 1, + [BPF_LD|BPF_B|BPF_ABS] = BPF_S_LD_B_ABS + 1, + [BPF_LD|BPF_W|BPF_LEN] = BPF_S_LD_W_LEN + 1, + [BPF_LD|BPF_W|BPF_IND] = BPF_S_LD_W_IND + 1, + [BPF_LD|BPF_H|BPF_IND] = BPF_S_LD_H_IND + 1, + [BPF_LD|BPF_B|BPF_IND] = BPF_S_LD_B_IND + 1, + [BPF_LD|BPF_IMM] = BPF_S_LD_IMM + 1, + [BPF_LDX|BPF_W|BPF_LEN] = BPF_S_LDX_W_LEN + 1, + [BPF_LDX|BPF_B|BPF_MSH] = BPF_S_LDX_B_MSH + 1, + [BPF_LDX|BPF_IMM] = BPF_S_LDX_IMM + 1, + [BPF_MISC|BPF_TAX] = BPF_S_MISC_TAX + 1, + [BPF_MISC|BPF_TXA] = BPF_S_MISC_TXA + 1, + [BPF_RET|BPF_K] = BPF_S_RET_K + 1, + [BPF_RET|BPF_A] = BPF_S_RET_A + 1, + [BPF_ALU|BPF_DIV|BPF_K] = BPF_S_ALU_DIV_K + 1, + [BPF_LD|BPF_MEM] = BPF_S_LD_MEM + 1, + [BPF_LDX|BPF_MEM] = BPF_S_LDX_MEM + 1, + [BPF_ST] = BPF_S_ST + 1, + [BPF_STX] = BPF_S_STX + 1, + [BPF_JMP|BPF_JA] = BPF_S_JMP_JA + 1, + [BPF_JMP|BPF_JEQ|BPF_K] = BPF_S_JMP_JEQ_K + 1, + [BPF_JMP|BPF_JEQ|BPF_X] = BPF_S_JMP_JEQ_X + 1, + [BPF_JMP|BPF_JGE|BPF_K] = BPF_S_JMP_JGE_K + 1, + [BPF_JMP|BPF_JGE|BPF_X] = BPF_S_JMP_JGE_X + 1, + [BPF_JMP|BPF_JGT|BPF_K] = BPF_S_JMP_JGT_K + 1, + [BPF_JMP|BPF_JGT|BPF_X] = BPF_S_JMP_JGT_X + 1, + [BPF_JMP|BPF_JSET|BPF_K] = BPF_S_JMP_JSET_K + 1, + [BPF_JMP|BPF_JSET|BPF_X] = BPF_S_JMP_JSET_X + 1, + }; int pc; if (flen == 0 || flen > BPF_MAXINSNS) @@ -391,136 +441,31 @@ int sk_chk_filter(struct sock_filter *filter, int flen) /* check the filter code now */ for (pc = 0; pc < flen; pc++) { - ftest = &filter[pc]; - - /* Only allow valid instructions */ - switch (ftest->code) { - case BPF_ALU|BPF_ADD|BPF_K: - ftest->code = BPF_S_ALU_ADD_K; - break; - case BPF_ALU|BPF_ADD|BPF_X: - ftest->code = BPF_S_ALU_ADD_X; - break; - case BPF_ALU|BPF_SUB|BPF_K: - ftest->code = BPF_S_ALU_SUB_K; - break; - case BPF_ALU|BPF_SUB|BPF_X: - ftest->code = BPF_S_ALU_SUB_X; - break; - case BPF_ALU|BPF_MUL|BPF_K: - ftest->code = BPF_S_ALU_MUL_K; - break; - case BPF_ALU|BPF_MUL|BPF_X: - ftest->code = BPF_S_ALU_MUL_X; - break; - case BPF_ALU|BPF_DIV|BPF_X: - ftest->code = BPF_S_ALU_DIV_X; - break; - case BPF_ALU|BPF_AND|BPF_K: - ftest->code = BPF_S_ALU_AND_K; - break; - case BPF_ALU|BPF_AND|BPF_X: - ftest->code = BPF_S_ALU_AND_X; - break; - case BPF_ALU|BPF_OR|BPF_K: - ftest->code = BPF_S_ALU_OR_K; - break; - case BPF_ALU|BPF_OR|BPF_X: - ftest->code = BPF_S_ALU_OR_X; - break; - case BPF_ALU|BPF_LSH|BPF_K: - ftest->code = BPF_S_ALU_LSH_K; - break; - case BPF_ALU|BPF_LSH|BPF_X: - ftest->code = BPF_S_ALU_LSH_X; - break; - case BPF_ALU|BPF_RSH|BPF_K: - ftest->code = BPF_S_ALU_RSH_K; - break; - case BPF_ALU|BPF_RSH|BPF_X: - ftest->code = BPF_S_ALU_RSH_X; - break; - case BPF_ALU|BPF_NEG: - ftest->code = BPF_S_ALU_NEG; - break; - case BPF_LD|BPF_W|BPF_ABS: - ftest->code = BPF_S_LD_W_ABS; - break; - case BPF_LD|BPF_H|BPF_ABS: - ftest->code = BPF_S_LD_H_ABS; - break; - case BPF_LD|BPF_B|BPF_ABS: - ftest->code = BPF_S_LD_B_ABS; - break; - case BPF_LD|BPF_W|BPF_LEN: - ftest->code = BPF_S_LD_W_LEN; - break; - case BPF_LD|BPF_W|BPF_IND: - ftest->code = BPF_S_LD_W_IND; - break; - case BPF_LD|BPF_H|BPF_IND: - ftest->code = BPF_S_LD_H_IND; - break; - case BPF_LD|BPF_B|BPF_IND: - ftest->code = BPF_S_LD_B_IND; - break; - case BPF_LD|BPF_IMM: - ftest->code = BPF_S_LD_IMM; - break; - case BPF_LDX|BPF_W|BPF_LEN: - ftest->code = BPF_S_LDX_W_LEN; - break; - case BPF_LDX|BPF_B|BPF_MSH: - ftest->code = BPF_S_LDX_B_MSH; - break; - case BPF_LDX|BPF_IMM: - ftest->code = BPF_S_LDX_IMM; - break; - case BPF_MISC|BPF_TAX: - ftest->code = BPF_S_MISC_TAX; - break; - case BPF_MISC|BPF_TXA: - ftest->code = BPF_S_MISC_TXA; - break; - case BPF_RET|BPF_K: - ftest->code = BPF_S_RET_K; - break; - case BPF_RET|BPF_A: - ftest->code = BPF_S_RET_A; - break; + struct sock_filter *ftest = &filter[pc]; + u16 code = ftest->code; + if (code >= ARRAY_SIZE(codes)) + return -EINVAL; + code = codes[code]; + /* Undo the '+ 1' in codes[] after validation. */ + if (!code--) + return -EINVAL; /* Some instructions need special checks */ - + switch (code) { + case BPF_S_ALU_DIV_K: /* check for division by zero */ - case BPF_ALU|BPF_DIV|BPF_K: if (ftest->k == 0) return -EINVAL; - ftest->code = BPF_S_ALU_DIV_K; break; - - /* check for invalid memory addresses */ - case BPF_LD|BPF_MEM: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_LD_MEM; - break; - case BPF_LDX|BPF_MEM: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_LDX_MEM; - break; - case BPF_ST: - if (ftest->k >= BPF_MEMWORDS) - return -EINVAL; - ftest->code = BPF_S_ST; - break; - case BPF_STX: + case BPF_S_LD_MEM: + case BPF_S_LDX_MEM: + case BPF_S_ST: + case BPF_S_STX: + /* check for invalid memory addresses */ if (ftest->k >= BPF_MEMWORDS) return -EINVAL; - ftest->code = BPF_S_STX; break; - - case BPF_JMP|BPF_JA: + case BPF_S_JMP_JA: /* * Note, the large ftest->k might cause loops. * Compare this with conditional jumps below, @@ -528,40 +473,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen) */ if (ftest->k >= (unsigned)(flen-pc-1)) return -EINVAL; - ftest->code = BPF_S_JMP_JA; - break; - - case BPF_JMP|BPF_JEQ|BPF_K: - ftest->code = BPF_S_JMP_JEQ_K; - break; - case BPF_JMP|BPF_JEQ|BPF_X: - ftest->code = BPF_S_JMP_JEQ_X; break; - case BPF_JMP|BPF_JGE|BPF_K: - ftest->code = BPF_S_JMP_JGE_K; - break; - case BPF_JMP|BPF_JGE|BPF_X: - ftest->code = BPF_S_JMP_JGE_X; - break; - case BPF_JMP|BPF_JGT|BPF_K: - ftest->code = BPF_S_JMP_JGT_K; - break; - case BPF_JMP|BPF_JGT|BPF_X: - ftest->code = BPF_S_JMP_JGT_X; - break; - case BPF_JMP|BPF_JSET|BPF_K: - ftest->code = BPF_S_JMP_JSET_K; - break; - case BPF_JMP|BPF_JSET|BPF_X: - ftest->code = BPF_S_JMP_JSET_X; - break; - - default: - return -EINVAL; - } - - /* for conditionals both must be safe */ - switch (ftest->code) { case BPF_S_JMP_JEQ_K: case BPF_S_JMP_JEQ_X: case BPF_S_JMP_JGE_K: @@ -570,10 +482,13 @@ int sk_chk_filter(struct sock_filter *filter, int flen) case BPF_S_JMP_JGT_X: case BPF_S_JMP_JSET_X: case BPF_S_JMP_JSET_K: + /* for conditionals both must be safe */ if (pc + ftest->jt + 1 >= flen || pc + ftest->jf + 1 >= flen) return -EINVAL; + break; } + ftest->code = code; } /* last instruction must be a RET code */ @@ -581,10 +496,8 @@ int sk_chk_filter(struct sock_filter *filter, int flen) case BPF_S_RET_K: case BPF_S_RET_A: return 0; - break; - default: - return -EINVAL; - } + } + return -EINVAL; } EXPORT_SYMBOL(sk_chk_filter); -- 1.7.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-17 1:19 ` [PATCH v3] " Tetsuo Handa @ 2010-11-17 7:48 ` Eric Dumazet 2010-11-17 7:54 ` Changli Gao 2010-11-17 8:06 ` Tetsuo Handa 2010-11-18 18:58 ` David Miller 1 sibling, 2 replies; 27+ messages in thread From: Eric Dumazet @ 2010-11-17 7:48 UTC (permalink / raw) To: Tetsuo Handa; +Cc: mjt, davem, drosenberg, hagen, xiaosuo, netdev Le mercredi 17 novembre 2010 à 10:19 +0900, Tetsuo Handa a écrit : > Eric Dumazet wrote: > > I dont understand the problem... > > Once translated, you have to test the translated code, not the original > > one ;) > > I moved the test to after translation. > > > why u16 ? > > > > You store translated instructions, so u8 is OK > > I chose u16 because type of filter->code is __u16. > But I changed to use u8 as you suggested that translated code fits in u8. > > > Also fix the indentation at the end of sk_chk_filter() > > > > You have 3 extra tabulations : > > Fixed. > > Hagen Paul Pfeifer wrote: > > Maybe I don't get it, but you increment the opcode by one, but you never > > increment the opcode in sk_run_filter() - do I miss something? Did you test > > the your patch (a trivial tcpdump rule should be sufficient)? > > I added a comment line. > > Changli Gao wrote: > > > + struct sock_filter *ftest = &filter[pc]; > > > > Why move the define here? > > To suppress compiler's warning about mixed declaration. > > > > + u16 code = ftest->code; > > > > > > + if (code >= ARRAY_SIZE(codes)) > > > + return 0; > > > > return -EINVAL; > > Fixed in v2. Thanks. > > > But how about this: > > > > enum { > > BPF_S_RET_K = 1, > > If BPF_S_* are only for kernel internal use, I think we don't need to translate > from the beginning because only net/core/filter.c uses BPF_S_*. > > BPF_S_* are exposed to userspace via /usr/include/linux/filter.h since 2.6.36. > Is it no problem to change? No problem, and Changli posted patch to move them to net/core/filter.c anyway. > > Filesize change (x86_32) by this patch: > gcc 3.3.5: 7184 -> 5060 > gcc 4.4.3: 7972 -> 5588 > ---------------------------------------- > From b8777ab64bc31dbbe499eb62c2ffd29add7e79c8 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Wed, 17 Nov 2010 09:46:33 +0900 > Subject: [PATCH v3] filter: Optimize instruction revalidation code. > > Since repeating u16 value to u8 value conversion using switch() clause's > case statement is wasteful, this patch introduces u16 to u8 mapping table > and removes most of case statements. As a result, the size of net/core/filter.o > is reduced by about 29% on x86. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- > net/core/filter.c | 231 +++++++++++++++++------------------------------------ > 1 files changed, 72 insertions(+), 159 deletions(-) > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Please repost it when Changli patch is accepted by David (if accepted :)), to get rid of the "+ 1" ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-17 7:48 ` Eric Dumazet @ 2010-11-17 7:54 ` Changli Gao 2010-11-17 8:18 ` Eric Dumazet 2010-11-17 8:06 ` Tetsuo Handa 1 sibling, 1 reply; 27+ messages in thread From: Changli Gao @ 2010-11-17 7:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: Tetsuo Handa, mjt, davem, drosenberg, hagen, netdev On Wed, Nov 17, 2010 at 3:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > > Please repost it when Changli patch is accepted by David > (if accepted :)), to get rid of the "+ 1" > Oh. No need, if David accepts this patch first, otherwise, there is just some offset, and patch can handle it. :) -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-17 7:54 ` Changli Gao @ 2010-11-17 8:18 ` Eric Dumazet 0 siblings, 0 replies; 27+ messages in thread From: Eric Dumazet @ 2010-11-17 8:18 UTC (permalink / raw) To: Changli Gao; +Cc: Tetsuo Handa, mjt, davem, drosenberg, hagen, netdev Le mercredi 17 novembre 2010 à 15:54 +0800, Changli Gao a écrit : > On Wed, Nov 17, 2010 at 3:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > > > > Please repost it when Changli patch is accepted by David > > (if accepted :)), to get rid of the "+ 1" > > > > Oh. No need, if David accepts this patch first, otherwise, there is > just some offset, and patch can handle it. :) > > I was speaking of all the "+ 1" in codes[] init + static const u8 codes[] = { + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, + [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, + [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, + [BPF_ALU|BPF_SUB|BPF_X] = BPF_S_ALU_SUB_X + 1, + [BPF_ALU|BPF_MUL|BPF_K] = BPF_S_ALU_MUL_K + 1, + [BPF_ALU|BPF_MUL|BPF_X] = BPF_S_ALU_MUL_X + 1, + [BPF_ALU|BPF_DIV|BPF_X] = BPF_S_ALU_DIV_X + 1, + [BPF_ALU|BPF_AND|BPF_K] = BPF_S_ALU_AND_K + 1, + [BPF_ALU|BPF_AND|BPF_X] = BPF_S_ALU_AND_X + 1, + [BPF_ALU|BPF_OR|BPF_K] = BPF_S_ALU_OR_K + 1, + [BPF_ALU|BPF_OR|BPF_X] = BPF_S_ALU_OR_X + 1, + [BPF_ALU|BPF_LSH|BPF_K] = BPF_S_ALU_LSH_K + 1, + [BPF_ALU|BPF_LSH|BPF_X] = BPF_S_ALU_LSH_X + 1, + [BPF_ALU|BPF_RSH|BPF_K] = BPF_S_ALU_RSH_K + 1, + [BPF_ALU|BPF_RSH|BPF_X] = BPF_S_ALU_RSH_X + 1, + [BPF_ALU|BPF_NEG] = BPF_S_ALU_NEG + 1, + [BPF_LD|BPF_W|BPF_ABS] = BPF_S_LD_W_ABS + 1, + [BPF_LD|BPF_H|BPF_ABS] = BPF_S_LD_H_ABS + 1, + [BPF_LD|BPF_B|BPF_ABS] = BPF_S_LD_B_ABS + 1, + [BPF_LD|BPF_W|BPF_LEN] = BPF_S_LD_W_LEN + 1, + [BPF_LD|BPF_W|BPF_IND] = BPF_S_LD_W_IND + 1, + [BPF_LD|BPF_H|BPF_IND] = BPF_S_LD_H_IND + 1, + [BPF_LD|BPF_B|BPF_IND] = BPF_S_LD_B_IND + 1, + [BPF_LD|BPF_IMM] = BPF_S_LD_IMM + 1, + [BPF_LDX|BPF_W|BPF_LEN] = BPF_S_LDX_W_LEN + 1, + [BPF_LDX|BPF_B|BPF_MSH] = BPF_S_LDX_B_MSH + 1, + [BPF_LDX|BPF_IMM] = BPF_S_LDX_IMM + 1, + [BPF_MISC|BPF_TAX] = BPF_S_MISC_TAX + 1, + [BPF_MISC|BPF_TXA] = BPF_S_MISC_TXA + 1, + [BPF_RET|BPF_K] = BPF_S_RET_K + 1, + [BPF_RET|BPF_A] = BPF_S_RET_A + 1, + [BPF_ALU|BPF_DIV|BPF_K] = BPF_S_ALU_DIV_K + 1, + [BPF_LD|BPF_MEM] = BPF_S_LD_MEM + 1, + [BPF_LDX|BPF_MEM] = BPF_S_LDX_MEM + 1, + [BPF_ST] = BPF_S_ST + 1, + [BPF_STX] = BPF_S_STX + 1, + [BPF_JMP|BPF_JA] = BPF_S_JMP_JA + 1, + [BPF_JMP|BPF_JEQ|BPF_K] = BPF_S_JMP_JEQ_K + 1, + [BPF_JMP|BPF_JEQ|BPF_X] = BPF_S_JMP_JEQ_X + 1, + [BPF_JMP|BPF_JGE|BPF_K] = BPF_S_JMP_JGE_K + 1, + [BPF_JMP|BPF_JGE|BPF_X] = BPF_S_JMP_JGE_X + 1, + [BPF_JMP|BPF_JGT|BPF_K] = BPF_S_JMP_JGT_K + 1, + [BPF_JMP|BPF_JGT|BPF_X] = BPF_S_JMP_JGT_X + 1, + [BPF_JMP|BPF_JSET|BPF_K] = BPF_S_JMP_JSET_K + 1, + [BPF_JMP|BPF_JSET|BPF_X] = BPF_S_JMP_JSET_X + 1, + }; This was needed because the translated instructions were begining from 0 If we change them to start at offset 1, we can avoid all the "+ 1" : and avoid the bit ugly : if (!code--) ... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-17 7:48 ` Eric Dumazet 2010-11-17 7:54 ` Changli Gao @ 2010-11-17 8:06 ` Tetsuo Handa 2010-11-17 9:01 ` Hagen Paul Pfeifer 1 sibling, 1 reply; 27+ messages in thread From: Tetsuo Handa @ 2010-11-17 8:06 UTC (permalink / raw) To: eric.dumazet; +Cc: mjt, davem, drosenberg, hagen, xiaosuo, netdev Eric Dumazet wrote: > > > But how about this: > > > > > > enum { > > > BPF_S_RET_K = 1, > > > > If BPF_S_* are only for kernel internal use, I think we don't need to translate > > from the beginning because only net/core/filter.c uses BPF_S_*. > > > > BPF_S_* are exposed to userspace via /usr/include/linux/filter.h since 2.6.36. > > Is it no problem to change? > > No problem, and Changli posted patch to move them to net/core/filter.c > anyway. > Please repost it when Changli patch is accepted by David > (if accepted :)), to get rid of the "+ 1" But if Changli's patch is accepted, we can remove BPF_S_* like below. switch (fentry->code) { - case BPF_S_ALU_ADD_X: + case BPF_ALU|BPF_ADD|BPF_X: /* BPF_S_ALU_ADD_X */ A += X; continue; - case BPF_S_ALU_ADD_K: + case BPF_ALU|BPF_ADD|BPF_K: /* BPF_S_ALU_ADD_K */ A += f_k; continue; - case BPF_S_ALU_SUB_X: + case BPF_ALU|BPF_SUB|BPF_X: /* BPF_S_ALU_SUB_X */ A -= X; continue; - case BPF_S_ALU_SUB_K: + case BPF_ALU|BPF_SUB|BPF_K: /* BPF_S_ALU_SUB_K */ A -= f_k; continue; If compilers can generate better code for continuous case values than discontinuous case values, then keeping BPF_S_* makes sense. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-17 8:06 ` Tetsuo Handa @ 2010-11-17 9:01 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 27+ messages in thread From: Hagen Paul Pfeifer @ 2010-11-17 9:01 UTC (permalink / raw) To: Tetsuo Handa; +Cc: eric.dumazet, mjt, davem, drosenberg, xiaosuo, netdev On Wed, 17 Nov 2010 17:06:10 +0900, Tetsuo Handawrote: > If compilers can generate better code for continuous case values > than discontinuous case values, then keeping BPF_S_* makes sense. Yes, this is crucial. A compiler cannot generate a jump table of O(1) for dense labels. The BPF_S_* translation was done to convert to a dense label construct. So this "optimization" is a no-go. Look at the log and read my comment for a detailed explanation. Anyway: "code--" increments the _copied_ value, not the value in the evaluated opcode itself. I will validate the patch! Hagen ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] filter: Optimize instruction revalidation code. 2010-11-17 1:19 ` [PATCH v3] " Tetsuo Handa 2010-11-17 7:48 ` Eric Dumazet @ 2010-11-18 18:58 ` David Miller 1 sibling, 0 replies; 27+ messages in thread From: David Miller @ 2010-11-18 18:58 UTC (permalink / raw) To: penguin-kernel; +Cc: mjt, drosenberg, hagen, xiaosuo, netdev From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 17 Nov 2010 10:19:51 +0900 > Subject: [PATCH v3] filter: Optimize instruction revalidation code. > > Since repeating u16 value to u8 value conversion using switch() clause's > case statement is wasteful, this patch introduces u16 to u8 mapping table > and removes most of case statements. As a result, the size of net/core/filter.o > is reduced by about 29% on x86. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Applied, thank you. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] filter: Optimize instruction revalidation code. 2010-11-16 13:08 ` [PATCH] filter: Optimize instruction revalidation code Tetsuo Handa 2010-11-16 13:11 ` Michael Tokarev 2010-11-16 13:44 ` Eric Dumazet @ 2010-11-16 22:13 ` Hagen Paul Pfeifer 2010-11-16 23:31 ` Changli Gao 2010-11-16 23:24 ` Changli Gao 3 siblings, 1 reply; 27+ messages in thread From: Hagen Paul Pfeifer @ 2010-11-16 22:13 UTC (permalink / raw) To: Tetsuo Handa; +Cc: davem, eric.dumazet, drosenberg, netdev * Tetsuo Handa | 2010-11-16 22:08:50 [+0900]: >--- a/net/core/filter.c >+++ b/net/core/filter.c >@@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); > */ > int sk_chk_filter(struct sock_filter *filter, int flen) > { >- struct sock_filter *ftest; >+ /* >+ * Valid instructions are initialized to non-0. >+ * Invalid instructions are initialized to 0. >+ */ >+ static u16 codes[] = { >+ [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, >+ [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, >+ [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, [...] Maybe I don't get it, but you increment the opcode by one, but you never increment the opcode in sk_run_filter() - do I miss something? Did you test the your patch (a trivial tcpdump rule should be sufficient)? If this question is answered (or fixed): Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> HGN ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] filter: Optimize instruction revalidation code. 2010-11-16 22:13 ` [PATCH] " Hagen Paul Pfeifer @ 2010-11-16 23:31 ` Changli Gao 2010-11-16 23:45 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 27+ messages in thread From: Changli Gao @ 2010-11-16 23:31 UTC (permalink / raw) To: Hagen Paul Pfeifer; +Cc: Tetsuo Handa, davem, eric.dumazet, drosenberg, netdev On Wed, Nov 17, 2010 at 6:13 AM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > * Tetsuo Handa | 2010-11-16 22:08:50 [+0900]: > >>--- a/net/core/filter.c >>+++ b/net/core/filter.c >>@@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); >> */ >> int sk_chk_filter(struct sock_filter *filter, int flen) >> { >>- struct sock_filter *ftest; >>+ /* >>+ * Valid instructions are initialized to non-0. >>+ * Invalid instructions are initialized to 0. >>+ */ >>+ static u16 codes[] = { >>+ [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, >>+ [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, >>+ [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, > [...] > > Maybe I don't get it, but you increment the opcode by one, but you never > increment the opcode in sk_run_filter() - do I miss something? Did you test > the your patch (a trivial tcpdump rule should be sufficient)? > + code = codes[code]; + if (!code--) + return -EINVAL; But how about this: enum { BPF_S_RET_K = 1, -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] filter: Optimize instruction revalidation code. 2010-11-16 23:31 ` Changli Gao @ 2010-11-16 23:45 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 27+ messages in thread From: Hagen Paul Pfeifer @ 2010-11-16 23:45 UTC (permalink / raw) To: Changli Gao; +Cc: Tetsuo Handa, davem, eric.dumazet, drosenberg, netdev * Changli Gao | 2010-11-17 07:31:51 [+0800]: >> Maybe I don't get it, but you increment the opcode by one, but you never >> increment the opcode in sk_run_filter() - do I miss something? Did you test >> the your patch (a trivial tcpdump rule should be sufficient)? >> > >+ code = codes[code]; >+ if (!code--) >+ return -EINVAL; Right, temporary in sk_chk_filter() but as I wrote earlier not in sk_run_filter(). >But how about this: > >enum { > BPF_S_RET_K = 1, better. Best regards, Hagen BTW: you can verify your code by construct some artificial filter rules via struct sock_filter { uint16_t code; /* Actual filter code */ uint8_t jt; /* Jump true */ uint8_t jf; /* Jump false */ uint32_t k; /* Generic multiuse field }; and attach them to a socket with setsockopt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] filter: Optimize instruction revalidation code. 2010-11-16 13:08 ` [PATCH] filter: Optimize instruction revalidation code Tetsuo Handa ` (2 preceding siblings ...) 2010-11-16 22:13 ` [PATCH] " Hagen Paul Pfeifer @ 2010-11-16 23:24 ` Changli Gao 3 siblings, 0 replies; 27+ messages in thread From: Changli Gao @ 2010-11-16 23:24 UTC (permalink / raw) To: Tetsuo Handa; +Cc: davem, eric.dumazet, drosenberg, netdev On Tue, Nov 16, 2010 at 9:08 PM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > In the wake of commit 57fe93b3 "filter: make sure filters dont read > uninitialized memory", I came up with this patch. I don't know how to test. > Please review and do tests. > > This patch optimizes sk_chk_filter(). Using gcc 4.1.2 on x86_32, the size of > net/core/filter.o changed from 7416 bytes to 5260 bytes > ("make allnoconfig" + CONFIG_NET=y). > > By the way, if "struct sock_filter *filter" argument of sk_run_filter() does > not change after it was revalidated at sk_chk_filter(), can't we detect and > reject "BPF_S_LD_MEM/BPF_S_LDX_MEM before BPF_S_ST/BPF_S_STX" at > sk_chk_filter()? > ---------------------------------------- > From 2bf36130bd4912590b409b47a6a9a82e2884e035 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Tue, 16 Nov 2010 21:39:50 +0900 > Subject: [PATCH] filter: Optimize instruction revalidation code. > > Since repeating small value to small value conversion using switch() clause's > case statement is wasteful, this patch instroduces u16 to u16 mapping table > and removes most of case statements. As a result, the size of net/core/filter.o > is reduced by about 30% on x86. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- > net/core/filter.c | 214 ++++++++++++++++------------------------------------- > 1 files changed, 65 insertions(+), 149 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 23e9b2a..85be3d8 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -383,7 +383,57 @@ EXPORT_SYMBOL(sk_run_filter); > */ > int sk_chk_filter(struct sock_filter *filter, int flen) > { > - struct sock_filter *ftest; > + /* > + * Valid instructions are initialized to non-0. > + * Invalid instructions are initialized to 0. > + */ > + static u16 codes[] = { > + [BPF_ALU|BPF_ADD|BPF_K] = BPF_S_ALU_ADD_K + 1, > + [BPF_ALU|BPF_ADD|BPF_X] = BPF_S_ALU_ADD_X + 1, > + [BPF_ALU|BPF_SUB|BPF_K] = BPF_S_ALU_SUB_K + 1, > + [BPF_ALU|BPF_SUB|BPF_X] = BPF_S_ALU_SUB_X + 1, > + [BPF_ALU|BPF_MUL|BPF_K] = BPF_S_ALU_MUL_K + 1, > + [BPF_ALU|BPF_MUL|BPF_X] = BPF_S_ALU_MUL_X + 1, > + [BPF_ALU|BPF_DIV|BPF_X] = BPF_S_ALU_DIV_X + 1, > + [BPF_ALU|BPF_AND|BPF_K] = BPF_S_ALU_AND_K + 1, > + [BPF_ALU|BPF_AND|BPF_X] = BPF_S_ALU_AND_X + 1, > + [BPF_ALU|BPF_OR|BPF_K] = BPF_S_ALU_OR_K + 1, > + [BPF_ALU|BPF_OR|BPF_X] = BPF_S_ALU_OR_X + 1, > + [BPF_ALU|BPF_LSH|BPF_K] = BPF_S_ALU_LSH_K + 1, > + [BPF_ALU|BPF_LSH|BPF_X] = BPF_S_ALU_LSH_X + 1, > + [BPF_ALU|BPF_RSH|BPF_K] = BPF_S_ALU_RSH_K + 1, > + [BPF_ALU|BPF_RSH|BPF_X] = BPF_S_ALU_RSH_X + 1, > + [BPF_ALU|BPF_NEG] = BPF_S_ALU_NEG + 1, > + [BPF_LD|BPF_W|BPF_ABS] = BPF_S_LD_W_ABS + 1, > + [BPF_LD|BPF_H|BPF_ABS] = BPF_S_LD_H_ABS + 1, > + [BPF_LD|BPF_B|BPF_ABS] = BPF_S_LD_B_ABS + 1, > + [BPF_LD|BPF_W|BPF_LEN] = BPF_S_LD_W_LEN + 1, > + [BPF_LD|BPF_W|BPF_IND] = BPF_S_LD_W_IND + 1, > + [BPF_LD|BPF_H|BPF_IND] = BPF_S_LD_H_IND + 1, > + [BPF_LD|BPF_B|BPF_IND] = BPF_S_LD_B_IND + 1, > + [BPF_LD|BPF_IMM] = BPF_S_LD_IMM + 1, > + [BPF_LDX|BPF_W|BPF_LEN] = BPF_S_LDX_W_LEN + 1, > + [BPF_LDX|BPF_B|BPF_MSH] = BPF_S_LDX_B_MSH + 1, > + [BPF_LDX|BPF_IMM] = BPF_S_LDX_IMM + 1, > + [BPF_MISC|BPF_TAX] = BPF_S_MISC_TAX + 1, > + [BPF_MISC|BPF_TXA] = BPF_S_MISC_TXA + 1, > + [BPF_RET|BPF_K] = BPF_S_RET_K + 1, > + [BPF_RET|BPF_A] = BPF_S_RET_A + 1, > + [BPF_ALU|BPF_DIV|BPF_K] = BPF_S_ALU_DIV_K + 1, > + [BPF_LD|BPF_MEM] = BPF_S_LD_MEM + 1, > + [BPF_LDX|BPF_MEM] = BPF_S_LDX_MEM + 1, > + [BPF_ST] = BPF_S_ST + 1, > + [BPF_STX] = BPF_S_STX + 1, > + [BPF_JMP|BPF_JA] = BPF_S_JMP_JA + 1, > + [BPF_JMP|BPF_JEQ|BPF_K] = BPF_S_JMP_JEQ_K + 1, > + [BPF_JMP|BPF_JEQ|BPF_X] = BPF_S_JMP_JEQ_X + 1, > + [BPF_JMP|BPF_JGE|BPF_K] = BPF_S_JMP_JGE_K + 1, > + [BPF_JMP|BPF_JGE|BPF_X] = BPF_S_JMP_JGE_X + 1, > + [BPF_JMP|BPF_JGT|BPF_K] = BPF_S_JMP_JGT_K + 1, > + [BPF_JMP|BPF_JGT|BPF_X] = BPF_S_JMP_JGT_X + 1, > + [BPF_JMP|BPF_JSET|BPF_K] = BPF_S_JMP_JSET_K + 1, > + [BPF_JMP|BPF_JSET|BPF_X] = BPF_S_JMP_JSET_X + 1, > + }; > int pc; > > if (flen == 0 || flen > BPF_MAXINSNS) > @@ -391,135 +441,26 @@ int sk_chk_filter(struct sock_filter *filter, int flen) > > /* check the filter code now */ > for (pc = 0; pc < flen; pc++) { > - ftest = &filter[pc]; > - > - /* Only allow valid instructions */ > - switch (ftest->code) { > - case BPF_ALU|BPF_ADD|BPF_K: > - ftest->code = BPF_S_ALU_ADD_K; > - break; > - case BPF_ALU|BPF_ADD|BPF_X: > - ftest->code = BPF_S_ALU_ADD_X; > - break; > - case BPF_ALU|BPF_SUB|BPF_K: > - ftest->code = BPF_S_ALU_SUB_K; > - break; > - case BPF_ALU|BPF_SUB|BPF_X: > - ftest->code = BPF_S_ALU_SUB_X; > - break; > - case BPF_ALU|BPF_MUL|BPF_K: > - ftest->code = BPF_S_ALU_MUL_K; > - break; > - case BPF_ALU|BPF_MUL|BPF_X: > - ftest->code = BPF_S_ALU_MUL_X; > - break; > - case BPF_ALU|BPF_DIV|BPF_X: > - ftest->code = BPF_S_ALU_DIV_X; > - break; > - case BPF_ALU|BPF_AND|BPF_K: > - ftest->code = BPF_S_ALU_AND_K; > - break; > - case BPF_ALU|BPF_AND|BPF_X: > - ftest->code = BPF_S_ALU_AND_X; > - break; > - case BPF_ALU|BPF_OR|BPF_K: > - ftest->code = BPF_S_ALU_OR_K; > - break; > - case BPF_ALU|BPF_OR|BPF_X: > - ftest->code = BPF_S_ALU_OR_X; > - break; > - case BPF_ALU|BPF_LSH|BPF_K: > - ftest->code = BPF_S_ALU_LSH_K; > - break; > - case BPF_ALU|BPF_LSH|BPF_X: > - ftest->code = BPF_S_ALU_LSH_X; > - break; > - case BPF_ALU|BPF_RSH|BPF_K: > - ftest->code = BPF_S_ALU_RSH_K; > - break; > - case BPF_ALU|BPF_RSH|BPF_X: > - ftest->code = BPF_S_ALU_RSH_X; > - break; > - case BPF_ALU|BPF_NEG: > - ftest->code = BPF_S_ALU_NEG; > - break; > - case BPF_LD|BPF_W|BPF_ABS: > - ftest->code = BPF_S_LD_W_ABS; > - break; > - case BPF_LD|BPF_H|BPF_ABS: > - ftest->code = BPF_S_LD_H_ABS; > - break; > - case BPF_LD|BPF_B|BPF_ABS: > - ftest->code = BPF_S_LD_B_ABS; > - break; > - case BPF_LD|BPF_W|BPF_LEN: > - ftest->code = BPF_S_LD_W_LEN; > - break; > - case BPF_LD|BPF_W|BPF_IND: > - ftest->code = BPF_S_LD_W_IND; > - break; > - case BPF_LD|BPF_H|BPF_IND: > - ftest->code = BPF_S_LD_H_IND; > - break; > - case BPF_LD|BPF_B|BPF_IND: > - ftest->code = BPF_S_LD_B_IND; > - break; > - case BPF_LD|BPF_IMM: > - ftest->code = BPF_S_LD_IMM; > - break; > - case BPF_LDX|BPF_W|BPF_LEN: > - ftest->code = BPF_S_LDX_W_LEN; > - break; > - case BPF_LDX|BPF_B|BPF_MSH: > - ftest->code = BPF_S_LDX_B_MSH; > - break; > - case BPF_LDX|BPF_IMM: > - ftest->code = BPF_S_LDX_IMM; > - break; > - case BPF_MISC|BPF_TAX: > - ftest->code = BPF_S_MISC_TAX; > - break; > - case BPF_MISC|BPF_TXA: > - ftest->code = BPF_S_MISC_TXA; > - break; > - case BPF_RET|BPF_K: > - ftest->code = BPF_S_RET_K; > - break; > - case BPF_RET|BPF_A: > - ftest->code = BPF_S_RET_A; > - break; > + struct sock_filter *ftest = &filter[pc]; Why move the define here? > + u16 code = ftest->code; > > + if (code >= ARRAY_SIZE(codes)) > + return 0; return -EINVAL; > /* Some instructions need special checks */ > - > - /* check for division by zero */ > + switch (code) { > case BPF_ALU|BPF_DIV|BPF_K: > + /* check for division by zero */ > if (ftest->k == 0) > return -EINVAL; > - ftest->code = BPF_S_ALU_DIV_K; > break; > - > - /* check for invalid memory addresses */ > case BPF_LD|BPF_MEM: > - if (ftest->k >= BPF_MEMWORDS) > - return -EINVAL; > - ftest->code = BPF_S_LD_MEM; > - break; > case BPF_LDX|BPF_MEM: > - if (ftest->k >= BPF_MEMWORDS) > - return -EINVAL; > - ftest->code = BPF_S_LDX_MEM; > - break; > case BPF_ST: > - if (ftest->k >= BPF_MEMWORDS) > - return -EINVAL; > - ftest->code = BPF_S_ST; > - break; > case BPF_STX: > + /* check for invalid memory addresses */ > if (ftest->k >= BPF_MEMWORDS) > return -EINVAL; > - ftest->code = BPF_S_STX; > break; > - > case BPF_JMP|BPF_JA: > /* > * Note, the large ftest->k might cause loops. > @@ -528,40 +469,14 @@ int sk_chk_filter(struct sock_filter *filter, int flen) > */ > if (ftest->k >= (unsigned)(flen-pc-1)) > return -EINVAL; > - ftest->code = BPF_S_JMP_JA; > - break; > - > - case BPF_JMP|BPF_JEQ|BPF_K: > - ftest->code = BPF_S_JMP_JEQ_K; > - break; > - case BPF_JMP|BPF_JEQ|BPF_X: > - ftest->code = BPF_S_JMP_JEQ_X; > - break; > - case BPF_JMP|BPF_JGE|BPF_K: > - ftest->code = BPF_S_JMP_JGE_K; > - break; > - case BPF_JMP|BPF_JGE|BPF_X: > - ftest->code = BPF_S_JMP_JGE_X; > - break; > - case BPF_JMP|BPF_JGT|BPF_K: > - ftest->code = BPF_S_JMP_JGT_K; > - break; > - case BPF_JMP|BPF_JGT|BPF_X: > - ftest->code = BPF_S_JMP_JGT_X; > break; > - case BPF_JMP|BPF_JSET|BPF_K: > - ftest->code = BPF_S_JMP_JSET_K; > - break; > - case BPF_JMP|BPF_JSET|BPF_X: > - ftest->code = BPF_S_JMP_JSET_X; > - break; > - > - default: > - return -EINVAL; > } > - > - /* for conditionals both must be safe */ > - switch (ftest->code) { > + code = codes[code]; > + if (!code--) > + return -EINVAL; > + > + /* for conditionals both must be safe */ > + switch (code) { > case BPF_S_JMP_JEQ_K: > case BPF_S_JMP_JEQ_X: > case BPF_S_JMP_JGE_K: > @@ -574,6 +489,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen) > pc + ftest->jf + 1 >= flen) > return -EINVAL; > } > + ftest->code = code; > } > > /* last instruction must be a RET code */ > -- > 1.6.1 > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters
@ 2010-11-10 18:25 Dan Rosenberg
0 siblings, 0 replies; 27+ messages in thread
From: Dan Rosenberg @ 2010-11-10 18:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev, stable, security
Apologies for the terseness, I'm not looking for a fight.
I agree the memset is too expensive, and that Eric's fix is far better.
-Dan
------Original Message------
From: David Miller
To: drosenberg@vsecurity.com
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org
Cc: security@kernel.org
Subject: Re: [PATCH] Prevent reading uninitialized memory with socketfilters
Sent: Nov 10, 2010 1:21 PM
From: "Dan Rosenberg" <drosenberg@vsecurity.com>
Date: Wed, 10 Nov 2010 18:18:08 +0000
> The code sample I linked to clearly demonstrates exactly how to
> accomplish this, if you had bothered to read it.
I told you why I didn't read it, if you had bothered to read my
reply properly :-)
Anyways, I realize we have to do something, but memset() is going
to completely kill performance. I consider Eric's suggestion the
closest to acceptable cost at this point but even that is hard
to digest for me.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] Prevent reading uninitialized memory with socket filters @ 2010-11-09 22:28 Dan Rosenberg 2010-11-10 5:28 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Dan Rosenberg @ 2010-11-09 22:28 UTC (permalink / raw) To: netdev; +Cc: stable, security The "mem" array used as scratch space for socket filters is not initialized, allowing unprivileged users to leak kernel stack bytes. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> --- net/core/filter.c | 2 ++ 1 file changed, 2 insertions(+), 0 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 7beaec3..2749ba0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -121,6 +121,8 @@ unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int int k; int pc; + memset(mem, 0, sizeof(mem)); + /* * Process array of filter instructions. */ ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socket filters 2010-11-09 22:28 [PATCH] Prevent reading uninitialized memory with socket filters Dan Rosenberg @ 2010-11-10 5:28 ` David Miller 2010-11-10 5:53 ` Eric Dumazet 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2010-11-10 5:28 UTC (permalink / raw) To: drosenberg; +Cc: netdev, stable, security From: Dan Rosenberg <drosenberg@vsecurity.com> Date: Tue, 09 Nov 2010 17:28:44 -0500 > The "mem" array used as scratch space for socket filters is not > initialized, allowing unprivileged users to leak kernel stack bytes. > > Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Prove it. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socket filters 2010-11-10 5:28 ` David Miller @ 2010-11-10 5:53 ` Eric Dumazet 2010-11-10 7:22 ` Eric Dumazet 0 siblings, 1 reply; 27+ messages in thread From: Eric Dumazet @ 2010-11-10 5:53 UTC (permalink / raw) To: David Miller; +Cc: drosenberg, netdev, stable, security Le mardi 09 novembre 2010 à 21:28 -0800, David Miller a écrit : > From: Dan Rosenberg <drosenberg@vsecurity.com> > Date: Tue, 09 Nov 2010 17:28:44 -0500 > > > The "mem" array used as scratch space for socket filters is not > > initialized, allowing unprivileged users to leak kernel stack bytes. > > > > Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> > > Prove it. And once done, add the checks in sk_chk_filter() ? Allow a load of mem[X] only if a prior store of mem[X] is proven. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socket filters 2010-11-10 5:53 ` Eric Dumazet @ 2010-11-10 7:22 ` Eric Dumazet 2010-11-10 14:25 ` [PATCH] Prevent reading uninitialized memory with socketfilters Tetsuo Handa 0 siblings, 1 reply; 27+ messages in thread From: Eric Dumazet @ 2010-11-10 7:22 UTC (permalink / raw) To: David Miller; +Cc: drosenberg, netdev, stable, security Le mercredi 10 novembre 2010 à 06:53 +0100, Eric Dumazet a écrit : > Le mardi 09 novembre 2010 à 21:28 -0800, David Miller a écrit : > > From: Dan Rosenberg <drosenberg@vsecurity.com> > > Date: Tue, 09 Nov 2010 17:28:44 -0500 > > > > > The "mem" array used as scratch space for socket filters is not > > > initialized, allowing unprivileged users to leak kernel stack bytes. > > > > > > Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> > > > > Prove it. > > And once done, add the checks in sk_chk_filter() ? > > Allow a load of mem[X] only if a prior store of mem[X] is proven. > > This seems complex, and might fail on some valid filters. What about the following patch then ? [PATCH] filter: make sure filters dont read uninitialized memory There is a possibility malicious users can get limited information about uninitialized stack mem array. Even if sk_run_filter() result is bound to packet length (0 .. 65535), we could imagine this can be used by hostile user. Initializing mem[] array, like Dan Rosenberg suggested in his patch is expensive since most filters dont even use this array. Its hard to make the filter validation in sk_chk_filter(), because of the jumps. This might be done later. In this patch, I use a bitmap (a single long var) so that only filters using mem[] loads/stores pay the price of added security checks. For other filters, additional cost is a single instruction. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/core/filter.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 7beaec3..4d84dc2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -117,10 +117,12 @@ unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int u32 A = 0; /* Accumulator */ u32 X = 0; /* Index Register */ u32 mem[BPF_MEMWORDS]; /* Scratch Memory Store */ + unsigned long memvalid = 0; u32 tmp; int k; int pc; + BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG); /* * Process array of filter instructions. */ @@ -264,10 +266,12 @@ load_b: X = fentry->k; continue; case BPF_S_LD_MEM: - A = mem[fentry->k]; + A = (memvalid & (1UL << fentry->k)) ? + mem[fentry->k] : 0; continue; case BPF_S_LDX_MEM: - X = mem[fentry->k]; + X = (memvalid & (1UL << fentry->k)) ? + mem[fentry->k] : 0; continue; case BPF_S_MISC_TAX: X = A; @@ -280,9 +284,11 @@ load_b: case BPF_S_RET_A: return A; case BPF_S_ST: + memvalid |= 1UL << fentry->k; mem[fentry->k] = A; continue; case BPF_S_STX: + memvalid |= 1UL << fentry->k; mem[fentry->k] = X; continue; default: ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 7:22 ` Eric Dumazet @ 2010-11-10 14:25 ` Tetsuo Handa 2010-11-10 18:32 ` David Miller 2010-11-10 18:39 ` David Miller 0 siblings, 2 replies; 27+ messages in thread From: Tetsuo Handa @ 2010-11-10 14:25 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev Just I thought... > unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) > { > struct sock_filter *fentry; /* We walk down these */ Can't this be "const struct sock_filter *"? > (...snipped...) > for (pc = 0; pc < flen; pc++) { > fentry = &filter[pc]; Can't we do u32 f_k = fentry->k; and replace 27 repetition of fentry->k with f_k? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 14:25 ` [PATCH] Prevent reading uninitialized memory with socketfilters Tetsuo Handa @ 2010-11-10 18:32 ` David Miller 2010-11-10 18:39 ` David Miller 1 sibling, 0 replies; 27+ messages in thread From: David Miller @ 2010-11-10 18:32 UTC (permalink / raw) To: penguin-kernel; +Cc: eric.dumazet, netdev From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 10 Nov 2010 23:25:08 +0900 > Just I thought... > >> unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) >> { >> struct sock_filter *fentry; /* We walk down these */ > Can't this be "const struct sock_filter *"? >> (...snipped...) >> for (pc = 0; pc < flen; pc++) { >> fentry = &filter[pc]; > Can't we do > u32 f_k = fentry->k; > and replace 27 repetition of fentry->k with f_k? Yes, this feedback seems reasonable, I'll make these changes when I apply Eric's patch, thanks! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 14:25 ` [PATCH] Prevent reading uninitialized memory with socketfilters Tetsuo Handa 2010-11-10 18:32 ` David Miller @ 2010-11-10 18:39 ` David Miller 2010-11-10 20:57 ` Ben Hutchings 1 sibling, 1 reply; 27+ messages in thread From: David Miller @ 2010-11-10 18:39 UTC (permalink / raw) To: penguin-kernel; +Cc: eric.dumazet, netdev From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 10 Nov 2010 23:25:08 +0900 > Just I thought... > >> unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) >> { >> struct sock_filter *fentry; /* We walk down these */ > Can't this be "const struct sock_filter *"? >> (...snipped...) >> for (pc = 0; pc < flen; pc++) { >> fentry = &filter[pc]; > Can't we do > u32 f_k = fentry->k; > and replace 27 repetition of fentry->k with f_k? Ok, here is what I'm adding to net-2.6 (and will submit to -stable), if anyone can spot any silly errors let me know now. Thanks! -------------------- filter: make sure filters dont read uninitialized memory There is a possibility malicious users can get limited information about uninitialized stack mem array. Even if sk_run_filter() result is bound to packet length (0 .. 65535), we could imagine this can be used by hostile user. Initializing mem[] array, like Dan Rosenberg suggested in his patch is expensive since most filters dont even use this array. Its hard to make the filter validation in sk_chk_filter(), because of the jumps. This might be done later. In this patch, I use a bitmap (a single long var) so that only filters using mem[] loads/stores pay the price of added security checks. For other filters, additional cost is a single instruction. [ Since we access fentry->k a lot now, cache it in a local variable and mark filter entry pointer as const. -DaveM ] Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> --- net/core/filter.c | 64 +++++++++++++++++++++++++++++------------------------ 1 files changed, 35 insertions(+), 29 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 7beaec3..23e9b2a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -112,39 +112,41 @@ EXPORT_SYMBOL(sk_filter); */ unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int flen) { - struct sock_filter *fentry; /* We walk down these */ void *ptr; u32 A = 0; /* Accumulator */ u32 X = 0; /* Index Register */ u32 mem[BPF_MEMWORDS]; /* Scratch Memory Store */ + unsigned long memvalid = 0; u32 tmp; int k; int pc; + BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG); /* * Process array of filter instructions. */ for (pc = 0; pc < flen; pc++) { - fentry = &filter[pc]; + const struct sock_filter *fentry = &filter[pc]; + u32 f_k = fentry->k; switch (fentry->code) { case BPF_S_ALU_ADD_X: A += X; continue; case BPF_S_ALU_ADD_K: - A += fentry->k; + A += f_k; continue; case BPF_S_ALU_SUB_X: A -= X; continue; case BPF_S_ALU_SUB_K: - A -= fentry->k; + A -= f_k; continue; case BPF_S_ALU_MUL_X: A *= X; continue; case BPF_S_ALU_MUL_K: - A *= fentry->k; + A *= f_k; continue; case BPF_S_ALU_DIV_X: if (X == 0) @@ -152,49 +154,49 @@ unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int A /= X; continue; case BPF_S_ALU_DIV_K: - A /= fentry->k; + A /= f_k; continue; case BPF_S_ALU_AND_X: A &= X; continue; case BPF_S_ALU_AND_K: - A &= fentry->k; + A &= f_k; continue; case BPF_S_ALU_OR_X: A |= X; continue; case BPF_S_ALU_OR_K: - A |= fentry->k; + A |= f_k; continue; case BPF_S_ALU_LSH_X: A <<= X; continue; case BPF_S_ALU_LSH_K: - A <<= fentry->k; + A <<= f_k; continue; case BPF_S_ALU_RSH_X: A >>= X; continue; case BPF_S_ALU_RSH_K: - A >>= fentry->k; + A >>= f_k; continue; case BPF_S_ALU_NEG: A = -A; continue; case BPF_S_JMP_JA: - pc += fentry->k; + pc += f_k; continue; case BPF_S_JMP_JGT_K: - pc += (A > fentry->k) ? fentry->jt : fentry->jf; + pc += (A > f_k) ? fentry->jt : fentry->jf; continue; case BPF_S_JMP_JGE_K: - pc += (A >= fentry->k) ? fentry->jt : fentry->jf; + pc += (A >= f_k) ? fentry->jt : fentry->jf; continue; case BPF_S_JMP_JEQ_K: - pc += (A == fentry->k) ? fentry->jt : fentry->jf; + pc += (A == f_k) ? fentry->jt : fentry->jf; continue; case BPF_S_JMP_JSET_K: - pc += (A & fentry->k) ? fentry->jt : fentry->jf; + pc += (A & f_k) ? fentry->jt : fentry->jf; continue; case BPF_S_JMP_JGT_X: pc += (A > X) ? fentry->jt : fentry->jf; @@ -209,7 +211,7 @@ unsigned int sk_run_filter(struct sk_buff *skb, struct sock_filter *filter, int pc += (A & X) ? fentry->jt : fentry->jf; continue; case BPF_S_LD_W_ABS: - k = fentry->k; + k = f_k; load_w: ptr = load_pointer(skb, k, 4, &tmp); if (ptr != NULL) { @@ -218,7 +220,7 @@ load_w: } break; case BPF_S_LD_H_ABS: - k = fentry->k; + k = f_k; load_h: ptr = load_pointer(skb, k, 2, &tmp); if (ptr != NULL) { @@ -227,7 +229,7 @@ load_h: } break; case BPF_S_LD_B_ABS: - k = fentry->k; + k = f_k; load_b: ptr = load_pointer(skb, k, 1, &tmp); if (ptr != NULL) { @@ -242,32 +244,34 @@ load_b: X = skb->len; continue; case BPF_S_LD_W_IND: - k = X + fentry->k; + k = X + f_k; goto load_w; case BPF_S_LD_H_IND: - k = X + fentry->k; + k = X + f_k; goto load_h; case BPF_S_LD_B_IND: - k = X + fentry->k; + k = X + f_k; goto load_b; case BPF_S_LDX_B_MSH: - ptr = load_pointer(skb, fentry->k, 1, &tmp); + ptr = load_pointer(skb, f_k, 1, &tmp); if (ptr != NULL) { X = (*(u8 *)ptr & 0xf) << 2; continue; } return 0; case BPF_S_LD_IMM: - A = fentry->k; + A = f_k; continue; case BPF_S_LDX_IMM: - X = fentry->k; + X = f_k; continue; case BPF_S_LD_MEM: - A = mem[fentry->k]; + A = (memvalid & (1UL << f_k)) ? + mem[f_k] : 0; continue; case BPF_S_LDX_MEM: - X = mem[fentry->k]; + X = (memvalid & (1UL << f_k)) ? + mem[f_k] : 0; continue; case BPF_S_MISC_TAX: X = A; @@ -276,14 +280,16 @@ load_b: A = X; continue; case BPF_S_RET_K: - return fentry->k; + return f_k; case BPF_S_RET_A: return A; case BPF_S_ST: - mem[fentry->k] = A; + memvalid |= 1UL << f_k; + mem[f_k] = A; continue; case BPF_S_STX: - mem[fentry->k] = X; + memvalid |= 1UL << f_k; + mem[f_k] = X; continue; default: WARN_ON(1); -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 18:39 ` David Miller @ 2010-11-10 20:57 ` Ben Hutchings 2010-11-10 20:59 ` David Miller 0 siblings, 1 reply; 27+ messages in thread From: Ben Hutchings @ 2010-11-10 20:57 UTC (permalink / raw) To: David Miller; +Cc: penguin-kernel, eric.dumazet, netdev On Wed, 2010-11-10 at 10:39 -0800, David Miller wrote: [...] > In this patch, I use a bitmap (a single long var) so that only filters > using mem[] loads/stores pay the price of added security checks. > > For other filters, additional cost is a single instruction. > > [ Since we access fentry->k a lot now, cache it in a local variable > and mark filter entry pointer as const. -DaveM ] [...] I don't see the justification for combining these changes. One patch, one fix, right? Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 20:57 ` Ben Hutchings @ 2010-11-10 20:59 ` David Miller 2010-11-10 21:25 ` Ben Hutchings 0 siblings, 1 reply; 27+ messages in thread From: David Miller @ 2010-11-10 20:59 UTC (permalink / raw) To: bhutchings; +Cc: penguin-kernel, eric.dumazet, netdev From: Ben Hutchings <bhutchings@solarflare.com> Date: Wed, 10 Nov 2010 20:57:44 +0000 > On Wed, 2010-11-10 at 10:39 -0800, David Miller wrote: > [...] >> In this patch, I use a bitmap (a single long var) so that only filters >> using mem[] loads/stores pay the price of added security checks. >> >> For other filters, additional cost is a single instruction. >> >> [ Since we access fentry->k a lot now, cache it in a local variable >> and mark filter entry pointer as const. -DaveM ] > [...] > > I don't see the justification for combining these changes. One patch, > one fix, right? I'm minimizing the performance impact of the new bitmap checks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Prevent reading uninitialized memory with socketfilters 2010-11-10 20:59 ` David Miller @ 2010-11-10 21:25 ` Ben Hutchings 0 siblings, 0 replies; 27+ messages in thread From: Ben Hutchings @ 2010-11-10 21:25 UTC (permalink / raw) To: David Miller; +Cc: penguin-kernel, eric.dumazet, netdev On Wed, 2010-11-10 at 12:59 -0800, David Miller wrote: > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Wed, 10 Nov 2010 20:57:44 +0000 > > > On Wed, 2010-11-10 at 10:39 -0800, David Miller wrote: > > [...] > >> In this patch, I use a bitmap (a single long var) so that only filters > >> using mem[] loads/stores pay the price of added security checks. > >> > >> For other filters, additional cost is a single instruction. > >> > >> [ Since we access fentry->k a lot now, cache it in a local variable > >> and mark filter entry pointer as const. -DaveM ] > > [...] > > > > I don't see the justification for combining these changes. One patch, > > one fix, right? > > I'm minimizing the performance impact of the new bitmap checks. This seems like an entirely separate optimisation, since fentry->k was *already* being used all over the place. (And a smart compiler should optimise that anyway... though I realise gcc is often not that smart.) Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-11-18 18:58 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-10 18:18 [PATCH] Prevent reading uninitialized memory with socketfilters Dan Rosenberg 2010-11-10 18:21 ` David Miller 2010-11-10 18:33 ` Eric Dumazet 2010-11-10 18:38 ` David Miller 2010-11-16 13:08 ` [PATCH] filter: Optimize instruction revalidation code Tetsuo Handa 2010-11-16 13:11 ` Michael Tokarev 2010-11-16 13:44 ` Eric Dumazet 2010-11-16 14:31 ` [PATCH v2] " Tetsuo Handa 2010-11-16 16:30 ` Eric Dumazet 2010-11-17 1:19 ` [PATCH v3] " Tetsuo Handa 2010-11-17 7:48 ` Eric Dumazet 2010-11-17 7:54 ` Changli Gao 2010-11-17 8:18 ` Eric Dumazet 2010-11-17 8:06 ` Tetsuo Handa 2010-11-17 9:01 ` Hagen Paul Pfeifer 2010-11-18 18:58 ` David Miller 2010-11-16 22:13 ` [PATCH] " Hagen Paul Pfeifer 2010-11-16 23:31 ` Changli Gao 2010-11-16 23:45 ` Hagen Paul Pfeifer 2010-11-16 23:24 ` Changli Gao -- strict thread matches above, loose matches on Subject: below -- 2010-11-10 18:25 [PATCH] Prevent reading uninitialized memory with socketfilters Dan Rosenberg 2010-11-09 22:28 [PATCH] Prevent reading uninitialized memory with socket filters Dan Rosenberg 2010-11-10 5:28 ` David Miller 2010-11-10 5:53 ` Eric Dumazet 2010-11-10 7:22 ` Eric Dumazet 2010-11-10 14:25 ` [PATCH] Prevent reading uninitialized memory with socketfilters Tetsuo Handa 2010-11-10 18:32 ` David Miller 2010-11-10 18:39 ` David Miller 2010-11-10 20:57 ` Ben Hutchings 2010-11-10 20:59 ` David Miller 2010-11-10 21:25 ` Ben Hutchings
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).