* [RFC PATCH 1/7] build: Add '-lm' build flag
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 2/7] smdb.py: remove undocumented test command Andrew Murray
` (6 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
Adding this flag avoids the following link error:
libsparse.a(hashtable.o): In function `create_hashtable':
/home/andmur01/smatch/cwchash/hashtable.c:51: undefined reference to `ceilf'
libsparse.a(hashtable.o): In function `hashtable_expand':
/home/andmur01/smatch/cwchash/hashtable.c:124: undefined reference to `ceilf'
collect2: error: ld returned 1 exit status
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index dc37ae2fe566..4da1ecbbdbb5 100644
--- a/Makefile
+++ b/Makefile
@@ -252,7 +252,7 @@ SMATCH_SCRIPTS=smatch_scripts/add_gfp_to_allocations.sh \
smatch_scripts/trace_params.pl smatch_scripts/unlocked_paths.pl \
smatch_scripts/whitespace_only.sh smatch_scripts/wine_checker.sh \
-SMATCH_LDFLAGS := -lsqlite3 -lssl -lcrypto
+SMATCH_LDFLAGS := -lsqlite3 -lssl -lcrypto -lm
smatch: smatch.o $(SMATCH_FILES) $(SMATCH_CHECKS) $(LIBS)
$(QUIET_LINK)$(LD) -o $@ $< $(SMATCH_FILES) $(SMATCH_CHECKS) $(LIBS) $(SMATCH_LDFLAGS)
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 2/7] smdb.py: remove undocumented test command
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 1/7] build: Add '-lm' build flag Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 3/7] arm64: add check for comparison against tagged address Andrew Murray
` (5 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
The 'test' command, which has the same functionality as the call_info
command, is undocumented and isn't used, let's remove this.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
smatch_data/db/smdb.py | 4 ----
1 file changed, 4 deletions(-)
diff --git a/smatch_data/db/smdb.py b/smatch_data/db/smdb.py
index 9025a95070b5..80c2df59af01 100755
--- a/smatch_data/db/smdb.py
+++ b/smatch_data/db/smdb.py
@@ -677,9 +677,5 @@ elif sys.argv[1] == "constraint":
struct_type = sys.argv[2]
member = sys.argv[3]
constraint(struct_type, member)
-elif sys.argv[1] == "test":
- filename = sys.argv[2]
- func = sys.argv[3]
- caller_info_values(filename, func)
else:
usage()
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 1/7] build: Add '-lm' build flag Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 2/7] smdb.py: remove undocumented test command Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-07 15:49 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 4/7] smdb.py: add find_tagged and parse_warns_tagged commands Andrew Murray
` (4 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
The ARM64 ABI allows tagged memory addresses to be passed through the
user-kernel syscall ABI boundary. Tagged memory addresses are those which
contain a non-zero top byte - the hardware will always ignore this top
byte, however software does not.
This check will provide a warning when it detects that a comparison is
being made between user originated 64 bit data (where the top byte may
be non-zero) and any variable which may contain an untagged address.
Untagged variables are detected by looking for hard-coded known struct
members (such as vm_start, vm_end and addr_limit) and hard-coded known
macros (such as PAGE_SIZE, PAGE_MASK and TASK_SIZE). This check is
also able to detect when comparisons are made against variables that
have been assigned from these known untagged variables, though this
tracking is limited to the scope of the function.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
check_arm64_tagged.c | 255 +++++++++++++++++++++++++++++++++++++++++++
check_list.h | 2 +
2 files changed, 257 insertions(+)
create mode 100644 check_arm64_tagged.c
diff --git a/check_arm64_tagged.c b/check_arm64_tagged.c
new file mode 100644
index 000000000000..e126552e5964
--- /dev/null
+++ b/check_arm64_tagged.c
@@ -0,0 +1,255 @@
+/*
+ * Copyright (C) 2019 ARM.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see http://www.gnu.org/copyleft/gpl.txt
+ */
+
+#include "smatch.h"
+#include "smatch_extra.h"
+#include "smatch_function_hashtable.h"
+
+static bool expr_has_memory_addr(struct expression *expr);
+
+static DEFINE_HASHTABLE_SEARCH(search_symbol, char, char);
+static DEFINE_HASHTABLE_INSERT(insert_symbol, char, char);
+static struct hashtable *symbols;
+
+static void match_assign(struct expression *expr)
+{
+ char *left_name;
+ struct symbol *left_sym;
+
+ left_name = expr_to_var_sym(expr->left, &left_sym);
+ if (!left_name || !left_sym)
+ return;
+
+ /*
+ * Once we have spotted a symbol of interest (one that may hold
+ * an untagged memory address), we keep track of any assignments
+ * made, such that we can also treat the assigned symbol as something
+ * of interest. This tracking is limited in scope to the function.
+ */
+ if (expr_has_memory_addr(expr->right))
+ insert_symbol(symbols, left_name, left_name);
+}
+
+static void match_endfunc(struct symbol *sym)
+{
+ destroy_function_hashtable(symbols);
+ symbols = create_function_hashtable(4000);
+}
+
+static bool expr_has_untagged_symbol(struct expression *expr)
+{
+ char *name;
+ struct symbol *sym;
+
+ if (expr->type != EXPR_SYMBOL)
+ return false;
+
+ name = expr_to_var_sym(expr, &sym);
+ if (!name || !sym)
+ return false;
+
+ /* See if this is something we already know is of interest */
+ if (search_symbol(symbols, name))
+ return true;
+
+ return false;
+}
+
+static bool expr_has_untagged_member(struct expression *expr)
+{
+ if (expr->type != EXPR_DEREF)
+ return false;
+
+ if (!strcmp(expr->member->name, "vm_start") ||
+ !strcmp(expr->member->name, "vm_end") ||
+ !strcmp(expr->member->name, "addr_limit"))
+ return true;
+
+ return false;
+}
+
+static bool expr_has_macro_with_name(struct expression *expr, const char *macro_name)
+{
+ char *name;
+
+ name = get_macro_name(expr->pos);
+ return (name && !strcmp(name, macro_name));
+}
+
+static bool expr_has_untagged_macro(struct expression *expr)
+{
+ if (expr_has_macro_with_name(expr, "PAGE_SIZE") ||
+ expr_has_macro_with_name(expr, "PAGE_MASK") ||
+ expr_has_macro_with_name(expr, "TASK_SIZE"))
+ return true;
+
+ /**
+ * We can't detect a marco (such as PAGE_MASK) inside another macro
+ * such as offset_in_page, therefore we have to detect the outer macro
+ * instead.
+ */
+ if (expr_has_macro_with_name(expr, "offset_in_page"))
+ return true;
+
+ return false;
+}
+
+/*
+ * Identify expressions that contain memory addresses, in the future
+ * we may use annotations on symbols or function parameters.
+ */
+static bool expr_has_memory_addr(struct expression *expr)
+{
+ if (expr->type == EXPR_PREOP || expr->type == EXPR_POSTOP)
+ expr = strip_expr(expr->unop);
+
+ if (expr_has_untagged_member(expr))
+ return true;
+
+ if (expr_has_untagged_macro(expr))
+ return true;
+
+ if (expr_has_untagged_symbol(expr))
+ return true;
+
+ return false;
+}
+
+int rl_is_larger_or_equal(struct range_list *rl, sval_t sval)
+{
+ struct data_range *tmp;
+
+ FOR_EACH_PTR(rl, tmp) {
+ if (sval_cmp(tmp->max, sval) >= 0)
+ return 1;
+ } END_FOR_EACH_PTR(tmp);
+ return 0;
+}
+
+int rl_range_has_min_value(struct range_list *rl, sval_t sval)
+{
+ struct data_range *tmp;
+
+ FOR_EACH_PTR(rl, tmp) {
+ if (!sval_cmp(tmp->min, sval)) {
+ return 1;
+ }
+ } END_FOR_EACH_PTR(tmp);
+ return 0;
+}
+
+static bool rl_is_tagged(struct range_list *rl)
+{
+ sval_t invalid = { .type = &ullong_ctype, .value = (1ULL << 56) };
+ sval_t invalid_kernel = { .type = &ullong_ctype, .value = (0xff8ULL << 52) };
+
+ /*
+ * We only care for tagged addresses, thus ignore anything where the
+ * ranges of potential values cannot possibly have any of the top byte
+ * bits set.
+ */
+ if (!rl_is_larger_or_equal(rl, invalid))
+ return false;
+
+ /*
+ * Tagged addresses are untagged in the kernel by using sign_extend64 in
+ * the untagged_addr macro. For userspace addresses bit 55 will always
+ * be 0 and thus this has the effect of clearing the top byte. However
+ * for kernel addresses this is not true and the top bits end up set to
+ * all 1s. The untagged_addr macro results in leaving a gap in the range
+ * of possible values which can exist, thus let's look for a tell-tale
+ * range which starts from (0xff8ULL << 52).
+ */
+ if (rl_range_has_min_value(rl, invalid_kernel))
+ return false;
+
+ return true;
+}
+
+static void match_condition(struct expression *expr)
+{
+ struct range_list *rl = NULL;
+ struct expression *val = NULL;
+ struct symbol *type;
+ char *var_name;
+
+ /*
+ * Match instances where something is compared against something
+ * else - we include binary operators as these are commonly used
+ * to make a comparison, e.g. if (start & ~PAGE_MASK).
+ */
+ if (expr->type != EXPR_COMPARE &&
+ expr->type != EXPR_BINOP)
+ return;
+
+ /*
+ * Look on both sides of the comparison for something that shouldn't
+ * be compared with a tagged address, e.g. macros such as PAGE_MASK
+ * or struct members named .vm_start.
+ */
+ if (expr_has_memory_addr(expr->left))
+ val = expr->right;
+
+ /*
+ * The macro 'offset_in_page' has the PAGE_MASK macro inside it, this
+ * results in 'expr_has_memory_addr' returning true for both sides. To
+ * work around this we assume PAGE_MASK (or similar) is on the right
+ * side, thus we do the following test last.
+ */
+ if (expr_has_memory_addr(expr->right))
+ val = expr->left;
+
+ if (!val)
+ return;
+
+ /* We only care about memory addresses which are 64 bits */
+ type = get_type(val);
+ if (!type || type_bits(type) != 64)
+ return;
+
+ /* We only care for comparison against user originated data */
+ if (!get_user_rl(val, &rl))
+ return;
+
+ /* We only care for tagged addresses */
+ if (!rl_is_tagged(rl))
+ return;
+
+ /* Finally, we believe we may have spotted a risky comparison */
+ var_name = expr_to_var(val);
+ if (var_name)
+ sm_warning("comparison of a potentially tagged address (%s, %d, %s)", get_function(), get_param_num(val), var_name);
+}
+
+void check_arm64_tagged(int id)
+{
+ char *arch;
+
+ if (option_project != PROJ_KERNEL)
+ return;
+
+ /* Limit to aarch64 */
+ arch = getenv("ARCH");
+ if (!arch || strcmp(arch, "arm64"))
+ return;
+
+ symbols = create_function_hashtable(4000);
+
+ add_hook(&match_assign, ASSIGNMENT_HOOK);
+ add_hook(&match_condition, CONDITION_HOOK);
+ add_hook(&match_endfunc, END_FUNC_HOOK);
+}
diff --git a/check_list.h b/check_list.h
index e085cea4317d..4c298286ea9a 100644
--- a/check_list.h
+++ b/check_list.h
@@ -195,6 +195,8 @@ CK(check_implicit_dependencies)
CK(check_wine_filehandles)
CK(check_wine_WtoA)
+CK(check_arm64_tagged)
+
#include "check_list_local.h"
CK(register_scope)
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-10-07 15:35 ` [RFC PATCH 3/7] arm64: add check for comparison against tagged address Andrew Murray
@ 2019-10-07 15:49 ` Andrew Murray
2019-12-04 15:41 ` Andrew Murray
2019-12-05 13:27 ` Dan Carpenter
0 siblings, 2 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:49 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
On Mon, Oct 07, 2019 at 04:35:41PM +0100, Andrew Murray wrote:
> The ARM64 ABI allows tagged memory addresses to be passed through the
> user-kernel syscall ABI boundary. Tagged memory addresses are those which
> contain a non-zero top byte - the hardware will always ignore this top
> byte, however software does not.
>
> This check will provide a warning when it detects that a comparison is
> being made between user originated 64 bit data (where the top byte may
> be non-zero) and any variable which may contain an untagged address.
>
> Untagged variables are detected by looking for hard-coded known struct
> members (such as vm_start, vm_end and addr_limit) and hard-coded known
> macros (such as PAGE_SIZE, PAGE_MASK and TASK_SIZE). This check is
> also able to detect when comparisons are made against variables that
> have been assigned from these known untagged variables, though this
> tracking is limited to the scope of the function.
>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
> check_arm64_tagged.c | 255 +++++++++++++++++++++++++++++++++++++++++++
> check_list.h | 2 +
> 2 files changed, 257 insertions(+)
> create mode 100644 check_arm64_tagged.c
>
> diff --git a/check_arm64_tagged.c b/check_arm64_tagged.c
> new file mode 100644
> index 000000000000..e126552e5964
> --- /dev/null
> +++ b/check_arm64_tagged.c
> @@ -0,0 +1,255 @@
> +/*
> + * Copyright (C) 2019 ARM.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see http://www.gnu.org/copyleft/gpl.txt
> + */
> +
> +#include "smatch.h"
> +#include "smatch_extra.h"
> +#include "smatch_function_hashtable.h"
> +
> +static bool expr_has_memory_addr(struct expression *expr);
> +
> +static DEFINE_HASHTABLE_SEARCH(search_symbol, char, char);
> +static DEFINE_HASHTABLE_INSERT(insert_symbol, char, char);
> +static struct hashtable *symbols;
> +
> +static void match_assign(struct expression *expr)
> +{
> + char *left_name;
> + struct symbol *left_sym;
> +
> + left_name = expr_to_var_sym(expr->left, &left_sym);
> + if (!left_name || !left_sym)
> + return;
> +
> + /*
> + * Once we have spotted a symbol of interest (one that may hold
> + * an untagged memory address), we keep track of any assignments
> + * made, such that we can also treat the assigned symbol as something
> + * of interest. This tracking is limited in scope to the function.
> + */
> + if (expr_has_memory_addr(expr->right))
> + insert_symbol(symbols, left_name, left_name);
> +}
The purpose of this is to work around issues presented in this type of code:
unsigned long start = vma->vm_start;
if (start & ~PAGE_MASK)
;
Smatch will detect that vm_start is an untagged address, however unless it can
learn that start is also an untagged address then it won't pick up the potential
bug with line 2 where we compare a tagged address with an untagged address. The
current mplementation will track these simple assignments but the scope is
limited only to the current function.
A potential solution is to give Smatch a flag for each symbol that indicates if a
variable is untagged, tagged, or unknown. This flag should be propagated across
assignments (and similar) and stored in the database (to allow for knowledge of
the tagged flag across function calls). This may be similar to the existing
support in Smatch for USER_DATA. Does this seem a good fit?
> +
> +static void match_endfunc(struct symbol *sym)
> +{
> + destroy_function_hashtable(symbols);
> + symbols = create_function_hashtable(4000);
> +}
> +
> +static bool expr_has_untagged_symbol(struct expression *expr)
> +{
> + char *name;
> + struct symbol *sym;
> +
> + if (expr->type != EXPR_SYMBOL)
> + return false;
> +
> + name = expr_to_var_sym(expr, &sym);
> + if (!name || !sym)
> + return false;
> +
> + /* See if this is something we already know is of interest */
> + if (search_symbol(symbols, name))
> + return true;
> +
> + return false;
> +}
> +
> +static bool expr_has_untagged_member(struct expression *expr)
> +{
> + if (expr->type != EXPR_DEREF)
> + return false;
> +
> + if (!strcmp(expr->member->name, "vm_start") ||
> + !strcmp(expr->member->name, "vm_end") ||
> + !strcmp(expr->member->name, "addr_limit"))
> + return true;
> +
> + return false;
> +}
> +
> +static bool expr_has_macro_with_name(struct expression *expr, const char *macro_name)
> +{
> + char *name;
> +
> + name = get_macro_name(expr->pos);
> + return (name && !strcmp(name, macro_name));
> +}
> +
> +static bool expr_has_untagged_macro(struct expression *expr)
> +{
> + if (expr_has_macro_with_name(expr, "PAGE_SIZE") ||
> + expr_has_macro_with_name(expr, "PAGE_MASK") ||
> + expr_has_macro_with_name(expr, "TASK_SIZE"))
> + return true;
> +
> + /**
> + * We can't detect a marco (such as PAGE_MASK) inside another macro
> + * such as offset_in_page, therefore we have to detect the outer macro
> + * instead.
> + */
> + if (expr_has_macro_with_name(expr, "offset_in_page"))
> + return true;
> +
> + return false;
> +}
> +
> +/*
> + * Identify expressions that contain memory addresses, in the future
> + * we may use annotations on symbols or function parameters.
> + */
> +static bool expr_has_memory_addr(struct expression *expr)
> +{
> + if (expr->type == EXPR_PREOP || expr->type == EXPR_POSTOP)
> + expr = strip_expr(expr->unop);
> +
> + if (expr_has_untagged_member(expr))
> + return true;
> +
> + if (expr_has_untagged_macro(expr))
> + return true;
> +
> + if (expr_has_untagged_symbol(expr))
> + return true;
We hardcode symbols of interest that we consider to be untagged addresses. This
provides good coverage but isn't very flexible. A better approach would be to
annotate the kernel with address space tags, such as is the case for __user,
__percpu, etc. Thus variables, struct members and function parameters could be
annotated to indicate that they contain untagged addresses. Unfortunately:
- At present it's not possible to determine a struct member's address space
from Smatch
- It's not clear how you would annotate a macro such as PAGE_MASK (Smatch
already has difficulty spotting a macro inside a macro, e.g. PAGE_MASK
inside offset_in_page).
Is there a solution for this?
> +
> + return false;
> +}
> +
> +int rl_is_larger_or_equal(struct range_list *rl, sval_t sval)
> +{
> + struct data_range *tmp;
> +
> + FOR_EACH_PTR(rl, tmp) {
> + if (sval_cmp(tmp->max, sval) >= 0)
> + return 1;
> + } END_FOR_EACH_PTR(tmp);
> + return 0;
> +}
> +
> +int rl_range_has_min_value(struct range_list *rl, sval_t sval)
> +{
> + struct data_range *tmp;
> +
> + FOR_EACH_PTR(rl, tmp) {
> + if (!sval_cmp(tmp->min, sval)) {
> + return 1;
> + }
> + } END_FOR_EACH_PTR(tmp);
> + return 0;
> +}
> +
> +static bool rl_is_tagged(struct range_list *rl)
> +{
> + sval_t invalid = { .type = &ullong_ctype, .value = (1ULL << 56) };
> + sval_t invalid_kernel = { .type = &ullong_ctype, .value = (0xff8ULL << 52) };
> +
> + /*
> + * We only care for tagged addresses, thus ignore anything where the
> + * ranges of potential values cannot possibly have any of the top byte
> + * bits set.
> + */
> + if (!rl_is_larger_or_equal(rl, invalid))
> + return false;
> +
> + /*
> + * Tagged addresses are untagged in the kernel by using sign_extend64 in
> + * the untagged_addr macro. For userspace addresses bit 55 will always
> + * be 0 and thus this has the effect of clearing the top byte. However
> + * for kernel addresses this is not true and the top bits end up set to
> + * all 1s. The untagged_addr macro results in leaving a gap in the range
> + * of possible values which can exist, thus let's look for a tell-tale
> + * range which starts from (0xff8ULL << 52).
> + */
Addresses have their tags removed with the untagged_addr macro - this uses
sign_extend64. For userspace addresses, which have bit 55 always set to 0, this
macro has the effect of clearing the top byte. This isn't true for kernel
addresses. Smatch is able to understand the untagged_addr macro and as a result
adjusts the range of values a given variable that passes through it may have.
This makes it difficult to detect untagged addresses. We would typically detect
an untagged address if all the following conditions are true:
- The value comes from userspace (known by Smatch's USER_DATA tracking)
- The value's size is 64 bits
- The value isn't above (1ULL << 55)
However as Smatch doesn't understand that user addresses likely won't have bit
55 unset, Smatch knows that the range of values can include values greater than
(1ULL << 55). This is currently worked around by spotting a gap in the list of
ranges which is produced due to the untagged_addr macro. This isn't ideal but it
works.
An alternative approach is to detect functions that call untagged_addr and then
mark the symbol in the database as untagged - but this relies on parameter
tracking.
Thanks,
Andrew Murray
> + if (rl_range_has_min_value(rl, invalid_kernel))
> + return false;
> +
> + return true;
> +}
> +
> +static void match_condition(struct expression *expr)
> +{
> + struct range_list *rl = NULL;
> + struct expression *val = NULL;
> + struct symbol *type;
> + char *var_name;
> +
> + /*
> + * Match instances where something is compared against something
> + * else - we include binary operators as these are commonly used
> + * to make a comparison, e.g. if (start & ~PAGE_MASK).
> + */
> + if (expr->type != EXPR_COMPARE &&
> + expr->type != EXPR_BINOP)
> + return;
> +
> + /*
> + * Look on both sides of the comparison for something that shouldn't
> + * be compared with a tagged address, e.g. macros such as PAGE_MASK
> + * or struct members named .vm_start.
> + */
> + if (expr_has_memory_addr(expr->left))
> + val = expr->right;
> +
> + /*
> + * The macro 'offset_in_page' has the PAGE_MASK macro inside it, this
> + * results in 'expr_has_memory_addr' returning true for both sides. To
> + * work around this we assume PAGE_MASK (or similar) is on the right
> + * side, thus we do the following test last.
> + */
> + if (expr_has_memory_addr(expr->right))
> + val = expr->left;
> +
> + if (!val)
> + return;
> +
> + /* We only care about memory addresses which are 64 bits */
> + type = get_type(val);
> + if (!type || type_bits(type) != 64)
> + return;
> +
> + /* We only care for comparison against user originated data */
> + if (!get_user_rl(val, &rl))
> + return;
> +
> + /* We only care for tagged addresses */
> + if (!rl_is_tagged(rl))
> + return;
> +
> + /* Finally, we believe we may have spotted a risky comparison */
> + var_name = expr_to_var(val);
> + if (var_name)
> + sm_warning("comparison of a potentially tagged address (%s, %d, %s)", get_function(), get_param_num(val), var_name);
> +}
> +
> +void check_arm64_tagged(int id)
> +{
> + char *arch;
> +
> + if (option_project != PROJ_KERNEL)
> + return;
> +
> + /* Limit to aarch64 */
> + arch = getenv("ARCH");
> + if (!arch || strcmp(arch, "arm64"))
> + return;
> +
> + symbols = create_function_hashtable(4000);
> +
> + add_hook(&match_assign, ASSIGNMENT_HOOK);
> + add_hook(&match_condition, CONDITION_HOOK);
> + add_hook(&match_endfunc, END_FUNC_HOOK);
> +}
> diff --git a/check_list.h b/check_list.h
> index e085cea4317d..4c298286ea9a 100644
> --- a/check_list.h
> +++ b/check_list.h
> @@ -195,6 +195,8 @@ CK(check_implicit_dependencies)
> CK(check_wine_filehandles)
> CK(check_wine_WtoA)
>
> +CK(check_arm64_tagged)
> +
> #include "check_list_local.h"
>
> CK(register_scope)
> --
> 2.21.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-10-07 15:49 ` Andrew Murray
@ 2019-12-04 15:41 ` Andrew Murray
2019-12-05 13:27 ` Dan Carpenter
1 sibling, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-12-04 15:41 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch, Lorenzo.Pieralisi
On Mon, Oct 07, 2019 at 04:49:24PM +0100, Andrew Murray wrote:
> On Mon, Oct 07, 2019 at 04:35:41PM +0100, Andrew Murray wrote:
> > The ARM64 ABI allows tagged memory addresses to be passed through the
> > user-kernel syscall ABI boundary. Tagged memory addresses are those which
> > contain a non-zero top byte - the hardware will always ignore this top
> > byte, however software does not.
> >
> > This check will provide a warning when it detects that a comparison is
> > being made between user originated 64 bit data (where the top byte may
> > be non-zero) and any variable which may contain an untagged address.
> >
> > Untagged variables are detected by looking for hard-coded known struct
> > members (such as vm_start, vm_end and addr_limit) and hard-coded known
> > macros (such as PAGE_SIZE, PAGE_MASK and TASK_SIZE). This check is
> > also able to detect when comparisons are made against variables that
> > have been assigned from these known untagged variables, though this
> > tracking is limited to the scope of the function.
> >
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> > check_arm64_tagged.c | 255 +++++++++++++++++++++++++++++++++++++++++++
> > check_list.h | 2 +
> > 2 files changed, 257 insertions(+)
> > create mode 100644 check_arm64_tagged.c
> >
> > diff --git a/check_arm64_tagged.c b/check_arm64_tagged.c
> > new file mode 100644
> > index 000000000000..e126552e5964
> > --- /dev/null
> > +++ b/check_arm64_tagged.c
> > @@ -0,0 +1,255 @@
> > +/*
> > + * Copyright (C) 2019 ARM.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see http://www.gnu.org/copyleft/gpl.txt
> > + */
> > +
> > +#include "smatch.h"
> > +#include "smatch_extra.h"
> > +#include "smatch_function_hashtable.h"
> > +
> > +static bool expr_has_memory_addr(struct expression *expr);
> > +
> > +static DEFINE_HASHTABLE_SEARCH(search_symbol, char, char);
> > +static DEFINE_HASHTABLE_INSERT(insert_symbol, char, char);
> > +static struct hashtable *symbols;
> > +
> > +static void match_assign(struct expression *expr)
> > +{
> > + char *left_name;
> > + struct symbol *left_sym;
> > +
> > + left_name = expr_to_var_sym(expr->left, &left_sym);
> > + if (!left_name || !left_sym)
> > + return;
> > +
> > + /*
> > + * Once we have spotted a symbol of interest (one that may hold
> > + * an untagged memory address), we keep track of any assignments
> > + * made, such that we can also treat the assigned symbol as something
> > + * of interest. This tracking is limited in scope to the function.
> > + */
> > + if (expr_has_memory_addr(expr->right))
> > + insert_symbol(symbols, left_name, left_name);
> > +}
>
> The purpose of this is to work around issues presented in this type of code:
>
> unsigned long start = vma->vm_start;
> if (start & ~PAGE_MASK)
> ;
>
> Smatch will detect that vm_start is an untagged address, however unless it can
> learn that start is also an untagged address then it won't pick up the potential
> bug with line 2 where we compare a tagged address with an untagged address. The
> current mplementation will track these simple assignments but the scope is
> limited only to the current function.
>
> A potential solution is to give Smatch a flag for each symbol that indicates if a
> variable is untagged, tagged, or unknown. This flag should be propagated across
> assignments (and similar) and stored in the database (to allow for knowledge of
> the tagged flag across function calls). This may be similar to the existing
> support in Smatch for USER_DATA. Does this seem a good fit?
Hi Dan,
Just wondering if you have any feedback on this and the other patches?
I'm particularly keen to see if the overall approach I've taken is the most
appropriate - or if there is a better way of doing this.
Thanks,
Andrew Murray
>
> > +
> > +static void match_endfunc(struct symbol *sym)
> > +{
> > + destroy_function_hashtable(symbols);
> > + symbols = create_function_hashtable(4000);
> > +}
> > +
> > +static bool expr_has_untagged_symbol(struct expression *expr)
> > +{
> > + char *name;
> > + struct symbol *sym;
> > +
> > + if (expr->type != EXPR_SYMBOL)
> > + return false;
> > +
> > + name = expr_to_var_sym(expr, &sym);
> > + if (!name || !sym)
> > + return false;
> > +
> > + /* See if this is something we already know is of interest */
> > + if (search_symbol(symbols, name))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static bool expr_has_untagged_member(struct expression *expr)
> > +{
> > + if (expr->type != EXPR_DEREF)
> > + return false;
> > +
> > + if (!strcmp(expr->member->name, "vm_start") ||
> > + !strcmp(expr->member->name, "vm_end") ||
> > + !strcmp(expr->member->name, "addr_limit"))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static bool expr_has_macro_with_name(struct expression *expr, const char *macro_name)
> > +{
> > + char *name;
> > +
> > + name = get_macro_name(expr->pos);
> > + return (name && !strcmp(name, macro_name));
> > +}
> > +
> > +static bool expr_has_untagged_macro(struct expression *expr)
> > +{
> > + if (expr_has_macro_with_name(expr, "PAGE_SIZE") ||
> > + expr_has_macro_with_name(expr, "PAGE_MASK") ||
> > + expr_has_macro_with_name(expr, "TASK_SIZE"))
> > + return true;
> > +
> > + /**
> > + * We can't detect a marco (such as PAGE_MASK) inside another macro
> > + * such as offset_in_page, therefore we have to detect the outer macro
> > + * instead.
> > + */
> > + if (expr_has_macro_with_name(expr, "offset_in_page"))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +/*
> > + * Identify expressions that contain memory addresses, in the future
> > + * we may use annotations on symbols or function parameters.
> > + */
> > +static bool expr_has_memory_addr(struct expression *expr)
> > +{
> > + if (expr->type == EXPR_PREOP || expr->type == EXPR_POSTOP)
> > + expr = strip_expr(expr->unop);
> > +
> > + if (expr_has_untagged_member(expr))
> > + return true;
> > +
> > + if (expr_has_untagged_macro(expr))
> > + return true;
> > +
> > + if (expr_has_untagged_symbol(expr))
> > + return true;
>
> We hardcode symbols of interest that we consider to be untagged addresses. This
> provides good coverage but isn't very flexible. A better approach would be to
> annotate the kernel with address space tags, such as is the case for __user,
> __percpu, etc. Thus variables, struct members and function parameters could be
> annotated to indicate that they contain untagged addresses. Unfortunately:
>
> - At present it's not possible to determine a struct member's address space
> from Smatch
> - It's not clear how you would annotate a macro such as PAGE_MASK (Smatch
> already has difficulty spotting a macro inside a macro, e.g. PAGE_MASK
> inside offset_in_page).
>
> Is there a solution for this?
>
> > +
> > + return false;
> > +}
> > +
> > +int rl_is_larger_or_equal(struct range_list *rl, sval_t sval)
> > +{
> > + struct data_range *tmp;
> > +
> > + FOR_EACH_PTR(rl, tmp) {
> > + if (sval_cmp(tmp->max, sval) >= 0)
> > + return 1;
> > + } END_FOR_EACH_PTR(tmp);
> > + return 0;
> > +}
> > +
> > +int rl_range_has_min_value(struct range_list *rl, sval_t sval)
> > +{
> > + struct data_range *tmp;
> > +
> > + FOR_EACH_PTR(rl, tmp) {
> > + if (!sval_cmp(tmp->min, sval)) {
> > + return 1;
> > + }
> > + } END_FOR_EACH_PTR(tmp);
> > + return 0;
> > +}
> > +
> > +static bool rl_is_tagged(struct range_list *rl)
> > +{
> > + sval_t invalid = { .type = &ullong_ctype, .value = (1ULL << 56) };
> > + sval_t invalid_kernel = { .type = &ullong_ctype, .value = (0xff8ULL << 52) };
> > +
> > + /*
> > + * We only care for tagged addresses, thus ignore anything where the
> > + * ranges of potential values cannot possibly have any of the top byte
> > + * bits set.
> > + */
> > + if (!rl_is_larger_or_equal(rl, invalid))
> > + return false;
> > +
> > + /*
> > + * Tagged addresses are untagged in the kernel by using sign_extend64 in
> > + * the untagged_addr macro. For userspace addresses bit 55 will always
> > + * be 0 and thus this has the effect of clearing the top byte. However
> > + * for kernel addresses this is not true and the top bits end up set to
> > + * all 1s. The untagged_addr macro results in leaving a gap in the range
> > + * of possible values which can exist, thus let's look for a tell-tale
> > + * range which starts from (0xff8ULL << 52).
> > + */
>
> Addresses have their tags removed with the untagged_addr macro - this uses
> sign_extend64. For userspace addresses, which have bit 55 always set to 0, this
> macro has the effect of clearing the top byte. This isn't true for kernel
> addresses. Smatch is able to understand the untagged_addr macro and as a result
> adjusts the range of values a given variable that passes through it may have.
> This makes it difficult to detect untagged addresses. We would typically detect
> an untagged address if all the following conditions are true:
>
> - The value comes from userspace (known by Smatch's USER_DATA tracking)
> - The value's size is 64 bits
> - The value isn't above (1ULL << 55)
>
> However as Smatch doesn't understand that user addresses likely won't have bit
> 55 unset, Smatch knows that the range of values can include values greater than
> (1ULL << 55). This is currently worked around by spotting a gap in the list of
> ranges which is produced due to the untagged_addr macro. This isn't ideal but it
> works.
>
> An alternative approach is to detect functions that call untagged_addr and then
> mark the symbol in the database as untagged - but this relies on parameter
> tracking.
>
> Thanks,
>
> Andrew Murray
>
> > + if (rl_range_has_min_value(rl, invalid_kernel))
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +static void match_condition(struct expression *expr)
> > +{
> > + struct range_list *rl = NULL;
> > + struct expression *val = NULL;
> > + struct symbol *type;
> > + char *var_name;
> > +
> > + /*
> > + * Match instances where something is compared against something
> > + * else - we include binary operators as these are commonly used
> > + * to make a comparison, e.g. if (start & ~PAGE_MASK).
> > + */
> > + if (expr->type != EXPR_COMPARE &&
> > + expr->type != EXPR_BINOP)
> > + return;
> > +
> > + /*
> > + * Look on both sides of the comparison for something that shouldn't
> > + * be compared with a tagged address, e.g. macros such as PAGE_MASK
> > + * or struct members named .vm_start.
> > + */
> > + if (expr_has_memory_addr(expr->left))
> > + val = expr->right;
> > +
> > + /*
> > + * The macro 'offset_in_page' has the PAGE_MASK macro inside it, this
> > + * results in 'expr_has_memory_addr' returning true for both sides. To
> > + * work around this we assume PAGE_MASK (or similar) is on the right
> > + * side, thus we do the following test last.
> > + */
> > + if (expr_has_memory_addr(expr->right))
> > + val = expr->left;
> > +
> > + if (!val)
> > + return;
> > +
> > + /* We only care about memory addresses which are 64 bits */
> > + type = get_type(val);
> > + if (!type || type_bits(type) != 64)
> > + return;
> > +
> > + /* We only care for comparison against user originated data */
> > + if (!get_user_rl(val, &rl))
> > + return;
> > +
> > + /* We only care for tagged addresses */
> > + if (!rl_is_tagged(rl))
> > + return;
> > +
> > + /* Finally, we believe we may have spotted a risky comparison */
> > + var_name = expr_to_var(val);
> > + if (var_name)
> > + sm_warning("comparison of a potentially tagged address (%s, %d, %s)", get_function(), get_param_num(val), var_name);
> > +}
> > +
> > +void check_arm64_tagged(int id)
> > +{
> > + char *arch;
> > +
> > + if (option_project != PROJ_KERNEL)
> > + return;
> > +
> > + /* Limit to aarch64 */
> > + arch = getenv("ARCH");
> > + if (!arch || strcmp(arch, "arm64"))
> > + return;
> > +
> > + symbols = create_function_hashtable(4000);
> > +
> > + add_hook(&match_assign, ASSIGNMENT_HOOK);
> > + add_hook(&match_condition, CONDITION_HOOK);
> > + add_hook(&match_endfunc, END_FUNC_HOOK);
> > +}
> > diff --git a/check_list.h b/check_list.h
> > index e085cea4317d..4c298286ea9a 100644
> > --- a/check_list.h
> > +++ b/check_list.h
> > @@ -195,6 +195,8 @@ CK(check_implicit_dependencies)
> > CK(check_wine_filehandles)
> > CK(check_wine_WtoA)
> >
> > +CK(check_arm64_tagged)
> > +
> > #include "check_list_local.h"
> >
> > CK(register_scope)
> > --
> > 2.21.0
> >
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-10-07 15:49 ` Andrew Murray
2019-12-04 15:41 ` Andrew Murray
@ 2019-12-05 13:27 ` Dan Carpenter
2019-12-05 14:28 ` Dan Carpenter
1 sibling, 1 reply; 21+ messages in thread
From: Dan Carpenter @ 2019-12-05 13:27 UTC (permalink / raw)
To: Andrew Murray; +Cc: Catalin.Marinas, smatch
I'm sorry for losing track of this. I actually applied all the patches
when you sent them. They still cause that (harmless) compile warning
which I mentioned so I think about you when ever I do a build. :P But
otherwise I had completely erased this from my mind.
On Mon, Oct 07, 2019 at 04:49:24PM +0100, Andrew Murray wrote:
> > +static void match_assign(struct expression *expr)
> > +{
> > + char *left_name;
> > + struct symbol *left_sym;
> > +
> > + left_name = expr_to_var_sym(expr->left, &left_sym);
> > + if (!left_name || !left_sym)
> > + return;
> > +
> > + /*
> > + * Once we have spotted a symbol of interest (one that may hold
> > + * an untagged memory address), we keep track of any assignments
> > + * made, such that we can also treat the assigned symbol as something
> > + * of interest. This tracking is limited in scope to the function.
> > + */
> > + if (expr_has_memory_addr(expr->right))
> > + insert_symbol(symbols, left_name, left_name);
> > +}
>
> The purpose of this is to work around issues presented in this type of code:
>
> unsigned long start = vma->vm_start;
> if (start & ~PAGE_MASK)
> ;
>
> Smatch will detect that vm_start is an untagged address, however unless it can
> learn that start is also an untagged address then it won't pick up the potential
> bug with line 2 where we compare a tagged address with an untagged address. The
> current mplementation will track these simple assignments but the scope is
> limited only to the current function.
>
> A potential solution is to give Smatch a flag for each symbol that indicates if a
> variable is untagged, tagged, or unknown. This flag should be propagated across
> assignments (and similar) and stored in the database (to allow for knowledge of
> the tagged flag across function calls). This may be similar to the existing
> support in Smatch for USER_DATA. Does this seem a good fit?
>
First of all, if it works for you and you send me a patch I'm likely
going to apply it regardless of my personal feelings about the issue.
You're absolutely welcome to add any information you want to the
database. Right now my DB is around 20GB but if it's 30GB that's also
fine.
I have the memory of a gnat. Why did we not use smatch_bits.c for this?
One problem is that smatch_bits.c has no users and has never been
tested. In theory what that gives you is:
103 struct bit_info {
104 unsigned long long set;
105 unsigned long long possible;
106 };
The ->set bits are definitely set and the ->possible bits are possibly
set. It's quite a lot of work to do it properly across functions because
you need to mirror smatch_param_set.c, smatch_param_limit.c and
smatch_param_filter.c but for bits. Then I'll hook it into
smatch_implied.c as well. Right now it just passes the information to
the called function.
> > +/*
> > + * Identify expressions that contain memory addresses, in the future
> > + * we may use annotations on symbols or function parameters.
> > + */
> > +static bool expr_has_memory_addr(struct expression *expr)
> > +{
> > + if (expr->type == EXPR_PREOP || expr->type == EXPR_POSTOP)
> > + expr = strip_expr(expr->unop);
> > +
> > + if (expr_has_untagged_member(expr))
> > + return true;
> > +
> > + if (expr_has_untagged_macro(expr))
> > + return true;
> > +
> > + if (expr_has_untagged_symbol(expr))
> > + return true;
>
> We hardcode symbols of interest that we consider to be untagged addresses. This
> provides good coverage but isn't very flexible. A better approach would be to
> annotate the kernel with address space tags, such as is the case for __user,
> __percpu, etc. Thus variables, struct members and function parameters could be
> annotated to indicate that they contain untagged addresses. Unfortunately:
>
> - At present it's not possible to determine a struct member's address space
> from Smatch
I'm not sure how to get the address space for anything not just struct
members. :( I will investigate.
> - It's not clear how you would annotate a macro such as PAGE_MASK (Smatch
> already has difficulty spotting a macro inside a macro, e.g. PAGE_MASK
> inside offset_in_page).
>
> Is there a solution for this?
>
I did some work exposing nested macros for this, right? I was so sure
I sent an email about that... :/ get_all_macros() and
get_inner_macro() for use in expr_has_untagged_macro(). Apparently I
didn't send that email.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-12-05 13:27 ` Dan Carpenter
@ 2019-12-05 14:28 ` Dan Carpenter
2019-12-05 14:35 ` Dan Carpenter
2019-12-05 17:24 ` Luc Van Oostenryck
0 siblings, 2 replies; 21+ messages in thread
From: Dan Carpenter @ 2019-12-05 14:28 UTC (permalink / raw)
To: Andrew Murray; +Cc: Catalin.Marinas, smatch
On Thu, Dec 05, 2019 at 04:27:03PM +0300, Dan Carpenter wrote:
> > We hardcode symbols of interest that we consider to be untagged addresses. This
> > provides good coverage but isn't very flexible. A better approach would be to
> > annotate the kernel with address space tags, such as is the case for __user,
> > __percpu, etc. Thus variables, struct members and function parameters could be
> > annotated to indicate that they contain untagged addresses. Unfortunately:
> >
> > - At present it's not possible to determine a struct member's address space
> > from Smatch
>
> I'm not sure how to get the address space for anything not just struct
> members. :( I will investigate.
I was using the wrong attribute in my testing. In the kernel __user is
#define __user __attribute__((noderef, address_space(1)))
Just get_type() should work:
arg = get_argument_from_call_expr(expr->args, 0);
if (!arg)
return;
type = get_type(arg);
if (!type || !type->ctype.as)
return;
sm_msg("%s: expr = '%s' address space = %s", __func__, expr_to_str(expr), type->ctype.as->name);
The output looks like:
test.c:23 main() check_namespace: expr = 'frob(f->p)' address space = <asn:1>
regards,
dan carpenter
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-12-05 14:28 ` Dan Carpenter
@ 2019-12-05 14:35 ` Dan Carpenter
2019-12-05 17:21 ` Luc Van Oostenryck
2019-12-05 17:24 ` Luc Van Oostenryck
1 sibling, 1 reply; 21+ messages in thread
From: Dan Carpenter @ 2019-12-05 14:35 UTC (permalink / raw)
To: Andrew Murray; +Cc: Catalin.Marinas, smatch
On Thu, Dec 05, 2019 at 05:28:42PM +0300, Dan Carpenter wrote:
> On Thu, Dec 05, 2019 at 04:27:03PM +0300, Dan Carpenter wrote:
> > > We hardcode symbols of interest that we consider to be untagged addresses. This
> > > provides good coverage but isn't very flexible. A better approach would be to
> > > annotate the kernel with address space tags, such as is the case for __user,
> > > __percpu, etc. Thus variables, struct members and function parameters could be
> > > annotated to indicate that they contain untagged addresses. Unfortunately:
> > >
> > > - At present it's not possible to determine a struct member's address space
> > > from Smatch
> >
> > I'm not sure how to get the address space for anything not just struct
> > members. :( I will investigate.
>
> I was using the wrong attribute in my testing. In the kernel __user is
>
> #define __user __attribute__((noderef, address_space(1)))
>
My guess is that __untagged isn't something that Sparse knows about?
Sparse has "noderef" as a keyword in the parse.c and has a handler for
it.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-12-05 14:35 ` Dan Carpenter
@ 2019-12-05 17:21 ` Luc Van Oostenryck
0 siblings, 0 replies; 21+ messages in thread
From: Luc Van Oostenryck @ 2019-12-05 17:21 UTC (permalink / raw)
To: Dan Carpenter; +Cc: Andrew Murray, Catalin.Marinas, smatch
On Thu, Dec 05, 2019 at 05:35:54PM +0300, Dan Carpenter wrote:
> On Thu, Dec 05, 2019 at 05:28:42PM +0300, Dan Carpenter wrote:
> > On Thu, Dec 05, 2019 at 04:27:03PM +0300, Dan Carpenter wrote:
> > > > We hardcode symbols of interest that we consider to be untagged addresses. This
> > > > provides good coverage but isn't very flexible. A better approach would be to
> > > > annotate the kernel with address space tags, such as is the case for __user,
> > > > __percpu, etc. Thus variables, struct members and function parameters could be
> > > > annotated to indicate that they contain untagged addresses. Unfortunately:
> > > >
> > > > - At present it's not possible to determine a struct member's address space
> > > > from Smatch
> > >
> > > I'm not sure how to get the address space for anything not just struct
> > > members. :( I will investigate.
> >
> > I was using the wrong attribute in my testing. In the kernel __user is
> >
> > #define __user __attribute__((noderef, address_space(1)))
> >
>
> My guess is that __untagged isn't something that Sparse knows about?
Indeed.
-- Luc
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH 3/7] arm64: add check for comparison against tagged address
2019-12-05 14:28 ` Dan Carpenter
2019-12-05 14:35 ` Dan Carpenter
@ 2019-12-05 17:24 ` Luc Van Oostenryck
1 sibling, 0 replies; 21+ messages in thread
From: Luc Van Oostenryck @ 2019-12-05 17:24 UTC (permalink / raw)
To: Dan Carpenter; +Cc: Andrew Murray, Catalin.Marinas, smatch
On Thu, Dec 05, 2019 at 05:28:42PM +0300, Dan Carpenter wrote:
> On Thu, Dec 05, 2019 at 04:27:03PM +0300, Dan Carpenter wrote:
> > > We hardcode symbols of interest that we consider to be untagged addresses. This
> > > provides good coverage but isn't very flexible. A better approach would be to
> > > annotate the kernel with address space tags, such as is the case for __user,
> > > __percpu, etc. Thus variables, struct members and function parameters could be
> > > annotated to indicate that they contain untagged addresses. Unfortunately:
> > >
> > > - At present it's not possible to determine a struct member's address space
> > > from Smatch
> >
> > I'm not sure how to get the address space for anything not just struct
> > members. :( I will investigate.
>
> I was using the wrong attribute in my testing. In the kernel __user is
>
> #define __user __attribute__((noderef, address_space(1)))
>
> Just get_type() should work:
>
> arg = get_argument_from_call_expr(expr->args, 0);
> if (!arg)
> return;
> type = get_type(arg);
> if (!type || !type->ctype.as)
> return;
> sm_msg("%s: expr = '%s' address space = %s", __func__, expr_to_str(expr), type->ctype.as->name);
Ideally, this should use show_as() to display the address space.
-- Luc
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 4/7] smdb.py: add find_tagged and parse_warns_tagged commands
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
` (2 preceding siblings ...)
2019-10-07 15:35 ` [RFC PATCH 3/7] arm64: add check for comparison against tagged address Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-07 15:52 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations Andrew Murray
` (3 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
The find_tagged command follows a given parameter up the call
stack as far as it can, so long as the parameter contains user
data and the top byte is non-zero. This is helpful to identify
functions that should untag a tagged address.
The parse_warns_tagged command parses the smatch_warns.txt file
and calls find_tagged for each tagged related warning, thus
providing a summary of all issues found and their potential
causes.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
smatch_data/db/smdb.py | 103 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
diff --git a/smatch_data/db/smdb.py b/smatch_data/db/smdb.py
index 80c2df59af01..ba46b01a4d08 100755
--- a/smatch_data/db/smdb.py
+++ b/smatch_data/db/smdb.py
@@ -7,6 +7,7 @@
import sqlite3
import sys
import re
+import subprocess
try:
con = sqlite3.connect('smatch_db.sqlite')
@@ -25,6 +26,8 @@ def usage():
print "data_info <struct_type> <member> - information about a given data type"
print "function_ptr <function> - which function pointers point to this"
print "trace_param <function> <param> - trace where a parameter came from"
+ print "find_tagged <function> <param> - find the source of a tagged value (arm64)"
+ print "parse_warns_tagged <smatch_warns.txt> - parse warns file for summary of tagged issues (arm64)"
print "locals <file> - print the local values in a file."
sys.exit(1)
@@ -542,6 +545,99 @@ def function_type_value(struct_type, member):
for txt in cur:
print "%-30s | %-30s | %s | %s" %(txt[0], txt[1], txt[2], txt[3])
+def rl_too_big(txt):
+ rl = txt_to_rl(txt)
+ ret = ""
+ for idx in range(len(rl)):
+ cur_max = rl[idx][1]
+ if (cur_max > 0xFFFFFFFFFFFFFF):
+ return 1
+
+ return 0
+
+def rl_has_min_untagged(txt):
+ rl = txt_to_rl(txt)
+ ret = ""
+ for idx in range(len(rl)):
+ cur_min = rl[idx][0]
+ if (cur_min == 0xff80000000000000):
+ return 1
+
+ return 0
+
+def rl_is_tagged(txt):
+ if not rl_too_big(txt):
+ return 0
+
+ if rl_has_min_untagged(txt):
+ return 0
+
+ return 1
+
+def parse_warns_tagged(filename):
+ proc = subprocess.Popen(['cat %s | grep "potentially tagged" | sort | uniq' %(filename)], shell=True, stdout=subprocess.PIPE)
+ while True:
+ line = proc.stdout.readline()
+ if not line:
+ break
+
+ linepos = re.search("([^\s]+)", line).group(1)
+ groupre = re.search("potentially tagged address \(([^,]+), ([^,]+), ([^\)]+)\)", line)
+ groupre.group(1)
+
+ func = groupre.group(1)
+ param = int(groupre.group(2))
+ var = groupre.group(3)
+
+ if ("end" in var or "size" in var or "len" in var):
+ continue
+
+ print "\n%s (func: %s, param: %d:%s) may be caused by:" %(linepos, func, param, var)
+
+ if (param != -1):
+ if not find_tagged(func, param, 0, []):
+ print " %s (param %d) (can't walk call tree)" % (func, param)
+ else:
+ print " %s (variable %s (can't walk call tree)" % (func, var)
+
+def find_tagged(func, param, caller_call_id, printed):
+
+ callers = {}
+ cur = con.cursor()
+ ptrs = get_function_pointers(func)
+ found = 0
+
+ for ptr in ptrs:
+ cur.execute("select call_id, value from caller_info where function = '%s' and parameter=%d and type=%d" %(ptr, param, type_to_int("DATA_SOURCE")))
+
+ for row in cur:
+ if (row[1][0] == '$'):
+ if row[0] not in callers:
+ callers[row[0]] = {}
+ callers[row[0]]["param"] = int(row[1][1])
+
+ for ptr in ptrs:
+ cur.execute("select caller, call_id, value from caller_info where function = '%s' and parameter=%d and type=%d" %(ptr, param, type_to_int("USER_DATA")))
+
+ for row in cur:
+ if not rl_is_tagged(row[2]):
+ continue
+ found = 1
+ if row[1] not in callers:
+ callers[row[1]] = {}
+ if "param" not in callers[row[1]]:
+ line = " %s (param ?) -> %s (param %d)" % (row[0], func, param)
+ if line not in printed:
+ printed.append(line)
+ print line
+ continue
+ if row[0] not in printed:
+ printed.append(row[0])
+ if not find_tagged(row[0], callers[row[1]]["param"], row[1], printed):
+ print " %s (param %d)" % (row[0], param)
+
+ return found
+
def trace_callers(func, param):
sources = []
prev_type = 0
@@ -641,6 +737,13 @@ elif sys.argv[1] == "data_info":
elif sys.argv[1] == "call_tree":
func = sys.argv[2]
print_call_tree(func)
+elif sys.argv[1] == "find_tagged":
+ func = sys.argv[2]
+ param = int(sys.argv[3])
+ find_tagged(func, param, 0, [])
+elif sys.argv[1] == "parse_warns_tagged":
+ filename = sys.argv[2]
+ parse_warns_tagged(filename)
elif sys.argv[1] == "where":
if len(sys.argv) == 3:
struct_type = "%"
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 4/7] smdb.py: add find_tagged and parse_warns_tagged commands
2019-10-07 15:35 ` [RFC PATCH 4/7] smdb.py: add find_tagged and parse_warns_tagged commands Andrew Murray
@ 2019-10-07 15:52 ` Andrew Murray
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:52 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
On Mon, Oct 07, 2019 at 04:35:42PM +0100, Andrew Murray wrote:
> The find_tagged command follows a given parameter up the call
> stack as far as it can, so long as the parameter contains user
> data and the top byte is non-zero. This is helpful to identify
> functions that should untag a tagged address.
>
> The parse_warns_tagged command parses the smatch_warns.txt file
> and calls find_tagged for each tagged related warning, thus
> providing a summary of all issues found and their potential
> causes.
>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
> smatch_data/db/smdb.py | 103 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 103 insertions(+)
>
> diff --git a/smatch_data/db/smdb.py b/smatch_data/db/smdb.py
> index 80c2df59af01..ba46b01a4d08 100755
> --- a/smatch_data/db/smdb.py
> +++ b/smatch_data/db/smdb.py
> @@ -7,6 +7,7 @@
> import sqlite3
> import sys
> import re
> +import subprocess
>
> try:
> con = sqlite3.connect('smatch_db.sqlite')
> @@ -25,6 +26,8 @@ def usage():
> print "data_info <struct_type> <member> - information about a given data type"
> print "function_ptr <function> - which function pointers point to this"
> print "trace_param <function> <param> - trace where a parameter came from"
> + print "find_tagged <function> <param> - find the source of a tagged value (arm64)"
> + print "parse_warns_tagged <smatch_warns.txt> - parse warns file for summary of tagged issues (arm64)"
> print "locals <file> - print the local values in a file."
> sys.exit(1)
>
> @@ -542,6 +545,99 @@ def function_type_value(struct_type, member):
> for txt in cur:
> print "%-30s | %-30s | %s | %s" %(txt[0], txt[1], txt[2], txt[3])
>
> +def rl_too_big(txt):
> + rl = txt_to_rl(txt)
> + ret = ""
> + for idx in range(len(rl)):
> + cur_max = rl[idx][1]
> + if (cur_max > 0xFFFFFFFFFFFFFF):
> + return 1
> +
> + return 0
> +
> +def rl_has_min_untagged(txt):
> + rl = txt_to_rl(txt)
> + ret = ""
> + for idx in range(len(rl)):
> + cur_min = rl[idx][0]
> + if (cur_min == 0xff80000000000000):
> + return 1
> +
> + return 0
> +
> +def rl_is_tagged(txt):
> + if not rl_too_big(txt):
> + return 0
> +
> + if rl_has_min_untagged(txt):
> + return 0
> +
> + return 1
> +
> +def parse_warns_tagged(filename):
> + proc = subprocess.Popen(['cat %s | grep "potentially tagged" | sort | uniq' %(filename)], shell=True, stdout=subprocess.PIPE)
> + while True:
> + line = proc.stdout.readline()
> + if not line:
> + break
> +
> + linepos = re.search("([^\s]+)", line).group(1)
> + groupre = re.search("potentially tagged address \(([^,]+), ([^,]+), ([^\)]+)\)", line)
> + groupre.group(1)
> +
> + func = groupre.group(1)
> + param = int(groupre.group(2))
> + var = groupre.group(3)
> +
> + if ("end" in var or "size" in var or "len" in var):
> + continue
Here we look for hard coded parameters that hopefully are not going
to contain a tagged user address.
Thanks,
Andrew Murray
> +
> + print "\n%s (func: %s, param: %d:%s) may be caused by:" %(linepos, func, param, var)
> +
> + if (param != -1):
> + if not find_tagged(func, param, 0, []):
> + print " %s (param %d) (can't walk call tree)" % (func, param)
> + else:
> + print " %s (variable %s (can't walk call tree)" % (func, var)
> +
> +def find_tagged(func, param, caller_call_id, printed):
> +
> + callers = {}
> + cur = con.cursor()
> + ptrs = get_function_pointers(func)
> + found = 0
> +
> + for ptr in ptrs:
> + cur.execute("select call_id, value from caller_info where function = '%s' and parameter=%d and type=%d" %(ptr, param, type_to_int("DATA_SOURCE")))
> +
> + for row in cur:
> + if (row[1][0] == '$'):
> + if row[0] not in callers:
> + callers[row[0]] = {}
> + callers[row[0]]["param"] = int(row[1][1])
> +
> + for ptr in ptrs:
> + cur.execute("select caller, call_id, value from caller_info where function = '%s' and parameter=%d and type=%d" %(ptr, param, type_to_int("USER_DATA")))
> +
> + for row in cur:
> + if not rl_is_tagged(row[2]):
> + continue
> + found = 1
> + if row[1] not in callers:
> + callers[row[1]] = {}
> + if "param" not in callers[row[1]]:
> + line = " %s (param ?) -> %s (param %d)" % (row[0], func, param)
> + if line not in printed:
> + printed.append(line)
> + print line
> + continue
> + if row[0] not in printed:
> + printed.append(row[0])
> + if not find_tagged(row[0], callers[row[1]]["param"], row[1], printed):
> + print " %s (param %d)" % (row[0], param)
> +
> + return found
> +
> def trace_callers(func, param):
> sources = []
> prev_type = 0
> @@ -641,6 +737,13 @@ elif sys.argv[1] == "data_info":
> elif sys.argv[1] == "call_tree":
> func = sys.argv[2]
> print_call_tree(func)
> +elif sys.argv[1] == "find_tagged":
> + func = sys.argv[2]
> + param = int(sys.argv[3])
> + find_tagged(func, param, 0, [])
> +elif sys.argv[1] == "parse_warns_tagged":
> + filename = sys.argv[2]
> + parse_warns_tagged(filename)
> elif sys.argv[1] == "where":
> if len(sys.argv) == 3:
> struct_type = "%"
> --
> 2.21.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
` (3 preceding siblings ...)
2019-10-07 15:35 ` [RFC PATCH 4/7] smdb.py: add find_tagged and parse_warns_tagged commands Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-07 15:55 ` Andrew Murray
2019-10-08 8:24 ` Dan Carpenter
2019-10-07 15:35 ` [RFC PATCH 6/7] smdb.py: filter out __untagged from find_tagged results Andrew Murray
` (2 subsequent siblings)
7 siblings, 2 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
The kernel provides a number of annotations that can be used by smatch
for static analysis, for example __user and __kernel - the kernel relies
on the GCC attribute 'address_space(x)' to implement these annotations.
By adding a new annotation named __untagged with address space 5 we can
use this annotation to indicate function parameters which should not
be treated as tagged addresses.
Let's give Smatch an understanding of this annotation, when used by
function parameters, by adding a '[u]' tag to the end of their user value
ranges. We ensure that upon merges of range lists states, we only preserve
the tag if both states contain the tag.
In other words, if two functions A and B both call function C (let's assume
these functions all have a single parameter) - then the range list for
function C's parameter will only contain the tag if the parameters for both
A and B are also tagged. Therefore we know that if if we make a comparison
between a tagged and untagged address in C then we know it doesn't need
to flag a detection as the values concerned have all come from functions
that are marked as __untagged.
If only function A was tagged, then we could raise a detection in C - and
rely on parameter tracing to walk up both callers of C to build call trees
and exclude anything and its parents that are tagged.
In practice we will always raise a detection and let the smdb.py script
figure out which call trees to display.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
smatch_estate.c | 22 ++++++++++++
smatch_extra.h | 3 ++
smatch_kernel_user_data.c | 71 +++++++++++++++++++++++++++++++++------
3 files changed, 86 insertions(+), 10 deletions(-)
diff --git a/smatch_estate.c b/smatch_estate.c
index 5d0a1397acef..5650af88de21 100644
--- a/smatch_estate.c
+++ b/smatch_estate.c
@@ -53,6 +53,9 @@ struct smatch_state *merge_estates(struct smatch_state *s1, struct smatch_state
if (estate_capped(s1) && estate_capped(s2))
estate_set_capped(tmp);
+ if (estate_treat_untagged(s1) && estate_treat_untagged(s2))
+ estate_set_treat_untagged(tmp);
+
return tmp;
}
@@ -154,6 +157,23 @@ void estate_set_capped(struct smatch_state *state)
get_dinfo(state)->capped = true;
}
+bool estate_treat_untagged(struct smatch_state *state)
+{
+ if (!state)
+ return false;
+
+ /* impossible states are capped */
+ if (!estate_rl(state))
+ return true;
+
+ return get_dinfo(state)->treat_untagged;
+}
+
+void estate_set_treat_untagged(struct smatch_state *state)
+{
+ get_dinfo(state)->treat_untagged = true;
+}
+
sval_t estate_min(struct smatch_state *state)
{
return rl_min(estate_rl(state));
@@ -204,6 +224,8 @@ int estates_equiv(struct smatch_state *one, struct smatch_state *two)
return 0;
if (estate_capped(one) != estate_capped(two))
return 0;
+ if (estate_treat_untagged(one) != estate_treat_untagged(two))
+ return 0;
if (strcmp(one->name, two->name) == 0)
return 1;
return 0;
diff --git a/smatch_extra.h b/smatch_extra.h
index 49cc7c72ad19..40d59baf253d 100644
--- a/smatch_extra.h
+++ b/smatch_extra.h
@@ -31,6 +31,7 @@ struct data_info {
sval_t fuzzy_max;
unsigned int hard_max:1;
unsigned int capped:1;
+ unsigned int treat_untagged:1;
};
DECLARE_ALLOCATOR(data_info);
@@ -148,6 +149,8 @@ void estate_clear_hard_max(struct smatch_state *state);
int estate_get_hard_max(struct smatch_state *state, sval_t *sval);
bool estate_capped(struct smatch_state *state);
void estate_set_capped(struct smatch_state *state);
+bool estate_treat_untagged(struct smatch_state *state);
+void estate_set_treat_untagged(struct smatch_state *state);
int estate_get_single_value(struct smatch_state *state, sval_t *sval);
struct smatch_state *get_implied_estate(struct expression *expr);
diff --git a/smatch_kernel_user_data.c b/smatch_kernel_user_data.c
index 9e8b6a7f3506..52ed875f00bf 100644
--- a/smatch_kernel_user_data.c
+++ b/smatch_kernel_user_data.c
@@ -100,6 +100,8 @@ static void pre_merge_hook(struct sm_state *sm)
state = alloc_estate_rl(clone_rl(rl));
if (estate_capped(user) || is_capped_var_sym(sm->name, sm->sym))
estate_set_capped(state);
+ if (estate_treat_untagged(user))
+ estate_set_treat_untagged(state);
set_state(my_id, sm->name, sm->sym, state);
}
@@ -117,6 +119,8 @@ static void extra_nomod_hook(const char *name, struct symbol *sym, struct expres
new = alloc_estate_rl(rl);
if (estate_capped(user))
estate_set_capped(new);
+ if (estate_treat_untagged(user))
+ estate_set_treat_untagged(new);
set_state(my_id, name, sym, new);
}
@@ -174,6 +178,30 @@ bool user_rl_capped(struct expression *expr)
return true; /* not actually user data */
}
+bool user_rl_treat_untagged(struct expression *expr)
+{
+ struct smatch_state *state;
+ struct range_list *rl;
+ sval_t sval;
+
+ char *var_name = expr_to_var(expr);
+
+ expr = strip_expr(expr);
+ if (!expr)
+ return false;
+ if (get_value(expr, &sval))
+ return true;
+
+ state = get_state_expr(my_id, expr);
+ if (state)
+ return estate_treat_untagged(state);
+
+ if (get_user_rl(expr, &rl))
+ return false; /* uncapped user data */
+
+ return true; /* not actually user data */
+}
+
static void tag_inner_struct_members(struct expression *expr, struct symbol *member)
{
struct expression *edge_member;
@@ -575,6 +603,8 @@ static bool handle_op_assign(struct expression *expr)
state = alloc_estate_rl(rl);
if (user_rl_capped(binop_expr))
estate_set_capped(state);
+ if (user_rl_treat_untagged(expr->left))
+ estate_set_treat_untagged(state);
set_state_expr(my_id, expr->left, state);
return true;
}
@@ -616,8 +646,11 @@ static void match_assign(struct expression *expr)
rl = cast_rl(get_type(expr->left), rl);
state = alloc_estate_rl(rl);
+
if (user_rl_capped(expr->right))
estate_set_capped(state);
+ if (user_rl_treat_untagged(expr->right))
+ estate_set_treat_untagged(state);
set_state_expr(my_id, expr->left, state);
return;
@@ -1008,8 +1041,10 @@ static char *get_user_rl_str(struct expression *expr, struct symbol *type)
if (!get_user_rl(expr, &rl))
return NULL;
rl = cast_rl(type, rl);
- snprintf(buf, sizeof(buf), "%s%s",
- show_rl(rl), user_rl_capped(expr) ? "[c]" : "");
+ snprintf(buf, sizeof(buf), "%s%s%s",
+ show_rl(rl),
+ user_rl_capped(expr) ? "[c]" : "",
+ user_rl_treat_untagged(expr) ? "[u]" : "");
return buf;
}
@@ -1081,8 +1116,9 @@ static void struct_member_callback(struct expression *call, int param, char *pri
if (!rl)
return;
- snprintf(buf, sizeof(buf), "%s%s", show_rl(rl),
- estate_capped(sm->state) ? "[c]" : "");
+ snprintf(buf, sizeof(buf), "%s%s%s", show_rl(rl),
+ estate_capped(sm->state) ? "[c]" : "",
+ estate_treat_untagged(sm->state) ? "[u]" : "");
sql_insert_caller_info(call, USER_DATA, param, printed_name, buf);
}
@@ -1121,6 +1157,13 @@ static bool param_data_capped(const char *value)
return false;
}
+static bool param_data_treat_untagged(const char *value)
+{
+ if (strstr(value, ",u") || strstr(value, "[u"))
+ return true;
+ return false;
+}
+
static void set_param_user_data(const char *name, struct symbol *sym, char *key, char *value)
{
struct range_list *rl = NULL;
@@ -1168,6 +1211,8 @@ static void set_param_user_data(const char *name, struct symbol *sym, char *key,
state = alloc_estate_rl(rl);
if (param_data_capped(value) || is_capped(expr))
estate_set_capped(state);
+ if (param_data_treat_untagged(value) || sym->ctype.as == 5)
+ estate_set_treat_untagged(state);
set_state(my_id, fullname, sym, state);
}
@@ -1240,6 +1285,8 @@ static void set_to_user_data(struct expression *expr, char *key, char *value)
state = alloc_estate_rl(rl);
if (param_data_capped(value))
estate_set_capped(state);
+ if (param_data_treat_untagged(value))
+ estate_set_treat_untagged(state);
set_state(my_id, name, sym, state);
free:
free_string(name);
@@ -1347,9 +1394,10 @@ static void param_set_to_user_data(int return_id, char *return_ranges, struct ex
if (strcmp(param_name, "$") == 0) /* The -1 param is handled after the loop */
continue;
- snprintf(buf, sizeof(buf), "%s%s",
+ snprintf(buf, sizeof(buf), "%s%s%s",
show_rl(estate_rl(sm->state)),
- estate_capped(sm->state) ? "[c]" : "");
+ estate_capped(sm->state) ? "[c]" : "",
+ estate_treat_untagged(sm->state) ? "[u]" : "");
sql_insert_return_states(return_id, return_ranges,
func_gets_user_data ? USER_DATA_SET : USER_DATA,
param, param_name, buf);
@@ -1369,9 +1417,10 @@ static void param_set_to_user_data(int return_id, char *return_ranges, struct ex
return_found = true;
if (strcmp(param_name, "*$") == 0)
pointed_at_found = true;
- snprintf(buf, sizeof(buf), "%s%s",
+ snprintf(buf, sizeof(buf), "%s%s%s",
show_rl(estate_rl(sm->state)),
- estate_capped(sm->state) ? "[c]" : "");
+ estate_capped(sm->state) ? "[c]" : "",
+ estate_treat_untagged(sm->state) ? "[u]" : "");
sql_insert_return_states(return_id, return_ranges,
func_gets_user_data ? USER_DATA_SET : USER_DATA,
-1, param_name, buf);
@@ -1379,8 +1428,10 @@ static void param_set_to_user_data(int return_id, char *return_ranges, struct ex
/* This if for "return ntohl(foo);" */
if (!return_found && get_user_rl(expr, &rl)) {
- snprintf(buf, sizeof(buf), "%s%s",
- show_rl(rl), user_rl_capped(expr) ? "[c]" : "");
+ snprintf(buf, sizeof(buf), "%s%s%s",
+ show_rl(rl),
+ user_rl_capped(expr) ? "[c]" : "",
+ user_rl_treat_untagged(expr) ? "[u]" : "");
sql_insert_return_states(return_id, return_ranges,
func_gets_user_data ? USER_DATA_SET : USER_DATA,
-1, "$", buf);
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations
2019-10-07 15:35 ` [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations Andrew Murray
@ 2019-10-07 15:55 ` Andrew Murray
2019-12-05 15:04 ` Dan Carpenter
2019-10-08 8:24 ` Dan Carpenter
1 sibling, 1 reply; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:55 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
On Mon, Oct 07, 2019 at 04:35:43PM +0100, Andrew Murray wrote:
> The kernel provides a number of annotations that can be used by smatch
> for static analysis, for example __user and __kernel - the kernel relies
> on the GCC attribute 'address_space(x)' to implement these annotations.
> By adding a new annotation named __untagged with address space 5 we can
> use this annotation to indicate function parameters which should not
> be treated as tagged addresses.
>
> Let's give Smatch an understanding of this annotation, when used by
> function parameters, by adding a '[u]' tag to the end of their user value
> ranges. We ensure that upon merges of range lists states, we only preserve
> the tag if both states contain the tag.
>
> In other words, if two functions A and B both call function C (let's assume
> these functions all have a single parameter) - then the range list for
> function C's parameter will only contain the tag if the parameters for both
> A and B are also tagged. Therefore we know that if if we make a comparison
> between a tagged and untagged address in C then we know it doesn't need
> to flag a detection as the values concerned have all come from functions
> that are marked as __untagged.
>
> If only function A was tagged, then we could raise a detection in C - and
> rely on parameter tracing to walk up both callers of C to build call trees
> and exclude anything and its parents that are tagged.
>
> In practice we will always raise a detection and let the smdb.py script
> figure out which call trees to display.
This approach works well, however it sometimes falls over due to gaps in
the Smatch flow-tracking/parameter-tracking. E.g.
void func1(unsigned long __untagged addr, unsigned long len)
{
unsigned long tmp = 1;
tmp += addr;
tmp = func2(tmp, len, 4);
find_vma(NULL, tmp);
}
The parameter addr is annotated as untagged, therefore its range list may
look something like '0-u64max[u]'.
The first line of code results in a range list of tmp to be '1';
The second line of code assigns addr to tmp, resulting in tmp's range list
changing to '1-u64max[u]'
The third line of code passes tmp to a function and assigns the result back
to tmp. Just like any other function Smatch keeps track of all the callers
to func2, and when it parses the code in func2 it will consider 'tmp' to
have a range list of all those callers combined - if there is a caller of
func2 anywhere in the kernel that doesn't provide tmp with the '[u]' tag
then the tag will be dropped. This results in the return value of func2 not
containing a tag and resulting in 'tmp' inside func1 loosing it's '[u]' tag.
Sometimes Smatch is clever and can describe the range of values returned by a
function in terms relative to the input arguments, in that case the tag would
be preserved.
This means that find_vma, if it detects a tagged pointer will flag it seeing as
'tmp' does not contain the '[u]' tag. Fortunately, in this case we rely on
smdb.py to walk the tree from the detection, therefore it will see that func1
calls find_vma, and func1 'addr' is tagged and thus it can suppress the
detection. Unfortunately this relies on the ability of Smatch to detect that
'tmp' as passed to find_vma is the 'addr' in the function arguments - this
parameter tracking doesn't always work (for the same reasons as described). It is
for these reasons that using __untagged annotations is hit or miss.
Thanks,
Andrew Murray
>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
> smatch_estate.c | 22 ++++++++++++
> smatch_extra.h | 3 ++
> smatch_kernel_user_data.c | 71 +++++++++++++++++++++++++++++++++------
> 3 files changed, 86 insertions(+), 10 deletions(-)
>
> diff --git a/smatch_estate.c b/smatch_estate.c
> index 5d0a1397acef..5650af88de21 100644
> --- a/smatch_estate.c
> +++ b/smatch_estate.c
> @@ -53,6 +53,9 @@ struct smatch_state *merge_estates(struct smatch_state *s1, struct smatch_state
> if (estate_capped(s1) && estate_capped(s2))
> estate_set_capped(tmp);
>
> + if (estate_treat_untagged(s1) && estate_treat_untagged(s2))
> + estate_set_treat_untagged(tmp);
> +
> return tmp;
> }
>
> @@ -154,6 +157,23 @@ void estate_set_capped(struct smatch_state *state)
> get_dinfo(state)->capped = true;
> }
>
> +bool estate_treat_untagged(struct smatch_state *state)
> +{
> + if (!state)
> + return false;
> +
> + /* impossible states are capped */
> + if (!estate_rl(state))
> + return true;
> +
> + return get_dinfo(state)->treat_untagged;
> +}
> +
> +void estate_set_treat_untagged(struct smatch_state *state)
> +{
> + get_dinfo(state)->treat_untagged = true;
> +}
> +
> sval_t estate_min(struct smatch_state *state)
> {
> return rl_min(estate_rl(state));
> @@ -204,6 +224,8 @@ int estates_equiv(struct smatch_state *one, struct smatch_state *two)
> return 0;
> if (estate_capped(one) != estate_capped(two))
> return 0;
> + if (estate_treat_untagged(one) != estate_treat_untagged(two))
> + return 0;
> if (strcmp(one->name, two->name) == 0)
> return 1;
> return 0;
> diff --git a/smatch_extra.h b/smatch_extra.h
> index 49cc7c72ad19..40d59baf253d 100644
> --- a/smatch_extra.h
> +++ b/smatch_extra.h
> @@ -31,6 +31,7 @@ struct data_info {
> sval_t fuzzy_max;
> unsigned int hard_max:1;
> unsigned int capped:1;
> + unsigned int treat_untagged:1;
> };
> DECLARE_ALLOCATOR(data_info);
>
> @@ -148,6 +149,8 @@ void estate_clear_hard_max(struct smatch_state *state);
> int estate_get_hard_max(struct smatch_state *state, sval_t *sval);
> bool estate_capped(struct smatch_state *state);
> void estate_set_capped(struct smatch_state *state);
> +bool estate_treat_untagged(struct smatch_state *state);
> +void estate_set_treat_untagged(struct smatch_state *state);
>
> int estate_get_single_value(struct smatch_state *state, sval_t *sval);
> struct smatch_state *get_implied_estate(struct expression *expr);
> diff --git a/smatch_kernel_user_data.c b/smatch_kernel_user_data.c
> index 9e8b6a7f3506..52ed875f00bf 100644
> --- a/smatch_kernel_user_data.c
> +++ b/smatch_kernel_user_data.c
> @@ -100,6 +100,8 @@ static void pre_merge_hook(struct sm_state *sm)
> state = alloc_estate_rl(clone_rl(rl));
> if (estate_capped(user) || is_capped_var_sym(sm->name, sm->sym))
> estate_set_capped(state);
> + if (estate_treat_untagged(user))
> + estate_set_treat_untagged(state);
> set_state(my_id, sm->name, sm->sym, state);
> }
>
> @@ -117,6 +119,8 @@ static void extra_nomod_hook(const char *name, struct symbol *sym, struct expres
> new = alloc_estate_rl(rl);
> if (estate_capped(user))
> estate_set_capped(new);
> + if (estate_treat_untagged(user))
> + estate_set_treat_untagged(new);
> set_state(my_id, name, sym, new);
> }
>
> @@ -174,6 +178,30 @@ bool user_rl_capped(struct expression *expr)
> return true; /* not actually user data */
> }
>
> +bool user_rl_treat_untagged(struct expression *expr)
> +{
> + struct smatch_state *state;
> + struct range_list *rl;
> + sval_t sval;
> +
> + char *var_name = expr_to_var(expr);
> +
> + expr = strip_expr(expr);
> + if (!expr)
> + return false;
> + if (get_value(expr, &sval))
> + return true;
> +
> + state = get_state_expr(my_id, expr);
> + if (state)
> + return estate_treat_untagged(state);
> +
> + if (get_user_rl(expr, &rl))
> + return false; /* uncapped user data */
> +
> + return true; /* not actually user data */
> +}
> +
> static void tag_inner_struct_members(struct expression *expr, struct symbol *member)
> {
> struct expression *edge_member;
> @@ -575,6 +603,8 @@ static bool handle_op_assign(struct expression *expr)
> state = alloc_estate_rl(rl);
> if (user_rl_capped(binop_expr))
> estate_set_capped(state);
> + if (user_rl_treat_untagged(expr->left))
> + estate_set_treat_untagged(state);
> set_state_expr(my_id, expr->left, state);
> return true;
> }
> @@ -616,8 +646,11 @@ static void match_assign(struct expression *expr)
>
> rl = cast_rl(get_type(expr->left), rl);
> state = alloc_estate_rl(rl);
> +
> if (user_rl_capped(expr->right))
> estate_set_capped(state);
> + if (user_rl_treat_untagged(expr->right))
> + estate_set_treat_untagged(state);
> set_state_expr(my_id, expr->left, state);
>
> return;
> @@ -1008,8 +1041,10 @@ static char *get_user_rl_str(struct expression *expr, struct symbol *type)
> if (!get_user_rl(expr, &rl))
> return NULL;
> rl = cast_rl(type, rl);
> - snprintf(buf, sizeof(buf), "%s%s",
> - show_rl(rl), user_rl_capped(expr) ? "[c]" : "");
> + snprintf(buf, sizeof(buf), "%s%s%s",
> + show_rl(rl),
> + user_rl_capped(expr) ? "[c]" : "",
> + user_rl_treat_untagged(expr) ? "[u]" : "");
> return buf;
> }
>
> @@ -1081,8 +1116,9 @@ static void struct_member_callback(struct expression *call, int param, char *pri
> if (!rl)
> return;
>
> - snprintf(buf, sizeof(buf), "%s%s", show_rl(rl),
> - estate_capped(sm->state) ? "[c]" : "");
> + snprintf(buf, sizeof(buf), "%s%s%s", show_rl(rl),
> + estate_capped(sm->state) ? "[c]" : "",
> + estate_treat_untagged(sm->state) ? "[u]" : "");
> sql_insert_caller_info(call, USER_DATA, param, printed_name, buf);
> }
>
> @@ -1121,6 +1157,13 @@ static bool param_data_capped(const char *value)
> return false;
> }
>
> +static bool param_data_treat_untagged(const char *value)
> +{
> + if (strstr(value, ",u") || strstr(value, "[u"))
> + return true;
> + return false;
> +}
> +
> static void set_param_user_data(const char *name, struct symbol *sym, char *key, char *value)
> {
> struct range_list *rl = NULL;
> @@ -1168,6 +1211,8 @@ static void set_param_user_data(const char *name, struct symbol *sym, char *key,
> state = alloc_estate_rl(rl);
> if (param_data_capped(value) || is_capped(expr))
> estate_set_capped(state);
> + if (param_data_treat_untagged(value) || sym->ctype.as == 5)
> + estate_set_treat_untagged(state);
> set_state(my_id, fullname, sym, state);
> }
>
> @@ -1240,6 +1285,8 @@ static void set_to_user_data(struct expression *expr, char *key, char *value)
> state = alloc_estate_rl(rl);
> if (param_data_capped(value))
> estate_set_capped(state);
> + if (param_data_treat_untagged(value))
> + estate_set_treat_untagged(state);
> set_state(my_id, name, sym, state);
> free:
> free_string(name);
> @@ -1347,9 +1394,10 @@ static void param_set_to_user_data(int return_id, char *return_ranges, struct ex
> if (strcmp(param_name, "$") == 0) /* The -1 param is handled after the loop */
> continue;
>
> - snprintf(buf, sizeof(buf), "%s%s",
> + snprintf(buf, sizeof(buf), "%s%s%s",
> show_rl(estate_rl(sm->state)),
> - estate_capped(sm->state) ? "[c]" : "");
> + estate_capped(sm->state) ? "[c]" : "",
> + estate_treat_untagged(sm->state) ? "[u]" : "");
> sql_insert_return_states(return_id, return_ranges,
> func_gets_user_data ? USER_DATA_SET : USER_DATA,
> param, param_name, buf);
> @@ -1369,9 +1417,10 @@ static void param_set_to_user_data(int return_id, char *return_ranges, struct ex
> return_found = true;
> if (strcmp(param_name, "*$") == 0)
> pointed_at_found = true;
> - snprintf(buf, sizeof(buf), "%s%s",
> + snprintf(buf, sizeof(buf), "%s%s%s",
> show_rl(estate_rl(sm->state)),
> - estate_capped(sm->state) ? "[c]" : "");
> + estate_capped(sm->state) ? "[c]" : "",
> + estate_treat_untagged(sm->state) ? "[u]" : "");
> sql_insert_return_states(return_id, return_ranges,
> func_gets_user_data ? USER_DATA_SET : USER_DATA,
> -1, param_name, buf);
> @@ -1379,8 +1428,10 @@ static void param_set_to_user_data(int return_id, char *return_ranges, struct ex
>
> /* This if for "return ntohl(foo);" */
> if (!return_found && get_user_rl(expr, &rl)) {
> - snprintf(buf, sizeof(buf), "%s%s",
> - show_rl(rl), user_rl_capped(expr) ? "[c]" : "");
> + snprintf(buf, sizeof(buf), "%s%s%s",
> + show_rl(rl),
> + user_rl_capped(expr) ? "[c]" : "",
> + user_rl_treat_untagged(expr) ? "[u]" : "");
> sql_insert_return_states(return_id, return_ranges,
> func_gets_user_data ? USER_DATA_SET : USER_DATA,
> -1, "$", buf);
> --
> 2.21.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations
2019-10-07 15:55 ` Andrew Murray
@ 2019-12-05 15:04 ` Dan Carpenter
0 siblings, 0 replies; 21+ messages in thread
From: Dan Carpenter @ 2019-12-05 15:04 UTC (permalink / raw)
To: Andrew Murray; +Cc: Catalin.Marinas, smatch
On Mon, Oct 07, 2019 at 04:55:44PM +0100, Andrew Murray wrote:
> On Mon, Oct 07, 2019 at 04:35:43PM +0100, Andrew Murray wrote:
> > The kernel provides a number of annotations that can be used by smatch
> > for static analysis, for example __user and __kernel - the kernel relies
> > on the GCC attribute 'address_space(x)' to implement these annotations.
> > By adding a new annotation named __untagged with address space 5 we can
> > use this annotation to indicate function parameters which should not
> > be treated as tagged addresses.
> >
> > Let's give Smatch an understanding of this annotation, when used by
> > function parameters, by adding a '[u]' tag to the end of their user value
> > ranges. We ensure that upon merges of range lists states, we only preserve
> > the tag if both states contain the tag.
> >
> > In other words, if two functions A and B both call function C (let's assume
> > these functions all have a single parameter) - then the range list for
> > function C's parameter will only contain the tag if the parameters for both
> > A and B are also tagged. Therefore we know that if if we make a comparison
> > between a tagged and untagged address in C then we know it doesn't need
> > to flag a detection as the values concerned have all come from functions
> > that are marked as __untagged.
> >
> > If only function A was tagged, then we could raise a detection in C - and
> > rely on parameter tracing to walk up both callers of C to build call trees
> > and exclude anything and its parents that are tagged.
> >
> > In practice we will always raise a detection and let the smdb.py script
> > figure out which call trees to display.
>
> This approach works well, however it sometimes falls over due to gaps in
> the Smatch flow-tracking/parameter-tracking. E.g.
>
> void func1(unsigned long __untagged addr, unsigned long len)
> {
> unsigned long tmp = 1;
> tmp += addr;
> tmp = func2(tmp, len, 4);
> find_vma(NULL, tmp);
> }
>
> The parameter addr is annotated as untagged, therefore its range list may
> look something like '0-u64max[u]'.
Ideally, it would never be 0-u64max and [u] at the same time, right?
The [u] is already a work around because Smatch is parsing it badly.
>
> The first line of code results in a range list of tmp to be '1';
>
> The second line of code assigns addr to tmp, resulting in tmp's range list
> changing to '1-u64max[u]'
>
> The third line of code passes tmp to a function and assigns the result back
> to tmp. Just like any other function Smatch keeps track of all the callers
> to func2, and when it parses the code in func2 it will consider 'tmp' to
> have a range list of all those callers combined - if there is a caller of
> func2 anywhere in the kernel that doesn't provide tmp with the '[u]' tag
> then the tag will be dropped. This results in the return value of func2 not
> containing a tag and resulting in 'tmp' inside func1 loosing it's '[u]' tag.
Yeah. Gotcha. We don't know if the return value is tagged untagged,
but maybe we could store something to say that it inherits the tag from
param $0.
> Sometimes Smatch is clever and can describe the range of values returned by a
> function in terms relative to the input arguments, in that case the tag would
> be preserved.
Yeah.
>
> This means that find_vma, if it detects a tagged pointer will flag it seeing as
> 'tmp' does not contain the '[u]' tag. Fortunately, in this case we rely on
> smdb.py to walk the tree from the detection, therefore it will see that func1
> calls find_vma, and func1 'addr' is tagged and thus it can suppress the
> detection. Unfortunately this relies on the ability of Smatch to detect that
> 'tmp' as passed to find_vma is the 'addr' in the function arguments - this
> parameter tracking doesn't always work (for the same reasons as described). It is
> for these reasons that using __untagged annotations is hit or miss.
I feel like my temptation is always to do a hack around. Like we could
say func2() never calls get_user() or copy_from_user(), it just takes
user data from the parameters and returns it. Smatch already
differentiates these situations with USER_DATA and USER_DATA_SET so we
could say if a function returns USER_DATA set and only passed untagged
to the function then the return is also untagged.
Would that work?
regards,
dan carpenter
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations
2019-10-07 15:35 ` [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations Andrew Murray
2019-10-07 15:55 ` Andrew Murray
@ 2019-10-08 8:24 ` Dan Carpenter
2019-10-08 8:41 ` Andrew Murray
1 sibling, 1 reply; 21+ messages in thread
From: Dan Carpenter @ 2019-10-08 8:24 UTC (permalink / raw)
To: Andrew Murray; +Cc: Catalin.Marinas, smatch
On Mon, Oct 07, 2019 at 04:35:43PM +0100, Andrew Murray wrote:
> @@ -1168,6 +1211,8 @@ static void set_param_user_data(const char *name, struct symbol *sym, char *key,
> state = alloc_estate_rl(rl);
> if (param_data_capped(value) || is_capped(expr))
> estate_set_capped(state);
> + if (param_data_treat_untagged(value) || sym->ctype.as == 5)
> + estate_set_treat_untagged(state);
Could you update to the latest version of Smatch? These days cytype.as
address space is a pointer instead of an int.
smatch_kernel_user_data.c: In function ‘set_param_user_data’:
smatch_kernel_user_data.c:1225:56: warning: comparison between pointer and integer
if (param_data_treat_untagged(value) || sym->ctype.as == 5)
^~
regards,
dan carpenter
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations
2019-10-08 8:24 ` Dan Carpenter
@ 2019-10-08 8:41 ` Andrew Murray
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-08 8:41 UTC (permalink / raw)
To: Dan Carpenter; +Cc: Catalin.Marinas, smatch
On Tue, Oct 08, 2019 at 11:24:42AM +0300, Dan Carpenter wrote:
> On Mon, Oct 07, 2019 at 04:35:43PM +0100, Andrew Murray wrote:
> > @@ -1168,6 +1211,8 @@ static void set_param_user_data(const char *name, struct symbol *sym, char *key,
> > state = alloc_estate_rl(rl);
> > if (param_data_capped(value) || is_capped(expr))
> > estate_set_capped(state);
> > + if (param_data_treat_untagged(value) || sym->ctype.as == 5)
> > + estate_set_treat_untagged(state);
>
> Could you update to the latest version of Smatch? These days cytype.as
> address space is a pointer instead of an int.
>
> smatch_kernel_user_data.c: In function ‘set_param_user_data’:
> smatch_kernel_user_data.c:1225:56: warning: comparison between pointer and integer
> if (param_data_treat_untagged(value) || sym->ctype.as == 5)
> ^~
Doh! Thanks, I did rebase to the latest version before sending and checked it
built - though missed this warning.
I'll fix up the code on the next respin.
Andrew Murray
>
> regards,
> dan carpenter
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 6/7] smdb.py: filter out __untagged from find_tagged results
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
` (4 preceding siblings ...)
2019-10-07 15:35 ` [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 7/7] Documentation: add guide for tagged addresses Andrew Murray
2019-10-08 8:58 ` [RFC PATCH 0/7] Tagged Pointer Detection Dan Carpenter
7 siblings, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
The find_tagged command lists call sites that may lead to a tagged
pointer detection deeper in the call tree - let's filter out any
that we know to be annotated with __untagged.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
smatch_data/db/smdb.py | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/smatch_data/db/smdb.py b/smatch_data/db/smdb.py
index ba46b01a4d08..04bff00be45a 100755
--- a/smatch_data/db/smdb.py
+++ b/smatch_data/db/smdb.py
@@ -574,6 +574,12 @@ def rl_is_tagged(txt):
return 1
+def rl_is_treat_untagged(txt):
+ if "[u]" in txt:
+ return 1;
+
+ return 0
+
def parse_warns_tagged(filename):
proc = subprocess.Popen(['cat %s | grep "potentially tagged" | sort | uniq' %(filename)], shell=True, stdout=subprocess.PIPE)
while True:
@@ -622,6 +628,8 @@ def find_tagged(func, param, caller_call_id, printed):
for row in cur:
if not rl_is_tagged(row[2]):
continue
+ if rl_is_treat_untagged(row[2]):
+ continue
found = 1
if row[1] not in callers:
callers[row[1]] = {}
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 7/7] Documentation: add guide for tagged addresses
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
` (5 preceding siblings ...)
2019-10-07 15:35 ` [RFC PATCH 6/7] smdb.py: filter out __untagged from find_tagged results Andrew Murray
@ 2019-10-07 15:35 ` Andrew Murray
2019-10-08 8:58 ` [RFC PATCH 0/7] Tagged Pointer Detection Dan Carpenter
7 siblings, 0 replies; 21+ messages in thread
From: Andrew Murray @ 2019-10-07 15:35 UTC (permalink / raw)
To: Dan Carpenter, Catalin.Marinas; +Cc: smatch
Add documentation to explain how Smatch can be used to detect
when a comparison is being made between vm_start/vm_end members
of the vma_area_struct and user originated 64 bit data where the
top byte may be non-zero (tagged address).
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
.../arm64-detecting-tagged-addresses.txt | 207 ++++++++++++++++++
1 file changed, 207 insertions(+)
create mode 100644 Documentation/arm64-detecting-tagged-addresses.txt
diff --git a/Documentation/arm64-detecting-tagged-addresses.txt b/Documentation/arm64-detecting-tagged-addresses.txt
new file mode 100644
index 000000000000..0f421c825a9a
--- /dev/null
+++ b/Documentation/arm64-detecting-tagged-addresses.txt
@@ -0,0 +1,207 @@
+Detecting ARM64 tagged pointers
+===============================
+
+The ARM64 ABI allows tagged memory addresses to be passed through the
+user-kernel syscall ABI boundary. Tagged memory addresses are those which
+contain a non-zero top byte - the hardware will always ignore this top
+byte, however software does not. Therefore it is helpful to be able to
+detect code that erroneously compares tagged memory addresses with
+untagged memory addresses. This document describes how smatch can be used
+for this.
+
+Smatch will provide a warning when it detects that a comparison is being
+made between a user originated 64 bit data where the top byte may be
+non-zero and any variable which may contain an untagged address.
+
+Untagged variables are detected by looking for hard-coded known struct
+members (such as vm_start, vm_end and addr_limit) and hard-coded known
+macros (such as PAGE_SIZE, PAGE_MASK and TASK_SIZE). This check is
+also able to detect when comparisons are made against variables that
+have been assigned from these known untagged variables, though this
+tracking is limited to the scope of the function.
+
+This check is only performed when the ARCH environment variable is set to
+arm64. To provide a worked example, consider the following command which is
+used to perform Smatch static analysis on the Linux kernel:
+
+$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- ~/smatch/smatch_scripts/build_kernel_data.sh
+
+It is recommended that this command is run multiple times (6 or more) to
+provide Smatch with a deeper knowledge of the call stack. Before running
+multiple iterations of Smatch, it may be beneficial to delete any smatch*
+files in the root of the linux tree.
+
+Once Smatch has run, you can observe warnings as follows:
+
+$ cat smatch_warns.txt | grep "tagged address"
+mm/gup.c:818 __get_user_pages() warn: comparison of a potentially tagged
+address (__get_user_pages, 2, start)
+...
+
+This warning tells us that on line 818 of mm/gup.c an erroneous comparison
+may have been made between a tagged address (variable 'start' which originated
+from parameter 2 of the function) and existing kernel addresses (untagged).
+
+The code that this relates to follows:
+
+790: static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
+791: unsigned long start, unsigned long nr_pages,
+792: unsigned int gup_flags, struct page **pages,
+793: struct vm_area_struct **vmas, int *nonblocking)
+794:{
+...
+818: if (!vma || start >= vma->vm_end) {
+
+Through manual inspection of this code, we can verify that the variable 'start'
+originated from parameter 2 of its function '__get_user_pages'.
+
+A suggested fix at this point may be to call the untagged_addr macro prior
+to the comparison on line 818. However it's often helpful to follow the
+parameter up the call stack, we can do this with the following Smatch command:
+
+$ ~/smatch/smatch_data/db/smdb.py find_tagged __get_user_pages 2
+ copy_strings (param ?) -> get_arg_page (param 1)
+ vfio_pin_map_dma (param ?) -> vfio_pin_pages_remote (param 1)
+ __se_sys_ptrace (param 2)
+ environ_read (param ?) -> access_remote_vm (param 1)
+ io_sqe_buffer_register (param ?) -> get_user_pages (param 0)
+ gntdev_grant_copy_seg (param ?) -> gntdev_get_page (param 1)
+ get_futex_key (param ?) -> get_user_pages_fast (param 0)
+ __se_sys_madvise (param 0)
+ __mm_populate (param ?) -> populate_vma_page_range (param 1)
+ __se_sys_mprotect (param 0)
+
+
+This script will examine all of the possible callers of __get_user_pages where
+parameter 2 contains user data and where the top byte of the parameter may be
+non-zero. It will recurse up the possible call stacks as far as it can go. This
+will leave a list of functions that provide tagged addresses to __get_user_pages
+and the parameter of interest (or variable if Smatch cannot determine the
+function parameter).
+
+Sometimes Smatch is able to determine a caller of a function but is unable
+to determine which parameter of that function relates to the parameter of the
+called function, when this happens the following output it shown:
+
+get_futex_key (param ?) -> get_user_pages_fast (param 0)
+
+This shows that when following up the call tree from __get_user_pages, we stop
+at get_user_pages_fast with parameter 0 of that function containing user data.
+Smatch knows that get_futex_key calls get_user_pages_fast but cannot determine
+which parameter of get_futex_key provided the data of interest. In these cases
+manual inspection of the source tree can help and if necessary re-run the
+smdb.py script with new parameters (e.g. smdb.py find_tagged get_futex_key 0).
+
+To provide a summary of all of the tagged issues found, the following command
+can be run directly on the smatch_warns.txt file:
+
+$ ~/smatch/smatch_data/db/smdb.py parse_warns_tagged smatch_warns.txt
+
+This will run find_tagged for each issue found, e.g.
+
+mm/mmap.c:2918 (func: __do_sys_remap_file_pages, param: 0:start) may be caused by:
+ __se_sys_remap_file_pages (param 0)
+
+mm/mmap.c:2963 (func: __do_sys_remap_file_pages, param: -1:__UNIQUE_ID___y73) may be caused by:
+ __do_sys_remap_file_pages (variable __UNIQUE_ID___y73 (can't walk call tree)
+
+mm/mmap.c:3000 (func: do_brk_flags, param: -1:error) may be caused by:
+ do_brk_flags (variable error (can't walk call tree)
+
+mm/mmap.c:540 (func: find_vma_links, param: 1:addr) may be caused by:
+ find_vma_links (param 1) (can't walk call tree)
+
+mm/mmap.c:570 (func: count_vma_pages_range, param: -1:__UNIQUE_ID___x64) may be caused by:
+ count_vma_pages_range (variable __UNIQUE_ID___x64 (can't walk call tree)
+
+mm/mmap.c:580 (func: count_vma_pages_range, param: -1:__UNIQUE_ID___x68) may be caused by:
+ count_vma_pages_range (variable __UNIQUE_ID___x68 (can't walk call tree)
+
+mm/mmap.c:856 (func: __vma_adjust, param: 1:start) may be caused by:
+ __se_sys_mprotect (param 0)
+ __se_sys_mlock (param 0)
+ __se_sys_mlock2 (param 0)
+ __se_sys_munlock (param 0)
+ mbind_range (param ?) -> vma_merge (param 2)
+ __se_sys_madvise (param 0)
+ __se_sys_mbind (param 0)
+
+
+The above commands do not output a call stack, instead they provide the 'highest'
+caller found, to provide a call stack perform the following:
+
+$ ~/smatch/smatch_data/db/smdb.py call_tree __get_user_pages
+__get_user_pages()
+ __get_user_pages_locked()
+ get_user_pages_remote()
+ get_arg_page()
+ copy_strings()
+ remove_arg_zero()
+ vaddr_get_pfn()
+ vfio_pin_pages_remote()
+ vfio_pin_page_external()
+ process_vm_rw_single_vec()
+ process_vm_rw_core()
+ __access_remote_vm()
+ ptrace_access_vm()
+ access_remote_vm()
+ access_process_vm()
+ check_and_migrate_cma_pages()
+ __gup_longterm_locked()
+ get_user_pages()
+ __gup_longterm_unlocked()
+ get_user_pages_locked()
+ get_vaddr_frames()
+ vb2_create_framevec()
+ lookup_node()
+ do_get_mempolicy()
+ get_user_pages_unlocked()
+ hva_to_pfn_slow()
+ hva_to_pfn()
+
+Please note that this will show all the callers and is not filtered for those
+carrying tagged addresses in their parameters.
+
+It is possible to filter out false positives by annotating function parameters
+with __untagged. For example:
+
+unsigned long do_mmap(struct file *file, unsigned long addr,
+ unsigned long __untagged len, unsigned long prot,
+ unsigned long flags, vm_flags_t vm_flags,
+ unsigned long pgoff, unsigned long *populate,
+ struct list_head *uf)
+{
+
+This annotation tells smatch that regardless to the value stored in 'len' it
+should be treated as an untagged address. As Smatch is able to track the
+potential ranges of values a variable may hold, it will also track the
+annotation - therefore it is not necessary to use the annotation in every
+function that do_mmap calls. When using this annotation smdb.py will filter
+out functions that carry a value which has been annotated as untagged. Please
+note that due to limitations in parameter tracking some annotations will be
+ignored and not propogated all the way down the call tree.
+
+Finally, the following patch is required to add annotations to the Linux
+kernel:
+
+diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
+index 19e58b9138a0..755e8df375a5 100644
+--- a/include/linux/compiler_types.h
++++ b/include/linux/compiler_types.h
+@@ -19,6 +19,7 @@
+ # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
+ # define __percpu __attribute__((noderef, address_space(3)))
+ # define __rcu __attribute__((noderef, address_space(4)))
++# define __untagged __attribute__((address_space(5)))
+ # define __private __attribute__((noderef))
+ extern void __chk_user_ptr(const volatile void __user *);
+ extern void __chk_io_ptr(const volatile void __iomem *);
+@@ -45,6 +46,7 @@ extern void __chk_io_ptr(const volatile void __iomem *);
+ # define __cond_lock(x,c) (c)
+ # define __percpu
+ # define __rcu
++# define __untagged
+ # define __private
+ # define ACCESS_PRIVATE(p, member) ((p)->member)
+ #endif /* __CHECKER__ */
+
--
2.21.0
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 0/7] Tagged Pointer Detection
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
` (6 preceding siblings ...)
2019-10-07 15:35 ` [RFC PATCH 7/7] Documentation: add guide for tagged addresses Andrew Murray
@ 2019-10-08 8:58 ` Dan Carpenter
7 siblings, 0 replies; 21+ messages in thread
From: Dan Carpenter @ 2019-10-08 8:58 UTC (permalink / raw)
To: Andrew Murray; +Cc: Catalin.Marinas, smatch
I just applied all the patches. Let me push that so that we're working
from the same page.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 21+ messages in thread