From: Andrew Murray <andrew.murray@arm.com>
To: Dan Carpenter <dan.carpenter@oracle.com>, Catalin.Marinas@arm.com
Cc: smatch@vger.kernel.org
Subject: [RFC PATCH 7/7] Documentation: add guide for tagged addresses
Date: Mon, 7 Oct 2019 16:35:45 +0100 [thread overview]
Message-ID: <20191007153545.23231-8-andrew.murray@arm.com> (raw)
In-Reply-To: <20191007153545.23231-1-andrew.murray@arm.com>
Add documentation to explain how Smatch can be used to detect
when a comparison is being made between vm_start/vm_end members
of the vma_area_struct and user originated 64 bit data where the
top byte may be non-zero (tagged address).
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
.../arm64-detecting-tagged-addresses.txt | 207 ++++++++++++++++++
1 file changed, 207 insertions(+)
create mode 100644 Documentation/arm64-detecting-tagged-addresses.txt
diff --git a/Documentation/arm64-detecting-tagged-addresses.txt b/Documentation/arm64-detecting-tagged-addresses.txt
new file mode 100644
index 000000000000..0f421c825a9a
--- /dev/null
+++ b/Documentation/arm64-detecting-tagged-addresses.txt
@@ -0,0 +1,207 @@
+Detecting ARM64 tagged pointers
+===============================
+
+The ARM64 ABI allows tagged memory addresses to be passed through the
+user-kernel syscall ABI boundary. Tagged memory addresses are those which
+contain a non-zero top byte - the hardware will always ignore this top
+byte, however software does not. Therefore it is helpful to be able to
+detect code that erroneously compares tagged memory addresses with
+untagged memory addresses. This document describes how smatch can be used
+for this.
+
+Smatch will provide a warning when it detects that a comparison is being
+made between a user originated 64 bit data where the top byte may be
+non-zero and any variable which may contain an untagged address.
+
+Untagged variables are detected by looking for hard-coded known struct
+members (such as vm_start, vm_end and addr_limit) and hard-coded known
+macros (such as PAGE_SIZE, PAGE_MASK and TASK_SIZE). This check is
+also able to detect when comparisons are made against variables that
+have been assigned from these known untagged variables, though this
+tracking is limited to the scope of the function.
+
+This check is only performed when the ARCH environment variable is set to
+arm64. To provide a worked example, consider the following command which is
+used to perform Smatch static analysis on the Linux kernel:
+
+$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- ~/smatch/smatch_scripts/build_kernel_data.sh
+
+It is recommended that this command is run multiple times (6 or more) to
+provide Smatch with a deeper knowledge of the call stack. Before running
+multiple iterations of Smatch, it may be beneficial to delete any smatch*
+files in the root of the linux tree.
+
+Once Smatch has run, you can observe warnings as follows:
+
+$ cat smatch_warns.txt | grep "tagged address"
+mm/gup.c:818 __get_user_pages() warn: comparison of a potentially tagged
+address (__get_user_pages, 2, start)
+...
+
+This warning tells us that on line 818 of mm/gup.c an erroneous comparison
+may have been made between a tagged address (variable 'start' which originated
+from parameter 2 of the function) and existing kernel addresses (untagged).
+
+The code that this relates to follows:
+
+790: static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
+791: unsigned long start, unsigned long nr_pages,
+792: unsigned int gup_flags, struct page **pages,
+793: struct vm_area_struct **vmas, int *nonblocking)
+794:{
+...
+818: if (!vma || start >= vma->vm_end) {
+
+Through manual inspection of this code, we can verify that the variable 'start'
+originated from parameter 2 of its function '__get_user_pages'.
+
+A suggested fix at this point may be to call the untagged_addr macro prior
+to the comparison on line 818. However it's often helpful to follow the
+parameter up the call stack, we can do this with the following Smatch command:
+
+$ ~/smatch/smatch_data/db/smdb.py find_tagged __get_user_pages 2
+ copy_strings (param ?) -> get_arg_page (param 1)
+ vfio_pin_map_dma (param ?) -> vfio_pin_pages_remote (param 1)
+ __se_sys_ptrace (param 2)
+ environ_read (param ?) -> access_remote_vm (param 1)
+ io_sqe_buffer_register (param ?) -> get_user_pages (param 0)
+ gntdev_grant_copy_seg (param ?) -> gntdev_get_page (param 1)
+ get_futex_key (param ?) -> get_user_pages_fast (param 0)
+ __se_sys_madvise (param 0)
+ __mm_populate (param ?) -> populate_vma_page_range (param 1)
+ __se_sys_mprotect (param 0)
+
+
+This script will examine all of the possible callers of __get_user_pages where
+parameter 2 contains user data and where the top byte of the parameter may be
+non-zero. It will recurse up the possible call stacks as far as it can go. This
+will leave a list of functions that provide tagged addresses to __get_user_pages
+and the parameter of interest (or variable if Smatch cannot determine the
+function parameter).
+
+Sometimes Smatch is able to determine a caller of a function but is unable
+to determine which parameter of that function relates to the parameter of the
+called function, when this happens the following output it shown:
+
+get_futex_key (param ?) -> get_user_pages_fast (param 0)
+
+This shows that when following up the call tree from __get_user_pages, we stop
+at get_user_pages_fast with parameter 0 of that function containing user data.
+Smatch knows that get_futex_key calls get_user_pages_fast but cannot determine
+which parameter of get_futex_key provided the data of interest. In these cases
+manual inspection of the source tree can help and if necessary re-run the
+smdb.py script with new parameters (e.g. smdb.py find_tagged get_futex_key 0).
+
+To provide a summary of all of the tagged issues found, the following command
+can be run directly on the smatch_warns.txt file:
+
+$ ~/smatch/smatch_data/db/smdb.py parse_warns_tagged smatch_warns.txt
+
+This will run find_tagged for each issue found, e.g.
+
+mm/mmap.c:2918 (func: __do_sys_remap_file_pages, param: 0:start) may be caused by:
+ __se_sys_remap_file_pages (param 0)
+
+mm/mmap.c:2963 (func: __do_sys_remap_file_pages, param: -1:__UNIQUE_ID___y73) may be caused by:
+ __do_sys_remap_file_pages (variable __UNIQUE_ID___y73 (can't walk call tree)
+
+mm/mmap.c:3000 (func: do_brk_flags, param: -1:error) may be caused by:
+ do_brk_flags (variable error (can't walk call tree)
+
+mm/mmap.c:540 (func: find_vma_links, param: 1:addr) may be caused by:
+ find_vma_links (param 1) (can't walk call tree)
+
+mm/mmap.c:570 (func: count_vma_pages_range, param: -1:__UNIQUE_ID___x64) may be caused by:
+ count_vma_pages_range (variable __UNIQUE_ID___x64 (can't walk call tree)
+
+mm/mmap.c:580 (func: count_vma_pages_range, param: -1:__UNIQUE_ID___x68) may be caused by:
+ count_vma_pages_range (variable __UNIQUE_ID___x68 (can't walk call tree)
+
+mm/mmap.c:856 (func: __vma_adjust, param: 1:start) may be caused by:
+ __se_sys_mprotect (param 0)
+ __se_sys_mlock (param 0)
+ __se_sys_mlock2 (param 0)
+ __se_sys_munlock (param 0)
+ mbind_range (param ?) -> vma_merge (param 2)
+ __se_sys_madvise (param 0)
+ __se_sys_mbind (param 0)
+
+
+The above commands do not output a call stack, instead they provide the 'highest'
+caller found, to provide a call stack perform the following:
+
+$ ~/smatch/smatch_data/db/smdb.py call_tree __get_user_pages
+__get_user_pages()
+ __get_user_pages_locked()
+ get_user_pages_remote()
+ get_arg_page()
+ copy_strings()
+ remove_arg_zero()
+ vaddr_get_pfn()
+ vfio_pin_pages_remote()
+ vfio_pin_page_external()
+ process_vm_rw_single_vec()
+ process_vm_rw_core()
+ __access_remote_vm()
+ ptrace_access_vm()
+ access_remote_vm()
+ access_process_vm()
+ check_and_migrate_cma_pages()
+ __gup_longterm_locked()
+ get_user_pages()
+ __gup_longterm_unlocked()
+ get_user_pages_locked()
+ get_vaddr_frames()
+ vb2_create_framevec()
+ lookup_node()
+ do_get_mempolicy()
+ get_user_pages_unlocked()
+ hva_to_pfn_slow()
+ hva_to_pfn()
+
+Please note that this will show all the callers and is not filtered for those
+carrying tagged addresses in their parameters.
+
+It is possible to filter out false positives by annotating function parameters
+with __untagged. For example:
+
+unsigned long do_mmap(struct file *file, unsigned long addr,
+ unsigned long __untagged len, unsigned long prot,
+ unsigned long flags, vm_flags_t vm_flags,
+ unsigned long pgoff, unsigned long *populate,
+ struct list_head *uf)
+{
+
+This annotation tells smatch that regardless to the value stored in 'len' it
+should be treated as an untagged address. As Smatch is able to track the
+potential ranges of values a variable may hold, it will also track the
+annotation - therefore it is not necessary to use the annotation in every
+function that do_mmap calls. When using this annotation smdb.py will filter
+out functions that carry a value which has been annotated as untagged. Please
+note that due to limitations in parameter tracking some annotations will be
+ignored and not propogated all the way down the call tree.
+
+Finally, the following patch is required to add annotations to the Linux
+kernel:
+
+diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
+index 19e58b9138a0..755e8df375a5 100644
+--- a/include/linux/compiler_types.h
++++ b/include/linux/compiler_types.h
+@@ -19,6 +19,7 @@
+ # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
+ # define __percpu __attribute__((noderef, address_space(3)))
+ # define __rcu __attribute__((noderef, address_space(4)))
++# define __untagged __attribute__((address_space(5)))
+ # define __private __attribute__((noderef))
+ extern void __chk_user_ptr(const volatile void __user *);
+ extern void __chk_io_ptr(const volatile void __iomem *);
+@@ -45,6 +46,7 @@ extern void __chk_io_ptr(const volatile void __iomem *);
+ # define __cond_lock(x,c) (c)
+ # define __percpu
+ # define __rcu
++# define __untagged
+ # define __private
+ # define ACCESS_PRIVATE(p, member) ((p)->member)
+ #endif /* __CHECKER__ */
+
--
2.21.0
next prev parent reply other threads:[~2019-10-07 15:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-07 15:35 [RFC PATCH 0/7] Tagged Pointer Detection Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 1/7] build: Add '-lm' build flag Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 2/7] smdb.py: remove undocumented test command Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 3/7] arm64: add check for comparison against tagged address Andrew Murray
2019-10-07 15:49 ` Andrew Murray
2019-12-04 15:41 ` Andrew Murray
2019-12-05 13:27 ` Dan Carpenter
2019-12-05 14:28 ` Dan Carpenter
2019-12-05 14:35 ` Dan Carpenter
2019-12-05 17:21 ` Luc Van Oostenryck
2019-12-05 17:24 ` Luc Van Oostenryck
2019-10-07 15:35 ` [RFC PATCH 4/7] smdb.py: add find_tagged and parse_warns_tagged commands Andrew Murray
2019-10-07 15:52 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 5/7] kernel_user_data: track parameter __untagged annotations Andrew Murray
2019-10-07 15:55 ` Andrew Murray
2019-12-05 15:04 ` Dan Carpenter
2019-10-08 8:24 ` Dan Carpenter
2019-10-08 8:41 ` Andrew Murray
2019-10-07 15:35 ` [RFC PATCH 6/7] smdb.py: filter out __untagged from find_tagged results Andrew Murray
2019-10-07 15:35 ` Andrew Murray [this message]
2019-10-08 8:58 ` [RFC PATCH 0/7] Tagged Pointer Detection Dan Carpenter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191007153545.23231-8-andrew.murray@arm.com \
--to=andrew.murray@arm.com \
--cc=Catalin.Marinas@arm.com \
--cc=dan.carpenter@oracle.com \
--cc=smatch@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox