From: Catalin Marinas <catalin.marinas@arm.com>
To: "Qun-wei Lin (林群崴)" <Qun-wei.Lin@mediatek.com>
Cc: "linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"surenb@google.com" <surenb@google.com>,
"david@redhat.com" <david@redhat.com>,
"Chinwen Chang (張錦文)" <chinwen.chang@mediatek.com>,
"kasan-dev@googlegroups.com" <kasan-dev@googlegroups.com>,
"Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>,
"Casper Li (李中榮)" <casper.li@mediatek.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"Steven Price" <steven.price@arm.com>
Subject: Re: [BUG] Usersapce MTE error with allocation tag 0 when low on memory
Date: Wed, 29 Mar 2023 17:54:45 +0100 [thread overview]
Message-ID: <ZCRtVW9Q0WOKEQVX@arm.com> (raw)
In-Reply-To: <5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com>
+ Steven Price who added the MTE swap support.
On Wed, Mar 29, 2023 at 02:55:49AM +0000, Qun-wei Lin (林群崴) wrote:
> Hi,
>
> We meet the mass MTE errors happened in Android T with kernel-6.1.
>
> When the system is under memory pressure, the MTE often triggers some
> error reporting in userspace.
>
> Like the tombstone below, there are many reports with the acllocation
> tags of 0:
>
> Build fingerprint:
> 'alps/vext_k6897v1_64/k6897v1_64:13/TP1A.220624.014/mp2ofp23:userdebug/
> dev-keys'
> Revision: '0'
> ABI: 'arm64'
> Timestamp: 2023-03-14 06:39:40.344251744+0800
> Process uptime: 0s
> Cmdline: /vendor/bin/hw/camerahalserver
> pid: 988, tid: 1395, name: binder:988_3 >>>
> /vendor/bin/hw/camerahalserver <<<
> uid: 1047
> tagged_addr_ctrl: 000000000007fff3 (PR_TAGGED_ADDR_ENABLE,
> PR_MTE_TCF_SYNC, mask 0xfffe)
> signal 11 (SIGSEGV), code 9 (SEGV_MTESERR), fault addr
> 0x0d000075f1d8d7f0
> x0 00000075018d3fb0 x1 00000000c0306201 x2 00000075018d3ae8 x
> 3 000000000000720c
> x4 0000000000000000 x5 0000000000000000 x6 00000642000004fe x
> 7 0000054600000630
> x8 00000000fffffff2 x9 b34a1094e7e33c3f x10
> 00000075018d3a80 x11 00000075018d3a50
> x12 ffffff80ffffffd0 x13 0000061e0000072c x14
> 0000000000000004 x15 0000000000000000
> x16 00000077f2dfcd78 x17 00000077da3a8ff0 x18
> 00000075011bc000 x19 0d000075f1d8d898
> x20 0d000075f1d8d7f0 x21 0d000075f1d8d910 x22
> 0000000000000000 x23 00000000fffffff7
> x24 00000075018d4000 x25 0000000000000000 x26
> 00000075018d3ff8 x27 00000000000fc000
> x28 00000000000fe000 x29 00000075018d3b20
> lr 00000077f2d9f164 sp 00000075018d3ad0 pc 00000077f2d9f134 p
> st 0000000080001000
>
> backtrace:
> #00 pc 000000000005d134 /system/lib64/libbinder.so
> (android::IPCThreadState::talkWithDriver(bool)+244) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #01 pc 000000000005d448 /system/lib64/libbinder.so
> (android::IPCThreadState::getAndExecuteCommand()+24) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #02 pc 000000000005dd64 /system/lib64/libbinder.so
> (android::IPCThreadState::joinThreadPool(bool)+68) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #03 pc 000000000008dba8 /system/lib64/libbinder.so
> (android::PoolThread::threadLoop()+24) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #04 pc 0000000000013440 /system/lib64/libutils.so
> (android::Thread::_threadLoop(void*)+416) (BuildId:
> 10aac5d4a671e4110bc00c9b69d83d8a)
> #05 pc
> 00000000000c14cc /apex/com.android.runtime/lib64/bionic/libc.so
> (__pthread_start(void*)+204) (BuildId:
> 718ecc04753b519b0f6289a7a2fcf117)
> #06 pc
> 0000000000054930 /apex/com.android.runtime/lib64/bionic/libc.so
> (__start_thread+64) (BuildId: 718ecc04753b519b0f6289a7a2fcf117)
>
> Memory tags around the fault address (0xd000075f1d8d7f0), one tag per
> 16 bytes:
> 0x75f1d8cf00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d000: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d100: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d200: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d300: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d400: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d500: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d600: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> =>0x75f1d8d700: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [0]
> 0x75f1d8d800: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d900: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8da00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8db00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8dc00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8dd00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8de00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> Also happens in coredump.
>
> This problem only occurs when ZRAM is enabled, so we think there are
> some issues regarding swap in/out.
>
> Having compared the differences between Kernel-5.15 and Kernel-6.1,
> We found the order of swap_free() and set_pte_at() is changed in
> do_swap_page().
>
> When fault in, do_swap_page() will call swap_free() first:
> do_swap_page() -> swap_free() -> __swap_entry_free() ->
> free_swap_slot() -> swapcache_free_entries() -> swap_entry_free() ->
> swap_range_free() -> arch_swap_invalidate_page() ->
> mte_invalidate_tags_area() -> mte_invalidate_tags() -> xa_erase()
>
> and then call set_pte_at():
> do_swap_page() -> set_pte_at() -> __set_pte_at() -> mte_sync_tags() ->
> mte_sync_page_tags() -> mte_restore_tags() -> xa_load()
>
> This means that the swap slot is invalidated before pte mapping, and
> this will cause the mte tag in XArray to be released before tag
> restore.
>
> After I moved swap_free() to the next line of set_pte_at(), the problem
> is disappeared.
>
> We suspect that the following patches, which have changed the order, do
> not consider the mte tag restoring in page fault flow:
> https://lore.kernel.org/all/20220131162940.210846-5-david@redhat.com/
>
> Any suggestion is appreciated.
>
> Thank you.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: "Qun-wei Lin (林群崴)" <Qun-wei.Lin@mediatek.com>
Cc: "linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"surenb@google.com" <surenb@google.com>,
"david@redhat.com" <david@redhat.com>,
"Chinwen Chang (張錦文)" <chinwen.chang@mediatek.com>,
"kasan-dev@googlegroups.com" <kasan-dev@googlegroups.com>,
"Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>,
"Casper Li (李中榮)" <casper.li@mediatek.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"Steven Price" <steven.price@arm.com>
Subject: Re: [BUG] Usersapce MTE error with allocation tag 0 when low on memory
Date: Wed, 29 Mar 2023 17:54:45 +0100 [thread overview]
Message-ID: <ZCRtVW9Q0WOKEQVX@arm.com> (raw)
In-Reply-To: <5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com>
+ Steven Price who added the MTE swap support.
On Wed, Mar 29, 2023 at 02:55:49AM +0000, Qun-wei Lin (林群崴) wrote:
> Hi,
>
> We meet the mass MTE errors happened in Android T with kernel-6.1.
>
> When the system is under memory pressure, the MTE often triggers some
> error reporting in userspace.
>
> Like the tombstone below, there are many reports with the acllocation
> tags of 0:
>
> Build fingerprint:
> 'alps/vext_k6897v1_64/k6897v1_64:13/TP1A.220624.014/mp2ofp23:userdebug/
> dev-keys'
> Revision: '0'
> ABI: 'arm64'
> Timestamp: 2023-03-14 06:39:40.344251744+0800
> Process uptime: 0s
> Cmdline: /vendor/bin/hw/camerahalserver
> pid: 988, tid: 1395, name: binder:988_3 >>>
> /vendor/bin/hw/camerahalserver <<<
> uid: 1047
> tagged_addr_ctrl: 000000000007fff3 (PR_TAGGED_ADDR_ENABLE,
> PR_MTE_TCF_SYNC, mask 0xfffe)
> signal 11 (SIGSEGV), code 9 (SEGV_MTESERR), fault addr
> 0x0d000075f1d8d7f0
> x0 00000075018d3fb0 x1 00000000c0306201 x2 00000075018d3ae8 x
> 3 000000000000720c
> x4 0000000000000000 x5 0000000000000000 x6 00000642000004fe x
> 7 0000054600000630
> x8 00000000fffffff2 x9 b34a1094e7e33c3f x10
> 00000075018d3a80 x11 00000075018d3a50
> x12 ffffff80ffffffd0 x13 0000061e0000072c x14
> 0000000000000004 x15 0000000000000000
> x16 00000077f2dfcd78 x17 00000077da3a8ff0 x18
> 00000075011bc000 x19 0d000075f1d8d898
> x20 0d000075f1d8d7f0 x21 0d000075f1d8d910 x22
> 0000000000000000 x23 00000000fffffff7
> x24 00000075018d4000 x25 0000000000000000 x26
> 00000075018d3ff8 x27 00000000000fc000
> x28 00000000000fe000 x29 00000075018d3b20
> lr 00000077f2d9f164 sp 00000075018d3ad0 pc 00000077f2d9f134 p
> st 0000000080001000
>
> backtrace:
> #00 pc 000000000005d134 /system/lib64/libbinder.so
> (android::IPCThreadState::talkWithDriver(bool)+244) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #01 pc 000000000005d448 /system/lib64/libbinder.so
> (android::IPCThreadState::getAndExecuteCommand()+24) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #02 pc 000000000005dd64 /system/lib64/libbinder.so
> (android::IPCThreadState::joinThreadPool(bool)+68) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #03 pc 000000000008dba8 /system/lib64/libbinder.so
> (android::PoolThread::threadLoop()+24) (BuildId:
> 8b5612259e4a42521c430456ec5939c7)
> #04 pc 0000000000013440 /system/lib64/libutils.so
> (android::Thread::_threadLoop(void*)+416) (BuildId:
> 10aac5d4a671e4110bc00c9b69d83d8a)
> #05 pc
> 00000000000c14cc /apex/com.android.runtime/lib64/bionic/libc.so
> (__pthread_start(void*)+204) (BuildId:
> 718ecc04753b519b0f6289a7a2fcf117)
> #06 pc
> 0000000000054930 /apex/com.android.runtime/lib64/bionic/libc.so
> (__start_thread+64) (BuildId: 718ecc04753b519b0f6289a7a2fcf117)
>
> Memory tags around the fault address (0xd000075f1d8d7f0), one tag per
> 16 bytes:
> 0x75f1d8cf00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d000: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d100: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d200: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d300: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d400: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d500: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d600: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> =>0x75f1d8d700: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [0]
> 0x75f1d8d800: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8d900: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8da00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8db00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8dc00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8dd00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0x75f1d8de00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> Also happens in coredump.
>
> This problem only occurs when ZRAM is enabled, so we think there are
> some issues regarding swap in/out.
>
> Having compared the differences between Kernel-5.15 and Kernel-6.1,
> We found the order of swap_free() and set_pte_at() is changed in
> do_swap_page().
>
> When fault in, do_swap_page() will call swap_free() first:
> do_swap_page() -> swap_free() -> __swap_entry_free() ->
> free_swap_slot() -> swapcache_free_entries() -> swap_entry_free() ->
> swap_range_free() -> arch_swap_invalidate_page() ->
> mte_invalidate_tags_area() -> mte_invalidate_tags() -> xa_erase()
>
> and then call set_pte_at():
> do_swap_page() -> set_pte_at() -> __set_pte_at() -> mte_sync_tags() ->
> mte_sync_page_tags() -> mte_restore_tags() -> xa_load()
>
> This means that the swap slot is invalidated before pte mapping, and
> this will cause the mte tag in XArray to be released before tag
> restore.
>
> After I moved swap_free() to the next line of set_pte_at(), the problem
> is disappeared.
>
> We suspect that the following patches, which have changed the order, do
> not consider the mte tag restoring in page fault flow:
> https://lore.kernel.org/all/20220131162940.210846-5-david@redhat.com/
>
> Any suggestion is appreciated.
>
> Thank you.
next prev parent reply other threads:[~2023-03-29 16:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-29 2:55 [BUG] Usersapce MTE error with allocation tag 0 when low on memory Qun-wei Lin (林群崴)
2023-03-29 2:55 ` Qun-wei Lin (林群崴)
2023-03-29 2:55 ` Qun-wei Lin (林群崴)
2023-03-29 15:59 ` Andrey Konovalov
2023-03-29 15:59 ` Andrey Konovalov
2023-03-29 16:54 ` Catalin Marinas [this message]
2023-03-29 16:54 ` Catalin Marinas
2023-03-30 13:56 ` Steven Price
2023-03-30 13:56 ` Steven Price
2023-03-30 17:36 ` Catalin Marinas
2023-03-30 17:36 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZCRtVW9Q0WOKEQVX@arm.com \
--to=catalin.marinas@arm.com \
--cc=Kuan-Ying.Lee@mediatek.com \
--cc=Qun-wei.Lin@mediatek.com \
--cc=casper.li@mediatek.com \
--cc=chinwen.chang@mediatek.com \
--cc=david@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=kasan-dev@googlegroups.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.