From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
To: Song Liu <song@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Linux MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
William Kucharski <william.kucharski@oracle.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache
Date: Wed, 29 Sep 2021 15:50:39 +0800 [thread overview]
Message-ID: <dde441c4-febe-cfa1-7729-b405fa331a4e@linux.alibaba.com> (raw)
In-Reply-To: <CAPhsuW4x2UzMLwZyioWH4dXqrYwNT-XKgzvrm+6YeWk9EgQmCQ@mail.gmail.com>
On 9/29/21 3:14 PM, Song Liu wrote:
> On Tue, Sep 28, 2021 at 9:20 AM Rongwei Wang
> <rongwei.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 9/28/21 6:24 AM, Song Liu wrote:
>>> On Fri, Sep 24, 2021 at 12:12 AM Rongwei Wang
>>> <rongwei.wang@linux.alibaba.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/24/21 10:43 AM, Andrew Morton wrote:
>>>>> On Thu, 23 Sep 2021 01:04:54 +0800 Rongwei Wang <rongwei.wang@linux.alibaba.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>> On Sep 22, 2021, at 7:37 PM, Matthew Wilcox <willy@infradead.org> wrote:
>>>>>>>
>>>>>>> On Wed, Sep 22, 2021 at 03:06:44PM +0800, Rongwei Wang wrote:
>>>>>>>> Transparent huge page has supported read-only non-shmem files. The file-
>>>>>>>> backed THP is collapsed by khugepaged and truncated when written (for
>>>>>>>> shared libraries).
>>>>>>>>
>>>>>>>> However, there is race in two possible places.
>>>>>>>>
>>>>>>>> 1) multiple writers truncate the same page cache concurrently;
>>>>>>>> 2) collapse_file rolls back when writer truncates the page cache;
>>>>>>>
>>>>>>> As I've said before, the bug here is that somehow there is a writable fd
>>>>>>> to a file with THPs. That's what we need to track down and fix.
>>>>>> Hi, Matthew
>>>>>> I am not sure get your means. We know “mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs"
>>>>>> Introduced file-backed THPs for DSO. It is possible {very rarely} for DSO to be opened in writeable way.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> https://lore.kernel.org/linux-mm/YUdL3lFLFHzC80Wt@casper.infradead.org/
>>>>>> All in all, what you mean is that we should solve this race at the source?
>>>>>
>>>>> Matthew is being pretty clear here: we shouldn't be permitting
>>>>> userspace to get a writeable fd for a thp-backed file.
>>>>>
>>>>> Why are we permitting the DSO to be opened writeably? If there's a
>>>>> legitimate case for doing this then presumably "mm, thp: relax the
>>>> There is a use case to stress file-backed THP within attachment.
>>>> I test this case in a system which has enabled CONFIG_READ_ONLY_THP_FOR_FS:
>>>>
>>>> $ gcc -Wall -g -o stress_madvise_dso stress_madvise_dso.c
>>>> $ ulimit -s unlimited
>>>> $ ./stress_madvise_dso 10000 <libtest.so>
>>>>
>>>> the meaning of above parameters:
>>>> 10000: the max test time;
>>>> <libtest.so>: the DSO that will been mapped into file-backed THP by
>>>> madvise. It recommended that the text segment of DSO to be tested is
>>>> greater than 2M.
>>>>
>>>> The crash will been triggered at once in the latest kernel. And this
>>>> case also can used to trigger the bug that mentioned in our another patch.
>>>
>>> Hmm.. I am not able to use the repro program to crash the system. Not
>>> sure what I did wrong.
>>>
>> Hi
>> I have tried to check my test case again. Can you make sure the DSO that
>> you test have THP mapping?
>>
>> If you are willing to try again, I can send my libtest.c which is used
>> to test by myself (actually, it shouldn't be target DSO problem).
>>
>> Thanks very much!
>>> OTOH, does it make sense to block writes within khugepaged, like:
>>>
>>> diff --git i/mm/khugepaged.c w/mm/khugepaged.c
>>> index 045cc579f724e..ad7c41ec15027 100644
>>> --- i/mm/khugepaged.c
>>> +++ w/mm/khugepaged.c
>>> @@ -51,6 +51,7 @@ enum scan_result {
>>> SCAN_CGROUP_CHARGE_FAIL,
>>> SCAN_TRUNCATED,
>>> SCAN_PAGE_HAS_PRIVATE,
>>> + SCAN_BUSY_WRITE,
>>> };
>>>
>>> #define CREATE_TRACE_POINTS
>>> @@ -1652,6 +1653,11 @@ static void collapse_file(struct mm_struct *mm,
>>> /* Only allocate from the target node */
>>> gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
>>>
>>> + if (deny_write_access(file)) {
>>> + result = SCAN_BUSY_WRITE;
>>> + return;
>>> + }
>>> +
>> This can indeed avoid some possible races from source.
>>
>> But, I am thinking about whether this will lead to DDoS attack?
>> I remember the reason of DSO has ignored MAP_DENYWRITE in kernel
>> is that DDoS attack. In addition, 'deny_write_access' will change
>> the behavior, such as user will get 'Text file busy' during
>> collapse_file. I am not sure whether the behavior changing is acceptable
>> in user space.
>>
>> If it is acceptable, I am very willing to fix the races like your way.
>
> I guess we should not let the write get ETXTBUSY for khugepaged work.
>
> I am getting some segfault on stress_madvise_dso. And it doesn't really
> generate the bug stack in my vm (qemu-system-x86_64). Is there an newer
Hi, I can sure I am not update the stress_madvise_dso.c.
My test environment is vm (qemu-system-aarch64, 32 cores). And I can
think of the following possibilities:
(1) in thread_read()
printf("read %s\n", dso_path);
d = open(dso_path, O_RDONLY);
/* The start addr must be alignment with 2M */
void *p = mmap((void *)0x40000dc00000UL, 0x800000, PROT_READ |
PROT_EXEC,MAP_PRIVATE, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
goto out;
}
0x40000dc00000 is random setting by myself. I am not sure this address
is available in your vm.
(2) in thread_write()
int fd = open(dso_path, O_RDWR);
p = mmap(NULL, 0x800000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
goto out; /* fail */
}
because of I am sure the DSO is bigger than 0x800000, so directly map
the DSO using 0x800000. Maybe I had use '-z max-page-size=0x200000' to
compile the DSO? likes:
$ gcc -z max-page-size=0x200000 -o libtest.so -shared libtest.o
If you don't mind, you can send the segment fault log to me. And I will
find x86 environment to test.
Thanks!
> version of it?
>
> Thanks,
> Song
>
next prev parent reply other threads:[~2021-09-29 7:50 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 12:11 [PATCH 0/2] mm, thp: fix file-backed THP race in collapse_file Rongwei Wang
2021-09-06 12:11 ` [PATCH 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-07 2:49 ` Yu Xu
2021-09-07 18:08 ` Yang Shi
2021-09-08 2:35 ` Rongwei Wang
2021-09-08 21:48 ` Yang Shi
2021-09-09 1:25 ` Rongwei Wang
2021-09-13 14:49 ` [mm, thp] 20753096b6: BUG:unable_to_handle_page_fault_for_address kernel test robot
2021-09-13 14:49 ` kernel test robot
2021-09-06 12:12 ` [PATCH 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-09-07 16:56 ` Yang Shi
2021-09-08 2:16 ` Rongwei Wang
2021-09-08 21:51 ` Yang Shi
2021-09-09 1:33 ` Rongwei Wang
2021-09-22 7:06 ` [PATCH v2 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-09-22 7:06 ` [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-22 11:37 ` Matthew Wilcox
2021-09-22 17:04 ` Rongwei Wang
2021-09-24 2:43 ` Andrew Morton
2021-09-24 3:08 ` Yang Shi
2021-09-24 3:35 ` Rongwei Wang
2021-09-24 7:12 ` Rongwei Wang
2021-09-27 22:24 ` Song Liu
2021-09-28 12:06 ` Matthew Wilcox
2021-09-28 16:59 ` Song Liu
2021-09-28 16:20 ` Rongwei Wang
2021-09-29 7:14 ` Song Liu
2021-09-29 7:50 ` Rongwei Wang [this message]
2021-09-29 16:59 ` Song Liu
2021-09-29 17:55 ` Matthew Wilcox
2021-09-29 23:41 ` Song Liu
2021-09-30 0:00 ` Matthew Wilcox
2021-09-30 0:41 ` Song Liu
2021-09-30 2:14 ` Rongwei Wang
2021-10-04 17:26 ` Rongwei Wang
2021-10-04 19:05 ` Matthew Wilcox
2021-10-05 1:58 ` Rongwei Wang
2021-10-04 20:26 ` Song Liu
2021-10-05 2:58 ` Hugh Dickins
2021-10-05 3:07 ` Matthew Wilcox
2021-10-05 9:03 ` Rongwei Wang
2021-09-30 1:54 ` Rongwei Wang
2021-09-30 3:26 ` Song Liu
2021-09-30 5:24 ` Hugh Dickins
2021-09-30 15:28 ` Matthew Wilcox
2021-09-30 16:49 ` Hugh Dickins
2021-09-30 17:39 ` Yang Shi
2021-10-02 17:08 ` Matthew Wilcox
2021-10-04 18:28 ` Yang Shi
2021-10-04 19:31 ` Matthew Wilcox
2021-10-05 2:26 ` Hugh Dickins
2021-10-02 2:22 ` Rongwei Wang
2021-09-22 7:06 ` [PATCH v2 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06 2:18 ` [PATCH v3 v3 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-06 2:18 ` [PATCH v3 v3 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-06 2:18 ` [PATCH v3 v3 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06 2:41 ` Matthew Wilcox
2021-10-06 8:39 ` Rongwei Wang
2021-10-06 17:58 ` Yang Shi
2021-10-11 2:22 ` [PATCH v4 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-11 2:22 ` [PATCH v4 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-13 7:55 ` Rongwei Wang
2021-10-11 2:22 ` [PATCH v4 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-11 3:08 ` Matthew Wilcox
2021-10-11 3:22 ` Rongwei Wang
2021-10-11 5:08 ` [PATCH v4 RESEND " Rongwei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dde441c4-febe-cfa1-7729-b405fa331a4e@linux.alibaba.com \
--to=rongwei.wang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=song@kernel.org \
--cc=william.kucharski@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.