From: Xiaofei Tan <tanxiaofei@huawei.com>
To: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Linuxarm <linuxarm@huawei.com>, Will Deacon <will@kernel.org>,
Dave Martin <Dave.Martin@arm.com>,
linux-arm-kernel@lists.infradead.org,
Shiju Jose <shiju.jose@huawei.com>
Subject: Re: Question about SEA handling process happened in user space
Date: Fri, 10 Apr 2020 17:43:57 +0800 [thread overview]
Message-ID: <5E903FDD.4080106@huawei.com> (raw)
In-Reply-To: <66db5a6a-e68b-00b7-6a78-2c8cd9e63aab@arm.com>
Hi James,
On 2020/4/9 22:28, James Morse wrote:
> On 09/04/2020 10:17, Xiaofei Tan wrote:
>> On 2020/4/8 0:37, James Morse wrote:
>>> On 02/04/2020 07:35, Xiaofei Tan wrote:
>>>> On 2020/3/31 0:49, James Morse wrote:
>>>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>>>> alternative is to upgrade to SIGKILL in that case.
>>>>>
>>>>>
>>>>> If you see this instead of the address provided via firmware-first, there is a
>>>>> series to improve that here:
>>>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>>>
>>>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>>>> take the signal from memory_failure() instead, which may have better information.)
>>>
>>>> There may be an competition issue.
>>>> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
>>>> before here SEA handling end, and application process may back to run.
>
>>> With that series, it runs in process-context as task-work. memory_failure() needs to
>>> sleep, so it has to run in process-context.
>>
>>
>>> Doing it as task-work means it runs before the thread returns to user-space.
>>
>> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
>> have returned to user-space before it.
>
> ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
> and we call tracehook_notify_resume() which ends up in task_work_run()...
>
> That TIF flag effectively prevents this thread returning to user-space until that task
> work has run.
>
Got it. This function is great.
BTW, i have not found the place of setting the flag _TIF_NOTIFY_RESUME. Is it set by default for each thread?
>
>> BTW, What context synchronous exception abort is? I thought it was process-context.
>
> It depends what you interrupted.
> 32bit had different CPU modes for different contexts, we don't have that in 64bit. Instead
> we mask asynchronous interrupts, and tinker with the preempt count to track the context.
> Synchronous exceptions can't be masked, so they happen in whatever context you were
> already in.
> This means the exception handlers have to be be prepared for each eventuality.
> (which is why that code is starting to look complex)
>
OK.
>
>> Because in_interrupt() return false called in do_sea().
>
> If you took the exception from EL0, or EL1 process context, yes. If you took the exception
> from an IRQ handler, in_interrupt() would return true.
>
Got it.
>
>>> If another thread in the same process accesses the affected memory, I'd expect to take a
>>> second external abort. If another process had the page mapped, it could access the
>>> affected memory, again taking an external abort.
>
>> Yes, it is hard to avoid another thread to access the affected memory.
>> I just worry the same thread access it again.
>
> This is the race that that series fixes.
> It can't happen with mainline as the arch code unconditionally signals the affected
> process, which was the pre-RAS behaviour.
>
OK
>>> These two could happen while the first CPU was in firmware generating the CPER records, so
>>> its not a race we can fix. It should be harmless, the recovery action is the same, its
>>> just the error counters that count more events than errors. If you actually see it happen,
>>> we can try and make it smaller...
>
>> Hmm, maybe this double SEA handling is an solution.
>
> It assumes you get a second external-abort. We know this thread is affected, and will try
> and consume the error again if we restart it. We shouldn't restart it until we've given
> the recovery our best shot.
> Letting it loose is a poor choice if you have any kind of threshold for error-counts. They
> may jump NR_CPUs at a time until every CPU is waiting in memory_failure()...
>
Got it. Thanks.
>
> Thanks,
>
> James
>
> .
>
--
thanks
tanxiaofei
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-04-10 9:44 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-30 13:10 Question about SEA handling process happened in user space Xiaofei Tan
2020-03-30 16:49 ` James Morse
2020-03-31 9:41 ` Xiaofei Tan
2020-03-31 17:00 ` James Morse
2020-04-01 3:49 ` Xiaofei Tan
2020-04-07 16:37 ` James Morse
2020-04-09 8:42 ` Xiaofei Tan
2020-04-09 14:28 ` James Morse
2020-04-10 2:55 ` Xiaofei Tan
2020-04-16 13:27 ` James Morse
2020-04-18 10:49 ` Xiaofei Tan
2020-04-02 6:35 ` Xiaofei Tan
2020-04-07 16:37 ` James Morse
2020-04-09 9:17 ` Xiaofei Tan
2020-04-09 14:28 ` James Morse
2020-04-10 9:43 ` Xiaofei Tan [this message]
2020-04-16 13:50 ` James Morse
2020-04-18 11:25 ` Xiaofei Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5E903FDD.4080106@huawei.com \
--to=tanxiaofei@huawei.com \
--cc=Dave.Martin@arm.com \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linuxarm@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).