linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Xiaofei Tan <tanxiaofei@huawei.com>
To: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Linuxarm <linuxarm@huawei.com>, Will Deacon <will@kernel.org>,
	Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Shiju Jose <shiju.jose@huawei.com>
Subject: Re: Question about SEA handling process happened in user space
Date: Fri, 10 Apr 2020 17:43:57 +0800	[thread overview]
Message-ID: <5E903FDD.4080106@huawei.com> (raw)
In-Reply-To: <66db5a6a-e68b-00b7-6a78-2c8cd9e63aab@arm.com>

Hi James,

On 2020/4/9 22:28, James Morse wrote:
> On 09/04/2020 10:17, Xiaofei Tan wrote:
>> On 2020/4/8 0:37, James Morse wrote:
>>> On 02/04/2020 07:35, Xiaofei Tan wrote:
>>>> On 2020/3/31 0:49, James Morse wrote:
>>>>> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
>>>>> alternative is to upgrade to SIGKILL in that case.
>>>>>
>>>>>
>>>>> If you see this instead of the address provided via firmware-first, there is a
>>>>> series to improve that here:
>>>>> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
>>>>>
>>>>> (We skip this signal code of APEI promises it did all the work. This lets you
>>>>> take the signal from memory_failure() instead, which may have better information.)
>>>
>>>> There may be an competition issue.
>>>> APEI run memory_failure() in an bottom half for memory errors. Then it may be not finished
>>>> before here SEA handling end, and application process may back to run.
> 
>>> With that series, it runs in process-context as task-work. memory_failure() needs to
>>> sleep, so it has to run in process-context. 
>>
>>
>>> Doing it as task-work means it runs before the thread returns to user-space.
>>
>> Sorry, i don't understand this. i thought the task-work need to reschedule, and current thread should
>> have returned to user-space before it.
> 
> ret_to_user has a loop around do_notify_resume(), if the _TIF_NOTIFY_RESUME flag is set
> and we call tracehook_notify_resume() which ends up in task_work_run()...
> 
> That TIF flag effectively prevents this thread returning to user-space until that task
> work has run.
> 

Got it. This function is great.
BTW, i have not found the place of setting the flag _TIF_NOTIFY_RESUME. Is it set by default for each thread?

> 
>> BTW, What context synchronous exception abort is? I thought it was process-context.
> 
> It depends what you interrupted.
> 32bit had different CPU modes for different contexts, we don't have that in 64bit. Instead
> we mask asynchronous interrupts, and tinker with the preempt count to track the context.
> Synchronous exceptions can't be masked, so they happen in whatever context you were
> already in.
> This means the exception handlers have to be be prepared for each eventuality.
> (which is why that code is starting to look complex)
> 

OK.

> 
>> Because in_interrupt() return false called in do_sea().
> 
> If you took the exception from EL0, or EL1 process context, yes. If you took the exception
> from an IRQ handler, in_interrupt() would return true.
> 

Got it.

> 
>>> If another thread in the same process accesses the affected memory, I'd expect to take a
>>> second external abort. If another process had the page mapped, it could access the
>>> affected memory, again taking an external abort.
> 
>> Yes, it is hard to avoid another thread to access the affected memory.
>> I just worry the same thread access it again.
> 
> This is the race that that series fixes.
> It can't happen with mainline as the arch code unconditionally signals the affected
> process, which was the pre-RAS behaviour.
> 

OK

>>> These two could happen while the first CPU was in firmware generating the CPER records, so
>>> its not a race we can fix. It should be harmless, the recovery action is the same, its
>>> just the error counters that count more events than errors. If you actually see it happen,
>>> we can try and make it smaller...
> 
>> Hmm, maybe this double SEA handling is an solution.
> 
> It assumes you get a second external-abort. We know this thread is affected, and will try
> and consume the error again if we restart it. We shouldn't restart it until we've given
> the recovery our best shot.
> Letting it loose is a poor choice if you have any kind of threshold for error-counts. They
> may jump NR_CPUs at a time until every CPU is waiting in memory_failure()...
> 

Got it. Thanks.

> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-04-10  9:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-30 13:10 Question about SEA handling process happened in user space Xiaofei Tan
2020-03-30 16:49 ` James Morse
2020-03-31  9:41   ` Xiaofei Tan
2020-03-31 17:00     ` James Morse
2020-04-01  3:49       ` Xiaofei Tan
2020-04-07 16:37         ` James Morse
2020-04-09  8:42           ` Xiaofei Tan
2020-04-09 14:28             ` James Morse
2020-04-10  2:55               ` Xiaofei Tan
2020-04-16 13:27                 ` James Morse
2020-04-18 10:49                   ` Xiaofei Tan
2020-04-02  6:35   ` Xiaofei Tan
2020-04-07 16:37     ` James Morse
2020-04-09  9:17       ` Xiaofei Tan
2020-04-09 14:28         ` James Morse
2020-04-10  9:43           ` Xiaofei Tan [this message]
2020-04-16 13:50             ` James Morse
2020-04-18 11:25               ` Xiaofei Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5E903FDD.4080106@huawei.com \
    --to=tanxiaofei@huawei.com \
    --cc=Dave.Martin@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).