From: Reinette Chatre <reinette.chatre@intel.com>
To: Peter Newman <peternewman@google.com>
Cc: <fenghua.yu@intel.com>, <bp@alien8.de>, <derkling@google.com>,
<eranian@google.com>, <hpa@zytor.com>, <james.morse@arm.com>,
<jannh@google.com>, <kpsingh@google.com>,
<linux-kernel@vger.kernel.org>, <mingo@redhat.com>,
<tglx@linutronix.de>, <x86@kernel.org>
Subject: Re: [PATCH v4 1/2] x86/resctrl: Update task closid/rmid with task_call_func()
Date: Wed, 7 Dec 2022 10:38:38 -0800 [thread overview]
Message-ID: <e28c1f27-f320-511b-e5ea-c278a570d709@intel.com> (raw)
In-Reply-To: <CALPaoCg6YROmpFa_RCYOCDzHBtR5tSCh2JwsOwPPpzBraOHK4Q@mail.gmail.com>
Hi Peter,
On 12/7/2022 2:58 AM, Peter Newman wrote:
> Hi Reinette,
>
> On Tue, Dec 6, 2022 at 7:57 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 11/29/2022 3:10 AM, Peter Newman wrote:
>>> When the user moves a running task to a new rdtgroup using the tasks
>>> file interface, the resulting change in CLOSID/RMID must be immediately
>>> propagated to the PQR_ASSOC MSR on the task's CPU.
>>>
>>> It is possible for a task to wake up or migrate while it is being moved
>>> to a new group. If __rdtgroup_move_task() fails to observe that a task
>>> has begun running or misses that it migrated to a new CPU, the task will
>>> continue to use the old CLOSID or RMID until it switches in again.
>>>
>>> __rdtgroup_move_task() assumes that if the task migrates off of its CPU
>>> before it can IPI the task, then the task has already observed the
>>> updated CLOSID/RMID. Because this is done locklessly and an x86 CPU can
>>> delay stores until after loads, the following incorrect scenarios are
>>> possible:
>>>
>>> 1. __rdtgroup_move_task() stores the new closid and rmid in
>>> the task structure after it loads task_curr() and task_cpu().
>>
>> Stating how this scenario encounters the problem would help
>> so perhaps something like (please feel free to change):
>> "If the task starts running between a reordered task_curr() check and
>> the CLOSID/RMID update then it will start running with the old CLOSID/RMID
>> until it is switched again because __rdtgroup_move_task() failed to determine
>> that it needs to be interrupted to obtain the new CLOSID/RMID."
>
> That is largely what I was trying to state in paragraph 2 above, though
> at a higher level. I hoped the paragraph following it would do enough to
> connect the high-level description with the low-level problem scenarios.
There is no need to require the reader to connect various snippets to create
a problematic scenario themselves. The changelog should make the problem
obvious. I understand that it is what you wanted to say, that is why I moved
existing snippets to form a coherent problem scenario. It is ok if you do not
like the way I wrote it, it was only an example on how it can be done.
>>> 2. resctrl_sched_in() loads t->{closid,rmid} before the calling context
>>> switch stores new task_curr() and task_cpu() values.
>>
>> This scenario is not clear to me. Could you please provide more detail about it?
>> I was trying to follow the context_switch() flow and resctrl_sched_in() is
>> one of the last things done (context_switch()->switch_to()->resctrl_sched_in()).
>> From what I can tell rq->curr, as used by task_curr() is set before
>> even context_switch() is called ... and since the next task is picked from
>> the CPU's runqueue (and set_task_cpu() sets the task's cpu when moved to
>> a runqueue) it seems to me that the value used by task_cpu() would also
>> be set early (before context_switch() is called). It is thus not clear to
>> me how the above reordering could occur so an example would help a lot.
>
> Perhaps in both scenarios I didn't make it clear that reordering in the
> CPU can cause the incorrect behavior rather than the program order. In
> this explanation, items 1. and 2. are supposed to be completing the
> sentence ending with a ':' at the end of paragraph 3, so I thought that
> would keep focus on the CPU.
You did make it clear that the cause is reordering in the CPU. I am just
not able to see where the reordering is occurring in your scenario (2).
> I had assumed that the ordering requirements were well-understood, since
> they're stated in existing code comments a few times, and that making a
> case for how the expected ordering could be violated would be enough,
> but I'm happy to draw up a side-by-side example.
Please do. Could you start by highlighting which resctrl_sched_in()
you are referring to? I am trying to dissect (2) with the given information:
Through "the calling context switch" the scenario is written to create
understanding that it refers to:
context_switch()->switch_to()->resctrl_sched_in() - so the calling context
switch is the first in the above call path ... where does it (context_switch())
store the new task_curr() and task_cpu() values and how does that reorder with
resctrl_sched_in() further down in call path?
>>> Use task_call_func() in __rdtgroup_move_task() to serialize updates to
>>> the closid and rmid fields in the task_struct with context switch.
>>
>> Is there a reason why there is a switch between the all caps CLOSID/RMID
>> at the beginning to the no caps here?
>
> It's because I referred to the task_struct fields explicitly here.
You can use task_struct::closid and task_struct::rmid to make this clear.
Reinette
next prev parent reply other threads:[~2022-12-07 18:41 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 11:10 [PATCH v4 0/2] x86/resctrl: Fix task CLOSID update race Peter Newman
2022-11-29 11:10 ` [PATCH v4 1/2] x86/resctrl: Update task closid/rmid with task_call_func() Peter Newman
2022-12-06 18:56 ` Reinette Chatre
2022-12-07 10:58 ` Peter Newman
2022-12-07 18:38 ` Reinette Chatre [this message]
2022-12-08 22:30 ` Peter Newman
2022-12-09 23:54 ` Reinette Chatre
2022-12-12 17:36 ` Peter Newman
2022-12-13 18:33 ` Reinette Chatre
2022-12-14 10:05 ` Peter Newman
2022-11-29 11:10 ` [PATCH v4 2/2] x86/resctrl: IPI all online CPUs for group updates Peter Newman
2022-12-06 18:57 ` Reinette Chatre
2022-12-07 11:04 ` Peter Newman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e28c1f27-f320-511b-e5ea-c278a570d709@intel.com \
--to=reinette.chatre@intel.com \
--cc=bp@alien8.de \
--cc=derkling@google.com \
--cc=eranian@google.com \
--cc=fenghua.yu@intel.com \
--cc=hpa@zytor.com \
--cc=james.morse@arm.com \
--cc=jannh@google.com \
--cc=kpsingh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peternewman@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox