Re: [PATCH v4 1/2] x86/resctrl: Update task closid/rmid with task_call_func()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Reinette Chatre <reinette.chatre@intel.com>
To: Peter Newman <peternewman@google.com>
Cc: <fenghua.yu@intel.com>, <bp@alien8.de>, <derkling@google.com>,
	<eranian@google.com>, <hpa@zytor.com>, <james.morse@arm.com>,
	<jannh@google.com>, <kpsingh@google.com>,
	<linux-kernel@vger.kernel.org>, <mingo@redhat.com>,
	<tglx@linutronix.de>, <x86@kernel.org>
Subject: Re: [PATCH v4 1/2] x86/resctrl: Update task closid/rmid with task_call_func()
Date: Wed, 7 Dec 2022 10:38:38 -0800	[thread overview]
Message-ID: <e28c1f27-f320-511b-e5ea-c278a570d709@intel.com> (raw)
In-Reply-To: <CALPaoCg6YROmpFa_RCYOCDzHBtR5tSCh2JwsOwPPpzBraOHK4Q@mail.gmail.com>

Hi Peter,

On 12/7/2022 2:58 AM, Peter Newman wrote:
> Hi Reinette,
> 
> On Tue, Dec 6, 2022 at 7:57 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 11/29/2022 3:10 AM, Peter Newman wrote:
>>> When the user moves a running task to a new rdtgroup using the tasks
>>> file interface, the resulting change in CLOSID/RMID must be immediately
>>> propagated to the PQR_ASSOC MSR on the task's CPU.
>>>
>>> It is possible for a task to wake up or migrate while it is being moved
>>> to a new group. If __rdtgroup_move_task() fails to observe that a task
>>> has begun running or misses that it migrated to a new CPU, the task will
>>> continue to use the old CLOSID or RMID until it switches in again.
>>>
>>> __rdtgroup_move_task() assumes that if the task migrates off of its CPU
>>> before it can IPI the task, then the task has already observed the
>>> updated CLOSID/RMID. Because this is done locklessly and an x86 CPU can
>>> delay stores until after loads, the following incorrect scenarios are
>>> possible:
>>>
>>>  1. __rdtgroup_move_task() stores the new closid and rmid in
>>>     the task structure after it loads task_curr() and task_cpu().
>>
>> Stating how this scenario encounters the problem would help
>> so perhaps something like (please feel free to change):
>> "If the task starts running between a reordered task_curr() check and
>> the CLOSID/RMID update then it will start running with the old CLOSID/RMID
>> until it is switched again because __rdtgroup_move_task() failed to determine
>> that it needs to be interrupted to obtain the new CLOSID/RMID."
> 
> That is largely what I was trying to state in paragraph 2 above, though
> at a higher level. I hoped the paragraph following it would do enough to
> connect the high-level description with the low-level problem scenarios.

There is no need to require the reader to connect various snippets to create
a problematic scenario themselves. The changelog should make the problem
obvious. I understand that it is what you wanted to say, that is why I moved
existing snippets to form a coherent problem scenario. It is ok if you do not
like the way I wrote it, it was only an example on how it can be done.

>>>  2. resctrl_sched_in() loads t->{closid,rmid} before the calling context
>>>     switch stores new task_curr() and task_cpu() values.
>>
>> This scenario is not clear to me. Could you please provide more detail about it?
>> I was trying to follow the context_switch() flow and resctrl_sched_in() is
>> one of the last things done (context_switch()->switch_to()->resctrl_sched_in()).
>> From what I can tell rq->curr, as used by task_curr() is set before
>> even context_switch() is called ... and since the next task is picked from
>> the CPU's runqueue (and set_task_cpu() sets the task's cpu when moved to
>> a runqueue) it seems to me that the value used by task_cpu() would also
>> be set early (before context_switch() is called). It is thus not clear to
>> me how the above reordering could occur so an example would help a lot.
> 
> Perhaps in both scenarios I didn't make it clear that reordering in the
> CPU can cause the incorrect behavior rather than the program order. In
> this explanation, items 1. and 2. are supposed to be completing the
> sentence ending with a ':' at the end of paragraph 3, so I thought that
> would keep focus on the CPU.

You did make it clear that the cause is reordering in the CPU. I am just
not able to see where the reordering is occurring in your scenario (2).

> I had assumed that the ordering requirements were well-understood, since
> they're stated in existing code comments a few times, and that making a
> case for how the expected ordering could be violated would be enough,
> but I'm happy to draw up a side-by-side example.

Please do. Could you start by highlighting which resctrl_sched_in()
you are referring to? I am trying to dissect (2) with the given information:
Through "the calling context switch" the scenario is written to create
understanding that it refers to:
context_switch()->switch_to()->resctrl_sched_in() - so the calling context
switch is the first in the above call path ... where does it (context_switch())
store the new task_curr() and task_cpu() values and how does that reorder with
resctrl_sched_in() further down in call path?

>>> Use task_call_func() in __rdtgroup_move_task() to serialize updates to
>>> the closid and rmid fields in the task_struct with context switch.
>>
>> Is there a reason why there is a switch between the all caps CLOSID/RMID
>> at the beginning to the no caps here?
> 
> It's because I referred to the task_struct fields explicitly here.

You can use task_struct::closid and task_struct::rmid to make this clear.

Reinette

next prev parent reply	other threads:[~2022-12-07 18:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 11:10 [PATCH v4 0/2] x86/resctrl: Fix task CLOSID update race Peter Newman
2022-11-29 11:10 ` [PATCH v4 1/2] x86/resctrl: Update task closid/rmid with task_call_func() Peter Newman
2022-12-06 18:56   ` Reinette Chatre
2022-12-07 10:58     ` Peter Newman
2022-12-07 18:38       ` Reinette Chatre [this message]
2022-12-08 22:30         ` Peter Newman
2022-12-09 23:54           ` Reinette Chatre
2022-12-12 17:36             ` Peter Newman
2022-12-13 18:33               ` Reinette Chatre
2022-12-14 10:05                 ` Peter Newman
2022-11-29 11:10 ` [PATCH v4 2/2] x86/resctrl: IPI all online CPUs for group updates Peter Newman
2022-12-06 18:57   ` Reinette Chatre
2022-12-07 11:04     ` Peter Newman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e28c1f27-f320-511b-e5ea-c278a570d709@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=bp@alien8.de \
    --cc=derkling@google.com \
    --cc=eranian@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jannh@google.com \
    --cc=kpsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox