Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v9 2/7] cpuset: Add new v2 cpuset.sched.domain_root flag
From: Juri Lelli @ 2018-05-30 14:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi
In-Reply-To: <1527601294-3444-3-git-send-email-longman@redhat.com>

Hi,

On 29/05/18 09:41, Waiman Long wrote:

[...]

> +  cpuset.sched.domain_root
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.
> +
> +	If set, it indicates that the current cgroup is the root of a
> +	new scheduling domain or partition that comprises itself and
> +	all its descendants except those that are scheduling domain
> +	roots themselves and their descendants.  The root cgroup is
> +	always a scheduling domain root.
> +
> +	There are constraints on where this flag can be set.  It can
> +	only be set in a cgroup if all the following conditions are true.
> +
> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
> +	   exclusive, i.e. they are not shared by any of its siblings.
> +	2) The parent cgroup is also a scheduling domain root.
> +	3) There is no child cgroups with cpuset enabled.  This is
> +	   for eliminating corner cases that have to be handled if such
> +	   a condition is allowed.
> +
> +	Setting this flag will take the CPUs away from the effective
> +	CPUs of the parent cgroup.  Once it is set, this flag cannot
> +	be cleared if there are any child cgroups with cpuset enabled.
> +	Further changes made to "cpuset.cpus" is allowed as long as
> +	the first condition above is still true.

IIUC, with the configuration below

 cpuset.cpus.effective:6-11
 cgroup.controllers:cpuset
 cpuset.mems.effective:0-1
 cgroup.subtree_control:cpuset
 g1/cpuset.cpus.effective:0-5
 g1/cgroup.controllers:cpuset
 g1/cpuset.sched.load_balance:1
 g1/cpuset.mems.effective:0-1
 g1/cpuset.cpus:0-5
 g1/cpuset.sched.domain_root:1
 user.slice/cpuset.cpus.effective:6-11
 user.slice/cgroup.controllers:cpuset
 user.slice/cpuset.sched.load_balance:1
 user.slice/cpuset.mems.effective:0-1
 user.slice/cpuset.cpus:6-11
 user.slice/cpuset.sched.domain_root:0
 init.scope/cpuset.cpus.effective:6-11
 init.scope/cgroup.controllers:cpuset
 init.scope/cpuset.sched.load_balance:1
 init.scope/cpuset.mems.effective:0-1
 init.scope/cpuset.cpus:6-11
 init.scope/cpuset.sched.domain_root:0
 system.slice/cpuset.cpus.effective:6-11
 system.slice/cgroup.controllers:cpuset
 system.slice/cpuset.sched.load_balance:1
 system.slice/cpuset.mems.effective:0-1
 system.slice/cpuset.cpus:6-11
 system.slice/cpuset.sched.domain_root:0
 machine.slice/cpuset.cpus.effective:6-11
 machine.slice/cgroup.controllers:cpuset
 machine.slice/cpuset.sched.load_balance:1
 machine.slice/cpuset.mems.effective:0-1
 machine.slice/cpuset.cpus:6-11
 machine.slice/cpuset.sched.domain_root:0

I should be able to

 # echo 0-4 >g1/cpuset.cpus

?

It doesn't let me.

I'm not sure we actually want to allow that, but that's what would I
expect as per your text above.

Thanks,

- Juri

BTW: thanks a lot for your prompt feedback and hope it's OK if I keep
playing and asking questions. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Mathieu Poirier @ 2018-05-30 14:45 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jonathan Corbet, Robert Walker,
	Mike Leach, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, open list:DOCUMENTATION,
	Linux Kernel Mailing List, coresight
In-Reply-To: <20180530002837.GB11923@leoy-ThinkPad-X240s>

On 29 May 2018 at 18:28, Leo Yan <leo.yan@linaro.org> wrote:
> Hi Mathieu,
>
> On Tue, May 29, 2018 at 10:04:49AM -0600, Mathieu Poirier wrote:
>
> [...]
>
>> > As now this patch is big with more complex logic, so I consider to
>> > split it into small patches:
>> >
>> > - Define CS_ETM_INVAL_ADDR;
>> > - Fix for CS_ETM_TRACE_ON packet;
>> > - Fix for exception packet;
>> >
>> > Does this make sense for you?  I have concern that this patch is a
>> > fixing patch, so not sure after spliting patches will introduce
>> > trouble for applying them for other stable kernels ...
>>
>> Reverse the order:
>>
>> - Fix for CS_ETM_TRACE_ON packet;
>> - Fix for exception packet;
>> - Define CS_ETM_INVAL_ADDR;
>>
>> But you may not need to - see next comment.
>
> From the discussion context, I think here 'you may not need to' is
> referring to my concern for applying patches on stable kernel, so I
> should take this patch series as an enhancement and don't need to
> consider much for stable kernel.

Yes, that is what I meant.

>
> On the other hand, your suggestion is possible to mean 'not need
> to' split into small patches (though I guess this is misunderstanding
> for your meaning).
>
> Could you clarify which is your meaning?
>
>> >> > +
>> >> > +   /*
>> >> >      * The packet records the execution range with an exclusive end address
>> >> >      *
>> >> >      * A64 instructions are constant size, so the last executed
>> >> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> >> >     return packet->end_addr - A64_INSTR_SIZE;
>> >> >  }
>> >> >
>> >> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
>> >> > +{
>> >> > +   /*
>> >> > +    * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>> >> > +    * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> >> > +    */
>> >> > +   if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>> >> > +           return 0;
>> >>
>> >> Same comment as above.
>> >
>> > Will do this.
>> >
>> >> > +
>> >> > +   return packet->start_addr;
>> >> > +}
>> >> > +
>> >> >  static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
>> >> >  {
>> >> >     /*
>> >> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>> >> >
>> >> >     be       = &bs->entries[etmq->last_branch_pos];
>> >> >     be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>> >> > -   be->to   = etmq->packet->start_addr;
>> >> > +   be->to   = cs_etm__first_executed_instr(etmq->packet);
>> >> >     /* No support for mispredict */
>> >> >     be->flags.mispred = 0;
>> >> >     be->flags.predicted = 1;
>> >> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
>> >> >     sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>> >> >     sample.pid = etmq->pid;
>> >> >     sample.tid = etmq->tid;
>> >> > -   sample.addr = etmq->packet->start_addr;
>> >> > +   sample.addr = cs_etm__first_executed_instr(etmq->packet);
>> >> >     sample.id = etmq->etm->branches_id;
>> >> >     sample.stream_id = etmq->etm->branches_id;
>> >> >     sample.period = 1;
>> >> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> >> >             etmq->period_instructions = instrs_over;
>> >> >     }
>> >> >
>> >> > -   if (etm->sample_branches &&
>> >> > -       etmq->prev_packet &&
>> >> > -       etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> >> > -       etmq->prev_packet->last_instr_taken_branch) {
>> >> > -           ret = cs_etm__synth_branch_sample(etmq);
>> >> > -           if (ret)
>> >> > -                   return ret;
>> >> > +   if (etm->sample_branches && etmq->prev_packet) {
>> >> > +           bool generate_sample = false;
>> >> > +
>> >> > +           /* Generate sample for start tracing packet */
>> >> > +           if (etmq->prev_packet->sample_type == 0 ||
>> >>
>> >> What kind of packet is sample_type == 0 ?
>> >
>> > Just as explained above, sample_type == 0 is the packet which
>> > initialized in the function cs_etm__alloc_queue().
>> >
>> >> > +               etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>> >> > +                   generate_sample = true;
>> >> > +
>> >> > +           /* Generate sample for exception packet */
>> >> > +           if (etmq->prev_packet->exc == true)
>> >> > +                   generate_sample = true;
>> >>
>> >> Please don't do that.  Exception packets have a type of their own and can be
>> >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
>> >> are.  Moreover exception packet containt an address that, if I'm reading the
>> >> documenation properly, can be used to keep track of instructions that were
>> >> executed between the last address of the previous range packet and the address
>> >> executed just before the exception occurred.  Mike and Rob will have to confirm
>> >> this as the decoder may be doing all that hard work for us.
>> >
>> > Sure, will wait for Rob and Mike to confirm for this.
>> >
>> > At my side, I dump the packet, the exception packet isn't passed to
>> > cs-etm.c layer, the decoder layer only sets the flag
>> > 'packet->exc = true' when exception packet is coming [1].
>>
>> That's because we didn't need the information.  Now that we do a
>> function that will insert a packet in the decoder packet queue and
>> deal with the new packet type in the main decoder loop [2].  At that
>> point your work may not be eligible for stable anymore and I think it
>> is fine.  Robert's work was an enhancement over mine and yours is an
>> enhancement over his.
>>
>> [2]. https://elixir.bootlin.com/linux/v4.17-rc7/source/tools/perf/util/cs-etm.c#L999
>
> Agree, will look into for exception packet and try to add new packet
> type for this.
>
> [...]
>
> Thanks,
> Leo Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Leo Yan @ 2018-05-30 14:49 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Jonathan Corbet, Robert Walker,
	Mike Leach, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, open list:DOCUMENTATION,
	Linux Kernel Mailing List, coresight
In-Reply-To: <CANLsYkzPShMj1cbXZPb7Lzio+gKcmvhD2+KTYr7vP9ec1d6bmg@mail.gmail.com>

On Wed, May 30, 2018 at 08:45:46AM -0600, Mathieu Poirier wrote:
> On 29 May 2018 at 18:28, Leo Yan <leo.yan@linaro.org> wrote:
> > Hi Mathieu,
> >
> > On Tue, May 29, 2018 at 10:04:49AM -0600, Mathieu Poirier wrote:
> >
> > [...]
> >
> >> > As now this patch is big with more complex logic, so I consider to
> >> > split it into small patches:
> >> >
> >> > - Define CS_ETM_INVAL_ADDR;
> >> > - Fix for CS_ETM_TRACE_ON packet;
> >> > - Fix for exception packet;
> >> >
> >> > Does this make sense for you?  I have concern that this patch is a
> >> > fixing patch, so not sure after spliting patches will introduce
> >> > trouble for applying them for other stable kernels ...
> >>
> >> Reverse the order:
> >>
> >> - Fix for CS_ETM_TRACE_ON packet;
> >> - Fix for exception packet;
> >> - Define CS_ETM_INVAL_ADDR;
> >>
> >> But you may not need to - see next comment.
> >
> > From the discussion context, I think here 'you may not need to' is
> > referring to my concern for applying patches on stable kernel, so I
> > should take this patch series as an enhancement and don't need to
> > consider much for stable kernel.
> 
> Yes, that is what I meant.

Thanks for confirmation, will send new patch series according to the
discussion.

[...]

Thanks,
Leo Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] PCI: Add pci=safemode option
From: Bjorn Helgaas @ 2018-05-30 14:56 UTC (permalink / raw)
  To: okaya
  Cc: Christoph Hellwig, Gabriele Paoloni, open list:DOCUMENTATION,
	linux-pci, Keith Busch, Kai-Heng Feng, Ingo Molnar,
	Christoffer Dall, Jonathan Corbet, timur, Rafael J. Wysocki,
	Dongdong Liu, David Rientjes, Thymo van Beers, Paul E. McKenney,
	Frederick Lawler, Konrad Rzeszutek Wilk, Marc Zyngier,
	linux-arm-msm, Frederic Weisbecker, Bjorn Helgaas,
	Thomas Gleixner, linux-arm-kernel, Oza Pawandeep,
	Greg Kroah-Hartman, open list
In-Reply-To: <6dfe2db8f974d94c9867f30ec83d9333@codeaurora.org>

On Wed, May 30, 2018 at 12:44:29AM -0700, okaya@codeaurora.org wrote:
> On 2018-05-30 00:37, Christoph Hellwig wrote:
> > On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
> > > Bjorn and I discussed the need for such a "safe" mode feature when you
> > > want to bring up PCI for a platform. You want to turn off everything
> > > as
> > > a starter and just stick to bare minimum.
> > 
> > Can we please make it a config option the instead of adding code
> > to every kernel?  Also maybe the bringup should be in the name
> > to make this more clear?
> 
> One other requirement was to have a runtime option rather than compile time
> option.
> 
> When someone reported a problem, we wanted to be able to say "use this
> option and see if system boots" without doing any bisects or recompilation.
> 
> This would be the first step in troubleshooting a system to see if
> fundamental features are working.
> 
> I don't mind changing the name
> Bjorn mentioned safe option. I made it safemode. I am looking at Bjorn for
> suggestions at this moment.

This *was* my idea, but I'm starting to think it was a bad idea.

I don't want people to use pci= parameters as the normal way to get a
system to boot.  That would be a huge support hassle (putting things
in release notes, diagnosing problems when people forget it, etc).

But the parameters *are* useful for debugging.  If we had a
"pci=safemode" and it avoided some problem, the next step would be to
narrow it down by using the more specific flags (pci=nomsi, pci=noari,
pci=no_ext_tags, etc).  So I think 95% of the value is in the specific
flags, and a "pci=safemode" might add a little bit of value but at the
cost of a small but nagging maintenance concern and code clutter.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 2/7] cpuset: Add new v2 cpuset.sched.domain_root flag
From: Waiman Long @ 2018-05-30 14:57 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi
In-Reply-To: <20180530141804.GG3320@localhost.localdomain>

On 05/30/2018 10:18 AM, Juri Lelli wrote:
> Hi,
>
> On 29/05/18 09:41, Waiman Long wrote:
>
> [...]
>
>> +  cpuset.sched.domain_root
>> +	A read-write single value file which exists on non-root
>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>> +	and is not delegatable.
>> +
>> +	If set, it indicates that the current cgroup is the root of a
>> +	new scheduling domain or partition that comprises itself and
>> +	all its descendants except those that are scheduling domain
>> +	roots themselves and their descendants.  The root cgroup is
>> +	always a scheduling domain root.
>> +
>> +	There are constraints on where this flag can be set.  It can
>> +	only be set in a cgroup if all the following conditions are true.
>> +
>> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
>> +	   exclusive, i.e. they are not shared by any of its siblings.
>> +	2) The parent cgroup is also a scheduling domain root.
>> +	3) There is no child cgroups with cpuset enabled.  This is
>> +	   for eliminating corner cases that have to be handled if such
>> +	   a condition is allowed.
>> +
>> +	Setting this flag will take the CPUs away from the effective
>> +	CPUs of the parent cgroup.  Once it is set, this flag cannot
>> +	be cleared if there are any child cgroups with cpuset enabled.
>> +	Further changes made to "cpuset.cpus" is allowed as long as
>> +	the first condition above is still true.
> IIUC, with the configuration below
>
>  cpuset.cpus.effective:6-11
>  cgroup.controllers:cpuset
>  cpuset.mems.effective:0-1
>  cgroup.subtree_control:cpuset
>  g1/cpuset.cpus.effective:0-5
>  g1/cgroup.controllers:cpuset
>  g1/cpuset.sched.load_balance:1
>  g1/cpuset.mems.effective:0-1
>  g1/cpuset.cpus:0-5
>  g1/cpuset.sched.domain_root:1
>  user.slice/cpuset.cpus.effective:6-11
>  user.slice/cgroup.controllers:cpuset
>  user.slice/cpuset.sched.load_balance:1
>  user.slice/cpuset.mems.effective:0-1
>  user.slice/cpuset.cpus:6-11
>  user.slice/cpuset.sched.domain_root:0
>  init.scope/cpuset.cpus.effective:6-11
>  init.scope/cgroup.controllers:cpuset
>  init.scope/cpuset.sched.load_balance:1
>  init.scope/cpuset.mems.effective:0-1
>  init.scope/cpuset.cpus:6-11
>  init.scope/cpuset.sched.domain_root:0
>  system.slice/cpuset.cpus.effective:6-11
>  system.slice/cgroup.controllers:cpuset
>  system.slice/cpuset.sched.load_balance:1
>  system.slice/cpuset.mems.effective:0-1
>  system.slice/cpuset.cpus:6-11
>  system.slice/cpuset.sched.domain_root:0
>  machine.slice/cpuset.cpus.effective:6-11
>  machine.slice/cgroup.controllers:cpuset
>  machine.slice/cpuset.sched.load_balance:1
>  machine.slice/cpuset.mems.effective:0-1
>  machine.slice/cpuset.cpus:6-11
>  machine.slice/cpuset.sched.domain_root:0
>
> I should be able to
>
>  # echo 0-4 >g1/cpuset.cpus
>
> ?
>
> It doesn't let me.

It should allow that. I will fix this issue.

>
> I'm not sure we actually want to allow that, but that's what would I
> expect as per your text above.
>
> Thanks,
>
> - Juri
>
> BTW: thanks a lot for your prompt feedback and hope it's OK if I keep
> playing and asking questions. :)

Of course. I appreciate your help in looking for issue in the patch that
I might have overlooked.

Thanks,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Mike Leach @ 2018-05-30 15:04 UTC (permalink / raw)
  To: Robert Walker
  Cc: Mathieu Poirier, Leo Yan, Arnaldo Carvalho de Melo,
	Jonathan Corbet, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-doc, linux-kernel, coresight
In-Reply-To: <5efe804f-364b-6ae1-3fca-ec7f6d8a383d@arm.com>

Hi Leo,



On 30 May 2018 at 10:45, Robert Walker <robert.walker@arm.com> wrote:
>
>
> On 28/05/18 23:13, Mathieu Poirier wrote:
>>
>> Leo and/or Robert,
>>
>> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>>>
>>> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces") reworks the samples generation flow from CoreSight trace to
>>> match the correct format so Perf report tool can display the samples
>>> properly.
>>>
>>> But the change has side effect for branch packet handling, it only
>>> generate branch samples by checking previous packet flag
>>> 'last_instr_taken_branch' is true, this results in below three kinds
>>> packets are missed to generate branch samples:
>>>
>>> - The start tracing packet at the beginning of tracing data;
>>> - The exception handling packet;
>>> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>>>    for branch samples.  CS_ETM_TRACE_ON packet itself can give the info
>>>    that there have a discontinuity in the trace, on the other hand we
>>>    also miss to generate proper branch sample for packets before and
>>>    after CS_ETM_TRACE_ON packet.
>>>
>>> This patch is to add branch sample handling for up three kinds packets:
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>>>    zero and in this case it generates branch sample for the start tracing
>>>    packet; furthermore, we also need to handle the condition for
>>>    prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>>>    generate branch sample for exception handling packet;
>>>
>>> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>>>    branch sample in the function cs_etm__flush(), this can save complete
>>>    info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>>>    packet.  We also generate branch sample for the new CS_ETM_RANGE
>>>    packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>>>    first one purpose is to save the info for the new CS_ETM_RANGE packet,
>>>    the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>>>    have hint for a discontinuity in the trace.
>>>
>>>    For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>>>    'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>>>    the decoder layer as dummy value.  This patch is to convert these
>>>    values to zeros for more readable; this is accomplished by functions
>>>    cs_etm__last_executed_instr() and cs_etm__first_executed_instr().  The
>>>    later one is a new function introduced by this patch.
>>>
>>> Reviewed-by: Robert Walker <robert.walker@arm.com>
>>> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces")
>>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>>> ---
>>>   tools/perf/util/cs-etm.c | 93
>>> +++++++++++++++++++++++++++++++++++++-----------
>>>   1 file changed, 73 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 822ba91..8418173 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -495,6 +495,20 @@ static inline void
>>> cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>>>   static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet
>>> *packet)
>>>   {
>>>         /*
>>> +        * The packet is the start tracing packet if the end_addr is
>>> zero,
>>> +        * returns 0 for this case.
>>> +        */
>>> +       if (!packet->end_addr)
>>> +               return 0;
>>
>>
>> What is considered to be the "start tracing packet"?  Right now the only
>> two
>> kind of packets inserted in the decoder packet buffer queue are INST_RANGE
>> and
>> TRACE_ON.  How can we hit a condition where packet->end-addr == 0?
>>
>>
>>> +
>>> +       /*
>>> +        * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>>> +        * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> +        */
>>> +       if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>>> +               return 0;
>>
>>
>> As it is with the above, I find triggering on addresses to be brittle and
>> hard
>> to maintain on the long run.  Packets all have a sample_type field that
>> should
>> be used in cases like this one.  That way we know exactly the condition
>> that is
>> targeted.
>>
>> While working on this set, please spin-off another patch that defines
>> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
>> numeral is used.  That way we stop using the hard coded value.
>>
>>> +
>>> +       /*
>>>          * The packet records the execution range with an exclusive end
>>> address
>>>          *
>>>          * A64 instructions are constant size, so the last executed
>>> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct
>>> cs_etm_packet *packet)
>>>         return packet->end_addr - A64_INSTR_SIZE;
>>>   }
>>>   +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet
>>> *packet)
>>> +{
>>> +       /*
>>> +        * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>>> +        * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> +        */
>>> +       if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>>> +               return 0;
>>
>>
>> Same comment as above.
>>
>>> +
>>> +       return packet->start_addr;
>>> +}
>>> +
>>>   static inline u64 cs_etm__instr_count(const struct cs_etm_packet
>>> *packet)
>>>   {
>>>         /*
>>> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct
>>> cs_etm_queue *etmq)
>>>         be       = &bs->entries[etmq->last_branch_pos];
>>>         be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>>> -       be->to   = etmq->packet->start_addr;
>>> +       be->to   = cs_etm__first_executed_instr(etmq->packet);
>>>         /* No support for mispredict */
>>>         be->flags.mispred = 0;
>>>         be->flags.predicted = 1;
>>> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct
>>> cs_etm_queue *etmq)
>>>         sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>>>         sample.pid = etmq->pid;
>>>         sample.tid = etmq->tid;
>>> -       sample.addr = etmq->packet->start_addr;
>>> +       sample.addr = cs_etm__first_executed_instr(etmq->packet);
>>>         sample.id = etmq->etm->branches_id;
>>>         sample.stream_id = etmq->etm->branches_id;
>>>         sample.period = 1;
>>> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>>                 etmq->period_instructions = instrs_over;
>>>         }
>>>   -     if (etm->sample_branches &&
>>> -           etmq->prev_packet &&
>>> -           etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> -           etmq->prev_packet->last_instr_taken_branch) {
>>> -               ret = cs_etm__synth_branch_sample(etmq);
>>> -               if (ret)
>>> -                       return ret;
>>> +       if (etm->sample_branches && etmq->prev_packet) {
>>> +               bool generate_sample = false;
>>> +
>>> +               /* Generate sample for start tracing packet */
>>> +               if (etmq->prev_packet->sample_type == 0 ||
>>
>>
>> What kind of packet is sample_type == 0 ?
>>
>>> +                   etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>>> +                       generate_sample = true;
>>> +
>>> +               /* Generate sample for exception packet */
>>> +               if (etmq->prev_packet->exc == true)
>>> +                       generate_sample = true;
>>
>>
>> Please don't do that.  Exception packets have a type of their own and can
>> be
>> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
>> packets
>> are.  Moreover exception packet containt an address that, if I'm reading
>> the
>> documenation properly, can be used to keep track of instructions that were
>> executed between the last address of the previous range packet and the
>> address
>> executed just before the exception occurred.  Mike and Rob will have to
>> confirm
>> this as the decoder may be doing all that hard work for us.
>>

clarification on the exception packets....

The Opencsd output exception packet gives you the exception number,
and optionally the preferred return address. If this address is
present does depend a lot on the underlying protocol - will normally
be there with ETMv4.
Exceptions are marked differently in the underlying protocol - the
OCSD packets abstract away these differences.

consider the code:

0x1000: <some instructions>
0x1100: BR 0x2000
....
0x2000: <some instructions>
0x2020  BZ r4

Without an exception this would result in the packets

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider an exception occurring before the BR 0x2000

this will result in:-
OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_EXCEPTION_RETURN()  // present if exception returns are
explicitly marked in underlying trace - may not always be depending on
circumstances.
OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
range - just the branch
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider the exception occurring after the BR, but before any
other instructions are executed.

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_EXECEPTION(IRQ, ret-addr 0x2000)     // here the preferred return
address is actually the target of the branch.
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

So in general it is possible to arrive in the IRQ_START range with the
previous packet having been either a taken branch, a not taken branch,
or not a branch.
Care must be taken - whether AutoFDO or normal trace disassembly not
to assume that having the last range packet as a taken branch means
that the next range packet is the target, if there is an intervening
exception packet.

Regards

Mike

>>> +
>>> +               /* Generate sample for normal branch packet */
>>> +               if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> +                   etmq->prev_packet->last_instr_taken_branch)
>>> +                       generate_sample = true;
>>> +
>>> +               if (generate_sample) {
>>> +                       ret = cs_etm__synth_branch_sample(etmq);
>>> +                       if (ret)
>>> +                               return ret;
>>> +               }
>>>         }
>>>         if (etm->sample_branches || etm->synth_opts.last_branch) {
>>> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>>   static int cs_etm__flush(struct cs_etm_queue *etmq)
>>>   {
>>>         int err = 0;
>>> +       struct cs_etm_auxtrace *etm = etmq->etm;
>>>         struct cs_etm_packet *tmp;
>>>   -     if (etmq->etm->synth_opts.last_branch &&
>>> -           etmq->prev_packet &&
>>> -           etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>>> +       if (!etmq->prev_packet)
>>> +               return 0;
>>> +
>>> +       if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>>> +               return 0;
>>> +
>>> +       if (etmq->etm->synth_opts.last_branch) {
>>
>>
>> If you add:
>>
>>          if (!etmq->etm->synth_opts.last_branch)
>>                  return 0;
>>
>> You can avoid indenting the whole block.
>>
>>>                 /*
>>>                  * Generate a last branch event for the branches left in
>>> the
>>>                  * circular buffer at the end of the trace.
>>> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>>>                 err = cs_etm__synth_instruction_sample(
>>>                         etmq, addr,
>>>                         etmq->period_instructions);
>>> +               if (err)
>>> +                       return err;
>>>                 etmq->period_instructions = 0;
>>> +       }
>>>   -             /*
>>> -                * Swap PACKET with PREV_PACKET: PACKET becomes
>>> PREV_PACKET for
>>> -                * the next incoming packet.
>>> -                */
>>> -               tmp = etmq->packet;
>>> -               etmq->packet = etmq->prev_packet;
>>> -               etmq->prev_packet = tmp;
>>> +       if (etm->sample_branches) {
>>> +               err = cs_etm__synth_branch_sample(etmq);
>>> +               if (err)
>>> +                       return err;
>>>         }
>>>   -     return err;
>>> +       /*
>>> +        * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>>> +        * the next incoming packet.
>>> +        */
>>> +       tmp = etmq->packet;
>>> +       etmq->packet = etmq->prev_packet;
>>> +       etmq->prev_packet = tmp;
>>
>>
>> Robert, I remember noticing that when you first submitted the code but
>> forgot to
>> go back to it.  What is the point of swapping the packets?  I understand
>>
>> etmq->prev_packet = etmq->packet;
>>
>> But not
>>
>> etmq->packet = tmp;
>>
>> After all etmq->packet will be clobbered as soon as
>> cs_etm_decoder__get_packet()
>> is called, which is alwasy right after either cs_etm__sample() or
>> cs_etm__flush().
>>
>
> This is code I inherited from the original versions of these patches, but it
> works because:
> - etmq->packet and etmq->prev_packet are pointers to struct cs_etm_packet
> allocated by zalloc() in cs_etm__alloc_queue()
> - cs_etm_decoder__get_packet() takes a pointer to struct cs_etm_packet and
> copies the contents of the first packet from the queue into the passed
> location with:
>    *packet = decoder->packet_buffer[decoder->head]
>
> So the swap code is only swapping the pointers over, not the contents of the
> packets.
>
> Regards
>
> Rob
>
>
>
>> Thanks,
>> Mathieu
>>
>>
>>
>>> +       return 0;
>>>   }
>>>     static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>>> --
>>> 2.7.4
>>>
>



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] x86/xen: Combine PV features to be disabled in xen_nopv
From: kbuild test robot @ 2018-05-30 15:26 UTC (permalink / raw)
  To: Joao Martins
  Cc: kbuild-all, linux-kernel, Konrad Rzeszutek Wilk, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Juergen Gross, Boris Ostrovsky,
	xen-devel, Jonathan Corbet, linux-doc, Joao Martins
In-Reply-To: <20180528173211.16876-1-joao.m.martins@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 3596 bytes --]

Hi Konrad,

I love your patch! Yet something to improve:

[auto build test ERROR on xen-tip/linux-next]
[also build test ERROR on v4.17-rc7 next-20180529]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Joao-Martins/x86-xen-Combine-PV-features-to-be-disabled-in-xen_nopv/20180530-031040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git linux-next
config: x86_64-randconfig-s4-05301434 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/xen/smp.o: In function `xen_smp_intr_init':
>> arch/x86/xen/smp.c:76: undefined reference to `xen_nopv_ipi'
   arch/x86/xen/spinlock.o: In function `xen_init_lock_cpu':
>> arch/x86/xen/spinlock.c:82: undefined reference to `xen_nopv_spin'
   arch/x86/xen/spinlock.o: In function `xen_uninit_lock_cpu':
   arch/x86/xen/spinlock.c:110: undefined reference to `xen_nopv_spin'
   arch/x86/xen/spinlock.o: In function `xen_parse_nopvspin':
>> arch/x86/xen/spinlock.c:148: undefined reference to `xen_set_nopv'
   arch/x86/xen/spinlock.o: In function `xen_init_spinlocks':
   arch/x86/xen/spinlock.c:132: undefined reference to `xen_nopv_spin'

vim +76 arch/x86/xen/smp.c

    61	
    62	int xen_smp_intr_init(unsigned int cpu)
    63	{
    64		int rc;
    65		char *resched_name, *callfunc_name, *debug_name;
    66	
    67		debug_name = kasprintf(GFP_KERNEL, "debug%d", cpu);
    68		rc = bind_virq_to_irqhandler(VIRQ_DEBUG, cpu, xen_debug_interrupt,
    69					     IRQF_PERCPU | IRQF_NOBALANCING,
    70					     debug_name, NULL);
    71		if (rc < 0)
    72			goto fail;
    73		per_cpu(xen_debug_irq, cpu).irq = rc;
    74		per_cpu(xen_debug_irq, cpu).name = debug_name;
    75	
  > 76		if (xen_hvm_domain() && xen_nopv_ipi())
    77			return 0;
    78	
    79		resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
    80		rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
    81					    cpu,
    82					    xen_reschedule_interrupt,
    83					    IRQF_PERCPU|IRQF_NOBALANCING,
    84					    resched_name,
    85					    NULL);
    86		if (rc < 0)
    87			goto fail;
    88		per_cpu(xen_resched_irq, cpu).irq = rc;
    89		per_cpu(xen_resched_irq, cpu).name = resched_name;
    90	
    91		callfunc_name = kasprintf(GFP_KERNEL, "callfunc%d", cpu);
    92		rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_VECTOR,
    93					    cpu,
    94					    xen_call_function_interrupt,
    95					    IRQF_PERCPU|IRQF_NOBALANCING,
    96					    callfunc_name,
    97					    NULL);
    98		if (rc < 0)
    99			goto fail;
   100		per_cpu(xen_callfunc_irq, cpu).irq = rc;
   101		per_cpu(xen_callfunc_irq, cpu).name = callfunc_name;
   102	
   103		callfunc_name = kasprintf(GFP_KERNEL, "callfuncsingle%d", cpu);
   104		rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_SINGLE_VECTOR,
   105					    cpu,
   106					    xen_call_function_single_interrupt,
   107					    IRQF_PERCPU|IRQF_NOBALANCING,
   108					    callfunc_name,
   109					    NULL);
   110		if (rc < 0)
   111			goto fail;
   112		per_cpu(xen_callfuncsingle_irq, cpu).irq = rc;
   113		per_cpu(xen_callfuncsingle_irq, cpu).name = callfunc_name;
   114	
   115		return 0;
   116	
   117	 fail:
   118		xen_smp_intr_free(cpu);
   119		return rc;
   120	}
   121	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32361 bytes --]

^ permalink raw reply

* Re: [PATCH] PCI: Add pci=safemode option
From: Sinan Kaya @ 2018-05-30 15:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, Gabriele Paoloni, open list:DOCUMENTATION,
	linux-pci, Keith Busch, Kai-Heng Feng, Ingo Molnar,
	Christoffer Dall, Jonathan Corbet, timur, Rafael J. Wysocki,
	Dongdong Liu, David Rientjes, Thymo van Beers, Paul E. McKenney,
	Frederick Lawler, Konrad Rzeszutek Wilk, Marc Zyngier,
	linux-arm-msm, Frederic Weisbecker, Bjorn Helgaas,
	Thomas Gleixner, linux-arm-kernel, Oza Pawandeep,
	Greg Kroah-Hartman, open list
In-Reply-To: <20180530145650.GA39853@bhelgaas-glaptop.roam.corp.google.com>

On 5/30/2018 10:56 AM, Bjorn Helgaas wrote:
> This *was* my idea, but I'm starting to think it was a bad idea.
> 
> I don't want people to use pci= parameters as the normal way to get a
> system to boot.  That would be a huge support hassle (putting things
> in release notes, diagnosing problems when people forget it, etc).
> 
> But the parameters *are* useful for debugging.  If we had a
> "pci=safemode" and it avoided some problem, the next step would be to
> narrow it down by using the more specific flags (pci=nomsi, pci=noari,
> pci=no_ext_tags, etc).  So I think 95% of the value is in the specific
> flags, and a "pci=safemode" might add a little bit of value but at the
> cost of a small but nagging maintenance concern and code clutter.

OK. Let's try noXYZ feature. Can you enumerate the XYZ features that you
want to see?

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Leo Yan @ 2018-05-30 15:39 UTC (permalink / raw)
  To: Mike Leach
  Cc: Robert Walker, Mathieu Poirier, Arnaldo Carvalho de Melo,
	Jonathan Corbet, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-doc, linux-kernel, coresight
In-Reply-To: <CAJ9a7VhC7EHJ1DtTEH1uERGiBpCJgseboaH0cqgpsU3ZMcgjgQ@mail.gmail.com>

Hi Mike,

On Wed, May 30, 2018 at 04:04:34PM +0100, Mike Leach wrote:

[...]

> >>> +               /* Generate sample for exception packet */
> >>> +               if (etmq->prev_packet->exc == true)
> >>> +                       generate_sample = true;
> >>
> >>
> >> Please don't do that.  Exception packets have a type of their own and can
> >> be
> >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
> >> packets
> >> are.  Moreover exception packet containt an address that, if I'm reading
> >> the
> >> documenation properly, can be used to keep track of instructions that were
> >> executed between the last address of the previous range packet and the
> >> address
> >> executed just before the exception occurred.  Mike and Rob will have to
> >> confirm
> >> this as the decoder may be doing all that hard work for us.
> >>
> 
> clarification on the exception packets....
> 
> The Opencsd output exception packet gives you the exception number,
> and optionally the preferred return address. If this address is
> present does depend a lot on the underlying protocol - will normally
> be there with ETMv4.
> Exceptions are marked differently in the underlying protocol - the
> OCSD packets abstract away these differences.
> 
> consider the code:
> 
> 0x1000: <some instructions>
> 0x1100: BR 0x2000
> ....
> 0x2000: <some instructions>
> 0x2020  BZ r4
> 
> Without an exception this would result in the packets
> 
> OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
> range packets have start addr inclusive, end addr exclusive.
> OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> depends on condition>
> 
> Now consider an exception occurring before the BR 0x2000
> 
> this will result in:-
> OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
> OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
> OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
> this is more likely to have multiple ranges / branches before any
> return, but simplified here.
> OCSD_EXCEPTION_RETURN()  // present if exception returns are
> explicitly marked in underlying trace - may not always be depending on
> circumstances.
> OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
> range - just the branch
> OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> depends on condition>
> 
> Now consider the exception occurring after the BR, but before any
> other instructions are executed.
> 
> OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
> range packets have start addr inclusive, end addr exclusive.
> OCSD_EXECEPTION(IRQ, ret-addr 0x2000)     // here the preferred return
> address is actually the target of the branch.
> OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
> this is more likely to have multiple ranges / branches before any
> return, but simplified here.
> OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> depends on condition>
> 
> So in general it is possible to arrive in the IRQ_START range with the
> previous packet having been either a taken branch, a not taken branch,
> or not a branch.
> Care must be taken - whether AutoFDO or normal trace disassembly not
> to assume that having the last range packet as a taken branch means
> that the next range packet is the target, if there is an intervening
> exception packet.

Thanks a lot for detailed explaination.

IIUC, AutoFDO will not have such issue due every range packet will be
handled for it.  On the other hand, as you remind, the branch samples
(and its consumer trace disassembler) is very dependent on the flag
'last_instr_taken_branch'.

According to your explaination, I think we consider the branch is
taken for below situations:

- The new coming packet is exception packet (both for exception entry
  and exit packets);
- The previous packet is expcetion packet;
- The previous packet is normal range packet with
  'last_instr_taken_branch' = true;

So I'd like to use below function to demonstrate my understanding for
exception packets handling.  I also will send out one new patch for
support exception packet for reviewing.

If you have concern or I miss anything, please let me know.

static bool cs_etm__is_taken_branch(struct cs_etm_packet *prev_packet,
                                    struct cs_etm_packet *packet,)
{
        /* The branch is taken for normal range packet with taken branch flag */
        if (prev_packet->sample_type == CS_ETM_RANGE &&
            prev_packet->last_instr_taken_branch)
                return true;

        /* The branch is taken if previous packet is exception packet */
        if (prev_packet->sample_type == CS_ETM_EXCEPTION ||
            prev_packet->sample_type == CS_ETM_EXCEPTION_RET)
                return true;

        /* The branch is taken for an intervening exception packet */
        if (packet->sample_type == CS_ETM_EXCEPTION ||
            packet->sample_type == CS_ETM_EXCEPTION_RET)
                return true;

        return false;
}

[...]

Thanks,
Leo Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Leo Yan @ 2018-05-30 15:49 UTC (permalink / raw)
  To: Mike Leach
  Cc: Robert Walker, Mathieu Poirier, Arnaldo Carvalho de Melo,
	Jonathan Corbet, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-doc, linux-kernel, coresight
In-Reply-To: <20180530153900.GA10925@leoy-ThinkPad-X240s>

On Wed, May 30, 2018 at 11:39:00PM +0800, Leo Yan wrote:
> Hi Mike,
> 
> On Wed, May 30, 2018 at 04:04:34PM +0100, Mike Leach wrote:
> 
> [...]
> 
> > >>> +               /* Generate sample for exception packet */
> > >>> +               if (etmq->prev_packet->exc == true)
> > >>> +                       generate_sample = true;
> > >>
> > >>
> > >> Please don't do that.  Exception packets have a type of their own and can
> > >> be
> > >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
> > >> packets
> > >> are.  Moreover exception packet containt an address that, if I'm reading
> > >> the
> > >> documenation properly, can be used to keep track of instructions that were
> > >> executed between the last address of the previous range packet and the
> > >> address
> > >> executed just before the exception occurred.  Mike and Rob will have to
> > >> confirm
> > >> this as the decoder may be doing all that hard work for us.
> > >>
> > 
> > clarification on the exception packets....
> > 
> > The Opencsd output exception packet gives you the exception number,
> > and optionally the preferred return address. If this address is
> > present does depend a lot on the underlying protocol - will normally
> > be there with ETMv4.
> > Exceptions are marked differently in the underlying protocol - the
> > OCSD packets abstract away these differences.
> > 
> > consider the code:
> > 
> > 0x1000: <some instructions>
> > 0x1100: BR 0x2000
> > ....
> > 0x2000: <some instructions>
> > 0x2020  BZ r4
> > 
> > Without an exception this would result in the packets
> > 
> > OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
> > range packets have start addr inclusive, end addr exclusive.
> > OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> > depends on condition>
> > 
> > Now consider an exception occurring before the BR 0x2000
> > 
> > this will result in:-
> > OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
> > OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
> > OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
> > this is more likely to have multiple ranges / branches before any
> > return, but simplified here.
> > OCSD_EXCEPTION_RETURN()  // present if exception returns are
> > explicitly marked in underlying trace - may not always be depending on
> > circumstances.
> > OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
> > range - just the branch
> > OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> > depends on condition>
> > 
> > Now consider the exception occurring after the BR, but before any
> > other instructions are executed.
> > 
> > OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken)   // recall that
> > range packets have start addr inclusive, end addr exclusive.
> > OCSD_EXECEPTION(IRQ, ret-addr 0x2000)     // here the preferred return
> > address is actually the target of the branch.
> > OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken)    //
> > this is more likely to have multiple ranges / branches before any
> > return, but simplified here.
> > OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
> > depends on condition>
> > 
> > So in general it is possible to arrive in the IRQ_START range with the
> > previous packet having been either a taken branch, a not taken branch,
> > or not a branch.
> > Care must be taken - whether AutoFDO or normal trace disassembly not
> > to assume that having the last range packet as a taken branch means
> > that the next range packet is the target, if there is an intervening
> > exception packet.
> 
> Thanks a lot for detailed explaination.
> 
> IIUC, AutoFDO will not have such issue due every range packet will be
> handled for it.  On the other hand, as you remind, the branch samples
> (and its consumer trace disassembler) is very dependent on the flag
> 'last_instr_taken_branch'.
> 
> According to your explaination, I think we consider the branch is
> taken for below situations:
> 
> - The new coming packet is exception packet (both for exception entry
>   and exit packets);
> - The previous packet is expcetion packet;
> - The previous packet is normal range packet with
>   'last_instr_taken_branch' = true;
> 
> So I'd like to use below function to demonstrate my understanding for
> exception packets handling.  I also will send out one new patch for
> support exception packet for reviewing.
> 
> If you have concern or I miss anything, please let me know.
> 
> static bool cs_etm__is_taken_branch(struct cs_etm_packet *prev_packet,
>                                     struct cs_etm_packet *packet,)
> {
>         /* The branch is taken for normal range packet with taken branch flag */
>         if (prev_packet->sample_type == CS_ETM_RANGE &&
>             prev_packet->last_instr_taken_branch)
>                 return true;
> 
>         /* The branch is taken if previous packet is exception packet */
>         if (prev_packet->sample_type == CS_ETM_EXCEPTION ||
>             prev_packet->sample_type == CS_ETM_EXCEPTION_RET)
>                 return true;
> 
>         /* The branch is taken for an intervening exception packet */
>         if (packet->sample_type == CS_ETM_EXCEPTION ||
>             packet->sample_type == CS_ETM_EXCEPTION_RET)
>                 return true;
> 
>         return false;
> }

Just clarify, I missed to mention I introduce two extra sample types:
CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET, one is for exception
entry packet and another is for exception exit packet.   If this is
hard for understanding, you could hold on for reveiwing new patch.

Thanks,
Leo Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/3] PCI: Allow specifying devices using a base bus and path of devfns
From: Alex Williamson @ 2018-05-30 16:23 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Christoffer Dall, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt,
	Christian König
In-Reply-To: <20180524214816.14485-3-logang@deltatee.com>

On Thu, 24 May 2018 15:48:15 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> When specifying PCI devices on the kernel command line using a
> BDF, the bus numbers can change when adding or replacing a device,
> changing motherboard firmware, or applying kernel parameters like
> pci=assign-buses. When this happens, it is usually undesirable to
> apply whatever command line tweak to the wrong device.
> 
> Therefore, it is useful to be able to specify devices with a base
> bus number and the path of devfns needed to get to it. (Similar to
> the "device scope" structure in the Intel VT-d spec, Section 8.3.1.)
> 
> Thus, we add an option to specify devices in the following format:
> 
> path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
> 
> The path can be any segment within the PCI hierarchy of any length and
> determined through the use of 'lspci -t'. When specified this way, it is
> less likely that a renumbered bus will result in a valid device specification
> and the tweak won't be applied to the wrong device.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Stephen Bates <sbates@raithlin.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  12 ++-
>  drivers/pci/pci.c                               | 106 +++++++++++++++++++++++-
>  2 files changed, 112 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 894aa516ceab..519ab95bb418 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2986,9 +2986,10 @@
>  
>  				Some options herein operate on a specific device
>  				or a set of devices (<pci_dev>). These are
> -				specified in one of two formats:
> +				specified in one of three formats:
>  
>  				[<domain>:]<bus>:<slot>.<func>
> +				path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
>  				pci:<vendor>:<device>[:<subvendor>:<subdevice>]
>  
>  				Note: the first format specifies a PCI
> @@ -2996,9 +2997,12 @@
>  				if new hardware is inserted, if motherboard
>  				firmware changes, or due to changes caused
>  				by other kernel parameters. The second format
> -				selects devices using IDs from the
> -				configuration space which may match multiple
> -				devices in the system.
> +				specifies a path from a device through
> +				a path of multiple slot/function addresses
> +				(this is more robust against renumbering
> +				issues). The third format selects devices using
> +				IDs from the configuration space which may match
> +				multiple devices in the system.
>  
>  		earlydump	[X86] dump PCI config space before the kernel
>  			        changes anything
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 85fec5e2640b..53ea0d7b02ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -184,22 +184,116 @@ EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  #endif
>  
>  /**
> + * pci_dev_str_match_path - test if a path string matches a device
> + * @dev:    the PCI device to test
> + * @p:      string to match the device against
> + * @endptr: pointer to the string after the match
> + *
> + * Test if a string (typically from a kernel parameter) formated as a
> + * path of slot/function addresses matches a PCI device. The string must
> + * be of the form:
> + *
> + *   [<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
> + *
> + * A path for a device can be obtained using 'lspci -t'. Using a path
> + * is more robust against renumbering of devices than using only
> + * a single bus, slot and function address.
> + *
> + * Returns 1 if the string matches the device, 0 if it does not and
> + * a negative error code if it fails to parse the string.
> + */
> +static int pci_dev_str_match_path(struct pci_dev *dev, const char *p,
> +				  const char **endptr)
> +{
> +	int ret;
> +	int seg, bus, slot, func, count;
> +	u8 *devfn_path;
> +	int num_devfn = 0;
> +	struct pci_dev *tmp;
> +
> +	ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,
> +		     &func, &count);
> +	if (ret != 4) {
> +		seg = 0;
> +		ret = sscanf(p, "%x:%x.%x%n", &bus, &slot,
> +			     &func, &count);
> +		if (ret != 3)
> +			return -EINVAL;
> +	}
> +
> +	p += count;
> +
> +	devfn_path = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	devfn_path[num_devfn++] = PCI_DEVFN(slot, func);
> +
> +	while (*p && *p != ',' && *p != ';') {
> +		ret = sscanf(p, "/%x.%x%n", &slot, &func, &count);
> +		if (ret != 2) {
> +			ret = -EINVAL;
> +			goto free_and_exit;
> +		}
> +
> +		p += count;
> +		devfn_path[num_devfn++] = PCI_DEVFN(slot, func);
> +		if (num_devfn >= PAGE_SIZE) {
> +			ret = -EINVAL;
> +			goto free_and_exit;
> +		}
> +	}
> +
> +	*endptr = p;
> +	ret = 0;
> +
> +	if (seg != pci_domain_nr(dev->bus))
> +		goto free_and_exit;
> +
> +	pci_dev_get(dev);
> +	while (num_devfn > 0 && dev) {
> +		num_devfn--;
> +
> +		if (devfn_path[num_devfn] != dev->devfn)
> +			goto put_and_exit;
> +
> +		if (num_devfn == 0 && bus == dev->bus->number) {
> +			ret = 1;
> +			goto put_and_exit;
> +		}
> +
> +		tmp = pci_dev_get(pci_upstream_bridge(dev));
> +		pci_dev_put(dev);
> +		dev = tmp;
> +	}
> +
> +put_and_exit:
> +	pci_dev_put(dev);
> +free_and_exit:
> +	kfree(devfn_path);
> +	return ret;
> +}

I like the idea, but can't we improve the implementation?  It seems
that we shouldn't need to allocate more than a working copy of the
original path string.  We can use strrchr() to find the last path
divider ('/'), match the slot.fn after that to the current devfn, set
that path divider to null, step to the next upstream device and
repeat.  Also, since we're working from a downstream device up, I
suspect we don't need to get and put references at each step, the
downstream device probably already holds a reference to the upstream
device for each step along the way.  It would be interesting to adapt
this to allow specifying a driver_override for a specific device as
well, I can think of a bunch of vfio users that would prefer that to
initrd scripts, but I know how much Bjorn loves more commandline
bloat ;)  Thanks,

Alex

> +
> +/**
>   * pci_dev_str_match - test if a string matches a device
>   * @dev:    the PCI device to test
>   * @p:      string to match the device against
>   * @endptr: pointer to the string after the match
>   *
>   * Test if a string (typically from a kernel parameter) matches a
> - * specified. The string may be of one of two forms formats:
> + * specified. The string may be of one of three formats:
>   *
>   *   [<domain>:]<bus>:<slot>.<func>
> + *   path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
>   *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
>   *
>   * The first format specifies a PCI bus/slot/function address which
>   * may change if new hardware is inserted, if motherboard firmware changes,
>   * or due to changes caused in kernel parameters.
>   *
> - * The second format matches devices using IDs in the configuration
> + * The second format specifies a PCI bus/slot/function root address and
> + * a path of slot/function addresses to the specific device from the root.
> + * The path for a device can be determined through the use of 'lspci -t'.
> + * This format is more robust against renumbering issues than the first format.
> +
> + * The third format matches devices using IDs in the configuration
>   * space which may match multiple devices in the system. A value of 0
>   * for any field will match all devices.
>   *
> @@ -236,7 +330,15 @@ static int pci_dev_str_match(struct pci_dev *dev, const char *p,
>  		    (!subsystem_device ||
>  			    subsystem_device == dev->subsystem_device))
>  			goto found;
> +	} else if (strncmp(p, "path:", 5) == 0) {
> +		/* PCI Root Bus and a path of Slot,Function IDs */
> +		p += 5;
>  
> +		ret = pci_dev_str_match_path(dev, p, &p);
> +		if (ret < 0)
> +			return ret;
> +		else if (ret)
> +			goto found;
>  	} else {
>  		/* PCI Bus,Slot,Function ids are specified */
>  		ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/3] PCI: Allow specifying devices using a base bus and path of devfns
From: Logan Gunthorpe @ 2018-05-30 17:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Christoffer Dall, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt,
	Christian König
In-Reply-To: <20180530102329.7423dc57@w520.home>



On 30/05/18 10:23 AM, Alex Williamson wrote:
> I like the idea, but can't we improve the implementation?  It seems
> that we shouldn't need to allocate more than a working copy of the
> original path string.  We can use strrchr() to find the last path
> divider ('/'), match the slot.fn after that to the current devfn, set
> that path divider to null, step to the next upstream device and
> repeat.  

Ok, I'll give it a shot. I thought this would be a bit more tricky, but
perhaps not.

> Also, since we're working from a downstream device up, I
> suspect we don't need to get and put references at each step, the
> downstream device probably already holds a reference to the upstream
> device for each step along the way.  

That makes sense to me. I think it's something I added in similar p2pdma
code from somebody's review. But sounds like I can probably strip it out
there too and just put a comment noting this.

Thanks,

Logan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V3 2/6] hwmon: Add support for RPi voltage sensor
From: Eric Anholt @ 2018-05-30 19:21 UTC (permalink / raw)
  To: Guenter Roeck, Stefan Wahren
  Cc: Jean Delvare, Jonathan Corbet, Florian Fainelli, Ray Jui,
	Scott Branden, Phil Elwell, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-rpi-kernel, linux-hwmon, linux-doc,
	Noralf Trønnes
In-Reply-To: <20180525193717.GA24967@roeck-us.net>

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

Guenter Roeck <linux@roeck-us.net> writes:

> On Fri, May 25, 2018 at 09:24:35PM +0200, Stefan Wahren wrote:
>> Currently there is no easy way to detect undervoltage conditions on a
>> remote Raspberry Pi. This hwmon driver retrieves the state of the
>> undervoltage sensor via mailbox interface. The handling based on
>> Noralf's modifications to the downstream firmware driver. In case of
>> an undervoltage condition only an entry is written to the kernel log.
>> 
>> CC: "Noralf Trønnes" <noralf@tronnes.org>
>> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
>
> Acked-by: Guenter Roeck <linux@roeck-us.net>
>
> ... assuming this will go through some arm tree.

That's fine with me, I can pull this whole set for 4.19.  Thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Zefan Li @ 2018-05-31  1:25 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi
In-Reply-To: <1527687991-1431-1-git-send-email-longman@redhat.com>

Hi Waiman,

On 2018/5/30 21:46, Waiman Long wrote:
> It was found that the cpuset.cpus could contain CPUs that are not listed
> in their parent's cpu list as shown by the command sequence below:
> 
>   # echo "+cpuset" >cgroup.subtree_control
>   # mkdir g1
>   # echo 0-5 >g1/cpuset.cpus
>   # mkdir g1/g11
>   # echo "+cpuset" > g1/cgroup.subtree_control
>   # echo 6-11 >g1/g11/cpuset.cpus
>   # grep -R . g1 | grep "\.cpus"
>   g1/cpuset.cpus:0-5
>   g1/cpuset.cpus.effective:0-5
>   g1/g11/cpuset.cpus:6-11
>   g1/g11/cpuset.cpus.effective:0-5
> 
> As the intersection of g11's cpus and that of g1 is empty, the effective
> cpus of g11 is just that of g1. The check in update_cpumask() is now
> corrected to make sure that cpus in a child cpus must be a subset of
> its parent's cpus. The error "write error: Invalid argument" will now
> be reported in the above case.
> 

We made the distinction between user-configured CPUs and effective CPUs
in commit 7e88291beefbb758, so actually it's not a bug.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 1/7] powerpc: Add TIDR CPU feature for POWER9
From: Andrew Donnellan @ 2018-05-31  4:19 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-2-alastair@au1.ibm.com>

On 11/05/18 16:12, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> This patch adds a CPU feature bit to show whether the CPU has
> the TIDR register available, enabling as_notify/wait in userspace.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation
From: Andrew Donnellan @ 2018-05-31  4:20 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-3-alastair@au1.ibm.com>

On 11/05/18 16:12, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> Switch the use of TIDR on it's CPU feature, rather than assuming it
> is available based on architecture.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 3/7] powerpc: use task_pid_nr() for TID allocation
From: Andrew Donnellan @ 2018-05-31  4:29 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-4-alastair@au1.ibm.com>

On 11/05/18 16:12, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The current implementation of TID allocation, using a global IDR, may
> result in an errant process starving the system of available TIDs.
> Instead, use task_pid_nr(), as mentioned by the original author. The
> scenario described which prevented it's use is not applicable, as
> set_thread_tidr can only be called after the task struct has been
> populated.
> 
> In the unlikely event that 2 threads share the TID and are waiting,
> all potential outcomes have been determined safe.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Thanks for the clarifying comment. The diff is painful to read but I 
think it makes sense :)

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
From: Andrew Donnellan @ 2018-05-31  4:32 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-5-alastair@au1.ibm.com>

On 11/05/18 16:13, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The function removes the process element from NPU cache.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>


-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 5/7] ocxl: Expose the thread_id needed for wait on POWER9
From: Andrew Donnellan @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-6-alastair@au1.ibm.com>

On 11/05/18 16:13, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> In order to successfully issue as_notify, an AFU needs to know the TID
> to notify, which in turn means that this information should be
> available in userspace so it can be communicated to the AFU.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Comments below.

> +#ifdef CONFIG_PPC64
> +static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
> +		struct ocxl_ioctl_p9_wait __user *uarg)
> +{
> +	struct ocxl_ioctl_p9_wait arg;
> +
> +	memset(&arg, 0, sizeof(arg));
> +
> +	if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
> +		enum ocxl_context_status status;
> +
> +		// Locks both status & tidr
> +		mutex_lock(&ctx->status_mutex);
> +		if (!ctx->tidr) {
> +			if (set_thread_tidr(current))
> +				return -ENOENT;
> +
> +			ctx->tidr = current->thread.tidr;
> +		}
> +
> +		status = ctx->status;
> +		mutex_unlock(&ctx->status_mutex);
> +
> +		if (status == ATTACHED) {
> +			int rc;
> +			struct link *link = ctx->afu->fn->link;
> +
> +			rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
> +			if (rc)
> +				return rc;
> +		}
> +
> +		arg.thread_id = ctx->tidr;
> +	} else
> +		return -ENOENT;

I didn't pick this up before - please please please use braces on both 
sides of the if here.

> +
> +	if (copy_to_user(uarg, &arg, sizeof(arg)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +#endif

> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 51ccf76db293..9ff6ddc28e22 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data);
>   
> +/**
> + * Update values within a Process Element
> + *
> + * link_handle: the link handle associated with the process element
> + * pasid: the PASID for the AFU context
> + * tid: the new thread id for the process element
> + */
> +extern int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);

My earlier comment about __u16 vs u16 applies for this declaration (and 
the body declaration) as well

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 6/7] ocxl: Add an IOCTL so userspace knows what OCXL features are available
From: Andrew Donnellan @ 2018-05-31  4:46 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-7-alastair@au1.ibm.com>

On 11/05/18 16:13, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> In order for a userspace AFU driver to call the POWER9 specific
> OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
> make that call.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 7/7] ocxl: Document new OCXL IOCTLs
From: Andrew Donnellan @ 2018-05-31  4:48 UTC (permalink / raw)
  To: Alastair D'Silva, linuxppc-dev
  Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
	felix, pombredanne, sukadev, npiggin, gregkh, arnd, fbarrat,
	corbet, Alastair D'Silva
In-Reply-To: <20180511061303.10728-8-alastair@au1.ibm.com>

On 11/05/18 16:13, Alastair D'Silva wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>


-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v1 0/2] support to set VSESR_EL2 by user space
From: Dongjiu Geng @ 2018-05-31 13:08 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi

This series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html

1. Detect whether KVM can set set guest SError syndrome
2. Support to Set VSESR_EL2 and inject SError by user space.
3. Support live migration to keep SError pending state and VSESR_EL2 value

The user space patch is here: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html

Dongjiu Geng (2):
  arm64: KVM: export the capability to set guest SError syndrome
  arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

 Documentation/virtual/kvm/api.txt    | 42 +++++++++++++++++++++++++++++++++---
 arch/arm/include/asm/kvm_host.h      |  6 ++++++
 arch/arm/kvm/guest.c                 | 12 +++++++++++
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/include/asm/kvm_host.h    |  7 ++++++
 arch/arm64/include/uapi/asm/kvm.h    | 13 +++++++++++
 arch/arm64/kvm/guest.c               | 36 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        |  7 +++++-
 arch/arm64/kvm/reset.c               |  4 ++++
 include/uapi/linux/kvm.h             |  1 +
 virt/kvm/arm/arm.c                   | 21 ++++++++++++++++++
 11 files changed, 150 insertions(+), 4 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v1 1/2] arm64: KVM: export the capability to set guest SError syndrome
From: Dongjiu Geng @ 2018-05-31 13:08 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1527772139-19665-1-git-send-email-gengdongjiu@huawei.com>

For the arm64 RAS Extension, user space can inject a virtual-SError
with specified ESR. So user space needs to know whether KVM support
to inject such SError, this interface adds this query for this capability.

KVM will check whether system support RAS Extension, if supported, KVM
returns true to user space, otherwise returns false.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  1 +
 3 files changed, 15 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 758bf40..fdac969 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4603,3 +4603,14 @@ Architectures: s390
 This capability indicates that kvm will implement the interfaces to handle
 reset, migration and nested KVM for branch prediction blocking. The stfle
 facility 82 should not be provided to the guest without this capability.
+
+8.14 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify the syndrome value reported
+to the guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, it can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+in ISS filed of ESR_EL1.
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b02c41e..e88f976 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -948,6 +948,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_BPB 152
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 155
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v1 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS
From: Dongjiu Geng @ 2018-05-31 13:08 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1527772139-19665-1-git-send-email-gengdongjiu@huawei.com>

For the migrating VMs, user space may need to know the exception
state. For example, in the machine A, KVM make an SError pending,
when migrate to B, KVM also needs to pend an SError.

This new IOCTL exports user-invisible states related to SError.
Together with appropriate user space changes, user space can get/set
the SError exception state to do migrate/snapshot/suspend.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
--
this series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html
change since V12:
1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) in kvm_arm_vcpu_get_events()

Change since V11:
Address James's comments, thanks James
1. Align the struct of kvm_vcpu_events to 64 bytes
2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
3. Change variables 'injected' name to 'serror_pending' in the kvm_arm_vcpu_set_events()
4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in kvm_arch_vcpu_ioctl()

Change since V10:
Address James's comments, thanks James
1. Merge the helper function with the user.
2. Move the ISS_MASK into pend_guest_serror() to clear top bits
3. Make kvm_vcpu_events struct align to 4 bytes
4. Add something check in the kvm_arm_vcpu_set_events()
5. Check kvm_arm_vcpu_get/set_events()'s return value.
6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space doesn't
   contain kernel stack.
---
 Documentation/virtual/kvm/api.txt    | 31 ++++++++++++++++++++++++++++---
 arch/arm/include/asm/kvm_host.h      |  6 ++++++
 arch/arm/kvm/guest.c                 | 12 ++++++++++++
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/include/asm/kvm_host.h    |  7 +++++++
 arch/arm64/include/uapi/asm/kvm.h    | 13 +++++++++++++
 arch/arm64/kvm/guest.c               | 36 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        |  7 ++++++-
 arch/arm64/kvm/reset.c               |  1 +
 virt/kvm/arm/arm.c                   | 21 +++++++++++++++++++++
 10 files changed, 135 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index fdac969..8896737 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -835,11 +835,13 @@ struct kvm_clock_data {
 
 Capability: KVM_CAP_VCPU_EVENTS
 Extended by: KVM_CAP_INTR_SHADOW
-Architectures: x86
+Architectures: x86, arm, arm64
 Type: vm ioctl
 Parameters: struct kvm_vcpu_event (out)
 Returns: 0 on success, -1 on error
 
+X86:
+
 Gets currently pending exceptions, interrupts, and NMIs as well as related
 states of the vcpu.
 
@@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
 - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
   smi contains a valid state.
 
+ARM, ARM64:
+
+Gets currently pending SError exceptions as well as related states of the vcpu.
+
+struct kvm_vcpu_events {
+	struct {
+		__u8 serror_pending;
+		__u8 serror_has_esr;
+		/* Align it to 8 bytes */
+		__u8 pad[6];
+		__u64 serror_esr;
+	} exception;
+	__u32 reserved[12];
+};
+
 4.32 KVM_SET_VCPU_EVENTS
 
-Capability: KVM_CAP_VCPU_EVENTS
+Capebility: KVM_CAP_VCPU_EVENTS
 Extended by: KVM_CAP_INTR_SHADOW
-Architectures: x86
+Architectures: x86, arm, arm64
 Type: vm ioctl
 Parameters: struct kvm_vcpu_event (in)
 Returns: 0 on success, -1 on error
 
+X86:
+
 Set pending exceptions, interrupts, and NMIs as well as related states of the
 vcpu.
 
@@ -910,6 +929,12 @@ shall be written into the VCPU.
 
 KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
 
+ARM, ARM64:
+
+Set pending SError exceptions as well as related states of the vcpu.
+
+See KVM_GET_VCPU_EVENTS for the data structure.
+
 
 4.33 KVM_GET_DEBUGREGS
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c7c28c8..39f9901 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -213,6 +213,12 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
+
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index a18f33e..c685f0e 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -261,6 +261,18 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	return -EINVAL;
+}
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 1dab3a9..18f61ff 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -81,6 +81,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
 	return (unsigned long *)&vcpu->arch.hcr_el2;
 }
 
+static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.vsesr_el2;
+}
+
 static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
 {
 	vcpu->arch.vsesr_el2 = vsesr;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 469de8a..357304a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -335,6 +335,11 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
@@ -363,6 +368,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
+
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
 void __kvm_set_tpidr_el2(u64 tpidr_el2);
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 04b3256..df4faee 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -39,6 +39,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_READONLY_MEM
+#define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
@@ -153,6 +154,18 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/* for KVM_GET/SET_VCPU_EVENTS */
+struct kvm_vcpu_events {
+	struct {
+		__u8 serror_pending;
+		__u8 serror_has_esr;
+		/* Align it to 8 bytes */
+		__u8 pad[6];
+		__u64 serror_esr;
+	} exception;
+	__u32 reserved[12];
+};
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 56a0260..71d3841 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -289,6 +289,42 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	events->exception.serror_pending = !!(vcpu->arch.hcr_el2 & HCR_VSE);
+	events->exception.serror_has_esr =
+			cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+					(!!vcpu_get_vsesr(vcpu));
+
+	if (events->exception.serror_pending &&
+		events->exception.serror_has_esr)
+		events->exception.serror_esr = vcpu_get_vsesr(vcpu);
+	else
+		events->exception.serror_esr = 0;
+
+	return 0;
+}
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	bool serror_pending = events->exception.serror_pending;
+	bool has_esr = events->exception.serror_has_esr;
+
+	if (serror_pending && has_esr) {
+		if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
+			return -EINVAL;
+
+		kvm_set_sei_esr(vcpu, events->exception.serror_esr);
+
+	} else if (serror_pending) {
+		kvm_inject_vabt(vcpu);
+	}
+
+	return 0;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index d8e7165..9e0ca56 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -166,7 +166,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 
 static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
 {
-	vcpu_set_vsesr(vcpu, esr);
+	vcpu_set_vsesr(vcpu, esr & ESR_ELx_ISS_MASK);
 	*vcpu_hcr(vcpu) |= HCR_VSE;
 }
 
@@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
 	pend_guest_serror(vcpu, ESR_ELx_ISV);
 }
+
+void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome)
+{
+	pend_guest_serror(vcpu, syndrome);
+}
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 38c8a64..20e919a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -82,6 +82,7 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
+	case KVM_CAP_VCPU_EVENTS:
 		r = 1;
 		break;
 	default:
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a4c1b76..8b43968 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1107,6 +1107,27 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = kvm_arm_vcpu_has_attr(vcpu, &attr);
 		break;
 	}
+	case KVM_GET_VCPU_EVENTS: {
+		struct kvm_vcpu_events events;
+
+		memset(&events, 0, sizeof(events));
+		if (kvm_arm_vcpu_get_events(vcpu, &events))
+			return -EINVAL;
+
+		if (copy_to_user(argp, &events, sizeof(events)))
+			return -EFAULT;
+
+		return 0;
+	}
+	case KVM_SET_VCPU_EVENTS: {
+		struct kvm_vcpu_events events;
+
+		if (copy_from_user(&events, argp,
+				sizeof(struct kvm_vcpu_events)))
+			return -EFAULT;
+
+		return kvm_arm_vcpu_set_events(vcpu, &events);
+	}
 	default:
 		r = -EINVAL;
 	}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Peter Zijlstra @ 2018-05-31  7:43 UTC (permalink / raw)
  To: Zefan Li
  Cc: Waiman Long, Tejun Heo, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <5B0F4F09.9050100@huawei.com>

On Thu, May 31, 2018 at 09:25:29AM +0800, Zefan Li wrote:
> Hi Waiman,
> 
> On 2018/5/30 21:46, Waiman Long wrote:
> > It was found that the cpuset.cpus could contain CPUs that are not listed
> > in their parent's cpu list as shown by the command sequence below:
> > 
> >   # echo "+cpuset" >cgroup.subtree_control
> >   # mkdir g1
> >   # echo 0-5 >g1/cpuset.cpus
> >   # mkdir g1/g11
> >   # echo "+cpuset" > g1/cgroup.subtree_control
> >   # echo 6-11 >g1/g11/cpuset.cpus
> >   # grep -R . g1 | grep "\.cpus"
> >   g1/cpuset.cpus:0-5
> >   g1/cpuset.cpus.effective:0-5
> >   g1/g11/cpuset.cpus:6-11
> >   g1/g11/cpuset.cpus.effective:0-5
> > 
> > As the intersection of g11's cpus and that of g1 is empty, the effective
> > cpus of g11 is just that of g1. The check in update_cpumask() is now
> > corrected to make sure that cpus in a child cpus must be a subset of
> > its parent's cpus. The error "write error: Invalid argument" will now
> > be reported in the above case.
> > 
> 
> We made the distinction between user-configured CPUs and effective CPUs
> in commit 7e88291beefbb758, so actually it's not a bug.

Why though; that makes no sense what so ever.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox