* Re: [PATCH v8 06/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping
From: Patrick Bellasi @ 2019-05-09 9:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Suren Baghdasaryan, LKML, linux-pm, linux-api, Ingo Molnar,
Tejun Heo, Rafael J . Wysocki, Vincent Guittot, Viresh Kumar,
Paul Turner, Quentin Perret, Dietmar Eggemann, Morten Rasmussen,
Juri Lelli, Todd Kjos, Joel Fernandes, Steve Muckle
In-Reply-To: <20190508194439.GF32547@worktop.programming.kicks-ass.net>
On 08-May 21:44, Peter Zijlstra wrote:
> On Tue, May 07, 2019 at 12:13:47PM +0100, Patrick Bellasi wrote:
> > On 17-Apr 15:26, Suren Baghdasaryan wrote:
> > > On Tue, Apr 2, 2019 at 3:42 AM Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
> > > > @@ -1056,6 +1100,13 @@ static void __init init_uclamp(void)
> > > > #else /* CONFIG_UCLAMP_TASK */
> > > > static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
> > > > static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> > > > +static inline int uclamp_validate(struct task_struct *p,
> > > > + const struct sched_attr *attr)
> > > > +{
> > > > + return -ENODEV;
> > >
> > > ENOSYS might be more appropriate?
> >
> > Yep, agree, thanks!
>
> No, -ENOSYS (see the comment) is special in that it indicates the whole
> system call is unavailable; that is most certainly not the case!
Yep, noted. Thanks.
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply
* Re: [PATCH v8 06/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping
From: Patrick Bellasi @ 2019-05-09 9:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190508194107.GE32547@worktop.programming.kicks-ass.net>
On 08-May 21:41, Peter Zijlstra wrote:
> On Tue, Apr 02, 2019 at 11:41:42AM +0100, Patrick Bellasi wrote:
> > @@ -1056,6 +1100,13 @@ static void __init init_uclamp(void)
> > #else /* CONFIG_UCLAMP_TASK */
> > static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
> > static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> > +static inline int uclamp_validate(struct task_struct *p,
> > + const struct sched_attr *attr)
> > +{
> > + return -ENODEV;
>
> Does that maybe want to be -EOPNOTSUPP ?
Suren propose ENOSYS for another similar case, i.e.
!CONFIG_UCLAMP_TASK definitions.
But EOPNOTSUPP seems more appropriate to me too.
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply
* Re: [PATCH v8 05/16] sched/core: Allow sched_setattr() to use the current policy
From: Patrick Bellasi @ 2019-05-09 9:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190508192131.GD32547@worktop.programming.kicks-ass.net>
On 08-May 21:21, Peter Zijlstra wrote:
> On Tue, Apr 02, 2019 at 11:41:41AM +0100, Patrick Bellasi wrote:
> > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> > index 22627f80063e..075c610adf45 100644
> > --- a/include/uapi/linux/sched.h
> > +++ b/include/uapi/linux/sched.h
> > @@ -40,6 +40,8 @@
> > /* SCHED_ISO: reserved but not implemented yet */
> > #define SCHED_IDLE 5
> > #define SCHED_DEADLINE 6
> > +/* Must be the last entry: used to sanity check attr.policy values */
> > +#define SCHED_POLICY_MAX SCHED_DEADLINE
>
> This is a wee bit sad to put in a uapi header; but yeah, where else :/
>
> Another option would be something like:
>
> enum {
> SCHED_NORMAL = 0,
> SCHED_FIFO = 1,
> SCHED_RR = 2,
> SCHED_BATCH = 3,
> /* SCHED_ISO = 4, reserved */
> SCHED_IDLE = 5,
> SCHED_DEADLINE = 6,
> SCHED_POLICY_NR
> };
I just wanted to minimize the changes by keeping the same structure...
If you prefer the above I can add a refactoring patch just to update
existing definitions before adding this patch...
>
> > /* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
> > #define SCHED_RESET_ON_FORK 0x40000000
> > @@ -50,9 +52,11 @@
> > #define SCHED_FLAG_RESET_ON_FORK 0x01
> > #define SCHED_FLAG_RECLAIM 0x02
> > #define SCHED_FLAG_DL_OVERRUN 0x04
> > +#define SCHED_FLAG_KEEP_POLICY 0x08
> >
> > #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
> > SCHED_FLAG_RECLAIM | \
> > - SCHED_FLAG_DL_OVERRUN)
> > + SCHED_FLAG_DL_OVERRUN | \
> > + SCHED_FLAG_KEEP_POLICY)
> >
> > #endif /* _UAPI_LINUX_SCHED_H */
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index d368ac26b8aa..20efb32e1a7e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4907,8 +4907,17 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> > if (retval)
> > return retval;
> >
> > - if ((int)attr.sched_policy < 0)
> > + /*
> > + * A valid policy is always required from userspace, unless
> > + * SCHED_FLAG_KEEP_POLICY is set and the current policy
> > + * is enforced for this call.
> > + */
> > + if (attr.sched_policy > SCHED_POLICY_MAX &&
> > + !(attr.sched_flags & SCHED_FLAG_KEEP_POLICY)) {
> > return -EINVAL;
> > + }
>
> And given I just looked at those darn SCHED_* things, I now note the
> above does 'funny' things when passed: attr.policy=4.
... and maybe factor in the same refactoring patch a check on
SCHED_ISO being not yet supported.
>
> > + if (attr.sched_flags & SCHED_FLAG_KEEP_POLICY)
> > + attr.sched_policy = SETPARAM_POLICY;
> >
> > rcu_read_lock();
> > retval = -ESRCH;
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Patrick Bellasi @ 2019-05-09 9:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190508191529.GA26813@worktop.programming.kicks-ass.net>
On 08-May 21:15, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 09:07:33PM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 02, 2019 at 11:41:40AM +0100, Patrick Bellasi wrote:
> > > +static inline struct uclamp_se
> > > +uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
> > > +{
> > > + struct uclamp_se uc_req = p->uclamp_req[clamp_id];
> > > + struct uclamp_se uc_max = uclamp_default[clamp_id];
> > > +
> > > + /* System default restrictions always apply */
> > > + if (unlikely(uc_req.value > uc_max.value))
> > > + return uc_max;
> > > +
> > > + return uc_req;
> > > +}
> > > +
> > > +static inline unsigned int
> > > +uclamp_eff_bucket_id(struct task_struct *p, unsigned int clamp_id)
> > > +{
> > > + struct uclamp_se uc_eff;
> > > +
> > > + /* Task currently refcounted: use back-annotated (effective) bucket */
> > > + if (p->uclamp[clamp_id].active)
> > > + return p->uclamp[clamp_id].bucket_id;
> > > +
> > > + uc_eff = uclamp_eff_get(p, clamp_id);
> > > +
> > > + return uc_eff.bucket_id;
> > > +}
> > > +
> > > +unsigned int uclamp_eff_value(struct task_struct *p, unsigned int clamp_id)
> > > +{
> > > + struct uclamp_se uc_eff;
> > > +
> > > + /* Task currently refcounted: use back-annotated (effective) value */
> > > + if (p->uclamp[clamp_id].active)
> > > + return p->uclamp[clamp_id].value;
> > > +
> > > + uc_eff = uclamp_eff_get(p, clamp_id);
> > > +
> > > + return uc_eff.value;
> > > +}
> >
> > This is 'wrong' because:
> >
> > uclamp_eff_value(p,id) := uclamp_eff(p,id).value
>
> Clearly I means to say the above does not hold with the given
> implementation, while the naming would suggest it does.
Not sure to completely get your point...
AFAIU, what you call uclamp_eff(p, id).value is the "value" member of
the struct returned by uclamp_eff_get(p,id), which is back annotate
by uclamp_rq_inc_id(p, rq, id) in:
p->uclamp[clamp_id].value
when a task becomes RUNNABLE.
> > Which seems to suggest the uclamp_eff_*() functions want another name.
That function returns the effective value of a task, which is either:
1. the back annotated value for a RUNNABLE task
or
2. the aggregation of task-specific, system-default and cgroup values
for a non RUNNABLE task.
> > Also, suppose the above would be true; does GCC really generate better
> > code for the LHS compared to the RHS?
It generate "sane" code which implements the above logic and allows
to know that whenever we call uclamp_eff_value(p,id) we get the most
updated effective value for a task, independently from its {!}RUNNABLE
state.
I would keep the function but, since Suren also complained also about
the name... perhaps I should come up with a better name? Proposals?
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Patrick Bellasi @ 2019-05-09 8:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190508190011.GB32547@worktop.programming.kicks-ass.net>
On 08-May 21:00, Peter Zijlstra wrote:
>
> There was a bunch of repetition that seemed fragile; does something like
> the below make sense?
Absolutely yes... will add to v9, thanks.
> Index: linux-2.6/kernel/sched/core.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/core.c
> +++ linux-2.6/kernel/sched/core.c
> @@ -770,6 +770,9 @@ unsigned int sysctl_sched_uclamp_util_ma
> /* All clamps are required to be less or equal than these values */
> static struct uclamp_se uclamp_default[UCLAMP_CNT];
>
> +#define for_each_clamp_id(clamp_id) \
> + for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
> +
> /* Integer rounded range for each bucket */
> #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
>
> @@ -790,6 +793,12 @@ static inline unsigned int uclamp_none(i
> return SCHED_CAPACITY_SCALE;
> }
>
> +static inline void uclamp_se_set(struct uclamp_se *uc_se, unsigned int value)
> +{
> + uc_se->value = value;
> + uc_se->bucket_id = uclamp_bucket_id(value);
> +}
> +
> static inline unsigned int
> uclamp_idle_value(struct rq *rq, unsigned int clamp_id, unsigned int clamp_value)
> {
> @@ -977,7 +986,7 @@ static inline void uclamp_rq_inc(struct
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
> + for_each_clamp_id(clamp_id)
> uclamp_rq_inc_id(p, rq, clamp_id);
>
> /* Reset clamp idle holding when there is one RUNNABLE task */
> @@ -992,7 +1001,7 @@ static inline void uclamp_rq_dec(struct
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
> + for_each_clamp_id(clamp_id)
> uclamp_rq_dec_id(p, rq, clamp_id);
> }
>
> @@ -1021,16 +1030,13 @@ int sysctl_sched_uclamp_handler(struct c
> }
>
> if (old_min != sysctl_sched_uclamp_util_min) {
> - uclamp_default[UCLAMP_MIN].value =
> - sysctl_sched_uclamp_util_min;
> - uclamp_default[UCLAMP_MIN].bucket_id =
> - uclamp_bucket_id(sysctl_sched_uclamp_util_min);
> + uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> + sysctl_sched_uclamp_util_min);
> }
> +
> if (old_max != sysctl_sched_uclamp_util_max) {
> - uclamp_default[UCLAMP_MAX].value =
> - sysctl_sched_uclamp_util_max;
> - uclamp_default[UCLAMP_MAX].bucket_id =
> - uclamp_bucket_id(sysctl_sched_uclamp_util_max);
> + uclamp_se_set(&uclamp_default[UCLAMP_MAX],
> + sysctl_sched_uclamp_util_max);
> }
>
> /*
> @@ -1052,7 +1058,7 @@ static void uclamp_fork(struct task_stru
> {
> unsigned int clamp_id;
>
> - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
> + for_each_clamp_id(clamp_id)
> p->uclamp[clamp_id].active = false;
> }
>
> @@ -1067,17 +1073,12 @@ static void __init init_uclamp(void)
> cpu_rq(cpu)->uclamp_flags = 0;
> }
>
> - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) {
> - struct uclamp_se *uc_se = &init_task.uclamp_req[clamp_id];
> -
> - uc_se->value = uclamp_none(clamp_id);
> - uc_se->bucket_id = uclamp_bucket_id(uc_se->value);
> - }
> + for_each_clamp_id(clamp_id)
> + uclamp_se_set(&init_task.uclamp_req[clamp_id], uclamp_none(clamp_id));
>
> /* System defaults allow max clamp values for both indexes */
> - uc_max.value = uclamp_none(UCLAMP_MAX);
> - uc_max.bucket_id = uclamp_bucket_id(uc_max.value);
> - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
> + uclamp_se_set(&uc_max, uclamp_none(UCLAMP_MAX));
> + for_each_clamp_id(clamp_id)
> uclamp_default[clamp_id] = uc_max;
> }
>
>
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Patrick Bellasi @ 2019-05-09 8:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190508184202.GA32547@worktop.programming.kicks-ass.net>
On 08-May 20:42, Peter Zijlstra wrote:
> On Tue, Apr 02, 2019 at 11:41:40AM +0100, Patrick Bellasi wrote:
> > Add a privileged interface to define a system default configuration via:
> >
> > /proc/sys/kernel/sched_uclamp_util_{min,max}
>
> Isn't the 'u' in "uclamp" already for util?
Yes, right... I've just wanted to keep the same "uclamp" prefix used
by all related kernel symbols. But, since that's user-space API we can
certainly drop it and go for either:
/proc/sys/kernel/sched_clamp_util_{min,max}
/proc/sys/kernel/sched_util_clamp_{min,max}
Preference?
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply
* Re: [PATCH v9 00/24] ILP32 for ARM64
From: Yury Norov @ 2019-05-08 23:10 UTC (permalink / raw)
To: Yury Norov
Cc: Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
linux-doc, linux-arch, linux-api, Adam Borowski, Alexander Graf,
Alexey Klimov, Andreas Schwab, Andrew Pinski, Bamvor Zhangjian,
Chris Metcalf, Christoph Muellner, Dave Martin, David S . Miller,
Florian Weimer, Geert Uytterhoeven, Heiko Carstens,
James Hogan <jame>
In-Reply-To: <20190508225900.GA14091@yury-thinkpad>
On Wed, May 08, 2019 at 03:59:00PM -0700, Yury Norov wrote:
> Hi all,
>
> On Wed, May 16, 2018 at 11:18:45AM +0300, Yury Norov wrote:
> > This series enables AARCH64 with ILP32 mode.
> >
> > As supporting work, it introduces ARCH_32BIT_OFF_T configuration
> > option that is enabled for existing 32-bit architectures but disabled
> > for new arches (so 64-bit off_t userspace type is used by new userspace).
> > Also it deprecates getrlimit and setrlimit syscalls prior to prlimit64.
> >
> > Based on kernel v4.16. Tested with LTP, glibc testsuite, trinity, lmbench,
> > CPUSpec.
> >
> > This series on github:
> > https://github.com/norov/linux/tree/ilp32-4.16
> > Linaro toolchain:
> > http://snapshots.linaro.org/components/toolchain/binaries/7.3-2018.04-rc1/aarch64-linux-gnu_ilp32/
> > Debian repo:
> > http://people.linaro.org/~wookey/ilp32/
> > OpenSUSE repo:
> > https://build.opensuse.org/project/show/devel:ARM:Factory:Contrib:ILP32
>
> This is the 5.1-based version.
> Changes comparing to 5.0:
> - drop arch patches that has been taken upstream:
> 80d7da1cac62 asm-generic: Drop getrlimit and setrlimit syscalls from default list
> 942fa985e9f1 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
> 0d0216c03a7a compat ABI: use non-compat openat and open_by_handle_at variants
> - in include/linux/thread_bits.h define current_thread_info() prior to
> inclusion of asm/thread_info.h, to avoid circullar dependencies (thread: move
> thread bits accessors to separated file);
> - enable old IPC interfaces for ilp32, according to mainline changes
> (arm64: ilp32: introduce syscall table for ILP32).
Missed link:
https://github.com/norov/linux/tree/ilp32-5.1
>
> Thanks,
> Yury
^ permalink raw reply
* Re: [PATCH v9 00/24] ILP32 for ARM64
From: Yury Norov @ 2019-05-08 22:59 UTC (permalink / raw)
To: Yury Norov
Cc: Catalin Marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
linux-doc, linux-arch, linux-api, Adam Borowski, Alexander Graf,
Alexey Klimov, Andreas Schwab, Andrew Pinski, Bamvor Zhangjian,
Chris Metcalf, Christoph Muellner, Dave Martin, David S . Miller,
Florian Weimer, Geert Uytterhoeven, Heiko Carstens,
James Hogan <jame>
In-Reply-To: <20180516081910.10067-1-ynorov@caviumnetworks.com>
Hi all,
On Wed, May 16, 2018 at 11:18:45AM +0300, Yury Norov wrote:
> This series enables AARCH64 with ILP32 mode.
>
> As supporting work, it introduces ARCH_32BIT_OFF_T configuration
> option that is enabled for existing 32-bit architectures but disabled
> for new arches (so 64-bit off_t userspace type is used by new userspace).
> Also it deprecates getrlimit and setrlimit syscalls prior to prlimit64.
>
> Based on kernel v4.16. Tested with LTP, glibc testsuite, trinity, lmbench,
> CPUSpec.
>
> This series on github:
> https://github.com/norov/linux/tree/ilp32-4.16
> Linaro toolchain:
> http://snapshots.linaro.org/components/toolchain/binaries/7.3-2018.04-rc1/aarch64-linux-gnu_ilp32/
> Debian repo:
> http://people.linaro.org/~wookey/ilp32/
> OpenSUSE repo:
> https://build.opensuse.org/project/show/devel:ARM:Factory:Contrib:ILP32
This is the 5.1-based version.
Changes comparing to 5.0:
- drop arch patches that has been taken upstream:
80d7da1cac62 asm-generic: Drop getrlimit and setrlimit syscalls from default list
942fa985e9f1 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
0d0216c03a7a compat ABI: use non-compat openat and open_by_handle_at variants
- in include/linux/thread_bits.h define current_thread_info() prior to
inclusion of asm/thread_info.h, to avoid circullar dependencies (thread: move
thread bits accessors to separated file);
- enable old IPC interfaces for ilp32, according to mainline changes
(arm64: ilp32: introduce syscall table for ILP32).
Thanks,
Yury
^ permalink raw reply
* Re: [PATCH v8 06/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping
From: Peter Zijlstra @ 2019-05-08 19:44 UTC (permalink / raw)
To: Patrick Bellasi
Cc: Suren Baghdasaryan, LKML, linux-pm, linux-api, Ingo Molnar,
Tejun Heo, Rafael J . Wysocki, Vincent Guittot, Viresh Kumar,
Paul Turner, Quentin Perret, Dietmar Eggemann, Morten Rasmussen,
Juri Lelli, Todd Kjos, Joel Fernandes, Steve Muckle
In-Reply-To: <20190507111347.4ivnjwbymsf7i3e6@e110439-lin>
On Tue, May 07, 2019 at 12:13:47PM +0100, Patrick Bellasi wrote:
> On 17-Apr 15:26, Suren Baghdasaryan wrote:
> > On Tue, Apr 2, 2019 at 3:42 AM Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > @@ -1056,6 +1100,13 @@ static void __init init_uclamp(void)
> > > #else /* CONFIG_UCLAMP_TASK */
> > > static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
> > > static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> > > +static inline int uclamp_validate(struct task_struct *p,
> > > + const struct sched_attr *attr)
> > > +{
> > > + return -ENODEV;
> >
> > ENOSYS might be more appropriate?
>
> Yep, agree, thanks!
No, -ENOSYS (see the comment) is special in that it indicates the whole
system call is unavailable; that is most certainly not the case!
^ permalink raw reply
* Re: [PATCH v8 06/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping
From: Peter Zijlstra @ 2019-05-08 19:41 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190402104153.25404-7-patrick.bellasi@arm.com>
On Tue, Apr 02, 2019 at 11:41:42AM +0100, Patrick Bellasi wrote:
> @@ -1056,6 +1100,13 @@ static void __init init_uclamp(void)
> #else /* CONFIG_UCLAMP_TASK */
> static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
> static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> +static inline int uclamp_validate(struct task_struct *p,
> + const struct sched_attr *attr)
> +{
> + return -ENODEV;
Does that maybe want to be -EOPNOTSUPP ?
> +}
^ permalink raw reply
* Re: [PATCH v8 05/16] sched/core: Allow sched_setattr() to use the current policy
From: Peter Zijlstra @ 2019-05-08 19:21 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190402104153.25404-6-patrick.bellasi@arm.com>
On Tue, Apr 02, 2019 at 11:41:41AM +0100, Patrick Bellasi wrote:
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index 22627f80063e..075c610adf45 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -40,6 +40,8 @@
> /* SCHED_ISO: reserved but not implemented yet */
> #define SCHED_IDLE 5
> #define SCHED_DEADLINE 6
> +/* Must be the last entry: used to sanity check attr.policy values */
> +#define SCHED_POLICY_MAX SCHED_DEADLINE
This is a wee bit sad to put in a uapi header; but yeah, where else :/
Another option would be something like:
enum {
SCHED_NORMAL = 0,
SCHED_FIFO = 1,
SCHED_RR = 2,
SCHED_BATCH = 3,
/* SCHED_ISO = 4, reserved */
SCHED_IDLE = 5,
SCHED_DEADLINE = 6,
SCHED_POLICY_NR
};
> /* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
> #define SCHED_RESET_ON_FORK 0x40000000
> @@ -50,9 +52,11 @@
> #define SCHED_FLAG_RESET_ON_FORK 0x01
> #define SCHED_FLAG_RECLAIM 0x02
> #define SCHED_FLAG_DL_OVERRUN 0x04
> +#define SCHED_FLAG_KEEP_POLICY 0x08
>
> #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
> SCHED_FLAG_RECLAIM | \
> - SCHED_FLAG_DL_OVERRUN)
> + SCHED_FLAG_DL_OVERRUN | \
> + SCHED_FLAG_KEEP_POLICY)
>
> #endif /* _UAPI_LINUX_SCHED_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d368ac26b8aa..20efb32e1a7e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4907,8 +4907,17 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> if (retval)
> return retval;
>
> - if ((int)attr.sched_policy < 0)
> + /*
> + * A valid policy is always required from userspace, unless
> + * SCHED_FLAG_KEEP_POLICY is set and the current policy
> + * is enforced for this call.
> + */
> + if (attr.sched_policy > SCHED_POLICY_MAX &&
> + !(attr.sched_flags & SCHED_FLAG_KEEP_POLICY)) {
> return -EINVAL;
> + }
And given I just looked at those darn SCHED_* things, I now note the
above does 'funny' things when passed: attr.policy=4.
> + if (attr.sched_flags & SCHED_FLAG_KEEP_POLICY)
> + attr.sched_policy = SETPARAM_POLICY;
>
> rcu_read_lock();
> retval = -ESRCH;
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Peter Zijlstra @ 2019-05-08 19:15 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190508190733.GC32547@worktop.programming.kicks-ass.net>
On Wed, May 08, 2019 at 09:07:33PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 02, 2019 at 11:41:40AM +0100, Patrick Bellasi wrote:
> > +static inline struct uclamp_se
> > +uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
> > +{
> > + struct uclamp_se uc_req = p->uclamp_req[clamp_id];
> > + struct uclamp_se uc_max = uclamp_default[clamp_id];
> > +
> > + /* System default restrictions always apply */
> > + if (unlikely(uc_req.value > uc_max.value))
> > + return uc_max;
> > +
> > + return uc_req;
> > +}
> > +
> > +static inline unsigned int
> > +uclamp_eff_bucket_id(struct task_struct *p, unsigned int clamp_id)
> > +{
> > + struct uclamp_se uc_eff;
> > +
> > + /* Task currently refcounted: use back-annotated (effective) bucket */
> > + if (p->uclamp[clamp_id].active)
> > + return p->uclamp[clamp_id].bucket_id;
> > +
> > + uc_eff = uclamp_eff_get(p, clamp_id);
> > +
> > + return uc_eff.bucket_id;
> > +}
> > +
> > +unsigned int uclamp_eff_value(struct task_struct *p, unsigned int clamp_id)
> > +{
> > + struct uclamp_se uc_eff;
> > +
> > + /* Task currently refcounted: use back-annotated (effective) value */
> > + if (p->uclamp[clamp_id].active)
> > + return p->uclamp[clamp_id].value;
> > +
> > + uc_eff = uclamp_eff_get(p, clamp_id);
> > +
> > + return uc_eff.value;
> > +}
>
> This is 'wrong' because:
>
> uclamp_eff_value(p,id) := uclamp_eff(p,id).value
Clearly I means to say the above does not hold with the given
implementation, while the naming would suggest it does.
> Which seems to suggest the uclamp_eff_*() functions want another name.
>
> Also, suppose the above would be true; does GCC really generate better
> code for the LHS compared to the RHS?
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Peter Zijlstra @ 2019-05-08 19:07 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190402104153.25404-5-patrick.bellasi@arm.com>
On Tue, Apr 02, 2019 at 11:41:40AM +0100, Patrick Bellasi wrote:
> +static inline struct uclamp_se
> +uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
> +{
> + struct uclamp_se uc_req = p->uclamp_req[clamp_id];
> + struct uclamp_se uc_max = uclamp_default[clamp_id];
> +
> + /* System default restrictions always apply */
> + if (unlikely(uc_req.value > uc_max.value))
> + return uc_max;
> +
> + return uc_req;
> +}
> +
> +static inline unsigned int
> +uclamp_eff_bucket_id(struct task_struct *p, unsigned int clamp_id)
> +{
> + struct uclamp_se uc_eff;
> +
> + /* Task currently refcounted: use back-annotated (effective) bucket */
> + if (p->uclamp[clamp_id].active)
> + return p->uclamp[clamp_id].bucket_id;
> +
> + uc_eff = uclamp_eff_get(p, clamp_id);
> +
> + return uc_eff.bucket_id;
> +}
> +
> +unsigned int uclamp_eff_value(struct task_struct *p, unsigned int clamp_id)
> +{
> + struct uclamp_se uc_eff;
> +
> + /* Task currently refcounted: use back-annotated (effective) value */
> + if (p->uclamp[clamp_id].active)
> + return p->uclamp[clamp_id].value;
> +
> + uc_eff = uclamp_eff_get(p, clamp_id);
> +
> + return uc_eff.value;
> +}
This is 'wrong' because:
uclamp_eff_value(p,id) := uclamp_eff(p,id).value
Which seems to suggest the uclamp_eff_*() functions want another name.
Also, suppose the above would be true; does GCC really generate better
code for the LHS compared to the RHS?
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Peter Zijlstra @ 2019-05-08 19:00 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190402104153.25404-5-patrick.bellasi@arm.com>
There was a bunch of repetition that seemed fragile; does something like
the below make sense?
Index: linux-2.6/kernel/sched/core.c
===================================================================
--- linux-2.6.orig/kernel/sched/core.c
+++ linux-2.6/kernel/sched/core.c
@@ -770,6 +770,9 @@ unsigned int sysctl_sched_uclamp_util_ma
/* All clamps are required to be less or equal than these values */
static struct uclamp_se uclamp_default[UCLAMP_CNT];
+#define for_each_clamp_id(clamp_id) \
+ for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
+
/* Integer rounded range for each bucket */
#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
@@ -790,6 +793,12 @@ static inline unsigned int uclamp_none(i
return SCHED_CAPACITY_SCALE;
}
+static inline void uclamp_se_set(struct uclamp_se *uc_se, unsigned int value)
+{
+ uc_se->value = value;
+ uc_se->bucket_id = uclamp_bucket_id(value);
+}
+
static inline unsigned int
uclamp_idle_value(struct rq *rq, unsigned int clamp_id, unsigned int clamp_value)
{
@@ -977,7 +986,7 @@ static inline void uclamp_rq_inc(struct
if (unlikely(!p->sched_class->uclamp_enabled))
return;
- for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
+ for_each_clamp_id(clamp_id)
uclamp_rq_inc_id(p, rq, clamp_id);
/* Reset clamp idle holding when there is one RUNNABLE task */
@@ -992,7 +1001,7 @@ static inline void uclamp_rq_dec(struct
if (unlikely(!p->sched_class->uclamp_enabled))
return;
- for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
+ for_each_clamp_id(clamp_id)
uclamp_rq_dec_id(p, rq, clamp_id);
}
@@ -1021,16 +1030,13 @@ int sysctl_sched_uclamp_handler(struct c
}
if (old_min != sysctl_sched_uclamp_util_min) {
- uclamp_default[UCLAMP_MIN].value =
- sysctl_sched_uclamp_util_min;
- uclamp_default[UCLAMP_MIN].bucket_id =
- uclamp_bucket_id(sysctl_sched_uclamp_util_min);
+ uclamp_se_set(&uclamp_default[UCLAMP_MIN],
+ sysctl_sched_uclamp_util_min);
}
+
if (old_max != sysctl_sched_uclamp_util_max) {
- uclamp_default[UCLAMP_MAX].value =
- sysctl_sched_uclamp_util_max;
- uclamp_default[UCLAMP_MAX].bucket_id =
- uclamp_bucket_id(sysctl_sched_uclamp_util_max);
+ uclamp_se_set(&uclamp_default[UCLAMP_MAX],
+ sysctl_sched_uclamp_util_max);
}
/*
@@ -1052,7 +1058,7 @@ static void uclamp_fork(struct task_stru
{
unsigned int clamp_id;
- for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
+ for_each_clamp_id(clamp_id)
p->uclamp[clamp_id].active = false;
}
@@ -1067,17 +1073,12 @@ static void __init init_uclamp(void)
cpu_rq(cpu)->uclamp_flags = 0;
}
- for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) {
- struct uclamp_se *uc_se = &init_task.uclamp_req[clamp_id];
-
- uc_se->value = uclamp_none(clamp_id);
- uc_se->bucket_id = uclamp_bucket_id(uc_se->value);
- }
+ for_each_clamp_id(clamp_id)
+ uclamp_se_set(&init_task.uclamp_req[clamp_id], uclamp_none(clamp_id));
/* System defaults allow max clamp values for both indexes */
- uc_max.value = uclamp_none(UCLAMP_MAX);
- uc_max.bucket_id = uclamp_bucket_id(uc_max.value);
- for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
+ uclamp_se_set(&uc_max, uclamp_none(UCLAMP_MAX));
+ for_each_clamp_id(clamp_id)
uclamp_default[clamp_id] = uc_max;
}
^ permalink raw reply
* Re: [PATCH v8 04/16] sched/core: uclamp: Add system default clamps
From: Peter Zijlstra @ 2019-05-08 18:42 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, linux-api, Ingo Molnar, Tejun Heo,
Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan
In-Reply-To: <20190402104153.25404-5-patrick.bellasi@arm.com>
On Tue, Apr 02, 2019 at 11:41:40AM +0100, Patrick Bellasi wrote:
> Add a privileged interface to define a system default configuration via:
>
> /proc/sys/kernel/sched_uclamp_util_{min,max}
Isn't the 'u' in "uclamp" already for util?
^ permalink raw reply
* Re: [PATCH v2 02/18] fpga: dfl: fme: remove copy_to_user() in ioctl for PR
From: Alan Tull @ 2019-05-08 17:58 UTC (permalink / raw)
To: Moritz Fischer; +Cc: Wu Hao, linux-fpga, linux-kernel, linux-api, Xu Yilun
In-Reply-To: <20190507172545.GA26690@archbox>
On Tue, May 7, 2019 at 12:26 PM Moritz Fischer <mdf@kernel.org> wrote:
>
> On Mon, Apr 29, 2019 at 04:55:35PM +0800, Wu Hao wrote:
> > This patch removes copy_to_user() code in partial reconfiguration
> > ioctl, as it's useless as user never needs to read the data
> > structure after ioctl.
> >
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> Acked-by: Moritz Fischer <mdf@kernel.org>
Acked-by: Alan Tull <atull@kernel.org>
Alan
> > ---
> > v2: clean up code split from patch 2 in v1 patchset.
> > ---
> > drivers/fpga/dfl-fme-pr.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
> > index d9ca955..6ec0f09 100644
> > --- a/drivers/fpga/dfl-fme-pr.c
> > +++ b/drivers/fpga/dfl-fme-pr.c
> > @@ -159,9 +159,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > mutex_unlock(&pdata->lock);
> > free_exit:
> > vfree(buf);
> > - if (copy_to_user((void __user *)arg, &port_pr, minsz))
> > - return -EFAULT;
> > -
> > return ret;
> > }
> >
> > --
> > 1.8.3.1
> >
^ permalink raw reply
* Re: [PATCH v3 09/11] platform/x86: asus-wmi: Control RGB keyboard backlight
From: Pavel Machek @ 2019-05-08 17:12 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Yurii Pavlovskyi, Jacek Anaszewski, Linux LED Subsystem,
Corentin Chary, Darren Hart, Andy Shevchenko, Daniel Drake,
acpi4asus-user, Platform Driver, Linux Kernel Mailing List,
linux-api
In-Reply-To: <CAHp75Vf9uPG7_K0P26nHYCH0WB6LFX3wk8aJBpLWQ-r46kDw9w@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]
Hi!
> > The WMI exposes two methods for controlling RGB keyboard backlight, which
> > allows controlling:
> > * RGB components in range 00 - ff,
> > * Switch between 4 effects,
> > * Switch between 3 effect speed modes,
> > * Separately enable the backlight on boot, in the awake state (after driver
> > load), in sleep mode, and probably in something called shutdown mode (no
> > observable effects of enabling it are known so far).
> >
> > The configuration should be written to several sysfs parameter buffers
> > which are then written via WMI by writing either 1 or 2 to the "kbbl_set"
> > parameter. When reading the buffers the last written value is returned.
> >
> > If the 2 is written to "kbbl_set", the parameters will be reset on reboot
> > (temporary mode), 1 is permanent mode, parameters are retained.
> >
> > The calls use new 3-dword input buffer method call.
> >
> > The functionality is only enabled if corresponding DSTS methods return
> > exact valid values.
> >
> > The following script demonstrates usage:
> >
> > echo Red [00 - ff]
> > echo 33 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_red
> > echo Green [00 - ff]
> > echo ff > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_green
> > echo Blue [00 - ff]
> > echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_blue
> > echo Mode: 0 - static color, 1 - breathing, 2 - color cycle, 3 - strobing
> > echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_mode
> > echo Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast
> > echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_speed
> > echo Enable: 02 - on boot, before module load, 08 - awake, 20 - sleep,
> > echo 2a or ff to set all
> > echo 2a > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_flags
> > echo Save: 1 - permanently, 2 - temporarily, reset after reboot
> > echo 1 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_set
> >
>
> Shouldn't be the LED subsystem driver for this?
Yes, please. We have common interface for LED drivers; this needs to
use it.
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [PATCH v3 09/11] platform/x86: asus-wmi: Control RGB keyboard backlight
From: Andy Shevchenko @ 2019-05-08 14:02 UTC (permalink / raw)
To: Yurii Pavlovskyi, Jacek Anaszewski, Linux LED Subsystem
Cc: Corentin Chary, Darren Hart, Andy Shevchenko, Daniel Drake,
acpi4asus-user, Platform Driver, Linux Kernel Mailing List,
linux-api
In-Reply-To: <c953b43b-6186-77e9-54b1-b1cd1d7d1eb6@gmail.com>
On Fri, Apr 19, 2019 at 1:14 PM Yurii Pavlovskyi
<yurii.pavlovskyi@gmail.com> wrote:
>
> The WMI exposes two methods for controlling RGB keyboard backlight, which
> allows controlling:
> * RGB components in range 00 - ff,
> * Switch between 4 effects,
> * Switch between 3 effect speed modes,
> * Separately enable the backlight on boot, in the awake state (after driver
> load), in sleep mode, and probably in something called shutdown mode (no
> observable effects of enabling it are known so far).
>
> The configuration should be written to several sysfs parameter buffers
> which are then written via WMI by writing either 1 or 2 to the "kbbl_set"
> parameter. When reading the buffers the last written value is returned.
>
> If the 2 is written to "kbbl_set", the parameters will be reset on reboot
> (temporary mode), 1 is permanent mode, parameters are retained.
>
> The calls use new 3-dword input buffer method call.
>
> The functionality is only enabled if corresponding DSTS methods return
> exact valid values.
>
> The following script demonstrates usage:
>
> echo Red [00 - ff]
> echo 33 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_red
> echo Green [00 - ff]
> echo ff > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_green
> echo Blue [00 - ff]
> echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_blue
> echo Mode: 0 - static color, 1 - breathing, 2 - color cycle, 3 - strobing
> echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_mode
> echo Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast
> echo 0 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_speed
> echo Enable: 02 - on boot, before module load, 08 - awake, 20 - sleep,
> echo 2a or ff to set all
> echo 2a > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_flags
> echo Save: 1 - permanently, 2 - temporarily, reset after reboot
> echo 1 > /sys/devices/platform/asus-nb-wmi/kbbl/kbbl_set
>
Shouldn't be the LED subsystem driver for this?
> Signed-off-by: Yurii Pavlovskyi <yurii.pavlovskyi@gmail.com>
> ---
> .../ABI/testing/sysfs-platform-asus-wmi | 61 ++++
> drivers/platform/x86/asus-wmi.c | 331 ++++++++++++++++++
> include/linux/platform_data/x86/asus-wmi.h | 2 +
> 3 files changed, 394 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-asus-wmi b/Documentation/ABI/testing/sysfs-platform-asus-wmi
> index 019e1e29370e..1cc54d5e3e10 100644
> --- a/Documentation/ABI/testing/sysfs-platform-asus-wmi
> +++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi
> @@ -36,3 +36,64 @@ KernelVersion: 3.5
> Contact: "AceLan Kao" <acelan.kao@canonical.com>
> Description:
> Resume on lid open. 1 means on, 0 means off.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_red
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + RGB keyboard backlight red component: 00 .. ff.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_green
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + RGB keyboard backlight green component: 00 .. ff.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_blue
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + RGB keyboard backlight blue component: 00 .. ff.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_mode
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + RGB keyboard backlight mode:
> + * 0 - static color,
> + * 1 - breathing,
> + * 2 - color cycle,
> + * 3 - strobing.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_speed
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + RGB keyboard backlight speed for modes 1 and 2:
> + * 0 - slow,
> + * 1 - medium,
> + * 2 - fast.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_flags
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + RGB keyboard backlight enable flags (2a to enable everything), OR of:
> + * 02 - on boot (until module load),
> + * 08 - awake,
> + * 20 - sleep.
> +
> +What: /sys/devices/platform/<platform>/kbbl/kbbl_set
> +Date: Apr 2019
> +KernelVersion: 5.1
> +Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
> +Description:
> + Write changed RGB keyboard backlight parameters:
> + * 1 - permanently,
> + * 2 - temporarily.
> diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
> index 1b8272374660..0a32079336d8 100644
> --- a/drivers/platform/x86/asus-wmi.c
> +++ b/drivers/platform/x86/asus-wmi.c
> @@ -148,6 +148,21 @@ struct asus_rfkill {
> u32 dev_id;
> };
>
> +struct asus_kbbl_rgb {
> + u8 kbbl_red;
> + u8 kbbl_green;
> + u8 kbbl_blue;
> + u8 kbbl_mode;
> + u8 kbbl_speed;
> +
> + u8 kbbl_set_red;
> + u8 kbbl_set_green;
> + u8 kbbl_set_blue;
> + u8 kbbl_set_mode;
> + u8 kbbl_set_speed;
> + u8 kbbl_set_flags;
> +};
> +
> struct asus_wmi {
> int dsts_id;
> int spec;
> @@ -183,6 +198,9 @@ struct asus_wmi {
> int asus_hwmon_num_fans;
> int asus_hwmon_pwm;
>
> + bool kbbl_rgb_available;
> + struct asus_kbbl_rgb kbbl_rgb;
> +
> struct hotplug_slot hotplug_slot;
> struct mutex hotplug_lock;
> struct mutex wmi_lock;
> @@ -658,6 +676,312 @@ static int asus_wmi_led_init(struct asus_wmi *asus)
> return rv;
> }
>
> +/* RGB keyboard backlight *****************************************************/
> +
> +static ssize_t show_u8(u8 value, char *buf)
> +{
> + return scnprintf(buf, PAGE_SIZE, "%02x\n", value);
> +}
> +
> +static ssize_t store_u8(u8 *value, const char *buf, int count)
> +{
> + int err;
> + u8 result;
> +
> + err = kstrtou8(buf, 16, &result);
> + if (err < 0) {
> + pr_warn("Trying to store invalid value\n");
> + return err;
> + }
> +
> + *value = result;
> +
> + return count;
> +}
> +
> +static ssize_t kbbl_red_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return show_u8(asus->kbbl_rgb.kbbl_red, buf);
> +}
> +
> +static ssize_t kbbl_red_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return store_u8(&asus->kbbl_rgb.kbbl_set_red, buf, count);
> +}
> +
> +static ssize_t kbbl_green_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return show_u8(asus->kbbl_rgb.kbbl_green, buf);
> +}
> +
> +static ssize_t kbbl_green_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return store_u8(&asus->kbbl_rgb.kbbl_set_green, buf, count);
> +}
> +
> +static ssize_t kbbl_blue_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return show_u8(asus->kbbl_rgb.kbbl_blue, buf);
> +}
> +
> +static ssize_t kbbl_blue_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return store_u8(&asus->kbbl_rgb.kbbl_set_blue, buf, count);
> +}
> +
> +static ssize_t kbbl_mode_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return show_u8(asus->kbbl_rgb.kbbl_mode, buf);
> +}
> +
> +static ssize_t kbbl_mode_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return store_u8(&asus->kbbl_rgb.kbbl_set_mode, buf, count);
> +}
> +
> +static ssize_t kbbl_speed_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return show_u8(asus->kbbl_rgb.kbbl_speed, buf);
> +}
> +
> +static ssize_t kbbl_speed_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return store_u8(&asus->kbbl_rgb.kbbl_set_speed, buf, count);
> +}
> +
> +static ssize_t kbbl_flags_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return show_u8(asus->kbbl_rgb.kbbl_set_flags, buf);
> +}
> +
> +static ssize_t kbbl_flags_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + struct asus_wmi *asus = dev_get_drvdata(dev);
> +
> + return store_u8(&asus->kbbl_rgb.kbbl_set_flags, buf, count);
> +}
> +
> +static ssize_t kbbl_set_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return scnprintf(buf, PAGE_SIZE,
> + "Write to configure RGB keyboard backlight\n");
> +}
> +
> +static int kbbl_rgb_write(struct asus_wmi *asus, int persistent)
> +{
> + int err;
> + u32 retval;
> + u8 speed_byte;
> + u8 mode_byte;
> + u8 speed;
> + u8 mode;
> +
> + speed = asus->kbbl_rgb.kbbl_set_speed;
> + switch (speed) {
> + case 0:
> + default:
> + speed_byte = 0xe1; // slow
> + speed = 0;
> + break;
> + case 1:
> + speed_byte = 0xeb; // medium
> + break;
> + case 2:
> + speed_byte = 0xf5; // fast
> + break;
> + }
> +
> + mode = asus->kbbl_rgb.kbbl_set_mode;
> + switch (mode) {
> + case 0:
> + default:
> + mode_byte = 0x00; // static color
> + mode = 0;
> + break;
> + case 1:
> + mode_byte = 0x01; // breathing
> + break;
> + case 2:
> + mode_byte = 0x02; // color cycle
> + break;
> + case 3:
> + mode_byte = 0x0a; // strobing
> + break;
> + }
> +
> + err = asus_wmi_evaluate_method_3dw(ASUS_WMI_METHODID_DEVS,
> + ASUS_WMI_DEVID_KBD_RGB,
> + (persistent ? 0xb4 : 0xb3) |
> + (mode_byte << 8) |
> + (asus->kbbl_rgb.kbbl_set_red << 16) |
> + (asus->kbbl_rgb.kbbl_set_green << 24),
> + (asus->kbbl_rgb.kbbl_set_blue) |
> + (speed_byte << 8), &retval);
> + if (err) {
> + pr_warn("RGB keyboard device 1, write error: %d\n", err);
> + return err;
> + }
> +
> + if (retval != 1) {
> + pr_warn("RGB keyboard device 1, write error (retval): %x\n",
> + retval);
> + return -EIO;
> + }
> +
> + err = asus_wmi_evaluate_method_3dw(ASUS_WMI_METHODID_DEVS,
> + ASUS_WMI_DEVID_KBD_RGB2,
> + (0xbd) |
> + (asus->kbbl_rgb.kbbl_set_flags << 16) |
> + (persistent ? 0x0100 : 0x0000), 0, &retval);
> + if (err) {
> + pr_warn("RGB keyboard device 2, write error: %d\n", err);
> + return err;
> + }
> +
> + if (retval != 1) {
> + pr_warn("RGB keyboard device 2, write error (retval): %x\n",
> + retval);
> + return -EIO;
> + }
> +
> + asus->kbbl_rgb.kbbl_red = asus->kbbl_rgb.kbbl_set_red;
> + asus->kbbl_rgb.kbbl_green = asus->kbbl_rgb.kbbl_set_green;
> + asus->kbbl_rgb.kbbl_blue = asus->kbbl_rgb.kbbl_set_blue;
> + asus->kbbl_rgb.kbbl_mode = mode;
> + asus->kbbl_rgb.kbbl_speed = speed;
> +
> + return 0;
> +}
> +
> +static ssize_t kbbl_set_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t count)
> +{
> + u8 value;
> + struct asus_wmi *asus;
> + int result;
> +
> + asus = dev_get_drvdata(dev);
> + result = store_u8(&value, buf, count);
> + if (result < 0)
> + return result;
> +
> + if (value == 1)
> + kbbl_rgb_write(asus, 1);
> + else if (value == 2)
> + kbbl_rgb_write(asus, 0);
> +
> + return count;
> +}
> +
> +/* RGB values: 00 .. ff */
> +static DEVICE_ATTR_RW(kbbl_red);
> +static DEVICE_ATTR_RW(kbbl_green);
> +static DEVICE_ATTR_RW(kbbl_blue);
> +
> +/*
> + * Color modes: 0 - static color, 1 - breathing, 2 - color cycle, 3 - strobing
> + */
> +static DEVICE_ATTR_RW(kbbl_mode);
> +
> +/* Speed for modes 1 and 2: 0 - slow, 1 - medium, 2 - fast */
> +static DEVICE_ATTR_RW(kbbl_speed);
> +
> +/*
> + * Enable: 02 - on boot (until module load) | 08 - awake | 20 - sleep
> + * (2a or ff to enable everything)
> + *
> + * Logically 80 would be shutdown, but no visible effects of this option
> + * were observed so far
> + */
> +static DEVICE_ATTR_RW(kbbl_flags);
> +
> +/* Write data: 1 - permanently, 2 - temporarily (reset after reboot) */
> +static DEVICE_ATTR_RW(kbbl_set);
> +
> +static struct attribute *rgbkb_sysfs_attributes[] = {
> + &dev_attr_kbbl_red.attr,
> + &dev_attr_kbbl_green.attr,
> + &dev_attr_kbbl_blue.attr,
> + &dev_attr_kbbl_mode.attr,
> + &dev_attr_kbbl_speed.attr,
> + &dev_attr_kbbl_flags.attr,
> + &dev_attr_kbbl_set.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group kbbl_attribute_group = {
> + .name = "kbbl",
> + .attrs = rgbkb_sysfs_attributes
> +};
> +
> +static int kbbl_rgb_init(struct asus_wmi *asus)
> +{
> + int err;
> +
> + err = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_RGB);
> + if (err) {
> + if (err == -ENODEV)
> + return 0;
> + else
> + return err;
> + }
> +
> + err = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_RGB2);
> + if (err) {
> + if (err == -ENODEV)
> + return 0;
> + else
> + return err;
> + }
> +
> + asus->kbbl_rgb_available = true;
> + return sysfs_create_group(&asus->platform_device->dev.kobj,
> + &kbbl_attribute_group);
> +}
> +
> +static void kbbl_rgb_exit(struct asus_wmi *asus)
> +{
> + if (asus->kbbl_rgb_available) {
> + sysfs_remove_group(&asus->platform_device->dev.kobj,
> + &kbbl_attribute_group);
> + }
> +}
> +
> /* RF *************************************************************************/
>
> /*
> @@ -2230,6 +2554,10 @@ static int asus_wmi_add(struct platform_device *pdev)
> if (err)
> goto fail_leds;
>
> + err = kbbl_rgb_init(asus);
> + if (err)
> + goto fail_rgbkb;
> +
> asus_wmi_get_devstate(asus, ASUS_WMI_DEVID_WLAN, &result);
> if (result & (ASUS_WMI_DSTS_PRESENCE_BIT | ASUS_WMI_DSTS_USER_BIT))
> asus->driver->wlan_ctrl_by_user = 1;
> @@ -2287,6 +2615,8 @@ static int asus_wmi_add(struct platform_device *pdev)
> fail_backlight:
> asus_wmi_rfkill_exit(asus);
> fail_rfkill:
> + kbbl_rgb_exit(asus);
> +fail_rgbkb:
> asus_wmi_led_exit(asus);
> fail_leds:
> fail_hwmon:
> @@ -2307,6 +2637,7 @@ static int asus_wmi_remove(struct platform_device *device)
> asus_wmi_backlight_exit(asus);
> asus_wmi_input_exit(asus);
> asus_wmi_led_exit(asus);
> + kbbl_rgb_exit(asus);
> asus_wmi_rfkill_exit(asus);
> asus_wmi_debugfs_exit(asus);
> asus_wmi_platform_exit(asus);
> diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h
> index a5fe7e68944b..c8c6e939e196 100644
> --- a/include/linux/platform_data/x86/asus-wmi.h
> +++ b/include/linux/platform_data/x86/asus-wmi.h
> @@ -57,6 +57,8 @@
> #define ASUS_WMI_DEVID_KBD_BACKLIGHT 0x00050021
> #define ASUS_WMI_DEVID_LIGHT_SENSOR 0x00050022 /* ?? */
> #define ASUS_WMI_DEVID_LIGHTBAR 0x00050025
> +#define ASUS_WMI_DEVID_KBD_RGB 0x00100056
> +#define ASUS_WMI_DEVID_KBD_RGB2 0x00100057
>
> /* Misc */
> #define ASUS_WMI_DEVID_CAMERA 0x00060013
> --
> 2.17.1
>
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v2 15/18] fpga: dfl: fme: add thermal management support
From: Wu Hao @ 2019-05-08 6:11 UTC (permalink / raw)
To: Moritz Fischer
Cc: atull, linux-fpga, linux-kernel, linux-api, Luwei Kang,
Russ Weight, Xu Yilun, linux-hwmon, linux
In-Reply-To: <20190507183057.GA30078@archbox>
On Tue, May 07, 2019 at 11:30:57AM -0700, Moritz Fischer wrote:
> Please for next round:
>
> +CC linux-hwmon, Guenter etc ...
Thanks a lot for the kindly reminder.
I will make sure linux-hwmon, Guenter cced for the next version patchset.
Thanks
Hao
>
> On Mon, Apr 29, 2019 at 04:55:48PM +0800, Wu Hao wrote:
> > This patch adds support to thermal management private feature for DFL
> > FPGA Management Engine (FME). This private feature driver registers
> > a hwmon for thermal/temperature monitoring (hwmon temp1_input).
> > If hardware automatic throttling is supported by this hardware, then
> > driver also exposes sysfs interfaces under hwmon for thresholds
> > (temp1_alarm/ crit/ emergency), threshold status (temp1_alarm_status/
> > temp1_crit_status) and throttling policy (temp1_alarm_policy).
> >
> > Signed-off-by: Luwei Kang <luwei.kang@intel.com>
> > Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > ---
> > v2: create a dfl_fme_thermal hwmon to expose thermal information.
> > move all sysfs interfaces under hwmon
> > tempareture --> hwmon temp1_input
> > threshold1 --> hwmon temp1_alarm
> > threshold2 --> hwmon temp1_crit
> > trip_threshold --> hwmon temp1_emergency
> > threshold1_status --> hwmon temp1_alarm_status
> > threshold2_status --> hwmon temp1_crit_status
> > threshold1_policy --> hwmon temp1_alarm_policy
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 64 +++++++
> > drivers/fpga/Kconfig | 2 +-
> > drivers/fpga/dfl-fme-main.c | 212 +++++++++++++++++++++++
> > 3 files changed, 277 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index d1aa375..dfbd315 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -44,3 +44,67 @@ Description: Read-only. It returns socket_id to indicate which socket
> > this FPGA belongs to, only valid for integrated solution.
> > User only needs this information, in case standard numa node
> > can't provide correct information.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/name
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. Read this file to get the name of hwmon device, it
> > + supports values:
> > + 'dfl_fme_thermal' - thermal hwmon device name
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns FPGA device temperature in millidegrees
> > + Celsius.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns hardware threshold1 temperature in
> > + millidegrees Celsius. If temperature rises at or above this
> > + threshold, hardware starts 50% or 90% throttling (see
> > + 'temp1_alarm_policy').
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns hardware threshold2 temperature in
> > + millidegrees Celsius. If temperature rises at or above this
> > + threshold, hardware starts 100% throttling.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_emergency
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns hardware trip threshold temperature in
> > + millidegrees Celsius. If temperature rises at or above this
> > + threshold, a fatal event will be triggered to board management
> > + controller (BMC) to shutdown FPGA.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm_status
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. It returns 1 if temperature is currently at or above
> > + hardware threshold1 (see 'temp1_alarm'), otherwise 0.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit_status
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. It returns 1 if temperature is currently at or above
> > + hardware threshold2 (see 'temp1_crit'), otherwise 0.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm_policy
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. Read this file to get the policy of hardware threshold1
> > + (see 'temp1_alarm'). It only supports two values (policies):
> > + 0 - AP2 state (90% throttling)
> > + 1 - AP1 state (50% throttling)
> > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> > index c20445b..a6d7588 100644
> > --- a/drivers/fpga/Kconfig
> > +++ b/drivers/fpga/Kconfig
> > @@ -154,7 +154,7 @@ config FPGA_DFL
> >
> > config FPGA_DFL_FME
> > tristate "FPGA DFL FME Driver"
> > - depends on FPGA_DFL
> > + depends on FPGA_DFL && HWMON
> > help
> > The FPGA Management Engine (FME) is a feature device implemented
> > under Device Feature List (DFL) framework. Select this option to
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 8339ee8..b9a68b8 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -14,6 +14,8 @@
> > * Henry Mitchel <henry.mitchel@intel.com>
> > */
> >
> > +#include <linux/hwmon.h>
> > +#include <linux/hwmon-sysfs.h>
> > #include <linux/kernel.h>
> > #include <linux/module.h>
> > #include <linux/uaccess.h>
> > @@ -217,6 +219,212 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> > .ioctl = fme_hdr_ioctl,
> > };
> >
> > +#define FME_THERM_THRESHOLD 0x8
> > +#define TEMP_THRESHOLD1 GENMASK_ULL(6, 0)
> > +#define TEMP_THRESHOLD1_EN BIT_ULL(7)
> > +#define TEMP_THRESHOLD2 GENMASK_ULL(14, 8)
> > +#define TEMP_THRESHOLD2_EN BIT_ULL(15)
> > +#define TRIP_THRESHOLD GENMASK_ULL(30, 24)
> > +#define TEMP_THRESHOLD1_STATUS BIT_ULL(32) /* threshold1 reached */
> > +#define TEMP_THRESHOLD2_STATUS BIT_ULL(33) /* threshold2 reached */
> > +/* threshold1 policy: 0 - AP2 (90% throttle) / 1 - AP1 (50% throttle) */
> > +#define TEMP_THRESHOLD1_POLICY BIT_ULL(44)
> > +
> > +#define FME_THERM_RDSENSOR_FMT1 0x10
> > +#define FPGA_TEMPERATURE GENMASK_ULL(6, 0)
> > +
> > +#define FME_THERM_CAP 0x20
> > +#define THERM_NO_THROTTLE BIT_ULL(0)
> > +
> > +#define MD_PRE_DEG
> > +
> > +static bool fme_thermal_throttle_support(void __iomem *base)
> > +{
> > + u64 v = readq(base + FME_THERM_CAP);
> > +
> > + return FIELD_GET(THERM_NO_THROTTLE, v) ? false : true;
> > +}
> > +
> > +static umode_t thermal_hwmon_attrs_visible(const void *drvdata,
> > + enum hwmon_sensor_types type,
> > + u32 attr, int channel)
> > +{
> > + const struct dfl_feature *feature = drvdata;
> > +
> > + /* temperature is always supported, and check hardware cap for others */
> > + if (attr == hwmon_temp_input)
> > + return 0444;
> > +
> > + return fme_thermal_throttle_support(feature->ioaddr) ? 0444 : 0;
> > +}
> > +
> > +static int thermal_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + switch (attr) {
> > + case hwmon_temp_input:
> > + v = readq(feature->ioaddr + FME_THERM_RDSENSOR_FMT1);
> > + *val = (long)(FIELD_GET(FPGA_TEMPERATURE, v) * 1000);
> > + break;
> > + case hwmon_temp_alarm:
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > + *val = (long)(FIELD_GET(TEMP_THRESHOLD1, v) * 1000);
> > + break;
> > + case hwmon_temp_crit:
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > + *val = (long)(FIELD_GET(TEMP_THRESHOLD2, v) * 1000);
> > + break;
> > + case hwmon_temp_emergency:
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > + *val = (long)(FIELD_GET(TRIP_THRESHOLD, v) * 1000);
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct hwmon_ops thermal_hwmon_ops = {
> > + .is_visible = thermal_hwmon_attrs_visible,
> > + .read = thermal_hwmon_read,
> > +};
> > +
> > +static const u32 thermal_hwmon_temp_config[] = {
> > + HWMON_T_INPUT | HWMON_T_ALARM | HWMON_T_CRIT | HWMON_T_EMERGENCY,
> > + 0
> > +};
> > +
> > +static const struct hwmon_channel_info hwmon_temp_info = {
> > + .type = hwmon_temp,
> > + .config = thermal_hwmon_temp_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *thermal_hwmon_info[] = {
> > + &hwmon_temp_info,
> > + NULL
> > +};
> > +
> > +static const struct hwmon_chip_info thermal_hwmon_chip_info = {
> > + .ops = &thermal_hwmon_ops,
> > + .info = thermal_hwmon_info,
> > +};
> > +
> > +static ssize_t temp1_alarm_status_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(TEMP_THRESHOLD1_STATUS, v));
> > +}
> > +
> > +static ssize_t temp1_crit_status_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(TEMP_THRESHOLD2_STATUS, v));
> > +}
> > +
> > +static ssize_t temp1_alarm_policy_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(TEMP_THRESHOLD1_POLICY, v));
> > +}
> > +
> > +static DEVICE_ATTR_RO(temp1_alarm_status);
> > +static DEVICE_ATTR_RO(temp1_crit_status);
> > +static DEVICE_ATTR_RO(temp1_alarm_policy);
> > +
> > +static struct attribute *thermal_extra_attrs[] = {
> > + &dev_attr_temp1_alarm_status.attr,
> > + &dev_attr_temp1_crit_status.attr,
> > + &dev_attr_temp1_alarm_policy.attr,
> > + NULL,
> > +};
> > +
> > +static umode_t thermal_extra_attrs_visible(struct kobject *kobj,
> > + struct attribute *attr, int index)
> > +{
> > + struct device *dev = kobj_to_dev(kobj);
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > +
> > + return fme_thermal_throttle_support(feature->ioaddr) ? attr->mode : 0;
> > +}
> > +
> > +static const struct attribute_group thermal_extra_group = {
> > + .attrs = thermal_extra_attrs,
> > + .is_visible = thermal_extra_attrs_visible,
> > +};
> > +__ATTRIBUTE_GROUPS(thermal_extra);
> > +
> > +static int fme_thermal_mgmt_init(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + struct device *hwmon;
> > +
> > + dev_dbg(&pdev->dev, "FME Thermal Management Init.\n");
> > +
> > + /*
> > + * create hwmon to allow userspace monitoring temperature and other
> > + * threshold information.
> > + *
> > + * temp1_alarm -> hardware threshold 1 -> 50% or 90% throttling
> > + * temp1_crit -> hardware threshold 2 -> 100% throttling
> > + * temp1_emergency -> hardware trip_threshold to shutdown FPGA
> > + *
> > + * create device specific sysfs interfaces, e.g. read temp1_alarm_policy
> > + * to understand the actual hardware throttling action (50% vs 90%).
> > + *
> > + * If hardware doesn't support automatic throttling per thresholds,
> > + * then all above sysfs interfaces are not visible except temp1_input
> > + * for temperature.
> > + */
> > + hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
> > + "dfl_fme_thermal", feature,
> > + &thermal_hwmon_chip_info,
> > + thermal_extra_groups);
> > + if (IS_ERR(hwmon)) {
> > + dev_err(&pdev->dev, "Fail to register thermal hwmon\n");
> > + return PTR_ERR(hwmon);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void fme_thermal_mgmt_uinit(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + dev_dbg(&pdev->dev, "FME Thermal Management UInit.\n");
> > +}
> > +
> > +static const struct dfl_feature_id fme_thermal_mgmt_id_table[] = {
> > + {.id = FME_FEATURE_ID_THERMAL_MGMT,},
> > + {0,}
> > +};
> > +
> > +static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > + .init = fme_thermal_mgmt_init,
> > + .uinit = fme_thermal_mgmt_uinit,
> > +};
> > +
> > static struct dfl_feature_driver fme_feature_drvs[] = {
> > {
> > .id_table = fme_hdr_id_table,
> > @@ -227,6 +435,10 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> > .ops = &fme_pr_mgmt_ops,
> > },
> > {
> > + .id_table = fme_thermal_mgmt_id_table,
> > + .ops = &fme_thermal_mgmt_ops,
> > + },
> > + {
> > .ops = NULL,
> > },
> > };
> > --
> > 1.8.3.1
> >
^ permalink raw reply
* Re: [PATCH v2 15/18] fpga: dfl: fme: add thermal management support
From: Wu Hao @ 2019-05-08 6:07 UTC (permalink / raw)
To: Guenter Roeck
Cc: Alan Tull, Moritz Fischer, linux-fpga, linux-kernel, linux-api,
Luwei Kang, Russ Weight, Xu Yilun, Jean Delvare, Linux HWMON List
In-Reply-To: <20190507183536.GB29510@roeck-us.net>
On Tue, May 07, 2019 at 11:35:36AM -0700, Guenter Roeck wrote:
> On Tue, May 07, 2019 at 01:20:52PM -0500, Alan Tull wrote:
> > On Mon, Apr 29, 2019 at 4:13 AM Wu Hao <hao.wu@intel.com> wrote:
> >
> > + The hwmon people
> >
> > >
> > > This patch adds support to thermal management private feature for DFL
> > > FPGA Management Engine (FME). This private feature driver registers
> > > a hwmon for thermal/temperature monitoring (hwmon temp1_input).
> > > If hardware automatic throttling is supported by this hardware, then
> > > driver also exposes sysfs interfaces under hwmon for thresholds
> > > (temp1_alarm/ crit/ emergency), threshold status (temp1_alarm_status/
> > > temp1_crit_status) and throttling policy (temp1_alarm_policy).
> > >
> > > Signed-off-by: Luwei Kang <luwei.kang@intel.com>
> > > Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> > > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > > ---
> > > v2: create a dfl_fme_thermal hwmon to expose thermal information.
> > > move all sysfs interfaces under hwmon
> > > tempareture --> hwmon temp1_input
> > > threshold1 --> hwmon temp1_alarm
> > > threshold2 --> hwmon temp1_crit
> > > trip_threshold --> hwmon temp1_emergency
> > > threshold1_status --> hwmon temp1_alarm_status
> > > threshold2_status --> hwmon temp1_crit_status
> > > threshold1_policy --> hwmon temp1_alarm_policy
>
> You should not write a hwmon driver if you don't want to follow the ABI.
> The implementation will only confuse the sensors command, so what exactly
> is the point ?
>
> More on that below.
Hi Guenter,
Thanks a lot for the review comments. Yes, I should use the standard ABI of
the hwmon. I will fix them in the next version patch.
For thermal hwmon
tempareture --> hwmon temp1_input
threshold1 --> hwmon temp1_alarm ---> temp1_max
threshold2 --> hwmon temp1_crit
trip_threshold --> hwmon temp1_emergency
threshold1_status --> hwmon temp1_alarm_status ---> temp1_max_alarm
threshold2_status --> hwmon temp1_crit_status ---> temp1_crit_alarm
threshold1_policy --> hwmon temp1_alarm_policy ---> temp1_max_policy
and power hwmon
consumed --> hwmon power1_input
threshold1 --> hwmon power1_cap ---> power1_max
threshold2 --> hwmon power1_crit
threshold1_status --> hwmon power1_cap_status ---> power1_max_alarm
threshold2_status --> hwmon power1_crit_status ---> power1_crit_alarm
xeon_limit --> hwmon power1_xeon_limit
fpga_limit --> hwmon power1_fpga_limit
ltr --> hwmon power1_ltr
switch to power1_max in power hwmon to make it aligned with thermal hwmon on
threshold1.
Thanks
Hao
>
> Guenter
>
> > > ---
> > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 64 +++++++
> > > drivers/fpga/Kconfig | 2 +-
> > > drivers/fpga/dfl-fme-main.c | 212 +++++++++++++++++++++++
> > > 3 files changed, 277 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > index d1aa375..dfbd315 100644
> > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > @@ -44,3 +44,67 @@ Description: Read-only. It returns socket_id to indicate which socket
> > > this FPGA belongs to, only valid for integrated solution.
> > > User only needs this information, in case standard numa node
> > > can't provide correct information.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/name
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-Only. Read this file to get the name of hwmon device, it
> > > + supports values:
> > > + 'dfl_fme_thermal' - thermal hwmon device name
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-Only. It returns FPGA device temperature in millidegrees
> > > + Celsius.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-Only. It returns hardware threshold1 temperature in
> > > + millidegrees Celsius. If temperature rises at or above this
> > > + threshold, hardware starts 50% or 90% throttling (see
> > > + 'temp1_alarm_policy').
> > > +
>
> This does not follow the ABI. temp1_alarm is the alarm status, not the alarm
> temperature. The ABI attribute name would be temp1_max.
>
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-Only. It returns hardware threshold2 temperature in
> > > + millidegrees Celsius. If temperature rises at or above this
> > > + threshold, hardware starts 100% throttling.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_emergency
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-Only. It returns hardware trip threshold temperature in
> > > + millidegrees Celsius. If temperature rises at or above this
> > > + threshold, a fatal event will be triggered to board management
> > > + controller (BMC) to shutdown FPGA.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm_status
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-only. It returns 1 if temperature is currently at or above
> > > + hardware threshold1 (see 'temp1_alarm'), otherwise 0.
> > > +
>
> Why not follow the ABI and use temp1_alarm ?
>
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit_status
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-only. It returns 1 if temperature is currently at or above
> > > + hardware threshold2 (see 'temp1_crit'), otherwise 0.
> > > +
>
> Why not follow the ABI and use temp1_crit_alarm ?
>
> > > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm_policy
> > > +Date: April 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <hao.wu@intel.com>
> > > +Description: Read-Only. Read this file to get the policy of hardware threshold1
> > > + (see 'temp1_alarm'). It only supports two values (policies):
> > > + 0 - AP2 state (90% throttling)
> > > + 1 - AP1 state (50% throttling)
> > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> > > index c20445b..a6d7588 100644
> > > --- a/drivers/fpga/Kconfig
> > > +++ b/drivers/fpga/Kconfig
> > > @@ -154,7 +154,7 @@ config FPGA_DFL
> > >
> > > config FPGA_DFL_FME
> > > tristate "FPGA DFL FME Driver"
> > > - depends on FPGA_DFL
> > > + depends on FPGA_DFL && HWMON
> > > help
> > > The FPGA Management Engine (FME) is a feature device implemented
> > > under Device Feature List (DFL) framework. Select this option to
> > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > > index 8339ee8..b9a68b8 100644
> > > --- a/drivers/fpga/dfl-fme-main.c
> > > +++ b/drivers/fpga/dfl-fme-main.c
> > > @@ -14,6 +14,8 @@
> > > * Henry Mitchel <henry.mitchel@intel.com>
> > > */
> > >
> > > +#include <linux/hwmon.h>
> > > +#include <linux/hwmon-sysfs.h>
> > > #include <linux/kernel.h>
> > > #include <linux/module.h>
> > > #include <linux/uaccess.h>
> > > @@ -217,6 +219,212 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> > > .ioctl = fme_hdr_ioctl,
> > > };
> > >
> > > +#define FME_THERM_THRESHOLD 0x8
> > > +#define TEMP_THRESHOLD1 GENMASK_ULL(6, 0)
> > > +#define TEMP_THRESHOLD1_EN BIT_ULL(7)
> > > +#define TEMP_THRESHOLD2 GENMASK_ULL(14, 8)
> > > +#define TEMP_THRESHOLD2_EN BIT_ULL(15)
> > > +#define TRIP_THRESHOLD GENMASK_ULL(30, 24)
> > > +#define TEMP_THRESHOLD1_STATUS BIT_ULL(32) /* threshold1 reached */
> > > +#define TEMP_THRESHOLD2_STATUS BIT_ULL(33) /* threshold2 reached */
> > > +/* threshold1 policy: 0 - AP2 (90% throttle) / 1 - AP1 (50% throttle) */
> > > +#define TEMP_THRESHOLD1_POLICY BIT_ULL(44)
> > > +
> > > +#define FME_THERM_RDSENSOR_FMT1 0x10
> > > +#define FPGA_TEMPERATURE GENMASK_ULL(6, 0)
> > > +
> > > +#define FME_THERM_CAP 0x20
> > > +#define THERM_NO_THROTTLE BIT_ULL(0)
> > > +
> > > +#define MD_PRE_DEG
> > > +
> > > +static bool fme_thermal_throttle_support(void __iomem *base)
> > > +{
> > > + u64 v = readq(base + FME_THERM_CAP);
> > > +
> > > + return FIELD_GET(THERM_NO_THROTTLE, v) ? false : true;
> > > +}
> > > +
> > > +static umode_t thermal_hwmon_attrs_visible(const void *drvdata,
> > > + enum hwmon_sensor_types type,
> > > + u32 attr, int channel)
> > > +{
> > > + const struct dfl_feature *feature = drvdata;
> > > +
> > > + /* temperature is always supported, and check hardware cap for others */
> > > + if (attr == hwmon_temp_input)
> > > + return 0444;
> > > +
> > > + return fme_thermal_throttle_support(feature->ioaddr) ? 0444 : 0;
> > > +}
> > > +
> > > +static int thermal_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > > + u32 attr, int channel, long *val)
> > > +{
> > > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > > + u64 v;
> > > +
> > > + switch (attr) {
> > > + case hwmon_temp_input:
> > > + v = readq(feature->ioaddr + FME_THERM_RDSENSOR_FMT1);
> > > + *val = (long)(FIELD_GET(FPGA_TEMPERATURE, v) * 1000);
> > > + break;
> > > + case hwmon_temp_alarm:
> > > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > > + *val = (long)(FIELD_GET(TEMP_THRESHOLD1, v) * 1000);
>
> This is supposed to return 0 or 1.
>
> > > + break;
> > > + case hwmon_temp_crit:
> > > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > > + *val = (long)(FIELD_GET(TEMP_THRESHOLD2, v) * 1000);
> > > + break;
> > > + case hwmon_temp_emergency:
> > > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > > + *val = (long)(FIELD_GET(TRIP_THRESHOLD, v) * 1000);
> > > + break;
> > > + default:
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static const struct hwmon_ops thermal_hwmon_ops = {
> > > + .is_visible = thermal_hwmon_attrs_visible,
> > > + .read = thermal_hwmon_read,
> > > +};
> > > +
> > > +static const u32 thermal_hwmon_temp_config[] = {
> > > + HWMON_T_INPUT | HWMON_T_ALARM | HWMON_T_CRIT | HWMON_T_EMERGENCY,
> > > + 0
> > > +};
> > > +
> > > +static const struct hwmon_channel_info hwmon_temp_info = {
> > > + .type = hwmon_temp,
> > > + .config = thermal_hwmon_temp_config,
> > > +};
> > > +
> > > +static const struct hwmon_channel_info *thermal_hwmon_info[] = {
> > > + &hwmon_temp_info,
> > > + NULL
> > > +};
> > > +
> > > +static const struct hwmon_chip_info thermal_hwmon_chip_info = {
> > > + .ops = &thermal_hwmon_ops,
> > > + .info = thermal_hwmon_info,
> > > +};
> > > +
> > > +static ssize_t temp1_alarm_status_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > > + u64 v;
> > > +
> > > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(TEMP_THRESHOLD1_STATUS, v));
> > > +}
> > > +
> > > +static ssize_t temp1_crit_status_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > > + u64 v;
> > > +
> > > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(TEMP_THRESHOLD2_STATUS, v));
> > > +}
> > > +
> > > +static ssize_t temp1_alarm_policy_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > > + u64 v;
> > > +
> > > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(TEMP_THRESHOLD1_POLICY, v));
> > > +}
> > > +
> > > +static DEVICE_ATTR_RO(temp1_alarm_status);
> > > +static DEVICE_ATTR_RO(temp1_crit_status);
> > > +static DEVICE_ATTR_RO(temp1_alarm_policy);
> > > +
> > > +static struct attribute *thermal_extra_attrs[] = {
> > > + &dev_attr_temp1_alarm_status.attr,
> > > + &dev_attr_temp1_crit_status.attr,
>
> Why not use standard attributes for the above ?
>
> > > + &dev_attr_temp1_alarm_policy.attr,
> > > + NULL,
> > > +};
> > > +
> > > +static umode_t thermal_extra_attrs_visible(struct kobject *kobj,
> > > + struct attribute *attr, int index)
> > > +{
> > > + struct device *dev = kobj_to_dev(kobj);
> > > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > > +
> > > + return fme_thermal_throttle_support(feature->ioaddr) ? attr->mode : 0;
> > > +}
> > > +
> > > +static const struct attribute_group thermal_extra_group = {
> > > + .attrs = thermal_extra_attrs,
> > > + .is_visible = thermal_extra_attrs_visible,
> > > +};
> > > +__ATTRIBUTE_GROUPS(thermal_extra);
> > > +
> > > +static int fme_thermal_mgmt_init(struct platform_device *pdev,
> > > + struct dfl_feature *feature)
> > > +{
> > > + struct device *hwmon;
> > > +
> > > + dev_dbg(&pdev->dev, "FME Thermal Management Init.\n");
> > > +
> > > + /*
> > > + * create hwmon to allow userspace monitoring temperature and other
> > > + * threshold information.
> > > + *
> > > + * temp1_alarm -> hardware threshold 1 -> 50% or 90% throttling
> > > + * temp1_crit -> hardware threshold 2 -> 100% throttling
> > > + * temp1_emergency -> hardware trip_threshold to shutdown FPGA
> > > + *
> > > + * create device specific sysfs interfaces, e.g. read temp1_alarm_policy
> > > + * to understand the actual hardware throttling action (50% vs 90%).
> > > + *
> > > + * If hardware doesn't support automatic throttling per thresholds,
> > > + * then all above sysfs interfaces are not visible except temp1_input
> > > + * for temperature.
> > > + */
> > > + hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
> > > + "dfl_fme_thermal", feature,
> > > + &thermal_hwmon_chip_info,
> > > + thermal_extra_groups);
> > > + if (IS_ERR(hwmon)) {
> > > + dev_err(&pdev->dev, "Fail to register thermal hwmon\n");
> > > + return PTR_ERR(hwmon);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static void fme_thermal_mgmt_uinit(struct platform_device *pdev,
> > > + struct dfl_feature *feature)
> > > +{
> > > + dev_dbg(&pdev->dev, "FME Thermal Management UInit.\n");
> > > +}
> > > +
> > > +static const struct dfl_feature_id fme_thermal_mgmt_id_table[] = {
> > > + {.id = FME_FEATURE_ID_THERMAL_MGMT,},
> > > + {0,}
> > > +};
> > > +
> > > +static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > > + .init = fme_thermal_mgmt_init,
> > > + .uinit = fme_thermal_mgmt_uinit,
> > > +};
> > > +
> > > static struct dfl_feature_driver fme_feature_drvs[] = {
> > > {
> > > .id_table = fme_hdr_id_table,
> > > @@ -227,6 +435,10 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> > > .ops = &fme_pr_mgmt_ops,
> > > },
> > > {
> > > + .id_table = fme_thermal_mgmt_id_table,
> > > + .ops = &fme_thermal_mgmt_ops,
> > > + },
> > > + {
> > > .ops = NULL,
> > > },
> > > };
> > > --
> > > 1.8.3.1
> > >
^ permalink raw reply
* Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters
From: Aleksa Sarai @ 2019-05-08 0:54 UTC (permalink / raw)
To: Jann Horn
Cc: Andy Lutomirski, Al Viro, Jeff Layton, J. Bruce Fields,
Arnd Bergmann, David Howells, Eric Biederman, Andrew Morton,
Alexei Starovoitov, Kees Cook, Christian Brauner, Tycho Andersen,
David Drysdale, Chanho Min, Oleg Nesterov, Aleksa Sarai,
Linus Torvalds, containers, linux-fsdevel, Linux API
In-Reply-To: <20190506191735.nmzf7kwfh7b6e2tf@yavin>
[-- Attachment #1: Type: text/plain, Size: 3207 bytes --]
On 2019-05-07, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2019-05-06, Jann Horn <jannh@google.com> wrote:
> > On Mon, May 6, 2019 at 6:56 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
> > > The need to be able to scope path resolution of interpreters became
> > > clear with one of the possible vectors used in CVE-2019-5736 (which
> > > most major container runtimes were vulnerable to).
> > >
> > > Naively, it might seem that openat(2) -- which supports path scoping --
> > > can be combined with execveat(AT_EMPTY_PATH) to trivially scope the
> > > binary being executed. Unfortunately, a "bad binary" (usually a symlink)
> > > could be written as a #!-style script with the symlink target as the
> > > interpreter -- which would be completely missed by just scoping the
> > > openat(2). An example of this being exploitable is CVE-2019-5736.
> > >
> > > In order to get around this, we need to pass down to each binfmt_*
> > > implementation the scoping flags requested in execveat(2). In order to
> > > maintain backwards-compatibility we only pass the scoping AT_* flags.
> > >
> > > To avoid breaking userspace (in the exceptionally rare cases where you
> > > have #!-scripts with a relative path being execveat(2)-ed with dfd !=
> > > AT_FDCWD), we only pass dfd down to binfmt_* if any of our new flags are
> > > set in execveat(2).
> >
> > This seems extremely dangerous. I like the overall series, but not this patch.
> >
> > > @@ -1762,6 +1774,12 @@ static int __do_execve_file(int fd, struct filename *filename,
> > >
> > > sched_exec();
> > >
> > > + bprm->flags = flags & (AT_XDEV | AT_NO_MAGICLINKS | AT_NO_SYMLINKS |
> > > + AT_THIS_ROOT);
> > [...]
> > > +#define AT_THIS_ROOT 0x100000 /* - Scope ".." resolution to dirfd (like chroot(2)). */
> >
> > So now what happens if there is a setuid root ELF binary with program
> > interpreter "/lib64/ld-linux-x86-64.so.2" (like /bin/su), and an
> > unprivileged user runs it with execveat(..., AT_THIS_ROOT)? Is that
> > going to let the unprivileged user decide which interpreter the
> > setuid-root process should use? From a high-level perspective, opening
> > the interpreter should be controlled by the program that is being
> > loaded, not by the program that invoked it.
>
> I went a bit nuts with openat_exec(), and I did end up adding it to the
> ELF interpreter lookup (and you're completely right that this is a bad
> idea -- I will drop it from this patch if it's included in the next
> series).
>
> The proposed solutions you give below are much nicer than this patch so
> I can drop it and work on fixing those issues separately.
Another possible solution would be to only allow (for instance)
AT_NO_MAGICLINKS for execveat(2). That way you cannot scope the
resolution but you can block the most concerning cases -- those
involving /proc-related access.
I've posted a v7 with this patch dropped (because we can always add AT_*
flags later in time), but I think having at least NO_MAGICLINKS would be
useful.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters
From: Eric W. Biederman @ 2019-05-08 0:38 UTC (permalink / raw)
To: Jann Horn
Cc: Aleksa Sarai, Andy Lutomirski, Al Viro, Jeff Layton,
J. Bruce Fields, Arnd Bergmann, David Howells, Andrew Morton,
Alexei Starovoitov, Kees Cook, Christian Brauner, Tycho Andersen,
David Drysdale, Chanho Min, Oleg Nesterov, Aleksa Sarai,
Linus Torvalds, containers, linux-fsdevel, Linux API
In-Reply-To: <CAG48ez0-CiODf6UBHWTaog97prx=VAd3HgHvEjdGNz344m1xKw@mail.gmail.com>
Jann Horn <jannh@google.com> writes:
>
> In my opinion, CVE-2019-5736 points out two different problems:
>
> The big problem: The __ptrace_may_access() logic has a special-case
> short-circuit for "introspection" that you can't opt out of;
Once upon a time in a galaxy far far away I fixed a bug where we missing
ptrace_may_access checks on various proc files and systems using selinux
stopped working. At the time selinux did not allow ptrace like access
to yourself. The "introspection" special case was the quick and simple
work-around.
There is nothing fundamental in having the "introspection" special case
except that various lsms have probably grown to depend upon it being
there. I expect without difficulty we could move the check down
into the various lsms. Which would get that check out of the core
kernel code.
Then the special case would the lsms challenge to keep or remove.
Eric
^ permalink raw reply
* Re: [PATCH] uapi: avoid namespace conflict in linux/posix_types.h
From: Joseph Myers @ 2019-05-07 22:50 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-api, linux-arch, netdev, Laura Abbott, Florian Weimer,
Paul Burton, Deepa Dinamani, linux-kernel
In-Reply-To: <20190319165123.3967889-1-arnd@arndb.de>
What happened with this patch (posted 19 March)? I found today that we
can't use Linux 5.1 headers in glibc testing because the namespace issues
are still present in the headers as of the release.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply
* Re: [PATCH v2 16/18] fpga: dfl: fme: add power management support
From: Guenter Roeck @ 2019-05-07 18:36 UTC (permalink / raw)
To: Alan Tull
Cc: Wu Hao, Moritz Fischer, linux-fpga, linux-kernel, linux-api,
Luwei Kang, Xu Yilun, Jean Delvare, Linux HWMON List
In-Reply-To: <CANk1AXQoYW_a_Jz_X_RCQtJpLn++dv8vAn4Yz+w+eRcZhrELGQ@mail.gmail.com>
On Tue, May 07, 2019 at 01:23:33PM -0500, Alan Tull wrote:
> On Mon, Apr 29, 2019 at 4:13 AM Wu Hao <hao.wu@intel.com> wrote:
>
> + hwmon folks
>
> >
> > This patch adds support for power management private feature under
> > FPGA Management Engine (FME). This private feature driver registers
> > a hwmon for power (power1_input), thresholds information, e.g.
> > (power1_cap / crit) and also read-only sysfs interfaces for other
> > power management information. For configuration, user could write
> > threshold values via above power1_cap / crit sysfs interface
> > under hwmon too.
> >
> > Signed-off-by: Luwei Kang <luwei.kang@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > ---
> > v2: create a dfl_fme_power hwmon to expose power sysfs interfaces.
> > move all sysfs interfaces under hwmon
> > consumed --> hwmon power1_input
> > threshold1 --> hwmon power1_cap
> > threshold2 --> hwmon power1_crit
> > threshold1_status --> hwmon power1_cap_status
> > threshold2_status --> hwmon power1_crit_status
> > xeon_limit --> hwmon power1_xeon_limit
> > fpga_limit --> hwmon power1_fpga_limit
> > ltr --> hwmon power1_ltr
Same response as before.
Guenter
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 67 ++++++
> > drivers/fpga/dfl-fme-main.c | 247 +++++++++++++++++++++++
> > 2 files changed, 314 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index dfbd315..e2ba92d 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -52,6 +52,7 @@ Contact: Wu Hao <hao.wu@intel.com>
> > Description: Read-Only. Read this file to get the name of hwmon device, it
> > supports values:
> > 'dfl_fme_thermal' - thermal hwmon device name
> > + 'dfl_fme_power' - power hwmon device name
> >
> > What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
> > Date: April 2019
> > @@ -108,3 +109,69 @@ Description: Read-Only. Read this file to get the policy of hardware threshold1
> > (see 'temp1_alarm'). It only supports two values (policies):
> > 0 - AP2 state (90% throttling)
> > 1 - AP1 state (50% throttling)
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_input
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns current FPGA power consumption in uW.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_cap
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Write. Read this file to get current hardware power
> > + threshold1 in uW. If power consumption rises at or above
> > + this threshold, hardware starts 50% throttling.
> > + Write this file to set current hardware power threshold1 in uW.
> > + As hardware only accepts values in Watts, so input value will
> > + be round down per Watts (< 1 watts part will be discarded).
> > + Write fails with -EINVAL if input parsing fails or input isn't
> > + in the valid range (0 - 127000000 uW).
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_crit
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Write. Read this file to get current hardware power
> > + threshold2 in uW. If power consumption rises at or above
> > + this threshold, hardware starts 90% throttling.
> > + Write this file to set current hardware power threshold2 in uW.
> > + As hardware only accepts values in Watts, so input value will
> > + be round down per Watts (< 1 watts part will be discarded).
> > + Write fails with -EINVAL if input parsing fails or input isn't
> > + in the valid range (0 - 127000000 uW).
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_cap_status
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. It returns 1 if power consumption is currently at or
> > + above hardware threshold1 (see 'power1_cap'), otherwise 0.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_crit_status
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. It returns 1 if power consumption is currently at or
> > + above hardware threshold2 (see 'power1_crit'), otherwise 0.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_xeon_limit
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns power limit for XEON in uW.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_fpga_limit
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns power limit for FPGA in uW.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/power1_ltr
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. Read this file to get current Latency Tolerance
> > + Reporting (ltr) value. This ltr impacts the CPU low power
> > + state in integrated solution.
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index b9a68b8..7005316 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -425,6 +425,249 @@ static void fme_thermal_mgmt_uinit(struct platform_device *pdev,
> > .uinit = fme_thermal_mgmt_uinit,
> > };
> >
> > +#define FME_PWR_STATUS 0x8
> > +#define FME_LATENCY_TOLERANCE BIT_ULL(18)
> > +#define PWR_CONSUMED GENMASK_ULL(17, 0)
> > +
> > +#define FME_PWR_THRESHOLD 0x10
> > +#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
> > +#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
> > +#define PWR_THRESHOLD_MAX 0x7f /* in Watts */
> > +#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
> > +#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
> > +
> > +#define FME_PWR_XEON_LIMIT 0x18
> > +#define XEON_PWR_LIMIT GENMASK_ULL(14, 0) /* in 0.1 Watts */
> > +#define XEON_PWR_EN BIT_ULL(15)
> > +#define FME_PWR_FPGA_LIMIT 0x20
> > +#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0) /* in 0.1 Watts */
> > +#define FPGA_PWR_EN BIT_ULL(15)
> > +
> > +#define PWR_THRESHOLD_MAX_IN_UW (PWR_THRESHOLD_MAX * 1000000)
> > +
> > +static int power_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + switch (attr) {
> > + case hwmon_power_input:
> > + v = readq(feature->ioaddr + FME_PWR_STATUS);
> > + *val = (long)(FIELD_GET(PWR_CONSUMED, v) * 1000000);
> > + break;
> > + case hwmon_power_cap:
> > + v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
> > + *val = (long)(FIELD_GET(PWR_THRESHOLD1, v) * 1000000);
> > + break;
> > + case hwmon_power_crit:
> > + v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
> > + *val = (long)(FIELD_GET(PWR_THRESHOLD2, v) * 1000000);
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int power_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long val)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + int ret = 0;
> > + u64 v;
> > +
> > + if (val < 0 || val > PWR_THRESHOLD_MAX_IN_UW)
> > + return -EINVAL;
> > +
> > + val = val / 1000000;
> > +
> > + mutex_lock(&pdata->lock);
> > +
> > + switch (attr) {
> > + case hwmon_power_cap:
> > + v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
> > + v &= ~PWR_THRESHOLD1;
> > + v |= FIELD_PREP(PWR_THRESHOLD1, val);
> > + writeq(v, feature->ioaddr + FME_PWR_THRESHOLD);
> > + break;
> > + case hwmon_power_crit:
> > + v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
> > + v &= ~PWR_THRESHOLD2;
> > + v |= FIELD_PREP(PWR_THRESHOLD2, val);
> > + writeq(v, feature->ioaddr + FME_PWR_THRESHOLD);
> > + break;
> > + default:
> > + ret = -EOPNOTSUPP;
> > + break;
> > + }
> > +
> > + mutex_unlock(&pdata->lock);
> > +
> > + return ret;
> > +}
> > +
> > +static umode_t power_hwmon_attrs_visible(const void *drvdata,
> > + enum hwmon_sensor_types type,
> > + u32 attr, int channel)
> > +{
> > + switch (attr) {
> > + case hwmon_power_input:
> > + return 0444;
> > + case hwmon_power_cap:
> > + case hwmon_power_crit:
> > + return 0644;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const u32 power_hwmon_config[] = {
> > + HWMON_P_INPUT | HWMON_P_CAP | HWMON_P_CRIT,
> > + 0
> > +};
> > +
> > +static const struct hwmon_channel_info hwmon_pwr_info = {
> > + .type = hwmon_power,
> > + .config = power_hwmon_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *power_hwmon_info[] = {
> > + &hwmon_pwr_info,
> > + NULL
> > +};
> > +
> > +static const struct hwmon_ops power_hwmon_ops = {
> > + .is_visible = power_hwmon_attrs_visible,
> > + .read = power_hwmon_read,
> > + .write = power_hwmon_write,
> > +};
> > +
> > +static const struct hwmon_chip_info power_hwmon_chip_info = {
> > + .ops = &power_hwmon_ops,
> > + .info = power_hwmon_info,
> > +};
> > +
> > +static ssize_t power1_cap_status_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
> > +}
> > +
> > +static ssize_t power1_crit_status_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
> > +}
> > +
> > +static ssize_t power1_xeon_limit_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u16 xeon_limit = 0;
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_PWR_XEON_LIMIT);
> > +
> > + if (FIELD_GET(XEON_PWR_EN, v))
> > + xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit * 100000);
> > +}
> > +
> > +static ssize_t power1_fpga_limit_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u16 fpga_limit = 0;
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_PWR_FPGA_LIMIT);
> > +
> > + if (FIELD_GET(FPGA_PWR_EN, v))
> > + fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit * 100000);
> > +}
> > +
> > +static ssize_t power1_ltr_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_PWR_STATUS);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
> > +}
> > +
> > +static DEVICE_ATTR_RO(power1_cap_status);
> > +static DEVICE_ATTR_RO(power1_crit_status);
> > +static DEVICE_ATTR_RO(power1_xeon_limit);
> > +static DEVICE_ATTR_RO(power1_fpga_limit);
> > +static DEVICE_ATTR_RO(power1_ltr);
> > +
> > +static struct attribute *power_extra_attrs[] = {
> > + &dev_attr_power1_cap_status.attr,
> > + &dev_attr_power1_crit_status.attr,
> > + &dev_attr_power1_xeon_limit.attr,
> > + &dev_attr_power1_fpga_limit.attr,
> > + &dev_attr_power1_ltr.attr,
> > + NULL
> > +};
> > +
> > +ATTRIBUTE_GROUPS(power_extra);
> > +
> > +static int fme_power_mgmt_init(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + struct device *hwmon;
> > +
> > + dev_dbg(&pdev->dev, "FME Power Management Init.\n");
> > +
> > + hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
> > + "dfl_fme_power", feature,
> > + &power_hwmon_chip_info,
> > + power_extra_groups);
> > + if (IS_ERR(hwmon)) {
> > + dev_err(&pdev->dev, "Fail to register power hwmon\n");
> > + return PTR_ERR(hwmon);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void fme_power_mgmt_uinit(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + dev_dbg(&pdev->dev, "FME Power Management UInit.\n");
> > +}
> > +
> > +static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
> > + {.id = FME_FEATURE_ID_POWER_MGMT,},
> > + {0,}
> > +};
> > +
> > +static const struct dfl_feature_ops fme_power_mgmt_ops = {
> > + .init = fme_power_mgmt_init,
> > + .uinit = fme_power_mgmt_uinit,
> > +};
> > +
> > static struct dfl_feature_driver fme_feature_drvs[] = {
> > {
> > .id_table = fme_hdr_id_table,
> > @@ -439,6 +682,10 @@ static void fme_thermal_mgmt_uinit(struct platform_device *pdev,
> > .ops = &fme_thermal_mgmt_ops,
> > },
> > {
> > + .id_table = fme_power_mgmt_id_table,
> > + .ops = &fme_power_mgmt_ops,
> > + },
> > + {
> > .ops = NULL,
> > },
> > };
> > --
> > 1.8.3.1
> >
^ permalink raw reply
* Re: [PATCH v2 15/18] fpga: dfl: fme: add thermal management support
From: Guenter Roeck @ 2019-05-07 18:35 UTC (permalink / raw)
To: Alan Tull
Cc: Wu Hao, Moritz Fischer, linux-fpga, linux-kernel, linux-api,
Luwei Kang, Russ Weight, Xu Yilun, Jean Delvare, Linux HWMON List
In-Reply-To: <CANk1AXR+wRoQB40ZFTAPPp4mmkzZY+03bTwdB24fL0mYNck2Ug@mail.gmail.com>
On Tue, May 07, 2019 at 01:20:52PM -0500, Alan Tull wrote:
> On Mon, Apr 29, 2019 at 4:13 AM Wu Hao <hao.wu@intel.com> wrote:
>
> + The hwmon people
>
> >
> > This patch adds support to thermal management private feature for DFL
> > FPGA Management Engine (FME). This private feature driver registers
> > a hwmon for thermal/temperature monitoring (hwmon temp1_input).
> > If hardware automatic throttling is supported by this hardware, then
> > driver also exposes sysfs interfaces under hwmon for thresholds
> > (temp1_alarm/ crit/ emergency), threshold status (temp1_alarm_status/
> > temp1_crit_status) and throttling policy (temp1_alarm_policy).
> >
> > Signed-off-by: Luwei Kang <luwei.kang@intel.com>
> > Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > ---
> > v2: create a dfl_fme_thermal hwmon to expose thermal information.
> > move all sysfs interfaces under hwmon
> > tempareture --> hwmon temp1_input
> > threshold1 --> hwmon temp1_alarm
> > threshold2 --> hwmon temp1_crit
> > trip_threshold --> hwmon temp1_emergency
> > threshold1_status --> hwmon temp1_alarm_status
> > threshold2_status --> hwmon temp1_crit_status
> > threshold1_policy --> hwmon temp1_alarm_policy
You should not write a hwmon driver if you don't want to follow the ABI.
The implementation will only confuse the sensors command, so what exactly
is the point ?
More on that below.
Guenter
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 64 +++++++
> > drivers/fpga/Kconfig | 2 +-
> > drivers/fpga/dfl-fme-main.c | 212 +++++++++++++++++++++++
> > 3 files changed, 277 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index d1aa375..dfbd315 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -44,3 +44,67 @@ Description: Read-only. It returns socket_id to indicate which socket
> > this FPGA belongs to, only valid for integrated solution.
> > User only needs this information, in case standard numa node
> > can't provide correct information.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/name
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. Read this file to get the name of hwmon device, it
> > + supports values:
> > + 'dfl_fme_thermal' - thermal hwmon device name
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_input
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns FPGA device temperature in millidegrees
> > + Celsius.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns hardware threshold1 temperature in
> > + millidegrees Celsius. If temperature rises at or above this
> > + threshold, hardware starts 50% or 90% throttling (see
> > + 'temp1_alarm_policy').
> > +
This does not follow the ABI. temp1_alarm is the alarm status, not the alarm
temperature. The ABI attribute name would be temp1_max.
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns hardware threshold2 temperature in
> > + millidegrees Celsius. If temperature rises at or above this
> > + threshold, hardware starts 100% throttling.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_emergency
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. It returns hardware trip threshold temperature in
> > + millidegrees Celsius. If temperature rises at or above this
> > + threshold, a fatal event will be triggered to board management
> > + controller (BMC) to shutdown FPGA.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm_status
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. It returns 1 if temperature is currently at or above
> > + hardware threshold1 (see 'temp1_alarm'), otherwise 0.
> > +
Why not follow the ABI and use temp1_alarm ?
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_crit_status
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-only. It returns 1 if temperature is currently at or above
> > + hardware threshold2 (see 'temp1_crit'), otherwise 0.
> > +
Why not follow the ABI and use temp1_crit_alarm ?
> > +What: /sys/bus/platform/devices/dfl-fme.0/hwmon/hwmonX/temp1_alarm_policy
> > +Date: April 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <hao.wu@intel.com>
> > +Description: Read-Only. Read this file to get the policy of hardware threshold1
> > + (see 'temp1_alarm'). It only supports two values (policies):
> > + 0 - AP2 state (90% throttling)
> > + 1 - AP1 state (50% throttling)
> > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> > index c20445b..a6d7588 100644
> > --- a/drivers/fpga/Kconfig
> > +++ b/drivers/fpga/Kconfig
> > @@ -154,7 +154,7 @@ config FPGA_DFL
> >
> > config FPGA_DFL_FME
> > tristate "FPGA DFL FME Driver"
> > - depends on FPGA_DFL
> > + depends on FPGA_DFL && HWMON
> > help
> > The FPGA Management Engine (FME) is a feature device implemented
> > under Device Feature List (DFL) framework. Select this option to
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 8339ee8..b9a68b8 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -14,6 +14,8 @@
> > * Henry Mitchel <henry.mitchel@intel.com>
> > */
> >
> > +#include <linux/hwmon.h>
> > +#include <linux/hwmon-sysfs.h>
> > #include <linux/kernel.h>
> > #include <linux/module.h>
> > #include <linux/uaccess.h>
> > @@ -217,6 +219,212 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> > .ioctl = fme_hdr_ioctl,
> > };
> >
> > +#define FME_THERM_THRESHOLD 0x8
> > +#define TEMP_THRESHOLD1 GENMASK_ULL(6, 0)
> > +#define TEMP_THRESHOLD1_EN BIT_ULL(7)
> > +#define TEMP_THRESHOLD2 GENMASK_ULL(14, 8)
> > +#define TEMP_THRESHOLD2_EN BIT_ULL(15)
> > +#define TRIP_THRESHOLD GENMASK_ULL(30, 24)
> > +#define TEMP_THRESHOLD1_STATUS BIT_ULL(32) /* threshold1 reached */
> > +#define TEMP_THRESHOLD2_STATUS BIT_ULL(33) /* threshold2 reached */
> > +/* threshold1 policy: 0 - AP2 (90% throttle) / 1 - AP1 (50% throttle) */
> > +#define TEMP_THRESHOLD1_POLICY BIT_ULL(44)
> > +
> > +#define FME_THERM_RDSENSOR_FMT1 0x10
> > +#define FPGA_TEMPERATURE GENMASK_ULL(6, 0)
> > +
> > +#define FME_THERM_CAP 0x20
> > +#define THERM_NO_THROTTLE BIT_ULL(0)
> > +
> > +#define MD_PRE_DEG
> > +
> > +static bool fme_thermal_throttle_support(void __iomem *base)
> > +{
> > + u64 v = readq(base + FME_THERM_CAP);
> > +
> > + return FIELD_GET(THERM_NO_THROTTLE, v) ? false : true;
> > +}
> > +
> > +static umode_t thermal_hwmon_attrs_visible(const void *drvdata,
> > + enum hwmon_sensor_types type,
> > + u32 attr, int channel)
> > +{
> > + const struct dfl_feature *feature = drvdata;
> > +
> > + /* temperature is always supported, and check hardware cap for others */
> > + if (attr == hwmon_temp_input)
> > + return 0444;
> > +
> > + return fme_thermal_throttle_support(feature->ioaddr) ? 0444 : 0;
> > +}
> > +
> > +static int thermal_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + switch (attr) {
> > + case hwmon_temp_input:
> > + v = readq(feature->ioaddr + FME_THERM_RDSENSOR_FMT1);
> > + *val = (long)(FIELD_GET(FPGA_TEMPERATURE, v) * 1000);
> > + break;
> > + case hwmon_temp_alarm:
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > + *val = (long)(FIELD_GET(TEMP_THRESHOLD1, v) * 1000);
This is supposed to return 0 or 1.
> > + break;
> > + case hwmon_temp_crit:
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > + *val = (long)(FIELD_GET(TEMP_THRESHOLD2, v) * 1000);
> > + break;
> > + case hwmon_temp_emergency:
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > + *val = (long)(FIELD_GET(TRIP_THRESHOLD, v) * 1000);
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct hwmon_ops thermal_hwmon_ops = {
> > + .is_visible = thermal_hwmon_attrs_visible,
> > + .read = thermal_hwmon_read,
> > +};
> > +
> > +static const u32 thermal_hwmon_temp_config[] = {
> > + HWMON_T_INPUT | HWMON_T_ALARM | HWMON_T_CRIT | HWMON_T_EMERGENCY,
> > + 0
> > +};
> > +
> > +static const struct hwmon_channel_info hwmon_temp_info = {
> > + .type = hwmon_temp,
> > + .config = thermal_hwmon_temp_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *thermal_hwmon_info[] = {
> > + &hwmon_temp_info,
> > + NULL
> > +};
> > +
> > +static const struct hwmon_chip_info thermal_hwmon_chip_info = {
> > + .ops = &thermal_hwmon_ops,
> > + .info = thermal_hwmon_info,
> > +};
> > +
> > +static ssize_t temp1_alarm_status_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(TEMP_THRESHOLD1_STATUS, v));
> > +}
> > +
> > +static ssize_t temp1_crit_status_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(TEMP_THRESHOLD2_STATUS, v));
> > +}
> > +
> > +static ssize_t temp1_alarm_policy_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > + u64 v;
> > +
> > + v = readq(feature->ioaddr + FME_THERM_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(TEMP_THRESHOLD1_POLICY, v));
> > +}
> > +
> > +static DEVICE_ATTR_RO(temp1_alarm_status);
> > +static DEVICE_ATTR_RO(temp1_crit_status);
> > +static DEVICE_ATTR_RO(temp1_alarm_policy);
> > +
> > +static struct attribute *thermal_extra_attrs[] = {
> > + &dev_attr_temp1_alarm_status.attr,
> > + &dev_attr_temp1_crit_status.attr,
Why not use standard attributes for the above ?
> > + &dev_attr_temp1_alarm_policy.attr,
> > + NULL,
> > +};
> > +
> > +static umode_t thermal_extra_attrs_visible(struct kobject *kobj,
> > + struct attribute *attr, int index)
> > +{
> > + struct device *dev = kobj_to_dev(kobj);
> > + struct dfl_feature *feature = dev_get_drvdata(dev);
> > +
> > + return fme_thermal_throttle_support(feature->ioaddr) ? attr->mode : 0;
> > +}
> > +
> > +static const struct attribute_group thermal_extra_group = {
> > + .attrs = thermal_extra_attrs,
> > + .is_visible = thermal_extra_attrs_visible,
> > +};
> > +__ATTRIBUTE_GROUPS(thermal_extra);
> > +
> > +static int fme_thermal_mgmt_init(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + struct device *hwmon;
> > +
> > + dev_dbg(&pdev->dev, "FME Thermal Management Init.\n");
> > +
> > + /*
> > + * create hwmon to allow userspace monitoring temperature and other
> > + * threshold information.
> > + *
> > + * temp1_alarm -> hardware threshold 1 -> 50% or 90% throttling
> > + * temp1_crit -> hardware threshold 2 -> 100% throttling
> > + * temp1_emergency -> hardware trip_threshold to shutdown FPGA
> > + *
> > + * create device specific sysfs interfaces, e.g. read temp1_alarm_policy
> > + * to understand the actual hardware throttling action (50% vs 90%).
> > + *
> > + * If hardware doesn't support automatic throttling per thresholds,
> > + * then all above sysfs interfaces are not visible except temp1_input
> > + * for temperature.
> > + */
> > + hwmon = devm_hwmon_device_register_with_info(&pdev->dev,
> > + "dfl_fme_thermal", feature,
> > + &thermal_hwmon_chip_info,
> > + thermal_extra_groups);
> > + if (IS_ERR(hwmon)) {
> > + dev_err(&pdev->dev, "Fail to register thermal hwmon\n");
> > + return PTR_ERR(hwmon);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void fme_thermal_mgmt_uinit(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + dev_dbg(&pdev->dev, "FME Thermal Management UInit.\n");
> > +}
> > +
> > +static const struct dfl_feature_id fme_thermal_mgmt_id_table[] = {
> > + {.id = FME_FEATURE_ID_THERMAL_MGMT,},
> > + {0,}
> > +};
> > +
> > +static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > + .init = fme_thermal_mgmt_init,
> > + .uinit = fme_thermal_mgmt_uinit,
> > +};
> > +
> > static struct dfl_feature_driver fme_feature_drvs[] = {
> > {
> > .id_table = fme_hdr_id_table,
> > @@ -227,6 +435,10 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> > .ops = &fme_pr_mgmt_ops,
> > },
> > {
> > + .id_table = fme_thermal_mgmt_id_table,
> > + .ops = &fme_thermal_mgmt_ops,
> > + },
> > + {
> > .ops = NULL,
> > },
> > };
> > --
> > 1.8.3.1
> >
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox