The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 2/2] power: supply: bq24735: bring down the noise level
From: Peter Rosin @ 2016-12-21 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Rosin, Sebastian Reichel, Rob Herring, Mark Rutland,
	linux-pm, devicetree
In-Reply-To: <1482355793-16190-1-git-send-email-peda@axentia.se>

If there is no ti,ac-detect-gpios configured, it is normal to
have failed reads of the options register. So, hold back on the
log spamming.

Signed-off-by: Peter Rosin <peda@axentia.se>
---
 drivers/power/supply/bq24735-charger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/power/supply/bq24735-charger.c b/drivers/power/supply/bq24735-charger.c
index d8be81203837..eb0145380def 100644
--- a/drivers/power/supply/bq24735-charger.c
+++ b/drivers/power/supply/bq24735-charger.c
@@ -192,7 +192,7 @@ static bool bq24735_charger_is_present(struct bq24735 *charger)
 
 		ac = bq24735_read_word(charger->client, BQ24735_CHG_OPT);
 		if (ac < 0) {
-			dev_err(&charger->client->dev,
+			dev_dbg(&charger->client->dev,
 				"Failed to read charger options : %d\n",
 				ac);
 			return false;
-- 
2.1.4

^ permalink raw reply related

* [PATCH 0/2] power: supply: bq24735: poll register if no ac-detect gpio
From: Peter Rosin @ 2016-12-21 21:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Rosin, Sebastian Reichel, Rob Herring, Mark Rutland,
	linux-pm, devicetree

Hi!

My patch [1] "power: supply: bq24735-charger: allow chargers to share the
ac-detect gpio" is perhaps a bit hard to digest. And while I still think
some way of sharing the ac-detect gpio is worthwhile, I thought of another
way to solve the problem at hand. Instead of polling a shared gpio, it's
as simple as polling the chip for what it is outputting on the ACOK pin.
And the code is already there! It just needs a tweak to allow this mode of
operation.

This appears to work just fine for me, and the difference is in the noise
since my shared gpio pin happens to be on an expander on the i2c bus, so
I end up with i2c traffic for each poll either way.

Cheers,
peda

[1] https://lkml.org/lkml/2016/12/13/786

Peter Rosin (2):
  power: supply: bq24735: allow polling even if there is no ac-detect
    gpio
  power: supply: bq24735: bring down the noise level

 Documentation/devicetree/bindings/power/supply/ti,bq24735.txt | 4 ++--
 drivers/power/supply/bq24735-charger.c                        | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.1.4

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: add myself as maintainer of fbdev
From: Sudip Mukherjee @ 2016-12-21 21:38 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: akpm, linux-fbdev, linux-kernel, dri-devel, Tomi Valkeinen,
	Daniel Vetter
In-Reply-To: <1915709.xWflxUXmem@amdc3058>

On Wed, Dec 21, 2016 at 05:50:05PM +0100, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Wednesday, December 21, 2016 03:06:55 PM Sudip Mukherjee wrote:
> > On Thu, Dec 15, 2016 at 03:45:47PM +0100, Bartlomiej Zolnierkiewicz wrote:
> > > I would like to help with fbdev maintenance.  I can dedicate some time
> > > for reviewing and handling patches but won't have time for much more.
> > > 
> > 
> > I thought usually someone takes over the maintainer role after
> > proving that he cares for a subsystem for a certain period of time.
> > 
> > I know Bartlomiej is here for a long time, but fbdev??
> 
> I'm not a fbdev expert and I'm not pretending to be one.
> 
> I decided to officially handle patches for fbdev after this
> subsystem has been marked as Orphan in upstream tree (nobody
> has volunteered to take over from Tomi after he sent his
> resignation patch to a list a week earlier).

ohhh.. right... i am seeing that thread now. Sorry for the noise.
I have two months of lkml and fbdev mails pending, will read them in
next two weeks of holidays. Only managed to see few mails that had me
in to or cc.

> 
> I won't have a lot of time even for this activity so help is
> very much welcomed.

time is the only thing which is very scarce nowadays. but please
do ping me if you need any help.

regards
sudip

^ permalink raw reply

* Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in /proc/self/smaps)
From: Dan Williams @ 2016-12-21 21:33 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Ross Zwisler, Nicholas Piggin, Jan Kara,
	Jeff Moyer, Yumei Huang, Michal Hocko, Xiaof Guangrong, KVM list,
	Dave Hansen, Gleb Natapov, linux-nvdimm@lists.01.org, mtosatti,
	linux-kernel@vger.kernel.org, Christoph Hellwig, Linux MM,
	Stefan Hajnoczi, linux-fsdevel, Paolo Bonzini, Andrew Morton
In-Reply-To: <20161221212412.GB4758@dastard>

On Wed, Dec 21, 2016 at 1:24 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Dec 21, 2016 at 08:53:46AM -0800, Dan Williams wrote:
>> On Tue, Dec 20, 2016 at 4:40 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Mon, Dec 19, 2016 at 05:18:40PM -0800, Dan Williams wrote:
>> >> On Mon, Dec 19, 2016 at 5:09 PM, Darrick J. Wong
>> >> <darrick.wong@oracle.com> wrote:
>> >> > On Mon, Dec 19, 2016 at 02:11:49PM -0700, Ross Zwisler wrote:
>> >> >> On Fri, Sep 16, 2016 at 03:54:05PM +1000, Nicholas Piggin wrote:
>> >> >> <>
>> >> >> > Definitely the first step would be your simple preallocated per
>> >> >> > inode approach until it is shown to be insufficient.
>> >> >>
>> >> >> Reviving this thread a few months later...
>> >> >>
>> >> >> Dave, we're interested in taking a serious look at what it would take to get
>> >> >> PMEM_IMMUTABLE working.  Do you still hold the opinion that this is (or could
>> >> >> become, with some amount of work) a workable solution?
>> >> >>
>> >> >> We're happy to do the grunt work for this feature, but we will probably need
>> >> >> guidance from someone with more XFS experience.  With you out on extended leave
>> >> >> the first half of 2017, who would be the best person to ask for this guidance?
>> >> >> Darrick?
>> >> >
>> >> > Yes, probably. :)
>> >> >
>> >> > I think where we left off with this (on the XFS side) is some sort of
>> >> > fallocate mode that would allocate blocks, zero them, and then set the
>> >> > DAX and PMEM_IMMUTABLE on-disk inode flags.  After that, you'd mmap the
>> >> > file and thereby gain the ability to control write persistents behavior
>> >> > without having to worry about fs metadata updates.  As an added plus, I
>> >> > think zeroing the pmem also clears media errors, or something like that.
>> >> >
>> >> > <shrug> Is that a reasonable starting point?  My memory is a little foggy.
>> >> >
>> >> > Hmm, I see Dan just posted something about blockdev fallocate.  I'll go
>> >> > read that.
>> >>
>> >> That's for device-dax, which is basically a poor man's PMEM_IMMUTABLE
>> >> via a character device interface. It's useful for cases where you want
>> >> an entire nvdimm namespace/volume in "no fs-metadata to worry about"
>> >> mode.  But, for sub-allocations of a namespace and support for
>> >> existing tooling, PMEM_IMMUTABLE is much more usable.
>> >
>> > Well sure... but otoh I was thinking that it'd be pretty neat if we
>> > could use the same code regardless of whether the target file was a
>> > dax-device or an xfs file:
>> >
>> > fd = open("<some path>", O_RDWR);
>> > fstat(fd, &statbuf):
>> > fallocate(fd, FALLOC_FL_PMEM_IMMUTABLE, 0, statbuf.st_size);
>> > p = mmap(NULL, statbuf.st_size, PROT_READ | PROT_WRITE, fd, 0);
>> >
>> > *(p + 42) = 0xDEADBEEF;
>> > asm { clflush; } /* or whatever */
>> >
>> > ...so perhaps it would be a good idea to design the fallocate primitive
>> > around "prepare this fd for mmap-only pmem semantics" and let it the
>> > backend do zeroing and inode flag changes as necessary to make it
>> > happen.  We'd need to do some bikeshedding about what the other falloc
>> > flags mean when we're dealing with pmem files and devices, but I think
>> > we should try to keep the userland presentation the same unless there's
>> > a really good reason not to.
>>
>> It would be interesting to use fallocate to size device-dax files...
>
> No. device-dax needs to die, not poison a bunch of existing file and
> block device APIs and behaviours with special snowflakes.  Get
> DAX-enabled filesystems to do what you need, and get rid of this
> ugly, nasty hack.
>

Right, Christoph already killed fallocate for device-dax.

What we're looking for now is the next level of detail on how to get
started on PMEM_IMMUTABLE, as Ross asked a few messages back in this
thread, so we have a reasonable replacement for device-dax.

^ permalink raw reply

* Re: [GIT pull] x86/cache: Updates for 4.10
From: Fenghua Yu @ 2016-12-21 21:31 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner
  Cc: Tony Luck, Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	linux-kernel
In-Reply-To: <20161221210119.GA56305@linux.intel.com>

On Tue, Dec 13, 2016, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 12 Dec 2016, Linus Torvalds wrote:
> On Mon, Dec 12, 2016 at 1:53 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> It looks pretty self-contained (good), but it also looks majorly
>> strange. I will have to think about this. What are the main/expected
>> users?
> - Virtualization so a VM can only trash only the associated part of the
> cash w/o disturbing others
> 
> - Real-Time systems to seperate RT and general workloads.
> 
> - Latency sensitive enterprise workloads
> 
> - In theory this also can be used to protect against cache side channel
> attacks.

Quite a few companies and research groups are interested in the hardware
feature and/or the resctrl file system interface. They do demonstrate
performance and QoS improvements in real-time, VM, and native Linux
by partitioning cache for processes to avoid noisy neighbors.

Thanks.

-Fenghua

^ permalink raw reply

* Re: [PATCH v3 13/15] livepatch: change to a per-task consistency model
From: Josh Poimboeuf @ 2016-12-21 21:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jessica Yu, Jiri Kosina, Miroslav Benes, linux-kernel,
	live-patching, Michael Ellerman, Heiko Carstens, x86,
	linuxppc-dev, linux-s390, Vojtech Pavlik, Jiri Slaby,
	Chris J Arges, Andy Lutomirski, Ingo Molnar, Peter Zijlstra
In-Reply-To: <20161220173246.GC25166@pathway.suse.cz>

On Tue, Dec 20, 2016 at 06:32:46PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 12:08:38, Josh Poimboeuf wrote:
> > Change livepatch to use a basic per-task consistency model.  This is the
> > foundation which will eventually enable us to patch those ~10% of
> > security patches which change function or data semantics.  This is the
> > biggest remaining piece needed to make livepatch more generally useful.
> > 
> > [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
> > 
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> > diff --git a/Documentation/livepatch/livepatch.txt b/Documentation/livepatch/livepatch.txt
> > index 6c43f6e..f87e742 100644
> > --- a/Documentation/livepatch/livepatch.txt
> > +++ b/Documentation/livepatch/livepatch.txt
> 
> I like the description.
> 
> Just a note that we will also need to review the section about
> limitations. But I am not sure that we want to do it in this patch.
> It might open a long discussion on its own.
> 
> > diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> > index 1a5a93c..8e06fe5 100644
> > --- a/include/linux/livepatch.h
> > +++ b/include/linux/livepatch.h
> > @@ -28,18 +28,40 @@
> >  
> >  #include <asm/livepatch.h>
> >  
> > +/* task patch states */
> > +#define KLP_UNDEFINED	-1
> > +#define KLP_UNPATCHED	 0
> > +#define KLP_PATCHED	 1
> > +
> >  /**
> >   * struct klp_func - function structure for live patching
> >   * @old_name:	name of the function to be patched
> >   * @new_func:	pointer to the patched function code
> >   * @old_sympos: a hint indicating which symbol position the old function
> >   *		can be found (optional)
> > + * @immediate:  patch the func immediately, bypassing backtrace safety checks
> 
> There are more checks possible. I would use the same description
> as for klp_object.

Agreed.

> >   * @old_addr:	the address of the function being patched
> >   * @kobj:	kobject for sysfs resources
> >   * @stack_node:	list node for klp_ops func_stack list
> >   * @old_size:	size of the old function
> >   * @new_size:	size of the new function
> >   * @patched:	the func has been added to the klp_ops list
> > + * @transition:	the func is currently being applied or reverted
> > + *
> > @@ -86,6 +110,7 @@ struct klp_object {
> >   * struct klp_patch - patch structure for live patching
> >   * @mod:	reference to the live patch module
> >   * @objs:	object entries for kernel objects to be patched
> > + * @immediate:  patch all funcs immediately, bypassing safety mechanisms
> >   * @list:	list node for global list of registered patches
> >   * @kobj:	kobject for sysfs resources
> >   * @enabled:	the patch is enabled (but operation may be incomplete)
> 
> [...]
> 
> > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > index fc160c6..22c0c01 100644
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -424,7 +477,10 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
> >  		goto err;
> >  	}
> >  
> > -	if (enabled) {
> > +	if (patch == klp_transition_patch) {
> > +		klp_reverse_transition();
> > +		mod_delayed_work(system_wq, &klp_transition_work, 0);
> 
> I would put this mod_delayed_work() into klp_reverse_transition().
> Also I would put that schedule_delayed_work() into
> klp_try_complete_transition().
> 
> If I did not miss anything, it will allow to move the
> klp_transition_work code to transition.c where it logically
> belongs.

Makes sense, I'll see if I can move all the klp_transition_work code to
transition.c.

> > +	} else if (enabled) {
> >  		ret = __klp_enable_patch(patch);
> >  		if (ret)
> >  			goto err;
> 
> [...]
> 
> > diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> > index 5efa262..e79ebb5 100644
> > --- a/kernel/livepatch/patch.c
> > +++ b/kernel/livepatch/patch.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/bug.h>
> >  #include <linux/printk.h>
> >  #include "patch.h"
> > +#include "transition.h"
> >  
> >  static LIST_HEAD(klp_ops);
> >  
> > @@ -54,15 +55,53 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> >  {
> >  	struct klp_ops *ops;
> >  	struct klp_func *func;
> > +	int patch_state;
> >  
> >  	ops = container_of(fops, struct klp_ops, fops);
> >  
> >  	rcu_read_lock();
> > +
> >  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
> >  				      stack_node);
> > -	if (WARN_ON_ONCE(!func))
> > +
> > +	if (!func)
> >  		goto unlock;
> 
> Why do you removed the WARN_ON_ONCE(), please?
> 
> We still add the function on the stack before registering
> the ftrace handler. Also we unregister the ftrace handler
> before removing the the last entry from the stack.
> 
> AFAIK, unregister_ftrace_function() calls rcu_synchronize()'
> to make sure that no-one is inside the handler once finished.
> Mirek knows more about it.

Hm, this is news to me.  Mirek, please share :-)

> If this is not true, we have a problem. For example,
> we call kfree(ops) after unregister_ftrace_function();

Agreed.

> BTW: I thought that this change was really needed because of
> klp_try_complete_transition(). But I think that the WARN
> could and should stay after all. See below.
> 
> 
> > +	/*
> > +	 * Enforce the order of the ops->func_stack and func->transition reads.
> > +	 * The corresponding write barrier is in __klp_enable_patch().
> > +	 */
> > +	smp_rmb();
> > +
> > +	if (unlikely(func->transition)) {
> > +
> > +		/*
> > +		 * Enforce the order of the func->transition and
> > +		 * current->patch_state reads.  Otherwise we could read an
> > +		 * out-of-date task state and pick the wrong function.  The
> > +		 * corresponding write barriers are in klp_init_transition()
> > +		 * and __klp_disable_patch().
> > +		 */
> > +		smp_rmb();
> > +
> > +		patch_state = current->patch_state;
> > +
> > +		WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > +
> > +		if (patch_state == KLP_UNPATCHED) {
> > +			/*
> > +			 * Use the previously patched version of the function.
> > +			 * If no previous patches exist, use the original
> > +			 * function.
> 
> s/use the original/continue with the original/  ?

Ok.

> > +			 */
> > +			func = list_entry_rcu(func->stack_node.next,
> > +					      struct klp_func, stack_node);
> > +
> > +			if (&func->stack_node == &ops->func_stack)
> > +				goto unlock;
> > +		}
> > +	}
> > +
> >  	klp_arch_set_pc(regs, (unsigned long)func->new_func);
> >  unlock:
> >  	rcu_read_unlock();
> > @@ -211,3 +250,12 @@ int klp_patch_object(struct klp_object *obj)
> >  
> >  	return 0;
> >  }
> > +
> > +void klp_unpatch_objects(struct klp_patch *patch)
> > +{
> > +	struct klp_object *obj;
> > +
> > +	klp_for_each_object(patch, obj)
> > +		if (obj->patched)
> > +			klp_unpatch_object(obj);
> > +}
> > --- /dev/null
> > +++ b/kernel/livepatch/transition.c
> > @@ -0,0 +1,479 @@
> > +/*
> > + * transition.c - Kernel Live Patching transition functions
> > + *
> > + * Copyright (C) 2015-2016 Josh Poimboeuf <jpoimboe@redhat.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/stacktrace.h>
> > +#include "patch.h"
> > +#include "transition.h"
> > +#include "../sched/sched.h"
> 
> Is this acceptable for the scheduler guys? 

I discussed the use of task_rq_lock() with Peter Zijlstra on IRC and he
seemed to think it was ok.  Peter, please speak up if you disagree :-)

> > +#define MAX_STACK_ENTRIES 100
> > +
> > +struct klp_patch *klp_transition_patch;
> > +
> > +static int klp_target_state = KLP_UNDEFINED;
> > +
> > +/* called from copy_process() during fork */
> > +void klp_copy_process(struct task_struct *child)
> > +{
> > +	child->patch_state = current->patch_state;
> > +
> > +	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
> > +}
> > +
> > +/*
> > + * klp_update_patch_state() - change the patched state of a task
> > + * @task:	The task to change
> > + *
> > + * Switches the patched state of the task to the set of functions in the target
> > + * patch state.
> > + */
> 
> Please, add here some warning. Something like:
> 
>  * This function must never be called in parallel with
>  * klp_ftrace_handler(). Otherwise, the handler might do random
>  * decisions and break the consistency.
>  *
>  * By other words, call this function only by the @task itself
>  * or make sure that it is not running.

Yeah, I'll add a comment here.  This goes back to our discussion from
last time:

  https://lkml.kernel.org/r/20160504172517.tdatoj2nlkqwyd4g@treble

> > +void klp_update_patch_state(struct task_struct *task)
> > +{
> > +	/*
> > +	 * The synchronize_rcu() call in klp_try_complete_transition() ensures
> > +	 * this critical section completes before the global patch transition
> > +	 * is considered complete so we don't have spurious patch_state updates
> > +	 * afterwards.
> > +	 */
> > +	rcu_read_lock();
> > +
> > +	/*
> > +	 * This test_and_clear_tsk_thread_flag() call also serves as a read
> > +	 * barrier to enforce the order of the TIF_PATCH_PENDING and
> > +	 * klp_target_state reads.  The corresponding write barriers are in
> > +	 * __klp_disable_patch() and klp_reverse_transition().
> > +	 */
> > +	if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
> > +		task->patch_state = READ_ONCE(klp_target_state);
> > +
> > +	rcu_read_unlock();
> > +}
> > +
> > +/*
> > + * Initialize the global target patch state and all tasks to the initial patch
> > + * state, and initialize all function transition states to true in preparation
> > + * for patching or unpatching.
> > + */
> > +void klp_init_transition(struct klp_patch *patch, int state)
> > +{
> > +	struct task_struct *g, *task;
> > +	unsigned int cpu;
> > +	struct klp_object *obj;
> > +	struct klp_func *func;
> > +	int initial_state = !state;
> > +
> > +	WARN_ON_ONCE(klp_target_state != KLP_UNDEFINED);
> > +
> > +	klp_transition_patch = patch;
> > +
> > +	/*
> > +	 * Set the global target patch state which tasks will switch to.  This
> > +	 * has no effect until the TIF_PATCH_PENDING flags get set later.
> > +	 */
> > +	klp_target_state = state;
> > +
> > +	/*
> > +	 * If the patch can be applied or reverted immediately, skip the
> > +	 * per-task transitions.
> > +	 */
> > +	if (patch->immediate)
> > +		return;
> > +
> > +	/*
> > +	 * Initialize all tasks to the initial patch state to prepare them for
> > +	 * switching to the target state.
> > +	 */
> > +	read_lock(&tasklist_lock);
> > +	for_each_process_thread(g, task) {
> > +		WARN_ON_ONCE(task->patch_state != KLP_UNDEFINED);
> > +		task->patch_state = initial_state;
> > +	}
> > +	read_unlock(&tasklist_lock);
> > +
> > +	/*
> > +	 * Ditto for the idle "swapper" tasks.
> > +	 */
> > +	get_online_cpus();
> > +	for_each_online_cpu(cpu) {
> > +		task = idle_task(cpu);
> > +		WARN_ON_ONCE(task->patch_state != KLP_UNDEFINED);
> > +		task->patch_state = initial_state;
> > +	}
> > +	put_online_cpus();
> 
> We allow to add/remove CPUs here. I am afraid that we will also need
> to add a cpu coming/going handler that will set the task->patch_state
> the right way. We must not set the klp_target_state until all ftrace
> handlers are ready.

What if we instead just change the above to use for_each_possible_cpu()?
We could do the same in klp_complete_transition().

> > +	/*
> > +	 * Enforce the order of the task->patch_state initializations and the
> > +	 * func->transition updates to ensure that, in the enable path,
> > +	 * klp_ftrace_handler() doesn't see a func in transition with a
> > +	 * task->patch_state of KLP_UNDEFINED.
> > +	 */
> > +	smp_wmb();
> > +
> > +	/*
> > +	 * Set the func transition states so klp_ftrace_handler() will know to
> > +	 * switch to the transition logic.
> > +	 *
> > +	 * When patching, the funcs aren't yet in the func_stack and will be
> > +	 * made visible to the ftrace handler shortly by the calls to
> > +	 * klp_patch_object().
> > +	 *
> > +	 * When unpatching, the funcs are already in the func_stack and so are
> > +	 * already visible to the ftrace handler.
> > +	 */
> > +	klp_for_each_object(patch, obj)
> > +		klp_for_each_func(obj, func)
> > +			func->transition = true;
> > +}
> > +
> > +/*
> > + * Start the transition to the specified target patch state so tasks can begin
> > + * switching to it.
> > + */
> > +void klp_start_transition(void)
> > +{
> > +	struct task_struct *g, *task;
> > +	unsigned int cpu;
> > +
> > +	WARN_ON_ONCE(klp_target_state == KLP_UNDEFINED);
> > +
> > +	pr_notice("'%s': %s...\n", klp_transition_patch->mod->name,
> > +		  klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
> > +
> > +	/*
> > +	 * If the patch can be applied or reverted immediately, skip the
> > +	 * per-task transitions.
> > +	 */
> > +	if (klp_transition_patch->immediate)
> > +		return;
> > +
> > +	/*
> > +	 * Mark all normal tasks as needing a patch state update.  As they pass
> > +	 * through the syscall barrier they'll switch over to the target state
> > +	 * (unless we switch them in klp_try_complete_transition() first).
> > +	 */
> > +	read_lock(&tasklist_lock);
> > +	for_each_process_thread(g, task)
> > +		set_tsk_thread_flag(task, TIF_PATCH_PENDING);
> 
> This is called also from klp_reverse_transition(). We should set it
> only when the task need migration. Also we should clear it when
> the task is in the right state already.
> 
> It is not only optimization. It actually solves a race between
> klp_complete_transition() and klp_update_patch_state(), see below.

I agree about the race, but if I did:

	for_each_process_thread(g, task) {
		if (task->patch_state != klp_target_state)
			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
		else
			clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
	}

It would still leave a small window where TIF_PATCH_PENDING gets set for
an already patched task, if klp_update_patch_state() is running at the
same time.

See below for another solution.

> > +	read_unlock(&tasklist_lock);
> > +
> > +	/*
> > +	 * Ditto for the idle "swapper" tasks, though they never cross the
> > +	 * syscall barrier.  Instead they switch over in cpu_idle_loop().
> > +	 */
> > +	get_online_cpus();
> > +	for_each_online_cpu(cpu)
> > +		set_tsk_thread_flag(idle_task(cpu), TIF_PATCH_PENDING);
> > +	put_online_cpus();
> 
> Also this stage need to be somehow handled by CPU coming/going
> handlers.

Here I think we could automatically switch any offline CPUs' idle tasks.
And something similar in klp_try_complete_transition().

> > +}
> > +
> > +/*
> > + * The transition to the target patch state is complete.  Clean up the data
> > + * structures.
> > + */
> > +void klp_complete_transition(void)
> > +{
> > +	struct klp_object *obj;
> > +	struct klp_func *func;
> > +	struct task_struct *g, *task;
> > +	unsigned int cpu;
> > +
> > +	if (klp_transition_patch->immediate)
> > +		goto done;
> > +
> > +	klp_for_each_object(klp_transition_patch, obj)
> > +		klp_for_each_func(obj, func)
> > +			func->transition = false;
> 
> We should call rcu_synchronize() here. Otherwise, there
> might be a race, see below:
> 
> CPU1					CPU2
> 
> klp_ftrace_handler()
>   if (unlikely(func->transition))
> 	// still true
> 
> 					klp_complete_transition()
> 					  func->transition = false;
> 					  task->patch_state =
> 					      KLP_UNDEFINED;
> 
>      patch_state = current->patch_state;
> 
>      WARN_ON(patch_state == KLP_UNDEFINED);
> 
> BANG!: We print the warning.

This shouldn't be possible because klp_try_complete_transition() calls
rcu_synchronize() before calling klp_complete_transition().  So by the
time klp_complete_transition() is called, the ftrace handler can no
longer see the affected func.  See the comment for rcu_synchronize() in
klp_try_complete_transition().

> Note that that smp_wmb() is enough in klp_init_transition()
> but it is not enough here. We need to wait longer once
> someone might be inside the if (true) code.
> 
> > +	read_lock(&tasklist_lock);
> > +	for_each_process_thread(g, task) {
> > +		clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> > +		task->patch_state = KLP_UNDEFINED;
> > +	}
> > +	read_unlock(&tasklist_lock);
> > +
> > +	get_online_cpus();
> > +	for_each_online_cpu(cpu) {
> > +		task = idle_task(cpu);
> > +		clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> 
> If TIF_PATCH_PENDING flag is set here it means that
> klp_update_patch_state() might get triggered and it might
> put wrong value into task->patch_state.
> 
> We must make sure that all task have this cleared before
> calling this function. This is another reason why
> klp_init_transition() should set the flag only when
> transition is needed.
> 
> We should only check the state here.
> 
> It still might make sense to clear it when it is set wrongly.
> But the question is if it is really safe to continue. I am
> afraid that it is not. It would mean that the consistency
> model is broken and we are in strange state.

As I mentioned above, with your proposal I think there could still be a
task with a spurious set TIF_PATCH_PENDING at this point.

Maybe instead we should clear all the TIF_PATCH_PENDING flags before the
synchronize_rcu() in klp_try_complete_transition().

> > +		task->patch_state = KLP_UNDEFINED;
> > +	}
> > +	put_online_cpus();
> > +
> > +done:
> > +	klp_target_state = KLP_UNDEFINED;
> > +	klp_transition_patch = NULL;
> > +}
> 
> [...]
> 
> > +
> > +/*
> > + * Try to switch all remaining tasks to the target patch state by walking the
> > + * stacks of sleeping tasks and looking for any to-be-patched or
> > + * to-be-unpatched functions.  If such functions are found, the task can't be
> > + * switched yet.
> > + *
> > + * If any tasks are still stuck in the initial patch state, schedule a retry.
> > + */
> > +bool klp_try_complete_transition(void)
> > +{
> > +	unsigned int cpu;
> > +	struct task_struct *g, *task;
> > +	bool complete = true;
> > +
> > +	WARN_ON_ONCE(klp_target_state == KLP_UNDEFINED);
> > +
> > +	/*
> > +	 * If the patch can be applied or reverted immediately, skip the
> > +	 * per-task transitions.
> > +	 */
> > +	if (klp_transition_patch->immediate)
> > +		goto success;
> > +
> > +	/*
> > +	 * Try to switch the tasks to the target patch state by walking their
> > +	 * stacks and looking for any to-be-patched or to-be-unpatched
> > +	 * functions.  If such functions are found on a stack, or if the stack
> > +	 * is deemed unreliable, the task can't be switched yet.
> > +	 *
> > +	 * Usually this will transition most (or all) of the tasks on a system
> > +	 * unless the patch includes changes to a very common function.
> > +	 */
> > +	read_lock(&tasklist_lock);
> > +	for_each_process_thread(g, task)
> > +		if (!klp_try_switch_task(task))
> > +			complete = false;
> > +	read_unlock(&tasklist_lock);
> > +
> > +	/*
> > +	 * Ditto for the idle "swapper" tasks.
> > +	 */
> > +	get_online_cpus();
> > +	for_each_online_cpu(cpu)
> > +		if (!klp_try_switch_task(idle_task(cpu)))
> > +			complete = false;
> > +	put_online_cpus();
> > +
> > +	/*
> > +	 * Some tasks weren't able to be switched over.  Try again later and/or
> > +	 * wait for other methods like syscall barrier switching.
> > +	 */
> > +	if (!complete)
> > +		return false;
> > +
> > +success:
> > +
> > +	/*
> > +	 * When unpatching, all tasks have transitioned to KLP_UNPATCHED so we
> > +	 * can now remove the new functions from the func_stack.
> > +	 */
> > +	if (klp_target_state == KLP_UNPATCHED)
> > +		klp_unpatch_objects(klp_transition_patch);
> > +
> > +	/*
> > +	 * Wait for all RCU read-side critical sections to complete.
> > +	 *
> > +	 * This has two purposes:
> > +	 *
> > +	 * 1) Ensure all existing critical sections in klp_update_patch_state()
> > +	 *    complete, so task->patch_state won't be unexpectedly updated
> > +	 *    later.
> 
> We should not be here if anyone still might be in klp_update_patch_state().

Depends on our discussion about conditionally setting TIF_PATCH_PENDING.

> 
> > +	 *
> > +	 * 2) When unpatching, don't allow any existing instances of
> > +	 *    klp_ftrace_handler() to access any obsolete funcs before we reset
> > +	 *    the func transition states to false.  Otherwise the handler may
> > +	 *    see the deleted "new" func, see that it's not in transition, and
> > +	 *    wrongly pick the new version of the function.
> > +	 */
> 
> This makes sense but it too me long time to understand. I wonder if
> this might be better:
> 
> 	/*
> 	 * Make sure that the function is removed from ops->func_stack
> 	 * before we clear func->transition. Otherwise the handler may
> 	 * pick the wrong version.
> 	 */

Sounds good.

> And I would call this only when the patch is being removed
> 
> 	if (klp_target_state = KLP_UNPATCHED)
> 		synchronize_rcu();

Depends on our discussion about conditionally setting TIF_PATCH_PENDING.

> I think that this was the reason to remove WARN_ON_ONCE(!func)
> in klp_ftrace_handler(). But this is not related. If this was
> the last entry in the list, we removed the ftrace_handler
> before removing the last entry. And unregister_ftrace_function()
> calls rcu_synchronize() to prevent calling the handler later.
> 
> 
> > +	synchronize_rcu();
> > +
> > +	pr_notice("'%s': %s complete\n", klp_transition_patch->mod->name,
> > +		  klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
> > +
> > +	/* we're done, now cleanup the data structures */
> > +	klp_complete_transition();
> > +
> > +	return true;
> > +}
> > +
> > +/*
> > + * This function can be called in the middle of an existing transition to
> > + * reverse the direction of the target patch state.  This can be done to
> > + * effectively cancel an existing enable or disable operation if there are any
> > + * tasks which are stuck in the initial patch state.
> > + */
> > +void klp_reverse_transition(void)
> > +{
> > +	klp_transition_patch->enabled = !klp_transition_patch->enabled;
> > +
> > +	klp_target_state = !klp_target_state;
> > +
> > +	/*
> > +	 * Enforce the order of the write to klp_target_state above and the
> > +	 * TIF_PATCH_PENDING writes in klp_start_transition() to ensure that
> > +	 * klp_update_patch_state() doesn't set a wrong task->patch_state.
> > +	 */
> > +	smp_wmb();
> 
> I would call rcu_synchronize() here to make sure that
> klp_update_patch_state() calls will not set
> an outdated task->patch_state.
> 
> Note that smp_wmb() is not enough. We do not check TIF_PATCH_PENDING
> in klp_try_switch_task(). There is a tiny race:
> 
> CPU1					CPU2
> 
> klp_update_patch_state()
> 
> 	if (test_and clear(task, TIF)
> 	     READ_ONCE(klp_target_state);
> 
> 					mutex_lock(klp_lock);
> 
> 					klp_reverse_transition()
> 					  klp_target_state =
> 					      !klp_target_state;
> 
> 					  klp_start_transition()
> 
> 					mutex_unlock(klp_lock);
> 
> 					 <switch to another process>
> 
> 					 klp_transition_work_fn()
> 					   mutex_lock(klp_lock);
> 					   klp_try_complete_transition()
> 					     klp_try_switch_task()
> 					       if (task->patch_state ==
> 						   klp_target_state)
> 						  return true;
> 
> 	    task->patch_state = <outdated_value>;
> 
> 	 klp_ftrace_handler()
> 
> BANG: klp_ftrace_handler() will use wrong implementation according
>       to the outdated task->patch_state. At the same time,
>       klp_transition() is not blocked by the task because it thinks
>       that it has a correct state.

Good find!

> > +
> > +	klp_start_transition();
> > +}
> > +
> > diff --git a/samples/livepatch/livepatch-sample.c b/samples/livepatch/livepatch-sample.c
> > index e34f871..bb61c65 100644
> > --- a/samples/livepatch/livepatch-sample.c
> > +++ b/samples/livepatch/livepatch-sample.c
> > @@ -17,6 +17,8 @@
> >   * along with this program; if not, see <http://www.gnu.org/licenses/>.
> >   */
> >  
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> >  #include <linux/module.h>
> >  #include <linux/kernel.h>
> >  #include <linux/livepatch.h>
> > @@ -69,6 +71,11 @@ static int livepatch_init(void)
> >  {
> >  	int ret;
> >  
> > +	if (!klp_have_reliable_stack() && !patch.immediate) {
> > +		pr_notice("disabling consistency model!\n");
> > +		patch.immediate = true;
> > +	}
> 
> I am scared to have this in the sample module. It makes sense
> to use the consistency model even for immediate patches because
> it allows to remove them. But this must not be used for patches
> that really require the consistency model. We should add
> a big fat warning at least.

I did this so that the sample module would still work for non-x86_64
arches, for which there's currently no way to patch kthreads.

Notice I did add a warning:

  pr_notice("disabling consistency model!\n");

Is the warning not fat enough?

> > +
> >  	ret = klp_register_patch(&patch);
> >  	if (ret)
> >  		return ret;
> 
> I like the patch. All the problems that I found look solvable.
> I think that we are on the right way.

Thank you for the excellent review!

-- 
Josh

^ permalink raw reply

* Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in /proc/self/smaps)
From: Dave Chinner @ 2016-12-21 21:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Darrick J. Wong, Ross Zwisler, Nicholas Piggin, Jan Kara,
	Jeff Moyer, Yumei Huang, Michal Hocko, Xiaof Guangrong, KVM list,
	Dave Hansen, Gleb Natapov, linux-nvdimm@lists.01.org, mtosatti,
	linux-kernel@vger.kernel.org, Christoph Hellwig, Linux MM,
	Stefan Hajnoczi, linux-fsdevel, Paolo Bonzini, Andrew Morton
In-Reply-To: <CAPcyv4g3sgQ=Lh99RP7V5g-okgdC=BaeW8oLtHRfwU1gmh07=A@mail.gmail.com>

On Wed, Dec 21, 2016 at 08:53:46AM -0800, Dan Williams wrote:
> On Tue, Dec 20, 2016 at 4:40 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Dec 19, 2016 at 05:18:40PM -0800, Dan Williams wrote:
> >> On Mon, Dec 19, 2016 at 5:09 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Dec 19, 2016 at 02:11:49PM -0700, Ross Zwisler wrote:
> >> >> On Fri, Sep 16, 2016 at 03:54:05PM +1000, Nicholas Piggin wrote:
> >> >> <>
> >> >> > Definitely the first step would be your simple preallocated per
> >> >> > inode approach until it is shown to be insufficient.
> >> >>
> >> >> Reviving this thread a few months later...
> >> >>
> >> >> Dave, we're interested in taking a serious look at what it would take to get
> >> >> PMEM_IMMUTABLE working.  Do you still hold the opinion that this is (or could
> >> >> become, with some amount of work) a workable solution?
> >> >>
> >> >> We're happy to do the grunt work for this feature, but we will probably need
> >> >> guidance from someone with more XFS experience.  With you out on extended leave
> >> >> the first half of 2017, who would be the best person to ask for this guidance?
> >> >> Darrick?
> >> >
> >> > Yes, probably. :)
> >> >
> >> > I think where we left off with this (on the XFS side) is some sort of
> >> > fallocate mode that would allocate blocks, zero them, and then set the
> >> > DAX and PMEM_IMMUTABLE on-disk inode flags.  After that, you'd mmap the
> >> > file and thereby gain the ability to control write persistents behavior
> >> > without having to worry about fs metadata updates.  As an added plus, I
> >> > think zeroing the pmem also clears media errors, or something like that.
> >> >
> >> > <shrug> Is that a reasonable starting point?  My memory is a little foggy.
> >> >
> >> > Hmm, I see Dan just posted something about blockdev fallocate.  I'll go
> >> > read that.
> >>
> >> That's for device-dax, which is basically a poor man's PMEM_IMMUTABLE
> >> via a character device interface. It's useful for cases where you want
> >> an entire nvdimm namespace/volume in "no fs-metadata to worry about"
> >> mode.  But, for sub-allocations of a namespace and support for
> >> existing tooling, PMEM_IMMUTABLE is much more usable.
> >
> > Well sure... but otoh I was thinking that it'd be pretty neat if we
> > could use the same code regardless of whether the target file was a
> > dax-device or an xfs file:
> >
> > fd = open("<some path>", O_RDWR);
> > fstat(fd, &statbuf):
> > fallocate(fd, FALLOC_FL_PMEM_IMMUTABLE, 0, statbuf.st_size);
> > p = mmap(NULL, statbuf.st_size, PROT_READ | PROT_WRITE, fd, 0);
> >
> > *(p + 42) = 0xDEADBEEF;
> > asm { clflush; } /* or whatever */
> >
> > ...so perhaps it would be a good idea to design the fallocate primitive
> > around "prepare this fd for mmap-only pmem semantics" and let it the
> > backend do zeroing and inode flag changes as necessary to make it
> > happen.  We'd need to do some bikeshedding about what the other falloc
> > flags mean when we're dealing with pmem files and devices, but I think
> > we should try to keep the userland presentation the same unless there's
> > a really good reason not to.
> 
> It would be interesting to use fallocate to size device-dax files...

No. device-dax needs to die, not poison a bunch of existing file and
block device APIs and behaviours with special snowflakes.  Get
DAX-enabled filesystems to do what you need, and get rid of this
ugly, nasty hack.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: Patch to include/linux/kernel.h breaks 3rd party modules.
From: Jessica Yu @ 2016-12-21 21:23 UTC (permalink / raw)
  To: Valdis Kletnieks; +Cc: Petr Mladek, linux-kernel
In-Reply-To: <30992.1482352925@turing-police.cc.vt.edu>

+++ Valdis Kletnieks [21/12/16 15:42 -0500]:
>Yes, I know that usually out-of-tree modules are on their own.
>However, this one may require a rethink..
>
>(Sorry for not catching this sooner, I hadn't tried to deal with the
>affected module since this patch hit linux-next in next-20161128)
>
>commit 7fd8329ba502ef76dd91db561c7aed696b2c7720
>Author: Petr Mladek <pmladek@suse.com>
>Date:   Wed Sep 21 13:47:22 2016 +0200
>
>    taint/module: Clean up global and module taint flags handling
>
>Contains this chunk:
>
>--- a/include/linux/kernel.h
>+++ b/include/linux/kernel.h
>@@ -506,6 +506,15 @@ extern enum system_states {
> #define TAINT_UNSIGNED_MODULE          13
> #define TAINT_SOFTLOCKUP               14
> #define TAINT_LIVEPATCH                        15
>+#define TAINT_FLAGS_COUNT              16
>+
>+struct taint_flag {
>+       char true;      /* character printed when tainted */
>+       char false;     /* character printed when not tainted */
>+       bool module;    /* also show as a per-module taint flag */
>+};
>
>and hilarity ensues when an out-of-tree module has this:
>
># ifndef true
>#  define true  (1)
># endif
># ifndef false
>#  define false (0)
># endif
>
>My proposed fix: change true/false to tainted/untainted.  If this
>is agreeable, I'll code and submit the fix.

Sure, that's fine with me.

Jessica

^ permalink raw reply

* Re: [PATCH] at86rf230: Allow slow GPIO pins for "rstn"
From: Stefan Schmidt @ 2016-12-21 21:01 UTC (permalink / raw)
  To: Chris Healy
  Cc: Alexander Aring, Andrey Smirnov, linux-kernel, netdev, linux-wpan
In-Reply-To: <CAFXsbZpgiTM2qngYERVpfXHvePW8M8DN91YnZwc-voKO=FDUUA@mail.gmail.com>

Hello.

On 21/12/16 19:30, Chris Healy wrote:
>
>
> On Dec 21, 2016 5:11 AM, "Stefan Schmidt" <stefan@osg.samsung.com
> <mailto:stefan@osg.samsung.com>> wrote:
>
>     Hello.
>
>
>     On 19/12/16 00:25, Andrey Smirnov wrote:
>
>         Driver code never touches "rstn" signal in atomic context, so
>         there's
>         no need to implicitly put such restriction on it by using
>         gpio_set_value
>         to manipulate it. Replace gpio_set_value to
>         gpio_set_value_cansleep to
>         fix that.
>
>
>     We need to make sure we are not assuming it can be called  in such a
>     context in the future now. But that is something we can worry about
>     if it comes up.
>
>
>         As a an example of where such restriction might be inconvenient,
>         consider a hardware design where "rstn" is connected to a pin of
>         I2C/SPI
>         GPIO expander chip.
>
>
>     Is this a real life issue you run into?
>
>
> I have a platform with this configuration.  The DTS for the platform is
> in the process of being mainlined right now.

Thanks for letting us know. What platform is that? I'm always interested 
in hearing about devices that use the Linux ieee802154 subsystem. :)

regards
Stefan Schmidt

^ permalink raw reply

* Re: WARNING: CPU: 3 PID: 1568 at kernel/sched/core.c:7738 __might_sleep+0x69/0x7e
From: Michal Hocko @ 2016-12-21 20:57 UTC (permalink / raw)
  To: Cong Wang; +Cc: Shaohua Li, Tejun Heo, LKML
In-Reply-To: <CAM_iQpXzGpiOC0qHdUJ=8mq81ECcpyE+YF+HGSRFMc1jnOVY8A@mail.gmail.com>

On Wed 21-12-16 11:24:47, Cong Wang wrote:
> On Wed, Dec 21, 2016 at 2:47 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >
> > Is there anything I could test?
> 
> I am working on a patchset, will send them for you to test. Meanwhile,
> if you could provide me a reproducer, that would help a lot.

I just boot my kvm image. Nothing really special here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Patch to include/linux/kernel.h breaks 3rd party modules.
From: Valdis Kletnieks @ 2016-12-21 20:42 UTC (permalink / raw)
  To: Petr Mladek, Jessica Yu; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

Yes, I know that usually out-of-tree modules are on their own.
However, this one may require a rethink..

(Sorry for not catching this sooner, I hadn't tried to deal with the
affected module since this patch hit linux-next in next-20161128)

commit 7fd8329ba502ef76dd91db561c7aed696b2c7720
Author: Petr Mladek <pmladek@suse.com>
Date:   Wed Sep 21 13:47:22 2016 +0200

    taint/module: Clean up global and module taint flags handling

Contains this chunk:

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -506,6 +506,15 @@ extern enum system_states {
 #define TAINT_UNSIGNED_MODULE          13
 #define TAINT_SOFTLOCKUP               14
 #define TAINT_LIVEPATCH                        15
+#define TAINT_FLAGS_COUNT              16
+
+struct taint_flag {
+       char true;      /* character printed when tainted */
+       char false;     /* character printed when not tainted */
+       bool module;    /* also show as a per-module taint flag */
+};

and hilarity ensues when an out-of-tree module has this:

# ifndef true
#  define true  (1)
# endif
# ifndef false
#  define false (0)
# endif

My proposed fix: change true/false to tainted/untainted.  If this
is agreeable, I'll code and submit the fix.



[-- Attachment #2: Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply

* Re: [PATCH] net: fddi: skfp: use %p format specifier for addresses rather than %x
From: David Miller @ 2016-12-21 20:35 UTC (permalink / raw)
  To: colin.king; +Cc: netdev, linux-kernel
In-Reply-To: <20161221160323.27780-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Wed, 21 Dec 2016 16:03:23 +0000

> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix: Addresses should be printed using the %p format specifier
> rather than using %x.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/5] firmware: add DECLARE_FW_CUSTOM_FALLBACK() annotation
From: Jacek Anaszewski @ 2016-12-21 20:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Milo Kim, Luis R. Rodriguez, gregkh, ming.lei, daniel.wagner, teg,
	mchehab, zajec5, linux-kernel, markivx, stephen.boyd, broonie,
	zohar, tiwai, johannes, chunkeey, hauke, jwboyer, dmitry.torokhov,
	dwmw2, jslaby, torvalds, luto, fengguang.wu, rpurdie,
	j.anaszewski, Abhay_Salunke, Julia.Lawall, Gilles.Muller,
	nicolas.palix, dhowells, bjorn.andersson, arend.vanspriel, kvalo,
	linux-leds
In-Reply-To: <20161221184902.GA21636@amd>

Hi,

On 12/21/2016 07:49 PM, Pavel Machek wrote:
> Hi!
> 
>>>>> Milo if sysfs is used can't the old userspace be mapped to use the new
>>>>> sysfs interface through a wrapper of some sort ? What exactly would be
>>>>> needed to ensure old userspace will not break?
>>>>
>>>> LP5521 and LP5523 have two ways to load hex code from the userspace - the
>>>> sysfs and firmware I/F. So user program supports both interfaces. Even if
>>>> the firmware I/F is not available, user can still run LED effect through the
>>>> sysfs.
>>>>
>>>> However, LP5562 and LP8501 support only single way which is the firmware
>>>> I/F. So user-space program for LP5562/8501 should be modified if lp55xx
>>>> removes the interface. My idea is
>>>
>>> Actually... it would be good to have some reasonable interface for RGB
>>> LEDs. This way, we need separate "firmware" for each LED
>>> controller. It would be good to have common format for LED effects.
>>
>> We still haven't tried trigger approach discussed over half a year ago.
>> If we used firmware approach we would still have to overcome the problem
>> of defining the LED class drivers affected by the firmware program.
> 
> The firmware approach is in the tree today :-(.

to RGB LEDs? What exactly do you have on mind?

> 
>>>> Device manufactures in Asia & North America requested lp55xx drivers, but I
>>>> don't know how many vendors uses the firmware I/F. Some vendors embeds the
>>>> binary code inside the driver instead of using user-program.
>>>
>>> Nokia N900 uses lp55xx, and I have custom scripts interfacing sysfs.
>>>
>>> Maemo uses the LEDs, too, but maemo is not open source.
>>>
>>> So no, I don't think there's anything important that could be broken.
>>
>> We can't guarantee that. Is there any problem in just using the
>> currently introduced DECLARE_FW_CUSTOM_FALLBACK() in
>> drivers/leds/leds-lp55xx-common.c?
> 
> Well, it would be good to get rid of the custom fallback
> functionality. And no, we don't need to "guarantee" that.  Removing
> obscure functionality noone uses is far game... providing noone
> complains ;-).

As Milo explained:

> Why has no one cried
> after the v4.0 custom fallback mechanism breaking ?

"Well, I don't know the reason exactly but my guess is they maybe still
using old kernel."

and after that:

"Device manufactures in Asia & North America requested lp55xx drivers"

These should be sufficient arguments for us for keeping the API
unchanged. If the users decided to upgrade their kernel then they
would be surprised by the API change.

DECLARE_FW_CUSTOM_FALLBACK macro seems to have been designed for
handling exactly this type of cases.

-- 
Best regards,
Jacek Anaszewski

^ permalink raw reply

* [PATCH v4 3/3] Bluetooth: btusb: Configure Marvell to use one of the pins for oob wakeup
From: Rajat Jain @ 2016-12-21 20:33 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Marcel Holtmann, Gustavo Padovan,
	Johan Hedberg, Amitkumar Karwar, Wei-Ning Huang, Xinming Hu,
	netdev, devicetree, linux-bluetooth, Brian Norris, linux-kernel
  Cc: Rajat Jain, rajatxjain
In-Reply-To: <1482352432-38302-1-git-send-email-rajatja@google.com>

The Marvell devices may have many gpio pins, and hence for wakeup
on these out-of-band pins, the chip needs to be told which pin is
to be used for wakeup, using an hci command.

Thus, we read the pin number etc from the device tree node and send
a command to the chip.

Signed-off-by: Rajat Jain <rajatja@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Acked-by: Rob Herring <robh@kernel.org>
---
v4: same as v3
v3: * remove the Marvell specific id table and check
    * Add reference to marvell-bt-8xxx.txt in btusb.txt
    * Add "Reviewed-by" and "Acked-by"    
v2: Fix the binding document to specify to use "wakeup" interrupt-name

 Documentation/devicetree/bindings/net/btusb.txt    |  3 ++
 .../{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} | 46 +++++++++++++++----
 drivers/bluetooth/btusb.c                          | 51 ++++++++++++++++++++++
 3 files changed, 92 insertions(+), 8 deletions(-)
 rename Documentation/devicetree/bindings/net/{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} (50%)

diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt
index 2c0355c..01fa2d4 100644
--- a/Documentation/devicetree/bindings/net/btusb.txt
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -10,6 +10,9 @@ Required properties:
 
 		  "usb1286,204e" (Marvell 8997)
 
+Also, vendors that use btusb may have device additional properties, e.g:
+Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
+
 Optional properties:
 
   - interrupt-parent: phandle of the parent interrupt controller
diff --git a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
similarity index 50%
rename from Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt
rename to Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
index 6a9a63c..9be1059 100644
--- a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt
+++ b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
@@ -1,16 +1,21 @@
-Marvell 8897/8997 (sd8897/sd8997) bluetooth SDIO devices
+Marvell 8897/8997 (sd8897/sd8997) bluetooth devices (SDIO or USB based)
 ------
+The 8997 devices supports multiple interfaces. When used on SDIO interfaces,
+the btmrvl driver is used and when used on USB interface, the btusb driver is
+used.
 
 Required properties:
 
   - compatible : should be one of the following:
-	* "marvell,sd8897-bt"
-	* "marvell,sd8997-bt"
+	* "marvell,sd8897-bt" (for SDIO)
+	* "marvell,sd8997-bt" (for SDIO)
+	* "usb1286,204e"      (for USB)
 
 Optional properties:
 
   - marvell,cal-data: Calibration data downloaded to the device during
 		      initialization. This is an array of 28 values(u8).
+		      This is only applicable to SDIO devices.
 
   - marvell,wakeup-pin: It represents wakeup pin number of the bluetooth chip.
 		        firmware will use the pin to wakeup host system (u16).
@@ -18,10 +23,15 @@ Optional properties:
 		      platform. The value will be configured to firmware. This
 		      is needed to work chip's sleep feature as expected (u16).
   - interrupt-parent: phandle of the parent interrupt controller
-  - interrupts : interrupt pin number to the cpu. Driver will request an irq based
-		 on this interrupt number. During system suspend, the irq will be
-		 enabled so that the bluetooth chip can wakeup host platform under
-		 certain condition. During system resume, the irq will be disabled
+  - interrupt-names: Used only for USB based devices (See below)
+  - interrupts : specifies the interrupt pin number to the cpu. For SDIO, the
+		 driver will use the first interrupt specified in the interrupt
+		 array. For USB based devices, the driver will use the interrupt
+		 named "wakeup" from the interrupt-names and interrupt arrays.
+		 The driver will request an irq based on this interrupt number.
+		 During system suspend, the irq will be enabled so that the
+		 bluetooth chip can wakeup host platform under certain
+		 conditions. During system resume, the irq will be disabled
 		 to make sure unnecessary interrupt is not received.
 
 Example:
@@ -29,7 +39,9 @@ Example:
 IRQ pin 119 is used as system wakeup source interrupt.
 wakeup pin 13 and gap 100ms are configured so that firmware can wakeup host
 using this device side pin and wakeup latency.
-calibration data is also available in below example.
+
+Example for SDIO device follows (calibration data is also available in
+below example).
 
 &mmc3 {
 	status = "okay";
@@ -54,3 +66,21 @@ calibration data is also available in below example.
 		marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
 	};
 };
+
+Example for USB device:
+
+&usb_host1_ohci {
+    status = "okay";
+    #address-cells = <1>;
+    #size-cells = <0>;
+
+    mvl_bt1: bt@1 {
+	compatible = "usb1286,204e";
+	reg = <1>;
+	interrupt-parent = <&gpio0>;
+	interrupt-names = "wakeup";
+	interrupts = <119 IRQ_TYPE_LEVEL_LOW>;
+	marvell,wakeup-pin = /bits/ 16 <0x0d>;
+	marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
+    };
+};
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 2b45f1c..91dcd8a 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -2343,6 +2343,50 @@ static int btusb_shutdown_intel(struct hci_dev *hdev)
 	return 0;
 }
 
+#ifdef CONFIG_PM
+/* Configure an out-of-band gpio as wake-up pin, if specified in device tree */
+static int marvell_config_oob_wake(struct hci_dev *hdev)
+{
+	struct sk_buff *skb;
+	struct btusb_data *data = hci_get_drvdata(hdev);
+	struct device *dev = &data->udev->dev;
+	u16 pin, gap, opcode;
+	int ret;
+	u8 cmd[5];
+
+	/* Move on if no wakeup pin specified */
+	if (of_property_read_u16(dev->of_node, "marvell,wakeup-pin", &pin) ||
+	    of_property_read_u16(dev->of_node, "marvell,wakeup-gap-ms", &gap))
+		return 0;
+
+	/* Vendor specific command to configure a GPIO as wake-up pin */
+	opcode = hci_opcode_pack(0x3F, 0x59);
+	cmd[0] = opcode & 0xFF;
+	cmd[1] = opcode >> 8;
+	cmd[2] = 2; /* length of parameters that follow */
+	cmd[3] = pin;
+	cmd[4] = gap; /* time in ms, for which wakeup pin should be asserted */
+
+	skb = bt_skb_alloc(sizeof(cmd), GFP_KERNEL);
+	if (!skb) {
+		bt_dev_err(hdev, "%s: No memory\n", __func__);
+		return -ENOMEM;
+	}
+
+	memcpy(skb_put(skb, sizeof(cmd)), cmd, sizeof(cmd));
+	hci_skb_pkt_type(skb) = HCI_COMMAND_PKT;
+
+	ret = btusb_send_frame(hdev, skb);
+	if (ret) {
+		bt_dev_err(hdev, "%s: configuration failed\n", __func__);
+		kfree_skb(skb);
+		return ret;
+	}
+
+	return 0;
+}
+#endif
+
 static int btusb_set_bdaddr_marvell(struct hci_dev *hdev,
 				    const bdaddr_t *bdaddr)
 {
@@ -2917,6 +2961,13 @@ static int btusb_probe(struct usb_interface *intf,
 	err = btusb_config_oob_wake(hdev);
 	if (err)
 		goto out_free_dev;
+
+	/* Marvell devices may need a specific chip configuration */
+	if (id->driver_info & BTUSB_MARVELL && data->oob_wake_irq) {
+		err = marvell_config_oob_wake(hdev);
+		if (err)
+			goto out_free_dev;
+	}
 #endif
 	if (id->driver_info & BTUSB_CW6622)
 		set_bit(HCI_QUIRK_BROKEN_STORED_LINK_KEY, &hdev->quirks);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH v4 2/3] Bluetooth: btusb: Add out-of-band wakeup support
From: Rajat Jain @ 2016-12-21 20:33 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Marcel Holtmann, Gustavo Padovan,
	Johan Hedberg, Amitkumar Karwar, Wei-Ning Huang, Xinming Hu,
	netdev, devicetree, linux-bluetooth, Brian Norris, linux-kernel
  Cc: Rajat Jain, rajatxjain
In-Reply-To: <1482352432-38302-1-git-send-email-rajatja@google.com>

Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
can be connected to a gpio on the CPU side, and can be used to wakeup
the host out-of-band. This can be useful in situations where the
in-band wakeup is not possible or not preferable (e.g. the in-band
wakeup may require the USB host controller to remain active, and
hence consuming more system power during system sleep).

The oob gpio interrupt to be used for wakeup on the CPU side, is
read from the device tree node, (using standard interrupt descriptors).
A devcie tree binding document is also added for the driver. The
compatible string is in compliance with
Documentation/devicetree/bindings/usb/usb-device.txt

Signed-off-by: Rajat Jain <rajatja@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
---
v4: Move the set_bit(BTUSB_OOB_WAKE_DISABLED,..) call to the beginning of
    btusb_config_oob_wake() - caught by Brian.
v3: Add Brian's "Reviewed-by"
v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
    * Leave it on device tree to specify IRQ flags (level /edge triggered)
    * Mark the device as non wakeable on exit.

 Documentation/devicetree/bindings/net/btusb.txt | 40 ++++++++++++
 drivers/bluetooth/btusb.c                       | 84 +++++++++++++++++++++++++
 2 files changed, 124 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/btusb.txt

diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt
new file mode 100644
index 0000000..2c0355c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -0,0 +1,40 @@
+Generic Bluetooth controller over USB (btusb driver)
+---------------------------------------------------
+
+Required properties:
+
+  - compatible : should comply with the format "usbVID,PID" specified in
+		 Documentation/devicetree/bindings/usb/usb-device.txt
+		 At the time of writing, the only OF supported devices
+		 (more may be added later) are:
+
+		  "usb1286,204e" (Marvell 8997)
+
+Optional properties:
+
+  - interrupt-parent: phandle of the parent interrupt controller
+  - interrupt-names: (see below)
+  - interrupts : The interrupt specified by the name "wakeup" is the interrupt
+		 that shall be used for out-of-band wake-on-bt. Driver will
+		 request this interrupt for wakeup. During system suspend, the
+		 irq will be enabled so that the bluetooth chip can wakeup host
+		 platform out of band. During system resume, the irq will be
+		 disabled to make sure unnecessary interrupt is not received.
+
+Example:
+
+Following example uses irq pin number 3 of gpio0 for out of band wake-on-bt:
+
+&usb_host1_ehci {
+    status = "okay";
+    #address-cells = <1>;
+    #size-cells = <0>;
+
+    mvl_bt1: bt@1 {
+	compatible = "usb1286,204e";
+	reg = <1>;
+	interrupt-parent = <&gpio0>;
+	interrupt-name = "wakeup";
+	interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
+    };
+};
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index ce22cef..2b45f1c 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -24,6 +24,8 @@
 #include <linux/module.h>
 #include <linux/usb.h>
 #include <linux/firmware.h>
+#include <linux/of_device.h>
+#include <linux/of_irq.h>
 #include <asm/unaligned.h>
 
 #include <net/bluetooth/bluetooth.h>
@@ -369,6 +371,7 @@ static const struct usb_device_id blacklist_table[] = {
 #define BTUSB_BOOTING		9
 #define BTUSB_RESET_RESUME	10
 #define BTUSB_DIAG_RUNNING	11
+#define BTUSB_OOB_WAKE_DISABLED	12
 
 struct btusb_data {
 	struct hci_dev       *hdev;
@@ -416,6 +419,8 @@ struct btusb_data {
 	int (*recv_bulk)(struct btusb_data *data, void *buffer, int count);
 
 	int (*setup_on_usb)(struct hci_dev *hdev);
+
+	int oob_wake_irq;   /* irq for out-of-band wake-on-bt */
 };
 
 static inline void btusb_free_frags(struct btusb_data *data)
@@ -2728,6 +2733,65 @@ static int btusb_bcm_set_diag(struct hci_dev *hdev, bool enable)
 }
 #endif
 
+#ifdef CONFIG_PM
+static irqreturn_t btusb_oob_wake_handler(int irq, void *priv)
+{
+	struct btusb_data *data = priv;
+
+	/* Disable only if not already disabled (keep it balanced) */
+	if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags)) {
+		disable_irq_nosync(irq);
+		disable_irq_wake(irq);
+	}
+	pm_wakeup_event(&data->udev->dev, 0);
+	return IRQ_HANDLED;
+}
+
+static const struct of_device_id btusb_match_table[] = {
+	{ .compatible = "usb1286,204e" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, btusb_match_table);
+
+/* Use an oob wakeup pin? */
+static int btusb_config_oob_wake(struct hci_dev *hdev)
+{
+	struct btusb_data *data = hci_get_drvdata(hdev);
+	struct device *dev = &data->udev->dev;
+	int irq, ret;
+
+	set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags);
+
+	if (!of_match_device(btusb_match_table, dev))
+		return 0;
+
+	/* Move on if no IRQ specified */
+	irq = of_irq_get_byname(dev->of_node, "wakeup");
+	if (irq <= 0) {
+		bt_dev_dbg(hdev, "%s: no OOB Wakeup IRQ in DT", __func__);
+		return 0;
+	}
+
+	ret = devm_request_irq(&hdev->dev, irq, btusb_oob_wake_handler,
+			       0, "OOB Wake-on-BT", data);
+	if (ret) {
+		bt_dev_err(hdev, "%s: IRQ request failed", __func__);
+		return ret;
+	}
+
+	ret = device_init_wakeup(dev, true);
+	if (ret) {
+		bt_dev_err(hdev, "%s: failed to init_wakeup\n", __func__);
+		return ret;
+	}
+
+	data->oob_wake_irq = irq;
+	disable_irq(irq);
+	bt_dev_info(hdev, "OOB Wake-on-BT configured at IRQ %u\n", irq);
+	return 0;
+}
+#endif
+
 static int btusb_probe(struct usb_interface *intf,
 		       const struct usb_device_id *id)
 {
@@ -2849,6 +2913,11 @@ static int btusb_probe(struct usb_interface *intf,
 	hdev->send   = btusb_send_frame;
 	hdev->notify = btusb_notify;
 
+#ifdef CONFIG_PM
+	err = btusb_config_oob_wake(hdev);
+	if (err)
+		goto out_free_dev;
+#endif
 	if (id->driver_info & BTUSB_CW6622)
 		set_bit(HCI_QUIRK_BROKEN_STORED_LINK_KEY, &hdev->quirks);
 
@@ -3061,6 +3130,9 @@ static void btusb_disconnect(struct usb_interface *intf)
 			usb_driver_release_interface(&btusb_driver, data->isoc);
 	}
 
+	if (data->oob_wake_irq)
+		device_init_wakeup(&data->udev->dev, false);
+
 	hci_free_dev(hdev);
 }
 
@@ -3089,6 +3161,12 @@ static int btusb_suspend(struct usb_interface *intf, pm_message_t message)
 	btusb_stop_traffic(data);
 	usb_kill_anchored_urbs(&data->tx_anchor);
 
+	if (data->oob_wake_irq && device_may_wakeup(&data->udev->dev)) {
+		clear_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags);
+		enable_irq_wake(data->oob_wake_irq);
+		enable_irq(data->oob_wake_irq);
+	}
+
 	/* Optionally request a device reset on resume, but only when
 	 * wakeups are disabled. If wakeups are enabled we assume the
 	 * device will stay powered up throughout suspend.
@@ -3126,6 +3204,12 @@ static int btusb_resume(struct usb_interface *intf)
 	if (--data->suspend_count)
 		return 0;
 
+	/* Disable only if not already disabled (keep it balanced) */
+	if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags)) {
+		disable_irq(data->oob_wake_irq);
+		disable_irq_wake(data->oob_wake_irq);
+	}
+
 	if (!test_bit(HCI_RUNNING, &hdev->flags))
 		goto done;
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH v4 1/3] Bluetooth: btusb: Use an error label for error paths
From: Rajat Jain @ 2016-12-21 20:33 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Marcel Holtmann, Gustavo Padovan,
	Johan Hedberg, Amitkumar Karwar, Wei-Ning Huang, Xinming Hu,
	netdev, devicetree, linux-bluetooth, Brian Norris, linux-kernel
  Cc: Rajat Jain, rajatxjain

Use a label to remove the repetetive cleanup, for error cases.

Signed-off-by: Rajat Jain <rajatja@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
---
v4: same as v3
v3: Added Brian's "Reviewed-by"
v2: same as v1

 drivers/bluetooth/btusb.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 2f633df..ce22cef 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -2991,18 +2991,15 @@ static int btusb_probe(struct usb_interface *intf,
 		err = usb_set_interface(data->udev, 0, 0);
 		if (err < 0) {
 			BT_ERR("failed to set interface 0, alt 0 %d", err);
-			hci_free_dev(hdev);
-			return err;
+			goto out_free_dev;
 		}
 	}
 
 	if (data->isoc) {
 		err = usb_driver_claim_interface(&btusb_driver,
 						 data->isoc, data);
-		if (err < 0) {
-			hci_free_dev(hdev);
-			return err;
-		}
+		if (err < 0)
+			goto out_free_dev;
 	}
 
 #ifdef CONFIG_BT_HCIBTUSB_BCM
@@ -3016,14 +3013,16 @@ static int btusb_probe(struct usb_interface *intf,
 #endif
 
 	err = hci_register_dev(hdev);
-	if (err < 0) {
-		hci_free_dev(hdev);
-		return err;
-	}
+	if (err < 0)
+		goto out_free_dev;
 
 	usb_set_intfdata(intf, data);
 
 	return 0;
+
+out_free_dev:
+	hci_free_dev(hdev);
+	return err;
 }
 
 static void btusb_disconnect(struct usb_interface *intf)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCHv2] crypto: testmgr: Use heap buffer for acomp test input
From: Laura Abbott @ 2016-12-21 20:32 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Ard Biesheuvel, Giovanni Cabiddu
  Cc: Laura Abbott, linux-crypto, linux-kernel, linux-arm-kernel,
	Mark Rutland, Christopher Covington


Christopher Covington reported a crash on aarch64 on recent Fedora
kernels:

kernel BUG at ./include/linux/scatterlist.h:140!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162
Hardware name: linux,dummy-virt (DT)
task: ffff80007c650080 task.stack: ffff800008910000
PC is at sg_init_one+0xa0/0xb8
LR is at sg_init_one+0x24/0xb8
...
[<ffff000008398db8>] sg_init_one+0xa0/0xb8
[<ffff000008350a44>] test_acomp+0x10c/0x438
[<ffff000008350e20>] alg_test_comp+0xb0/0x118
[<ffff00000834f28c>] alg_test+0x17c/0x2f0
[<ffff00000834c6a4>] cryptomgr_test+0x44/0x50
[<ffff0000080dac70>] kthread+0xf8/0x128
[<ffff000008082ec0>] ret_from_fork+0x10/0x50

The test vectors used for input are part of the kernel image. These
inputs are passed as a buffer to sg_init_one which eventually blows up
with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
false for the kernel image since virt_to_page will not return the
correct page. Fix this by copying the input vectors to heap buffer
before setting up the scatterlist.

Reported-by: Christopher Covington <cov@codeaurora.org>
Fixes: d7db7a882deb ("crypto: acomp - update testmgr with support for acomp")
Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 crypto/testmgr.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index f616ad7..44e888b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1461,16 +1461,25 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 	for (i = 0; i < ctcount; i++) {
 		unsigned int dlen = COMP_BUF_SIZE;
 		int ilen = ctemplate[i].inlen;
+		void *input_vec;
 
+		input_vec = kmalloc(ilen, GFP_KERNEL);
+		if (!input_vec) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		memcpy(input_vec, ctemplate[i].input, ilen);
 		memset(output, 0, dlen);
 		init_completion(&result.completion);
-		sg_init_one(&src, ctemplate[i].input, ilen);
+		sg_init_one(&src, input_vec, ilen);
 		sg_init_one(&dst, output, dlen);
 
 		req = acomp_request_alloc(tfm);
 		if (!req) {
 			pr_err("alg: acomp: request alloc failed for %s\n",
 			       algo);
+			kfree(input_vec);
 			ret = -ENOMEM;
 			goto out;
 		}
@@ -1483,6 +1492,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 		if (ret) {
 			pr_err("alg: acomp: compression failed on test %d for %s: ret=%d\n",
 			       i + 1, algo, -ret);
+			kfree(input_vec);
 			acomp_request_free(req);
 			goto out;
 		}
@@ -1491,6 +1501,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 			pr_err("alg: acomp: Compression test %d failed for %s: output len = %d\n",
 			       i + 1, algo, req->dlen);
 			ret = -EINVAL;
+			kfree(input_vec);
 			acomp_request_free(req);
 			goto out;
 		}
@@ -1500,26 +1511,37 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 			       i + 1, algo);
 			hexdump(output, req->dlen);
 			ret = -EINVAL;
+			kfree(input_vec);
 			acomp_request_free(req);
 			goto out;
 		}
 
+		kfree(input_vec);
 		acomp_request_free(req);
 	}
 
 	for (i = 0; i < dtcount; i++) {
 		unsigned int dlen = COMP_BUF_SIZE;
 		int ilen = dtemplate[i].inlen;
+		void *input_vec;
+
+		input_vec = kmalloc(ilen, GFP_KERNEL);
+		if (!input_vec) {
+			ret = -ENOMEM;
+			goto out;
+		}
 
+		memcpy(input_vec, dtemplate[i].input, ilen);
 		memset(output, 0, dlen);
 		init_completion(&result.completion);
-		sg_init_one(&src, dtemplate[i].input, ilen);
+		sg_init_one(&src, input_vec, ilen);
 		sg_init_one(&dst, output, dlen);
 
 		req = acomp_request_alloc(tfm);
 		if (!req) {
 			pr_err("alg: acomp: request alloc failed for %s\n",
 			       algo);
+			kfree(input_vec);
 			ret = -ENOMEM;
 			goto out;
 		}
@@ -1532,6 +1554,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 		if (ret) {
 			pr_err("alg: acomp: decompression failed on test %d for %s: ret=%d\n",
 			       i + 1, algo, -ret);
+			kfree(input_vec);
 			acomp_request_free(req);
 			goto out;
 		}
@@ -1540,6 +1563,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 			pr_err("alg: acomp: Decompression test %d failed for %s: output len = %d\n",
 			       i + 1, algo, req->dlen);
 			ret = -EINVAL;
+			kfree(input_vec);
 			acomp_request_free(req);
 			goto out;
 		}
@@ -1549,10 +1573,12 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 			       i + 1, algo);
 			hexdump(output, req->dlen);
 			ret = -EINVAL;
+			kfree(input_vec);
 			acomp_request_free(req);
 			goto out;
 		}
 
+		kfree(input_vec);
 		acomp_request_free(req);
 	}
 
-- 
2.7.4

^ permalink raw reply related

* Re: [patch 10/10] irqchip/armada-xp: Consolidate hotplug state space
From: Thomas Gleixner @ 2016-12-21 20:27 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: LKML, Sebastian Siewior, Ingo Molnar, Peter Zijlstra,
	Marc Zyngier
In-Reply-To: <20161221212235.77e590c8@free-electrons.com>

On Wed, 21 Dec 2016, Thomas Petazzoni wrote:
> On Wed, 21 Dec 2016 20:19:57 +0100, Thomas Gleixner wrote:
> > The mpic is either the main interrupt controller or sits behind a GIC. But
> > there is no way that both variants are available on the same system.
> 
> By "both variants", you mean the MPIC acting as the main interrupt
> controller on one side, and the MPIC acting as a "cascaded" controller,
> child of the GIC on the other side ?
> 
> If that's what you meant, then indeed it's correct.

Yes. I'll rephrase that.

Thanks,

	tglx

^ permalink raw reply

* BUG/panic in ctnetlink_conntrack_event in 4.8.11
From: Chris Boot @ 2016-12-21 20:20 UTC (permalink / raw)
  To: Netfilter Development Mailing list
  Cc: Linux Kernel Network Developers, coreteam, linux-kernel

Hi all,

I've encountered this BUG three times in the last few days, though I
must admit I've only captured the trace once so far so I can't be
completely certain it was exactly this the last few times. I did not
experience this with a 4.7 kernel; it only seemed to start with 4.8.

For some background: I use conntrackd (this is an "HA" firewall pair),
plenty of IPv6, IPsec with vti6 interfaces, conntrack, some NAT on IPv4
but definitely not with IPv6.

Without further ado, here is my crash:

[147965.209318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[147965.217347] IP: [<ffffffffb4bb8b19>] icmp6_send+0x229/0x9f0
[147965.223051] PGD 0 
[147965.225184] Oops: 0000 [#1] SMP
[147965.228424] Modules linked in: sch_fq_codel sch_htb pppoe pppox ppp_generic slhc ip6_vti ip6_tunnel tunnel6 drbg ansi_cprng seqiv esp6 xfrm4_mode_tunnel xfrm6_mode_tunnel ghash_generic gcm twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ctr des_generic cbc algif_skcipher camellia_generic camellia_x86_64 xts xcbc sha512_ssse3 sha512_generic md4 algif_hash af_alg xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo tun hmac xt_nat xt_policy xt_statistic xt_helper xt_CLASSIFY xt_recent ip6table_nat xt_dscp xt_length binfmt_misc ip6t_REJECT xt_hashlimit nf_reject_ipv6 ip6table_mangle xt_comment iptable_nat ipt_REJECT nf_reject_ipv4 xt_addrtype xt_set ip_set_hash_ip ip_set xt_connmark xt_mark iptable_mangle xt_tcpudp iptable_raw xt_CT ip6table_raw xt_multiport xt_conntrack nf_log_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp xt_NFLOG nf_nat_sip xt_LOG nf_log_ipv6 nf_nat_pptp nf_log_common nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack 8021q garp mrp stp llc dummy nfnetlink_log nfnetlink evdev kvm_amd kvm irqbypass pcspkr k10temp sp5100_tco i2c_piix4 sg shpchp acpi_cpufreq tpm_tis tpm_tis_core tpm button drbd lru_cache libcrc32c ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache dm_mod sd_mod uas usb_storage ohci_pci ehci_pci ohci_hcd ehci_hcd usbcore ahci libahci libata scsi_mod usb_common r8169 mii
[147965.409769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-2-amd64 #1 Debian 4.8.11-1
[147965.417773] Hardware name: PC Engines APU, BIOS SageBios_PCEngines_APU-45 04/05/2014
[147965.425607] task: ffff96d2d940aec0 task.stack: ffff96d2d9410000
[147965.431622] RIP: 0010:[<ffffffffb4bb8b19>]  [<ffffffffb4bb8b19>] icmp6_send+0x229/0x9f0
[147965.439742] RSP: 0018:ffff96d2ded03d30  EFLAGS: 00010246
[147965.445150] RAX: 0000000000000000 RBX: ffff96d2861e1700 RCX: 0000000000000020
[147965.452377] RDX: 0000000000000001 RSI: 0000000000000200 RDI: ffff96d1cb42799e
[147965.459597] RBP: ffff96d2ded03e60 R08: 0000000000000000 R09: ffff96d299832000
[147965.466823] R10: 0000000000000001 R11: 0000000000000000 R12: ffff96d1cb427996
[147965.474042] R13: ffffffffb50da680 R14: 0000000000000000 R15: 0000000000000003
[147965.481263] FS:  0000000000000000(0000) GS:ffff96d2ded00000(0000) knlGS:0000000000000000
[147965.489444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[147965.495283] CR2: 0000000000000018 CR3: 0000000116d59000 CR4: 00000000000006e0
[147965.502501] Stack:
[147965.504607]  ffff96d291d14880 0000000000000000 ffff96d289288400 0000000000000000
[147965.512189]  0000000000000000 0000000000000000 ffff96d1cb42799e 0000000000000000
[147965.519772]  0000000000000001 ffff96d200000001 ffff96d1cb4279ae ffff96d280fb2900
[147965.527356] Call Trace:
[147965.529895]  <IRQ> 
[147965.531932]  [<ffffffffc07330df>] ? ctnetlink_conntrack_event+0x3ff/0x620 [nf_conntrack_netlink]
[147965.541005]  [<ffffffffc068e94a>] ? nf_nat_cleanup_conntrack+0xea/0x1a0 [nf_nat]
[147965.548492]  [<ffffffffb4b74653>] ? get_frag_bucket_locked+0x43/0x70
[147965.554939]  [<ffffffffc0696330>] ? nf_ct_net_init+0x130/0x130 [nf_defrag_ipv6]
[147965.562338]  [<ffffffffb4bbff78>] ? ip6_expire_frag_queue+0xf8/0x100
[147965.568787]  [<ffffffffb46e6e80>] ? call_timer_fn+0x30/0x120
[147965.574539]  [<ffffffffb46e7406>] ? run_timer_softirq+0x216/0x4b0
[147965.580728]  [<ffffffffb46f7ac0>] ? tick_sched_handle.isra.12+0x20/0x50
[147965.587434]  [<ffffffffb46f7b28>] ? tick_sched_timer+0x38/0x70
[147965.593363]  [<ffffffffb4bf2598>] ? __do_softirq+0xf8/0x290
[147965.599030]  [<ffffffffb4681abb>] ? irq_exit+0x9b/0xa0
[147965.604266]  [<ffffffffb4bf23ae>] ? smp_apic_timer_interrupt+0x3e/0x50
[147965.610884]  [<ffffffffb4bf16c2>] ? apic_timer_interrupt+0x82/0x90
[147965.617155]  <EOI> 
[147965.619184]  [<ffffffffb4ab0f06>] ? cpuidle_enter_state+0x126/0x2d0
[147965.625740]  [<ffffffffb4ab0ef3>] ? cpuidle_enter_state+0x113/0x2d0
[147965.632100]  [<ffffffffb46bd742>] ? cpu_startup_entry+0x2a2/0x350
[147965.638291]  [<ffffffffb464eddd>] ? start_secondary+0x14d/0x190
[147965.644299] Code: 8b 44 24 38 75 46 f6 c2 02 74 05 f6 c2 30 75 3c 48 8b 43 58 4c 89 44 24 20 44 89 5c 24 38 44 89 54 24 40 89 54 24 48 48 83 e0 fe <48> 8b 78 18 e8 8e 88 02 00 8b 54 24 48 41 89 c1 44 8b 54 24 40 
[147965.664907] RIP  [<ffffffffb4bb8b19>] icmp6_send+0x229/0x9f0
[147965.670686]  RSP <ffff96d2ded03d30>
[147965.674267] CR2: 0000000000000018
[147965.677683] ---[ end trace d5725bb00a2f3d6b ]---
[147965.682396] Kernel panic - not syncing: Fatal exception in interrupt
[147965.688898] Kernel Offset: 0x33600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[147965.699764] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[147965.707009] ------------[ cut here ]------------
[147965.711730] WARNING: CPU: 1 PID: 0 at /build/linux-lIgGMF/linux-4.8.11/arch/x86/kernel/smp.c:125 check_preempt_curr+0x50/0x90
[147965.723108] Modules linked in: sch_fq_codel sch_htb pppoe pppox ppp_generic slhc ip6_vti ip6_tunnel tunnel6 drbg ansi_cprng seqiv esp6 xfrm4_mode_tunnel xfrm6_mode_tunnel ghash_generic gcm twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ctr des_generic cbc algif_skcipher camellia_generic camellia_x86_64 xts xcbc sha512_ssse3 sha512_generic md4 algif_hash af_alg xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo tun hmac xt_nat xt_policy xt_statistic xt_helper xt_CLASSIFY xt_recent ip6table_nat xt_dscp xt_length binfmt_misc ip6t_REJECT xt_hashlimit nf_reject_ipv6 ip6table_mangle xt_comment iptable_nat ipt_REJECT nf_reject_ipv4 xt_addrtype xt_set ip_set_hash_ip ip_set xt_connmark xt_mark iptable_mangle xt_tcpudp iptable_raw xt_CT ip6table_raw xt_multiport xt_conntrack nf_log_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp xt_NFLOG nf_nat_sip xt_LOG nf_log_ipv6 nf_nat_pptp nf_log_common nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack 8021q garp mrp stp llc dummy nfnetlink_log nfnetlink evdev kvm_amd kvm irqbypass pcspkr k10temp sp5100_tco i2c_piix4 sg shpchp acpi_cpufreq tpm_tis tpm_tis_core tpm button drbd lru_cache libcrc32c ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache dm_mod sd_mod uas usb_storage ohci_pci ehci_pci ohci_hcd ehci_hcd usbcore ahci libahci libata scsi_mod usb_common r8169 mii
[147965.904454] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         4.8.0-2-amd64 #1 Debian 4.8.11-1
[147965.913671] Hardware name: PC Engines APU, BIOS SageBios_PCEngines_APU-45 04/05/2014
[147965.921496]  0000000000000086 2c0493b5e5995666 ffffffffb49269f5 0000000000000000
[147965.929080]  0000000000000000 ffffffffb467c16e ffff96d2dec18180 ffff96d2c17fee40
[147965.936662]  ffff96d2dec18180 0000000000000004 0000000000000046 ffff96d2dec18180
[147965.944244] Call Trace:
[147965.946786]  <IRQ>  [<ffffffffb49269f5>] ? dump_stack+0x5c/0x77
[147965.952830]  [<ffffffffb467c16e>] ? __warn+0xbe/0xe0
[147965.957893]  [<ffffffffb46a4720>] ? check_preempt_curr+0x50/0x90
[147965.963993]  [<ffffffffb46a4774>] ? ttwu_do_wakeup+0x14/0xe0
[147965.969745]  [<ffffffffb46a5441>] ? try_to_wake_up+0x191/0x3a0
[147965.975675]  [<ffffffffb46bce93>] ? autoremove_wake_function+0x13/0x40
[147965.982293]  [<ffffffffb46bc76e>] ? __wake_up_common+0x4e/0x90
[147965.988221]  [<ffffffffb46bc7e4>] ? __wake_up+0x34/0x50
[147965.993545]  [<ffffffffb475b943>] ? irq_work_run_list+0x43/0x70
[147965.999557]  [<ffffffffb46318da>] ? smp_irq_work_interrupt+0x2a/0x30
[147966.006005]  [<ffffffffb4bf2202>] ? irq_work_interrupt+0x82/0x90
[147966.012108]  [<ffffffffb477a7b4>] ? panic+0x1e6/0x226
[147966.017254]  [<ffffffffb477a7ad>] ? panic+0x1df/0x226
[147966.022404]  [<ffffffffb462fa42>] ? oops_end+0xc2/0xd0
[147966.027635]  [<ffffffffb46651e8>] ? no_context+0x128/0x370
[147966.033217]  [<ffffffffb46a56d0>] ? wake_up_q+0x60/0x60
[147966.038538]  [<ffffffffb4bf0e58>] ? page_fault+0x28/0x30
[147966.043948]  [<ffffffffb4bb8b19>] ? icmp6_send+0x229/0x9f0
[147966.049532]  [<ffffffffc07330df>] ? ctnetlink_conntrack_event+0x3ff/0x620 [nf_conntrack_netlink]
[147966.058404]  [<ffffffffc068e94a>] ? nf_nat_cleanup_conntrack+0xea/0x1a0 [nf_nat]
[147966.065888]  [<ffffffffb4b74653>] ? get_frag_bucket_locked+0x43/0x70
[147966.072340]  [<ffffffffc0696330>] ? nf_ct_net_init+0x130/0x130 [nf_defrag_ipv6]
[147966.079738]  [<ffffffffb4bbff78>] ? ip6_expire_frag_queue+0xf8/0x100
[147966.086186]  [<ffffffffb46e6e80>] ? call_timer_fn+0x30/0x120
[147966.091941]  [<ffffffffb46e7406>] ? run_timer_softirq+0x216/0x4b0
[147966.098127]  [<ffffffffb46f7ac0>] ? tick_sched_handle.isra.12+0x20/0x50
[147966.104835]  [<ffffffffb46f7b28>] ? tick_sched_timer+0x38/0x70
[147966.110764]  [<ffffffffb4bf2598>] ? __do_softirq+0xf8/0x290
[147966.116431]  [<ffffffffb4681abb>] ? irq_exit+0x9b/0xa0
[147966.121664]  [<ffffffffb4bf23ae>] ? smp_apic_timer_interrupt+0x3e/0x50
[147966.128285]  [<ffffffffb4bf16c2>] ? apic_timer_interrupt+0x82/0x90
[147966.134557]  <EOI>  [<ffffffffb4ab0f06>] ? cpuidle_enter_state+0x126/0x2d0
[147966.141555]  [<ffffffffb4ab0ef3>] ? cpuidle_enter_state+0x113/0x2d0
[147966.147916]  [<ffffffffb46bd742>] ? cpu_startup_entry+0x2a2/0x350
[147966.154103]  [<ffffffffb464eddd>] ? start_secondary+0x14d/0x190
[147966.160117] ---[ end trace d5725bb00a2f3d6c ]---

Regards,
Chris

-- 
Chris Boot
bootc@bootc.net

^ permalink raw reply

* Re: [Patch v11 4/5] x86/cpufeature: enable RING3MWAIT for Knights Landing
From: Thomas Gleixner @ 2016-12-21 20:24 UTC (permalink / raw)
  To: Grzegorz Andrejczuk; +Cc: mingo, hpa, x86, linux-kernel, Piotr.Luc, dave.hansen
In-Reply-To: <1482258687-4582-1-git-send-email-grzegorz.andrejczuk@intel.com>

On Tue, 20 Dec 2016, Grzegorz Andrejczuk wrote:

So how am I supposed to know which version of these patches is the right
one? Both subject lines are identical. 

> Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200
> codenamed Knights Landing.
> 
> The patch:

>From your cover letter:

     Removed "This patch" from commit messages

Why is 'The patch any better' ?

> - Sets CPU feature X86_FEATURE_RING3MWAIT.
> - Sets HWCAP2_RING3MWAIT bit in ELF_HWCAP2.
> - Adds the ring3mwait=disable command line parameter.
> - Sets bit 1 of the MSR MISC_FEATURE_ENABLES or clears it when
>   the ring3mwait=disable command line parameter is used.

Changelogs should not describe WHAT the patch is doing. We can see that
from the patch. Changelogs should describe the WHY and CONCEPTS not
implementation details.

> +static void probe_xeon_phi_r3mwait(struct cpuinfo_x86 *c)
> +{
> +	/*
> +	 * Ring 3 MONITOR/MWAIT feature cannot be detected without
> +	 * cpu model and family comparison.
> +	 */
> +	if (c->x86 != 6 || c->x86_model != INTEL_FAM6_XEON_PHI_KNL)
> +		return;
> +
> +	if (ring3mwait_disabled) {
> +		msr_clear_bit(MSR_MISC_FEATURE_ENABLES,
> +			      MSR_MISC_FEATURE_ENABLES_RING3MWAIT_BIT);
> +		return;
> +	}
> +
> +	msr_set_bit(MSR_MISC_FEATURE_ENABLES,
> +		    MSR_MISC_FEATURE_ENABLES_RING3MWAIT_BIT);
> +	set_cpu_cap(c, X86_FEATURE_RING3MWAIT);
> +	set_bit(HWCAP2_RING3MWAIT, (unsigned long *)&ELF_HWCAP2);

>From your cover letter:

     "Removed warning from 32-bit build"

First of all, the warning

   arch/x86/include/asm/bitops.h:72:1: note: expected 'volatile long unsigned int *'
but argument is of type 'unsigned int *'
    set_bit(long nr, volatile unsigned long *addr)

is not at all 32bit specific.

Handing an unsigned int pointer to a function which expects a unsigned long
is even more wrong on 64bit.

So now for your 'removal fix': It's just as sloppy as anything else what
I've seen from you before.

Handing a typecasted unsigned int pointer to a function which expects an
unsigned long pointer is just broken and a clear sign of careless
tinkering.

The only reason why this 'works' is because x86 is a little endian
architecture and the bit number is a constant and set_bit gets translated
it into:

    orb 0x02, 0x0(%rip) 

Now if you look really close to that disassembly then you might notice,
that this sets bit 1 and not as you tell in patch 2/5:

   "Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose
    the ring 3 MONITOR/MWAIT."

So why does it not set bit 0?

Simply because you hand in HWCAP2_RING3MWAIT as bit number, which is
defined as:

+#define HWCAP2_RING3MWAIT              (1 << 0)

Crap, crap, crap.

What's so !$@&*(? wrong with doing the simple, obvious and correct:

       ELF_HWCAP2 |= HWCAP2_RING3MWAIT;

C is really hard, right?

Yours grumpy

      tglx



       

^ permalink raw reply

* Re: [PATCH v11 1/5] x86/msr: add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit
From: Thomas Gleixner @ 2016-12-21 20:23 UTC (permalink / raw)
  To: Grzegorz Andrejczuk; +Cc: mingo, hpa, x86, linux-kernel, Piotr.Luc, dave.hansen
In-Reply-To: <1482241726-27310-2-git-send-email-grzegorz.andrejczuk@intel.com>

On Tue, 20 Dec 2016, Grzegorz Andrejczuk wrote:
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 78f3760..55ffae0 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -539,6 +539,12 @@
>  #define MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE_BIT	39
>  #define MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE		(1ULL << MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE_BIT)
>  
> +/* MISC_FEATURE_ENABLES non-architectural features */
> +#define MSR_MISC_FEATURE_ENABLES		0x00000140
> +
> +#define MSR_MISC_FEATURE_ENABLES_RING3MWAIT_BIT		1
> +#define MSR_MISC_FEATURE_ENABLES_RING3MWAIT		(1ULL << MSR_MISC_FEATURE_ENABLES_RING3MWAIT_BIT)
> +

This last define is not used anywhere. I told you before, but addressing my
review comments completely is an unduly burden, or what?

Thanks,

	tglx

^ permalink raw reply

* [PATCH 2/3] KVM: nVMX: colocate guest vmcs checks
From: Radim Krčmář @ 2016-12-21 20:24 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, David Matlack
In-Reply-To: <20161221202404.24020-1-rkrcmar@redhat.com>

Few guest checks were done before entering the guest, for which there is
no reason.  Having them all in one place will be simpler.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 72 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ab65b31ce58c..050899431b5e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10499,39 +10499,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 		}
 	}
 
-	if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
-	    !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) {
-		nested_vmx_entry_failure(vcpu, vmcs12,
-			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-		return 1;
-	}
-	if (vmcs12->vmcs_link_pointer != -1ull) {
-		nested_vmx_entry_failure(vcpu, vmcs12,
-			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_VMCS_LINK_PTR);
-		return 1;
-	}
-
-	/*
-	 * If the load IA32_EFER VM-entry control is 1, the following checks
-	 * are performed on the field for the IA32_EFER MSR:
-	 * - Bits reserved in the IA32_EFER MSR must be 0.
-	 * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
-	 *   the IA-32e mode guest VM-exit control. It must also be identical
-	 *   to bit 8 (LME) if bit 31 in the CR0 field (corresponding to
-	 *   CR0.PG) is 1.
-	 */
-	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER) {
-		ia32e = (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) != 0;
-		if (!kvm_valid_efer(vcpu, vmcs12->guest_ia32_efer) ||
-		    ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) ||
-		    ((vmcs12->guest_cr0 & X86_CR0_PG) &&
-		     ia32e != !!(vmcs12->guest_ia32_efer & EFER_LME))) {
-			nested_vmx_entry_failure(vcpu, vmcs12,
-				EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-			return 1;
-		}
-	}
-
 	/*
 	 * We're finally done with prerequisite checking, and can start with
 	 * the nested entry.
@@ -10561,6 +10528,45 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 
 	vmx_segment_cache_clear(vmx);
 
+	if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
+	    !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) {
+		leave_guest_mode(vcpu);
+		vmx_load_vmcs01(vcpu);
+		nested_vmx_entry_failure(vcpu, vmcs12,
+			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
+		return 1;
+	}
+	if (vmcs12->vmcs_link_pointer != -1ull) {
+		leave_guest_mode(vcpu);
+		vmx_load_vmcs01(vcpu);
+		nested_vmx_entry_failure(vcpu, vmcs12,
+			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_VMCS_LINK_PTR);
+		return 1;
+	}
+
+	/*
+	 * If the load IA32_EFER VM-entry control is 1, the following checks
+	 * are performed on the field for the IA32_EFER MSR:
+	 * - Bits reserved in the IA32_EFER MSR must be 0.
+	 * - Bit 10 (corresponding to IA32_EFER.LMA) must equal the value of
+	 *   the IA-32e mode guest VM-exit control. It must also be identical
+	 *   to bit 8 (LME) if bit 31 in the CR0 field (corresponding to
+	 *   CR0.PG) is 1.
+	 */
+	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER) {
+		ia32e = (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE) != 0;
+		if (!kvm_valid_efer(vcpu, vmcs12->guest_ia32_efer) ||
+		    ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) ||
+		    ((vmcs12->guest_cr0 & X86_CR0_PG) &&
+		     ia32e != !!(vmcs12->guest_ia32_efer & EFER_LME))) {
+			leave_guest_mode(vcpu);
+			vmx_load_vmcs01(vcpu);
+			nested_vmx_entry_failure(vcpu, vmcs12,
+				EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
+			return 1;
+		}
+	}
+
 	if (prepare_vmcs02(vcpu, vmcs12, &exit_qualification)) {
 		leave_guest_mode(vcpu);
 		vmx_load_vmcs01(vcpu);
-- 
2.11.0

^ permalink raw reply related

* [PATCH 3/3] KVM: nVMX: refactor nested_vmx_entry_failure()
From: Radim Krčmář @ 2016-12-21 20:24 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, David Matlack
In-Reply-To: <20161221202404.24020-1-rkrcmar@redhat.com>

Change recurring pattern

  leave_guest_mode(vcpu);
  vmx_load_vmcs01(vcpu);
  nested_vmx_entry_failure(vcpu, ...);
  return 1;

into

  return nested_vmx_entry_failure(vcpu, ...)

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 46 ++++++++++++++++------------------------------
 1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 050899431b5e..a74cde40e349 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1398,7 +1398,7 @@ static inline bool is_nmi(u32 intr_info)
 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 			      u32 exit_intr_info,
 			      unsigned long exit_qualification);
-static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
+static int nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
 			struct vmcs12 *vmcs12,
 			u32 reason, unsigned long qualification);
 
@@ -10529,20 +10529,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	vmx_segment_cache_clear(vmx);
 
 	if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
-	    !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) {
-		leave_guest_mode(vcpu);
-		vmx_load_vmcs01(vcpu);
-		nested_vmx_entry_failure(vcpu, vmcs12,
+	    !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))
+		return nested_vmx_entry_failure(vcpu, vmcs12,
 			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-		return 1;
-	}
-	if (vmcs12->vmcs_link_pointer != -1ull) {
-		leave_guest_mode(vcpu);
-		vmx_load_vmcs01(vcpu);
-		nested_vmx_entry_failure(vcpu, vmcs12,
+
+	if (vmcs12->vmcs_link_pointer != -1ull)
+		return nested_vmx_entry_failure(vcpu, vmcs12,
 			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_VMCS_LINK_PTR);
-		return 1;
-	}
 
 	/*
 	 * If the load IA32_EFER VM-entry control is 1, the following checks
@@ -10559,32 +10552,21 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 		    ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) ||
 		    ((vmcs12->guest_cr0 & X86_CR0_PG) &&
 		     ia32e != !!(vmcs12->guest_ia32_efer & EFER_LME))) {
-			leave_guest_mode(vcpu);
-			vmx_load_vmcs01(vcpu);
-			nested_vmx_entry_failure(vcpu, vmcs12,
+			return nested_vmx_entry_failure(vcpu, vmcs12,
 				EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-			return 1;
 		}
 	}
 
-	if (prepare_vmcs02(vcpu, vmcs12, &exit_qualification)) {
-		leave_guest_mode(vcpu);
-		vmx_load_vmcs01(vcpu);
-		nested_vmx_entry_failure(vcpu, vmcs12,
+	if (prepare_vmcs02(vcpu, vmcs12, &exit_qualification))
+		return nested_vmx_entry_failure(vcpu, vmcs12,
 				EXIT_REASON_INVALID_STATE, exit_qualification);
-		return 1;
-	}
 
 	msr_entry_idx = nested_vmx_load_msr(vcpu,
 					    vmcs12->vm_entry_msr_load_addr,
 					    vmcs12->vm_entry_msr_load_count);
-	if (msr_entry_idx) {
-		leave_guest_mode(vcpu);
-		vmx_load_vmcs01(vcpu);
-		nested_vmx_entry_failure(vcpu, vmcs12,
+	if (msr_entry_idx)
+		return nested_vmx_entry_failure(vcpu, vmcs12,
 				EXIT_REASON_MSR_LOAD_FAIL, msr_entry_idx);
-		return 1;
-	}
 
 	vmcs12->launch_state = 1;
 
@@ -11173,16 +11155,20 @@ static void vmx_leave_nested(struct kvm_vcpu *vcpu)
  * It should only be called before L2 actually succeeded to run, and when
  * vmcs01 is current (it doesn't leave_guest_mode() or switch vmcss).
  */
-static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
+static int nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
 			struct vmcs12 *vmcs12,
 			u32 reason, unsigned long qualification)
 {
+	leave_guest_mode(vcpu);
+	vmx_load_vmcs01(vcpu);
 	load_vmcs12_host_state(vcpu, vmcs12);
 	vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
 	vmcs12->exit_qualification = qualification;
 	nested_vmx_succeed(vcpu);
 	if (enable_shadow_vmcs)
 		to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
+
+	return 1;
 }
 
 static int vmx_check_intercept(struct kvm_vcpu *vcpu,
-- 
2.11.0

^ permalink raw reply related

* [PATCH 1/3] KVM: nVMX: fix handling of invalid host efer
From: Radim Krčmář @ 2016-12-21 20:24 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, David Matlack
In-Reply-To: <20161221202404.24020-1-rkrcmar@redhat.com>

Host state must be checked before attempting vm entry, which means it
throws a different error.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c9d7f1a1ba16..ab65b31ce58c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10481,6 +10481,24 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 		goto out;
 	}
 
+	/*
+	 * If the load IA32_EFER VM-exit control is 1, bits reserved in the
+	 * IA32_EFER MSR must be 0 in the field for that register. In addition,
+	 * the values of the LMA and LME bits in the field must each be that of
+	 * the host address-space size VM-exit control.
+	 */
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) {
+		ia32e = (vmcs12->vm_exit_controls &
+			 VM_EXIT_HOST_ADDR_SPACE_SIZE) != 0;
+		if (!kvm_valid_efer(vcpu, vmcs12->host_ia32_efer) ||
+		    ia32e != !!(vmcs12->host_ia32_efer & EFER_LMA) ||
+		    ia32e != !!(vmcs12->host_ia32_efer & EFER_LME)) {
+			nested_vmx_failValid(vcpu,
+				VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
+			goto out;
+		}
+	}
+
 	if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
 	    !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) {
 		nested_vmx_entry_failure(vcpu, vmcs12,
@@ -10515,24 +10533,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	}
 
 	/*
-	 * If the load IA32_EFER VM-exit control is 1, bits reserved in the
-	 * IA32_EFER MSR must be 0 in the field for that register. In addition,
-	 * the values of the LMA and LME bits in the field must each be that of
-	 * the host address-space size VM-exit control.
-	 */
-	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) {
-		ia32e = (vmcs12->vm_exit_controls &
-			 VM_EXIT_HOST_ADDR_SPACE_SIZE) != 0;
-		if (!kvm_valid_efer(vcpu, vmcs12->host_ia32_efer) ||
-		    ia32e != !!(vmcs12->host_ia32_efer & EFER_LMA) ||
-		    ia32e != !!(vmcs12->host_ia32_efer & EFER_LME)) {
-			nested_vmx_entry_failure(vcpu, vmcs12,
-				EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
-			return 1;
-		}
-	}
-
-	/*
 	 * We're finally done with prerequisite checking, and can start with
 	 * the nested entry.
 	 */
-- 
2.11.0

^ permalink raw reply related

* [PATCH 0/3] KVM: nVMX: fix one guest vmcs check and refactor a bunch more
From: Radim Krčmář @ 2016-12-21 20:24 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, David Matlack

v1:
  Applies after David's "KVM: nVMX: fix instruction skipping during
  emulated vm-entry".


Radim Krčmář (3):
  KVM: nVMX: fix handling of invalid host efer
  KVM: nVMX: colocate guest vmcs checks
  KVM: nVMX: refactor nested_vmx_entry_failure()

 arch/x86/kvm/vmx.c | 92 +++++++++++++++++++++++++-----------------------------
 1 file changed, 42 insertions(+), 50 deletions(-)

-- 
2.11.0

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox