All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] tools/memory-model: document the "one-time init" pattern
From: Dave Chinner @ 2020-07-18  1:42 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-arch, Paul E . McKenney, linux-fsdevel,
	Akira Yokosawa, Alan Stern, Andrea Parri, Boqun Feng,
	Daniel Lustig, Darrick J . Wong, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Peter Zijlstra, Will Deacon
In-Reply-To: <20200717044427.68747-1-ebiggers@kernel.org>

On Thu, Jul 16, 2020 at 09:44:27PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> The "one-time init" pattern is implemented incorrectly in various places
> in the kernel.  And when people do try to implement it correctly, it is
> unclear what to use.  Try to give some proper guidance.
> 
> This is motivated by the discussion at
> https://lkml.kernel.org/linux-fsdevel/20200713033330.205104-1-ebiggers@kernel.org/T/#u
> regarding fixing the initialization of super_block::s_dio_done_wq.

You're still using words that the target audience of the
documentation will not understand.

This is known as the "curse of knowledge" cognative bias, where
subject matter experts try to explain something to non-experts using
terms only subject matter experts understand....

This is one of the reasons that the LKMM documetnation is so damn
difficult to read and understand: just understanding the vocabulary
it uses requires a huge learning curve, and it's not defined
anywhere. Understanding the syntax of examples requires a huge
learning curve, because it's not defined anywhere. 

Recipes are *not useful* if you need to understand the LKMM
documenation to select the correct recipe to use. Recipes are not
useful if you say "here's 5 different variations of the same thing,
up to you to understand which one you need to use". Recipes are not
useful if changes in other code can silently break the recipe that
was selected by the user by carefully considering the most optimal
variant at the time they selected it.

i.e. Recipes are not for experts who understand the LKMM - recipes
are for developers who don't really understand how the LKMM all
works and just want a single, solid, reliable pattern they can use
just about everywhere for that specific operation.

Performance and optimisation doesn't even enter the picture here -
we need to provide a simple, easy to use and understand pattern that
just works. We need to stop making this harder than it should be.

So....

> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  tools/memory-model/Documentation/recipes.txt | 151 +++++++++++++++++++
>  1 file changed, 151 insertions(+)
> 
> diff --git a/tools/memory-model/Documentation/recipes.txt b/tools/memory-model/Documentation/recipes.txt
> index 7fe8d7aa3029..04beb06dbfc7 100644
> --- a/tools/memory-model/Documentation/recipes.txt
> +++ b/tools/memory-model/Documentation/recipes.txt
> @@ -519,6 +519,157 @@ CPU1 puts the waiting task to sleep and CPU0 fails to wake it up.
>  
>  Note that use of locking can greatly simplify this pattern.
>  
> +One-time init
> +-------------
> +
> +The "one-time init" pattern is when multiple tasks can race to
> +initialize the same data structure(s) on first use.
> +
> +In many cases, it's best to just avoid the need for this by simply
> +initializing the data ahead of time.
> +
> +But in cases where the data would often go unused, one-time init can be
> +appropriate to avoid wasting kernel resources.  It can also be
> +appropriate if the initialization has other prerequisites which preclude
> +it being done ahead of time.
> +
> +First, consider if your data has (a) global or static scope, (b) can be
> +initialized from atomic context, and (c) cannot fail to be initialized.
> +If all of those apply, just use DO_ONCE() from <linux/once.h>:
> +
> +	DO_ONCE(func);
> +
> +If that doesn't apply, you'll have to implement one-time init yourself.
> +
> +The simplest implementation just uses a mutex and an 'inited' flag.
> +This implementation should be used where feasible:
> +
> +	static bool foo_inited;
> +	static DEFINE_MUTEX(foo_init_mutex);
> +
> +	int init_foo_if_needed(void)
> +	{
> +		int err = 0;
> +
> +		mutex_lock(&foo_init_mutex);
> +		if (!foo_inited) {
> +			err = init_foo();
> +			if (err == 0)
> +				foo_inited = true;
> +		}
> +		mutex_unlock(&foo_init_mutex);
> +		return err;
> +	}
> +
> +The above example uses static variables, but this solution also works
> +for initializing something that is part of another data structure.  The
> +mutex may still be static.

All good up to here - people will see this and understand that this
is the pattern they want to use, and DO_ONCE() is a great, simple
API that is easy to use.

What needs to follow is a canonical example of how to do it
locklessly and efficiently, without describing conditional use of it
using words like "initialised memory is transitively reachable" (I
don't know WTF that means!).  Don't discuss potential optimisations,
control flow/data dependencies, etc, because failing to understand
those details are the reason people are looking for a simple recipe
that does what they need in the first place ("curse of knowledge").

However, I think the whole problem around code like this is that it
is being open-coded and that is the reason people get it wrong.
Hence I agree with Willy that this needs to be wrapped in a simple,
easy to use and hard to get wrong APIs for the patterns we expect to
see people use.

And the recipes should doucment the use of that API for the
init-once pattern, not try to teach people how to open-code their
own init-once pattern that they will continue to screw up....

As a result, I think the examples should document correct use of the
API for the two main variants it would be used for. The first
variant has an external "inited" flag that handles multiple
structure initialisations, and the second variant handles allocation
and initialisation of a single structure that is stored and accessed
by a single location.

Work out an API to do these things correctly, then write the recipes
to use them. Then people like yourself can argue all day and night
on how to best optimise them, and people like myself can just ignore
that all knowing that my init_once() call will always do the right
thing.

> +For the single-pointer case, a further optimized implementation
> +eliminates the mutex and instead uses compare-and-exchange:
> +
> +	static struct foo *foo;
> +
> +	int init_foo_if_needed(void)
> +	{
> +		struct foo *p;
> +
> +		/* pairs with successful cmpxchg_release() below */
> +		if (smp_load_acquire(&foo))
> +			return 0;
> +
> +		p = alloc_foo();
> +		if (!p)
> +			return -ENOMEM;
> +
> +		/* on success, pairs with smp_load_acquire() above and below */
> +		if (cmpxchg_release(&foo, NULL, p) != NULL) {
> +			free_foo(p);
> +			/* pairs with successful cmpxchg_release() above */
> +			smp_load_acquire(&foo);

This is the failure path, not the success. So it describing it as
pairing with "successful cmpxchg_release() above" when the code that
is executing is in the path where  the cmpxchg_release() above -just
failed- doesn't help anyone understand exactly what it is pairing
with.

This is why I said in that other thread "just saying 'pairs with
<foo>' is not sufficient to explain to the reader exactly what the
memory barrier is doing. You needed two full paragraphs to explain
why this was put here and that, to me, indicate the example and
expected use case is wrong.

But even then, I think this example is incorrect and doesn't fit the
patterns people might expect. That is, if this init function
-returned foo for the caller to use-, then the smp_load_acquire() on
failure is necessary to ensure the initialisation done by the racing
context is correct seen. But this function doesn't return foo, and
so the smp_load_acquire() is required in whatever context is trying
to access the contents of foo, not the init function.

Hence I think this example is likely incorrect and will lead to bugs
because it does not, in any way, indicate that
smp_load_acquire(&foo) must always be used in the contexts where foo
may accessed before (or during) the init function has been run...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: [PATCH] Fix memory overwriting issue when copy an address to user space
From: David Miller @ 2020-07-18  1:43 UTC (permalink / raw)
  To: lebon.zhou; +Cc: kuba, linux-kernel
In-Reply-To: <CAEQHRfB2f4x2r9Sk1+ixAFjUTcFQArv9wtWfjq3igGfUgZBhXw@mail.gmail.com>

From: lebon zhou <lebon.zhou@gmail.com>
Date: Fri, 17 Jul 2020 10:31:54 +0000

> When application provided buffer size less than sockaddr_storage, then
> kernel will overwrite some memory area which may cause memory corruption,
> e.g.: in recvmsg case, let msg_name=malloc(8) and msg_namelen=8, then
> usually application can call recvmsg successful but actually application
> memory get corrupted.
> 
> Fix to return EINVAL when application buffer size less than
> sockaddr_storage.
> 
> Signed-off-by: lebon.zhou <lebon.zhou@gmail.com>

Please post networking fixes to netdev@vger.kernel.org

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: bna: Remove unused variable 't'
From: David Miller @ 2020-07-18  1:43 UTC (permalink / raw)
  To: zhangchangzhong
  Cc: rmody, skalluru, GR-Linux-NIC-Dev, kuba, netdev, linux-kernel
In-Reply-To: <1594981384-30489-1-git-send-email-zhangchangzhong@huawei.com>

From: Zhang Changzhong <zhangchangzhong@huawei.com>
Date: Fri, 17 Jul 2020 18:23:04 +0800

> Gcc report warning as follows:
> 
> drivers/net/ethernet/brocade/bna/bfa_ioc.c:1538:6: warning:
>  variable 't' set but not used [-Wunused-but-set-variable]
>  1538 |  u32 t;
>       |      ^
> 
> After commit c107ba171f3d ("bna: Firmware Patch Simplification"),
> 't' is never used, so removing it to avoid build warning.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: et131x: Remove unused variable 'pm_csr'
From: David Miller @ 2020-07-18  1:43 UTC (permalink / raw)
  To: zhangchangzhong; +Cc: mark.einon, kuba, netdev, linux-kernel
In-Reply-To: <1594982010-30679-1-git-send-email-zhangchangzhong@huawei.com>

From: Zhang Changzhong <zhangchangzhong@huawei.com>
Date: Fri, 17 Jul 2020 18:33:30 +0800

> Gcc report warning as follows:
> 
> drivers/net/ethernet/agere/et131x.c:953:6: warning:
>  variable 'pm_csr' set but not used [-Wunused-but-set-variable]
>   953 |  u32 pm_csr;
>       |      ^~~~~~
> drivers/net/ethernet/agere/et131x.c:1002:6:warning:
>  variable 'pm_csr' set but not used [-Wunused-but-set-variable]
>  1002 |  u32 pm_csr;
>       |      ^~~~~~
> drivers/net/ethernet/agere/et131x.c:3446:8: warning:
>  variable 'pm_csr' set but not used [-Wunused-but-set-variable]
>  3446 |    u32 pm_csr;
>       |        ^~~~~~
> 
> After commit 38df6492eb51 ("et131x: Add PCIe gigabit ethernet driver
> et131x to drivers/net"), 'pm_csr' is never used in these functions,
> so removing it to avoid build warning.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload
From: David Miller @ 2020-07-18  1:47 UTC (permalink / raw)
  To: grygorii.strashko; +Cc: netdev, kuba, m-karicheri2, nsekhar, linux-kernel
In-Reply-To: <20200717121932.26649-1-grygorii.strashko@ti.com>

From: Grygorii Strashko <grygorii.strashko@ti.com>
Date: Fri, 17 Jul 2020 15:19:32 +0300

> From: Murali Karicheri <m-karicheri2@ti.com>
> 
> Currently drive supports taprio offload which is a tc feature offloaded
> to cpsw hardware. So driver has to set the hw feature flag, NETIF_F_HW_TC
> in the net device to be compliant. This patch adds the flag.
> 
> Fixes: 8127224c2708 ("ethernet: ti: am65-cpsw-qos: add TAPRIO offload support")
> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>

How was the commit adding TAPRIO support even tested since without the
NETIF_F_HW_TC bit set tc_can_offload() always returns false?

^ permalink raw reply

* Re: [PATCH] net: ethernet: et131x: Remove redundant register read
From: David Miller @ 2020-07-18  1:48 UTC (permalink / raw)
  To: mark.einon; +Cc: netdev, linux-kernel
In-Reply-To: <20200717132135.361267-1-mark.einon@gmail.com>

From: Mark Einon <mark.einon@gmail.com>
Date: Fri, 17 Jul 2020 14:21:35 +0100

> Following the removal of an unused variable assignment (remove
> unused variable 'pm_csr') the associated register read can also go,
> as the read also occurs in the subsequent et1310_in_phy_coma()
> call.
> 
> Signed-off-by: Mark Einon <mark.einon@gmail.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH 1/2 v2] net: hsr: fix incorrect lsdu size in the tag of HSR frames for small frames
From: David Miller @ 2020-07-18  1:54 UTC (permalink / raw)
  To: m-karicheri2
  Cc: kuba, netdev, linux-kernel, nsekhar, grygorii.strashko,
	vinicius.gomes
In-Reply-To: <20200717145510.30433-1-m-karicheri2@ti.com>

From: Murali Karicheri <m-karicheri2@ti.com>
Date: Fri, 17 Jul 2020 10:55:09 -0400

> For small Ethernet frames with size less than minimum size 66 for HSR
> vs 60 for regular Ethernet frames, hsr driver currently doesn't pad the
> frame to make it minimum size. This results in incorrect LSDU size being
> populated in the HSR tag for these frames. Fix this by padding the frame
> to the minimum size applicable for HSR.
> 
> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2 v2] net: hsr: validate address B before copying to skb
From: David Miller @ 2020-07-18  1:54 UTC (permalink / raw)
  To: m-karicheri2
  Cc: kuba, netdev, linux-kernel, nsekhar, grygorii.strashko,
	vinicius.gomes
In-Reply-To: <20200717145510.30433-2-m-karicheri2@ti.com>

From: Murali Karicheri <m-karicheri2@ti.com>
Date: Fri, 17 Jul 2020 10:55:10 -0400

> Validate MAC address before copying the same to outgoing frame
> skb destination address. Since a node can have zero mac
> address for Link B until a valid frame is received over
> that link, this fix address the issue of a zero MAC address
> being in the packet.
> 
> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH v3 1/7] hsr: enhance netlink socket interface to support PRP
From: David Miller @ 2020-07-18  1:56 UTC (permalink / raw)
  To: m-karicheri2
  Cc: kuba, netdev, linux-kernel, linux-api, nsekhar, grygorii.strashko,
	vinicius.gomes
In-Reply-To: <20200717151511.329-2-m-karicheri2@ti.com>

From: Murali Karicheri <m-karicheri2@ti.com>
Date: Fri, 17 Jul 2020 11:15:05 -0400

> @@ -32,7 +33,9 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev,
>  		       struct netlink_ext_ack *extack)
>  {
>  	struct net_device *link[2];
> -	unsigned char multicast_spec, hsr_version;
> +	unsigned char multicast_spec;
> +	enum hsr_version proto_version;
> +	u8 proto = HSR_PROTOCOL_HSR;

Please use reverse christmas tree ordering for local variables.

Thank you.

^ permalink raw reply

* Re: [net-next PATCH v3 2/7] net: hsr: introduce common code for skb initialization
From: David Miller @ 2020-07-18  1:56 UTC (permalink / raw)
  To: m-karicheri2
  Cc: kuba, netdev, linux-kernel, linux-api, nsekhar, grygorii.strashko,
	vinicius.gomes
In-Reply-To: <20200717151511.329-3-m-karicheri2@ti.com>

From: Murali Karicheri <m-karicheri2@ti.com>
Date: Fri, 17 Jul 2020 11:15:06 -0400

> +static void send_hsr_supervision_frame(struct hsr_port *master,
> +				       u8 type, u8 hsr_ver)
> +{
> +	struct sk_buff *skb;
> +	struct hsr_tag *hsr_tag;
> +	struct hsr_sup_tag *hsr_stag;
> +	struct hsr_sup_payload *hsr_sp;
> +	unsigned long irqflags;

Reverse christmas tree please.

^ permalink raw reply

* Re: [PATCH] tools/memory-model: document the "one-time init" pattern
From: Dave Chinner @ 2020-07-18  2:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Eric Biggers, Darrick J. Wong, linux-kernel, linux-arch,
	Paul E . McKenney, linux-fsdevel, Akira Yokosawa, Andrea Parri,
	Boqun Feng, Daniel Lustig, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Peter Zijlstra, Will Deacon
In-Reply-To: <20200718012555.GA1168834@rowland.harvard.edu>

On Fri, Jul 17, 2020 at 09:25:55PM -0400, Alan Stern wrote:
> On Fri, Jul 17, 2020 at 05:58:57PM -0700, Eric Biggers wrote:
> > On Fri, Jul 17, 2020 at 01:53:40PM -0700, Darrick J. Wong wrote:
> > > > +There are also cases in which the smp_load_acquire() can be replaced by
> > > > +the more lightweight READ_ONCE().  (smp_store_release() is still
> > > > +required.)  Specifically, if all initialized memory is transitively
> > > > +reachable from the pointer itself, then there is no control dependency
> > > 
> > > I don't quite understand what "transitively reachable from the pointer
> > > itself" means?  Does that describe the situation where all the objects
> > > reachable through the object that the global struct foo pointer points
> > > at are /only/ reachable via that global pointer?
> > > 
> > 
> > The intent is that "transitively reachable" means that all initialized memory
> > can be reached by dereferencing the pointer in some way, e.g. p->a->b[5]->c.
> > 
> > It could also be the case that allocating the object initializes some global or
> > static data, which isn't reachable in that way.  Access to that data would then
> > be a control dependency, which a data dependency barrier wouldn't work for.
> > 
> > It's possible I misunderstood something.  (Note the next paragraph does say that
> > using READ_ONCE() is discouraged, exactly for this reason -- it can be hard to
> > tell whether it's correct.)  Suggestions of what to write here are appreciated.
> 
> Perhaps something like this:
> 
> 	Specifically, if the only way to reach the initialized memory 
> 	involves dereferencing the pointer itself then READ_ONCE() is 
> 	sufficient.  This is because there will be an address dependency 
> 	between reading the pointer and accessing the memory, which will 
> 	ensure proper ordering.  But if some of the initialized memory 
> 	is reachable some other way (for example, if it is global or 
> 	static data) then there need not be an address dependency, 
> 	merely a control dependency (checking whether the pointer is 
> 	non-NULL).  Control dependencies do not always ensure ordering 
> 	-- certainly not for reads, and depending on the compiler, 
> 	possibly not for some writes -- and therefore a load-acquire is 
> 	necessary.

Recipes are aimed at people who simply don't understand any of that
goobledegook. This won't help them -write correct code-.

> Perhaps this is more wordy than you want, but it does get the important 
> ideas across.

You think they are important because you understand what those words
mean.  Large numbers of developers do not understand what they mean,
nor how to put them into practise correctly.

Seriously: if you want people to use this stuff correctly, you need
to -dumb it down-, not make it even more challenging by explaining
words people don't understand with yet more words they don't
understand...

This is the "curse of knowledge" cognative bias in a nutshell.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: [PATCH] tools/memory-model: document the "one-time init" pattern
From: Eric Biggers @ 2020-07-18  2:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Darrick J. Wong, linux-kernel, linux-arch, Paul E . McKenney,
	linux-fsdevel, Akira Yokosawa, Andrea Parri, Boqun Feng,
	Daniel Lustig, Dave Chinner, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Peter Zijlstra, Will Deacon
In-Reply-To: <20200718012555.GA1168834@rowland.harvard.edu>

On Fri, Jul 17, 2020 at 09:25:55PM -0400, Alan Stern wrote:
> On Fri, Jul 17, 2020 at 05:58:57PM -0700, Eric Biggers wrote:
> > On Fri, Jul 17, 2020 at 01:53:40PM -0700, Darrick J. Wong wrote:
> > > > +There are also cases in which the smp_load_acquire() can be replaced by
> > > > +the more lightweight READ_ONCE().  (smp_store_release() is still
> > > > +required.)  Specifically, if all initialized memory is transitively
> > > > +reachable from the pointer itself, then there is no control dependency
> > > 
> > > I don't quite understand what "transitively reachable from the pointer
> > > itself" means?  Does that describe the situation where all the objects
> > > reachable through the object that the global struct foo pointer points
> > > at are /only/ reachable via that global pointer?
> > > 
> > 
> > The intent is that "transitively reachable" means that all initialized memory
> > can be reached by dereferencing the pointer in some way, e.g. p->a->b[5]->c.
> > 
> > It could also be the case that allocating the object initializes some global or
> > static data, which isn't reachable in that way.  Access to that data would then
> > be a control dependency, which a data dependency barrier wouldn't work for.
> > 
> > It's possible I misunderstood something.  (Note the next paragraph does say that
> > using READ_ONCE() is discouraged, exactly for this reason -- it can be hard to
> > tell whether it's correct.)  Suggestions of what to write here are appreciated.
> 
> Perhaps something like this:
> 
> 	Specifically, if the only way to reach the initialized memory 
> 	involves dereferencing the pointer itself then READ_ONCE() is 
> 	sufficient.  This is because there will be an address dependency 
> 	between reading the pointer and accessing the memory, which will 
> 	ensure proper ordering.  But if some of the initialized memory 
> 	is reachable some other way (for example, if it is global or 
> 	static data) then there need not be an address dependency, 
> 	merely a control dependency (checking whether the pointer is 
> 	non-NULL).  Control dependencies do not always ensure ordering 
> 	-- certainly not for reads, and depending on the compiler, 
> 	possibly not for some writes -- and therefore a load-acquire is 
> 	necessary.
> 
> Perhaps this is more wordy than you want, but it does get the important 
> ideas across.
> 

How about:

	There are also cases in which the smp_load_acquire() can be replaced by
	the more lightweight READ_ONCE().  (smp_store_release() is still
	required.)  Specifically, if the only way to reach the initialized
	memory involves dereferencing the pointer itself, then the data
	dependency barrier provided by READ_ONCE() is sufficient.  However, if
	some of the initialized memory is reachable some other way (for example,
	if it is global or static data) then there need not be an address
	dependency, merely a control dependency (checking whether the pointer is
	non-NULL).  READ_ONCE() is *not* sufficient in that case.

	The optimization of replacing smp_load_acquire() with READ_ONCE() is
	discouraged for nontrivial data structures, since it can be difficult to
	determine if it is correct.  In particular, for complex data structures
	the correctness of the READ_ONCE() optimization may depend on internal
	implementation details of other kernel subsystems.

^ permalink raw reply

* Re: [PATCH] git-mv: improve error message for conflicted file
From: Elijah Newren @ 2020-07-18  2:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Chris Torek via GitGitGadget, Git Mailing List, Chris Torek
In-Reply-To: <xmqqeep9d6tm.fsf@gitster.c.googlers.com>

On Fri, Jul 17, 2020 at 5:12 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > -             } else if (cache_name_pos(src, length) < 0)
> > -                     bad = _("not under version control");
> > -             else if (lstat(dst, &st) == 0 &&
> > +             } else if (cache_name_pos(src, length) < 0) {
> > +                     /*
> > +                      * This occurs for both untracked files *and*
> > +                      * files that are in merge-conflict state, so
> > +                      * let's distinguish between those two.
> > +                      */
> > +                     struct cache_entry *ce = cache_file_exists(src, length, ignore_case);
> > +                     if (ce == NULL)
> > +                             bad = _("not under version control");
> > +                     else
> > +                             bad = _("must resolve merge conflict first");
>
>
> The original did not care about the cache entry itself, and that is
> why cache_name_pos() was used.  Now you care what cache entry is at
> that position, running both calls is quite wasteful.
>
> Would it work better to declare "struct cache_entry *ce" in the
> scope that surrounds this if/elseif cascade and then rewrite this
> part more like so:
>
>         } else if (!(ce = cache_file_exists(...)) {
>                 bad = _("not tracked");
>         } else if (ce_stage(ce)) {
>                 bad = _("conflicted");
>         } else if (lstat(...)) { ...

Or, even better, make ce_stage(ce) not be an error; see
https://lore.kernel.org/git/xmqqk1ozb6qy.fsf@gitster-ct.c.googlers.com/.

I hadn't gotten around to it yet, but it's still on my radar.

(That said, I've obviously taken a really long time to get to it, so
improving the error message as an interim step is perfectly fine.)

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: atlantic: add support for FW 4.x
From: David Miller @ 2020-07-18  2:01 UTC (permalink / raw)
  To: mstarovoitov; +Cc: kuba, irusskikh, netdev, linux-kernel
In-Reply-To: <20200717180147.8854-1-mstarovoitov@marvell.com>

From: Mark Starovoytov <mstarovoitov@marvell.com>
Date: Fri, 17 Jul 2020 21:01:45 +0300

> This patch set adds support for FW 4.x, which is about to get into the
> production for some products.
> 4.x is mostly compatible with 3.x, save for soft reset, which requires
> the acquisition of 2 additional semaphores.
> Other differences (e.g. absence of PTP support) are handled via
> capabilities.
> 
> Note: 4.x targets specific products only. 3.x is still the main firmware
> branch, which should be used by most users (at least for now).

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] ne2k-pci: Use netif_msg_init to initialize msg_enable bits
From: David Miller @ 2020-07-18  2:04 UTC (permalink / raw)
  To: W_Armin; +Cc: netdev
In-Reply-To: <20200717182148.GA4905@mx-linux-amd>

From: Armin Wolf <W_Armin@gmx.de>
Date: Fri, 17 Jul 2020 20:21:48 +0200

> Use netif_msg_enable() to process param settings.
> 
> Signed-off-by: Armin Wolf <W_Armin@gmx.de>

Applied.

^ permalink raw reply

* [gpio:no-irqchip-nested-threading] BUILD SUCCESS d20743e24eec787d0333d423479ea2f137496f06
From: kernel test robot @ 2020-07-18  2:05 UTC (permalink / raw)
  To: Linus Walleij; +Cc: linux-gpio

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git  no-irqchip-nested-threading
branch HEAD: d20743e24eec787d0333d423479ea2f137496f06  gpio: crystalcove: Use irqchip template

elapsed time: 722m

configs tested: 86
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                               defconfig
arm64                            allmodconfig
arm64                             allnoconfig
nds32                             allnoconfig
powerpc                      ppc64e_defconfig
arm                           viper_defconfig
ia64                             alldefconfig
sh                           se7721_defconfig
ia64                          tiger_defconfig
mips                        jmr3927_defconfig
arm                            xcep_defconfig
h8300                            allyesconfig
powerpc                 linkstation_defconfig
sparc                       sparc32_defconfig
arm                          badge4_defconfig
powerpc                      pmac32_defconfig
riscv                          rv32_defconfig
i386                             allyesconfig
i386                                defconfig
i386                              debian-10.3
i386                              allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
openrisc                            defconfig
c6x                              allyesconfig
c6x                               allnoconfig
openrisc                         allyesconfig
nds32                               defconfig
csky                             allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allmodconfig
xtensa                              defconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                              allnoconfig
mips                             allmodconfig
parisc                            allnoconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                                defconfig
sparc                            allyesconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
x86_64                    rhel-7.6-kselftests
x86_64                               rhel-8.3
x86_64                                  kexec
x86_64                                   rhel
x86_64                                    lkp
x86_64                              fedora-25

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [gpio:for-next] BUILD SUCCESS 70d7cd6c82a906bfc45e5043fed5456d46a92662
From: kernel test robot @ 2020-07-18  2:05 UTC (permalink / raw)
  To: Linus Walleij; +Cc: linux-gpio

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git  for-next
branch HEAD: 70d7cd6c82a906bfc45e5043fed5456d46a92662  Merge branch 'devel' into for-next

elapsed time: 722m

configs tested: 86
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                               defconfig
arm64                            allmodconfig
arm64                             allnoconfig
nds32                             allnoconfig
powerpc                      ppc64e_defconfig
arm                           viper_defconfig
ia64                             alldefconfig
sh                           se7721_defconfig
ia64                          tiger_defconfig
mips                        jmr3927_defconfig
arm                            xcep_defconfig
h8300                            allyesconfig
powerpc                 linkstation_defconfig
sparc                       sparc32_defconfig
arm                          badge4_defconfig
powerpc                      pmac32_defconfig
riscv                          rv32_defconfig
i386                             allyesconfig
i386                                defconfig
i386                              debian-10.3
i386                              allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
openrisc                            defconfig
c6x                              allyesconfig
c6x                               allnoconfig
openrisc                         allyesconfig
nds32                               defconfig
csky                             allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allmodconfig
xtensa                              defconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                              allnoconfig
mips                             allmodconfig
parisc                            allnoconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                                defconfig
sparc                            allyesconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
x86_64                    rhel-7.6-kselftests
x86_64                               rhel-8.3
x86_64                                  kexec
x86_64                                   rhel
x86_64                                    lkp
x86_64                              fedora-25

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [gpio:devel] BUILD SUCCESS 80606cb24161d504acb4d89f406d68f72196575e
From: kernel test robot @ 2020-07-18  2:05 UTC (permalink / raw)
  To: Linus Walleij; +Cc: linux-gpio

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git  devel
branch HEAD: 80606cb24161d504acb4d89f406d68f72196575e  gpio: max77620: Use helper variable and clarify

elapsed time: 722m

configs tested: 86
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                               defconfig
arm64                            allmodconfig
arm64                             allnoconfig
nds32                             allnoconfig
powerpc                      ppc64e_defconfig
arm                           viper_defconfig
ia64                             alldefconfig
sh                           se7721_defconfig
ia64                          tiger_defconfig
mips                        jmr3927_defconfig
arm                            xcep_defconfig
h8300                            allyesconfig
powerpc                 linkstation_defconfig
sparc                       sparc32_defconfig
arm                          badge4_defconfig
powerpc                      pmac32_defconfig
riscv                          rv32_defconfig
i386                             allyesconfig
i386                                defconfig
i386                              debian-10.3
i386                              allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
openrisc                            defconfig
c6x                              allyesconfig
c6x                               allnoconfig
openrisc                         allyesconfig
nds32                               defconfig
csky                             allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allmodconfig
xtensa                              defconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                              allnoconfig
mips                             allmodconfig
parisc                            allnoconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                                defconfig
sparc                            allyesconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
x86_64                    rhel-7.6-kselftests
x86_64                               rhel-8.3
x86_64                                  kexec
x86_64                                   rhel
x86_64                                    lkp
x86_64                              fedora-25

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [gpio:gpio-descriptors-usb] BUILD SUCCESS b84efced84146e09ba546dcbbdf57cc9370f781c
From: kernel test robot @ 2020-07-18  2:05 UTC (permalink / raw)
  To: Linus Walleij; +Cc: linux-gpio

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git  gpio-descriptors-usb
branch HEAD: b84efced84146e09ba546dcbbdf57cc9370f781c  usb: ohci-omap: Convert to use GPIO descriptors

elapsed time: 722m

configs tested: 86
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm                                 defconfig
arm                              allyesconfig
arm                              allmodconfig
arm                               allnoconfig
arm64                            allyesconfig
arm64                               defconfig
arm64                            allmodconfig
arm64                             allnoconfig
nds32                             allnoconfig
powerpc                      ppc64e_defconfig
arm                           viper_defconfig
ia64                             alldefconfig
sh                           se7721_defconfig
ia64                          tiger_defconfig
mips                        jmr3927_defconfig
arm                            xcep_defconfig
h8300                            allyesconfig
powerpc                 linkstation_defconfig
sparc                       sparc32_defconfig
arm                          badge4_defconfig
powerpc                      pmac32_defconfig
riscv                          rv32_defconfig
i386                             allyesconfig
i386                                defconfig
i386                              debian-10.3
i386                              allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                              allnoconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                              allnoconfig
m68k                           sun3_defconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nios2                            allyesconfig
openrisc                            defconfig
c6x                              allyesconfig
c6x                               allnoconfig
openrisc                         allyesconfig
nds32                               defconfig
csky                             allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allmodconfig
xtensa                              defconfig
arc                                 defconfig
arc                              allyesconfig
sh                               allmodconfig
sh                                allnoconfig
microblaze                        allnoconfig
mips                             allyesconfig
mips                              allnoconfig
mips                             allmodconfig
parisc                            allnoconfig
parisc                              defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                             defconfig
powerpc                          allyesconfig
powerpc                          rhel-kconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                            allmodconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                                defconfig
sparc                            allyesconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
x86_64                    rhel-7.6-kselftests
x86_64                               rhel-8.3
x86_64                                  kexec
x86_64                                   rhel
x86_64                                    lkp
x86_64                              fedora-25

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [joro:sev-es-client-tip 29/76] arch/x86/kernel/head64.c:67:20: sparse: sparse: symbol 'startup_gdt' was not declared. Should it be
From: kernel test robot @ 2020-07-18  2:07 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git sev-es-client-tip
head:   57349dbe54433697c5b96bd5d8b5bd7499754d2b
commit: c12cc81e6b52de5e4c779e29d7ec903eff341cbc [29/76] x86/head/64: Install startup GDT
config: x86_64-randconfig-s022-20200717 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-49-g707c5017-dirty
        git checkout c12cc81e6b52de5e4c779e29d7ec903eff341cbc
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> arch/x86/kernel/head64.c:67:20: sparse: sparse: symbol 'startup_gdt' was not declared. Should it be static?
>> arch/x86/kernel/head64.c:77:17: sparse: sparse: symbol 'startup_gdt_descr' was not declared. Should it be static?
   arch/x86/kernel/head64.c:68:43: sparse: sparse: cast truncates bits from constant value (fffff becomes ffff)
   arch/x86/kernel/head64.c:69:43: sparse: sparse: cast truncates bits from constant value (fffff becomes ffff)
   arch/x86/kernel/head64.c:70:43: sparse: sparse: cast truncates bits from constant value (fffff becomes ffff)

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36202 bytes --]

^ permalink raw reply

* [RFC PATCH joro] x86/head/64: startup_gdt[] can be static
From: kernel test robot @ 2020-07-18  2:07 UTC (permalink / raw)
  To: kbuild-all
In-Reply-To: <202007181014.Nc7hu4Gg%lkp@intel.com>

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]


Fixes: c12cc81e6b52 ("x86/head/64: Install startup GDT")
Signed-off-by: kernel test robot <lkp@intel.com>
---
 head64.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 74766728baf32..0f12adcc3bf83 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -64,7 +64,7 @@ EXPORT_SYMBOL(vmemmap_base);
 /*
  * GDT used on the boot CPU before switching to virtual addresses.
  */
-struct desc_struct startup_gdt[GDT_ENTRIES] = {
+static struct desc_struct startup_gdt[GDT_ENTRIES] = {
 	[GDT_ENTRY_KERNEL32_CS]         = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
 	[GDT_ENTRY_KERNEL_CS]           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
 	[GDT_ENTRY_KERNEL_DS]           = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
@@ -74,7 +74,7 @@ struct desc_struct startup_gdt[GDT_ENTRIES] = {
  * Address needs to be set at runtime because it references the startup_gdt while
  * the kernel still uses a direct mapping.
  */
-struct desc_ptr startup_gdt_descr = {
+static struct desc_ptr startup_gdt_descr = {
 	.size = sizeof(startup_gdt),
 	.address = 0,
 };

^ permalink raw reply related

* Re: [PATCH net] mlxsw: core: Fix wrong SFP EEPROM reading for upper pages 1-3
From: David Miller @ 2020-07-18  2:08 UTC (permalink / raw)
  To: idosch; +Cc: netdev, kuba, jiri, vadimp, mlxsw, idosch
In-Reply-To: <20200717190143.563235-1-idosch@idosch.org>

From: Ido Schimmel <idosch@idosch.org>
Date: Fri, 17 Jul 2020 22:01:43 +0300

> From: Vadim Pasternak <vadimp@mellanox.com>
> 
> Fix wrong reading of upper pages for SFP EEPROM. According to "Memory
> Organization" figure in SFF-8472 spec: When reading upper pages 1, 2 and
> 3 the offset should be set relative to zero and I2C high address 0x51
> [1010001X (A2h)] is to be used.
> 
> Fixes: a45bfb5a5070 ("mlxsw: core: Extend QSFP EEPROM size for ethtool")
> Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] tools/memory-model: document the "one-time init" pattern
From: Matthew Wilcox @ 2020-07-18  2:13 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-arch, Paul E . McKenney, linux-fsdevel,
	Akira Yokosawa, Alan Stern, Andrea Parri, Boqun Feng,
	Daniel Lustig, Darrick J . Wong, Dave Chinner, David Howells,
	Jade Alglave, Luc Maranget, Nicholas Piggin, Peter Zijlstra,
	Will Deacon
In-Reply-To: <20200718013839.GD2183@sol.localdomain>

On Fri, Jul 17, 2020 at 06:38:39PM -0700, Eric Biggers wrote:
> On Fri, Jul 17, 2020 at 06:47:50PM +0100, Matthew Wilcox wrote:
> > On Thu, Jul 16, 2020 at 09:44:27PM -0700, Eric Biggers wrote:
> > > +If that doesn't apply, you'll have to implement one-time init yourself.
> > > +
> > > +The simplest implementation just uses a mutex and an 'inited' flag.
> > > +This implementation should be used where feasible:
> > 
> > I think some syntactic sugar should make it feasible for normal people
> > to implement the most efficient version of this just like they use locks.
> 
> Note that the cmpxchg version is not necessarily the "most efficient".
> 
> If the one-time initialization is expensive, e.g. if it allocates a lot of
> memory or if takes a long time, it could be better to use the mutex version so
> that at most one task does it.

Sure, but I think those are far less common than just allocating a single
thing.

> > How about something like this ...
> > 
> > once.h:
> > 
> > static struct init_once_pointer {
> > 	void *p;
> > };
> > 
> > static inline void *once_get(struct init_once_pointer *oncep)
> > { ... }
> > 
> > static inline bool once_store(struct init_once_pointer *oncep, void *p)
> > { ... }
> > 
> > --- foo.c ---
> > 
> > struct foo *get_foo(gfp_t gfp)
> > {
> > 	static struct init_once_pointer my_foo;
> > 	struct foo *foop;
> > 
> > 	foop = once_get(&my_foo);
> > 	if (foop)
> > 		return foop;
> > 
> > 	foop = alloc_foo(gfp);
> > 	if (!once_store(&my_foo, foop)) {
> > 		free_foo(foop);
> > 		foop = once_get(&my_foo);
> > 	}
> > 
> > 	return foop;
> > }
> > 
> > Any kernel programmer should be able to handle that pattern.  And no mutex!
> 
> I don't think this version would be worthwhile.  It eliminates type safety due
> to the use of 'void *', and doesn't actually save any lines of code.  Nor does
> it eliminate the need to correctly implement the cmpxchg failure case, which is
> tricky (it must free the object and get the new one) and will be rarely tested.

You're missing the point.  It prevents people from trying to optimise
"can I use READ_ONCE() here, or do I need to use smp_rmb()?"  The type
safety is provided by the get_foo() function.  I suppose somebody could
play some games with _Generic or something, but there's really no need to.
It's like using a list_head and casting to the container_of.

> It also forces all users of the struct to use this helper function to access it.
> That could be considered a good thing, but it's also bad because even with
> one-time init there's still usually some sort of ordering of "initialization"
> vs. "use".  Just taking a random example I'm familiar with, we do one-time init
> of inode::i_crypt_info when we open an encrypted file, so we guarantee it's set
> for all I/O to the file, where we then simply access ->i_crypt_info directly.
> We don't want the code to read like it's initializing ->i_crypt_info in the
> middle of ->writepages(), since that would be wrong.

Right, and I wouldn't use this pattern for that.  You can't get to
writepages without having opened the file, so just initialising the
pointer in open is fine.

> An improvement might be to make once_store() take the free function as a
> parameter so that it would handle the failure case for you:
> 
> struct foo *get_foo(gfp_t gfp)
> {
> 	static struct init_once_pointer my_foo;
> 	struct foo *foop;
>  
>  	foop = once_get(&my_foo);
>  	if (!foop) {
> 		foop = alloc_foo(gfp);
> 		if (foop)
> 			once_store(&my_foo, foop, free_foo);

Need to mark once_store as __must_check to avoid the bug you have here:

			foop = once_store(&my_foo, foop, free_foo);

Maybe we could use a macro for once_store so we could write:

void *once_get(struct init_pointer_once *);
int once_store(struct init_pointer_once *, void *);

#define once_alloc(s, o_alloc, o_free) ({                               \
        void *__p = o_alloc;                                            \
        if (__p) {                                                      \
                if (!once_store(s, __p)) {                              \
                        o_free(__p);                                    \
                        __p = once_get(s);                              \
                }                                                       \
        }                                                               \
        __p;                                                            \
})

---

struct foo *alloc_foo(gfp_t);
void free_foo(struct foo *);

struct foo *get_foo(gfp_t gfp)
{
        static struct init_pointer_once my_foo;
        struct foo *foop;

        foop = once_get(&my_foo);
        if (!foop)
                foop = once_alloc(&my_foo, alloc_foo(gfp), free_foo);
        return foop;
}

That's pretty hard to misuse (I compile-tested it, and it works).

^ permalink raw reply

* [RFC][PATCH] x86: optimization to avoid CAL+RES IPIs
From: Josh Don @ 2020-07-18  2:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: x86, H . Peter Anvin, linux-pm, linux-kernel, Rafael J . Wysocki,
	Daniel Lezcano, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Paul Turner, Josh Don

From: Venkatesh Pallipadi <venki@google.com>

smp_call_function_single and smp_send_reschedule send unconditional IPI
to target CPU. However, if the target CPU is in some form of poll based
idle, we can do IPI-less wakeups.

Doing this has certain advantages:
* Lower overhead on Async "no wait" IPI send path.
* Avoiding actual interrupts reduces system non-idle cycles.

Note that this only helps when target CPU is idle. When it is busy we
will still send an IPI as before.

*** RFC NOTE ***
This patch breaks idle time accounting (and to a lesser degree, softirq
accounting). This is because this patch violates the assumption that
softirq can only be run either on the tail of a hard IRQ or inline on
a non-idle thread via local_bh_enable(), since we can now process
softirq inline within the idle loop. These ssues can be resolved in a
later version of this patch.

Signed-off-by: Josh Don <joshdon@google.com>
---
 arch/x86/include/asm/mwait.h       |  5 +-
 arch/x86/include/asm/processor.h   |  1 +
 arch/x86/include/asm/thread_info.h |  2 +
 arch/x86/kernel/apic/ipi.c         |  8 +++
 arch/x86/kernel/smpboot.c          |  4 ++
 drivers/cpuidle/poll_state.c       |  5 +-
 include/linux/ipiless_wake.h       | 93 ++++++++++++++++++++++++++++++
 kernel/sched/idle.c                | 10 +++-
 8 files changed, 124 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/ipiless_wake.h

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index e039a933aca3..aed393f38a39 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_MWAIT_H
 #define _ASM_X86_MWAIT_H
 
+#include <linux/ipiless_wake.h>
 #include <linux/sched.h>
 #include <linux/sched/idle.h>
 
@@ -109,6 +110,7 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 {
 	if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
+		enter_ipiless_idle();
 		if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
 			mb();
 			clflush((void *)&current_thread_info()->flags);
@@ -116,8 +118,9 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 		}
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
-		if (!need_resched())
+		if (!is_ipiless_wakeup_pending())
 			__mwait(eax, ecx);
+		exit_ipiless_idle();
 	}
 	current_clr_polling();
 }
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 03b7c4ca425a..045fc9bbd095 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -568,6 +568,7 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset,
  * have to worry about atomic accesses.
  */
 #define TS_COMPAT		0x0002	/* 32bit syscall active (64BIT)*/
+#define TS_IPILESS_WAKEUP	0x0010	/* pending IPI-work on idle exit */
 
 static inline void
 native_load_sp0(unsigned long sp0)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8de8ceccb8bc..b6d3fa3c1578 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -98,6 +98,7 @@ struct thread_info {
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
 #define TIF_FORCED_TF		24	/* true if TF in eflags artificially */
 #define TIF_BLOCKSTEP		25	/* set when we want DEBUGCTLMSR_BTF */
+#define TIF_IN_IPILESS_IDLE	26	/* task in IPIless idle state */
 #define TIF_LAZY_MMU_UPDATES	27	/* task is updating the mmu lazily */
 #define TIF_SYSCALL_TRACEPOINT	28	/* syscall tracepoint instrumentation */
 #define TIF_ADDR32		29	/* 32-bit address space on 64 bits */
@@ -127,6 +128,7 @@ struct thread_info {
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
 #define _TIF_BLOCKSTEP		(1 << TIF_BLOCKSTEP)
+#define _TIF_IN_IPILESS_IDLE	(1 << TIF_IN_IPILESS_IDLE)
 #define _TIF_LAZY_MMU_UPDATES	(1 << TIF_LAZY_MMU_UPDATES)
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_ADDR32		(1 << TIF_ADDR32)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 6ca0f91372fd..6739aea98aee 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include <linux/cpumask.h>
+#include <linux/ipiless_wake.h>
 #include <linux/smp.h>
 
 #include "local.h"
@@ -67,11 +68,18 @@ void native_smp_send_reschedule(int cpu)
 		WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
 		return;
 	}
+
+	if (try_ipiless_wakeup(cpu))
+		return;
+
 	apic->send_IPI(cpu, RESCHEDULE_VECTOR);
 }
 
 void native_send_call_func_single_ipi(int cpu)
 {
+	if (try_ipiless_wakeup(cpu))
+		return;
+
 	apic->send_IPI(cpu, CALL_FUNCTION_SINGLE_VECTOR);
 }
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ffbd9a3d78d8..3e681f0359f7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -105,6 +105,8 @@ EXPORT_SYMBOL(__max_logical_packages);
 static unsigned int logical_packages __read_mostly;
 static unsigned int logical_die __read_mostly;
 
+DEFINE_PER_CPU(unsigned long *, idletask_ti_flags);
+
 /* Maximum number of SMT threads on any online core */
 int __read_mostly __max_smt_threads = 1;
 
@@ -1042,6 +1044,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
 	unsigned long timeout;
 
 	idle->thread.sp = (unsigned long)task_pt_regs(idle);
+	per_cpu(idletask_ti_flags, cpu) = &task_thread_info(idle)->flags;
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
 	initial_code = (unsigned long)start_secondary;
 	initial_stack  = idle->thread.sp;
@@ -1405,6 +1408,7 @@ void __init native_smp_prepare_boot_cpu(void)
 	cpumask_set_cpu(me, cpu_callout_mask);
 	cpu_set_state_online(me);
 	native_pv_lock_init();
+	per_cpu(idletask_ti_flags, me) = &task_thread_info(current)->flags;
 }
 
 void __init calculate_max_logical_packages(void)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index f7e83613ae94..e48cfa8fb15f 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/cpuidle.h>
+#include <linux/ipiless_wake.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/idle.h>
@@ -24,7 +25,8 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
 
 		limit = cpuidle_poll_time(drv, dev);
 
-		while (!need_resched()) {
+		enter_ipiless_idle();
+		while (!is_ipiless_wakeup_pending()) {
 			cpu_relax();
 			if (loop_count++ < POLL_IDLE_RELAX_COUNT)
 				continue;
@@ -35,6 +37,7 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
 				break;
 			}
 		}
+		exit_ipiless_idle();
 	}
 	current_clr_polling();
 
diff --git a/include/linux/ipiless_wake.h b/include/linux/ipiless_wake.h
new file mode 100644
index 000000000000..3854845a25a0
--- /dev/null
+++ b/include/linux/ipiless_wake.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IPILESS_WAKE_H
+#define _LINUX_IPILESS_WAKE_H
+
+#include <linux/hardirq.h>
+#include <linux/sched.h>
+#include <linux/thread_info.h>
+
+#if defined(CONFIG_SMP) && defined(TIF_IN_IPILESS_IDLE)
+
+DECLARE_PER_CPU(unsigned long *, idletask_ti_flags);
+/*
+ * TIF_IN_IPILESS_IDLE CPU being in a idle state with ipiless wakeup
+ * capability, without any pending IPIs.
+ * It is conditionally reset by an IPI source CPU and the reset automatically
+ * brings the target CPU out of its idle state.
+ *
+ * TS_IPILESS_WAKEUP is only changed by local CPU and is a place to store
+ * the info that there is a pending IPI work needed after complete idle exit.
+ */
+
+static inline void enter_ipiless_idle(void)
+{
+	set_thread_flag(TIF_IN_IPILESS_IDLE);
+}
+
+static inline void exit_ipiless_idle(void)
+{
+	if (!test_and_clear_thread_flag(TIF_IN_IPILESS_IDLE)) {
+		/*
+		 * Flag was already cleared, indicating that there is
+		 * a pending IPIless wakeup.
+		 * Save that info in status for later use.
+		 */
+		current_thread_info()->status |= TS_IPILESS_WAKEUP;
+	}
+}
+
+static inline int is_ipiless_wakeup_pending(void)
+{
+	return need_resched() ||
+		unlikely(!test_thread_flag(TIF_IN_IPILESS_IDLE));
+}
+
+static inline void do_ipiless_pending_work(void)
+{
+	if (unlikely(current_thread_info()->status & TS_IPILESS_WAKEUP)) {
+		current_thread_info()->status &= ~TS_IPILESS_WAKEUP;
+
+		local_bh_disable();
+		local_irq_disable();
+
+		/*
+		 * Note: we must be in some form of idle, so no need to perform
+		 * a kvm_set_cpu_l1tf_flush_l1d().
+		 */
+
+		/* CALL_FUNCTION_SINGLE_VECTOR */
+		irq_enter();
+		generic_smp_call_function_single_interrupt();
+		irq_exit();
+
+		/* RESCHEDULE_VECTOR */
+		scheduler_ipi();
+
+		local_irq_enable();
+		local_bh_enable();
+	}
+}
+
+static inline int try_ipiless_wakeup(int cpu)
+{
+	unsigned long *ti_flags = per_cpu(idletask_ti_flags, cpu);
+
+	if (!(*ti_flags & _TIF_IN_IPILESS_IDLE))
+		return 0;
+
+	return test_and_clear_bit(TIF_IN_IPILESS_IDLE,
+					(unsigned long *)ti_flags);
+}
+
+#else
+static inline void do_ipiless_pending_work(void) { }
+static inline void enter_ipiless_idle(void) { }
+static inline void exit_ipiless_idle(void) { }
+
+static inline int is_ipiless_wakeup_pending(void)
+{
+	return need_resched();
+}
+#endif
+
+#endif /* _LINUX_IPILESS_WAKE_H */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1ae95b9150d3..8897721816d5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -8,6 +8,8 @@
  */
 #include "sched.h"
 
+#include <linux/ipiless_wake.h>
+
 #include <trace/events/power.h>
 
 /* Linker adds these: start and end of __cpuidle functions */
@@ -58,10 +60,12 @@ static noinline int __cpuidle cpu_idle_poll(void)
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
 	local_irq_enable();
 	stop_critical_timings();
-
-	while (!tif_need_resched() &&
+	/* caller will process ipiless work */
+	enter_ipiless_idle();
+	while (!is_ipiless_wakeup_pending() &&
 		(cpu_idle_force_poll || tick_check_broadcast_expired()))
 		cpu_relax();
+	exit_ipiless_idle();
 	start_critical_timings();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 	rcu_idle_exit();
@@ -276,6 +280,8 @@ static void do_idle(void)
 			cpuidle_idle_call();
 		}
 		arch_cpu_idle_exit();
+
+		do_ipiless_pending_work();
 	}
 
 	/*
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom
From: Yafang Shao @ 2020-07-18  2:15 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Tetsuo Handa, Andrew Morton, Johannes Weiner,
	Linux MM
In-Reply-To: <alpine.DEB.2.23.453.2007171212210.3398972@chino.kir.corp.google.com>

On Sat, Jul 18, 2020 at 3:26 AM David Rientjes <rientjes@google.com> wrote:
>
> On Fri, 17 Jul 2020, Yafang Shao wrote:
>
> > > > Actually the kernel is doing it now, see bellow,
> > > >
> > > > dump_header() <<<< dump lots of information
> > > > __oom_kill_process
> > > >     p = find_lock_task_mm(victim);
> > > >     if (!p)
> > > >        return;   <<<< without killing any process.
> > > >
> > >
> > > Ah, this is catching an instance where the chosen process has already done
> > > exit_mm(), good catch -- I can find examples of this by scraping kernel
> > > logs from our fleet.
> > >
> > > So it appears there is precedence for dumping all the oom info but not
> > > actually performing any action for it and I made the earlier point that
> > > diagnostic information in the kernel log here is still useful.  I think it
> > > is still preferable that the kernel at least tell us why it didn't do
> > > anything, but as you mention that already happens today.
> > >
> > > Would you like to send a patch that checks for mem_cgroup_margin() here as
> > > well?  A second patch could make the possible inaction more visibile,
> > > something like "Process ${pid} (${comm}) is already exiting" for the above
> > > check or "Memcg ${memcg} is no longer out of memory".
> > >
> > > Another thing that these messages indicate, beyond telling us why the oom
> > > killer didn't actually SIGKILL anything, is that we can expect some skew
> > > in the memory stats that shows an availability of memory.
> > >
> >
> > Agreed, these messages would be helpful.
> > I will send a patch for it.
> >
>
> Thanks Yafang.  We should also continue talking about challenges you
> encounter with the oom killer either at the system level or for memcg
> limit ooms in a separate thread.  It's clear that you are meeting several
> of the issues that we have previously seen ourselves.
>
> I could do a full audit of all our oom killer changes that may be
> interesting to you, but off the top of my head:
>
>  - A means of triggering a memcg oom through the kernel: think of sysrq+f
>    but scoped to processes attached to a memcg hierarchy.  This allows
>    userspace to reliably oom kill processes on overcommitted systems
>    (SIGKILL can be insufficient if we depend on oom reaping, for example,
>    to make forward progress)
>

memcg sysrq+f would be helpful.
But I'm wondering how about waking up the oom_reaper when we send
SIGKILL to a process ?

For the below three proposals, I think they would be helpful as well
and I don't have different opinions。

>  - Storing the state of a memcg's memory at the time reclaim has failed
>    and we must oom kill: when the memcg oom killer is disabled so that
>    userspace can handle it, if it triggers an oom kill through the kernel
>    because it prefers an oom kill on an overcommitted system, we need to
>    dump the state of the memory at oom rather than with the stack of the
>    explicit trigger
>
>  - Supplement memcg oom notification with an additional notification event
>    on kernel oom kill: allows users to register for an event that triggers
>    when the kernel oom killer kills something (and keeps a count of these
>    events available for read)
>
>  - Add a notion of an oom delay: on overcommitted systems, userspace may
>    become unreliable or unresponsive despite our best efforts, this
>    supplements the ability to disable the oom killer for a memcg hierarchy
>    with the ability to disable it for a set period of time until the oom
>    killer intervenes and kills something (last ditch effort).
>
> I'd be happy to discuss any of these topics if you are interested.

Pls. send these patches at your convenience.

-- 
Thanks
Yafang


^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.