* Re: [PATCH iproute2-rc 2/8] rdma: Add "stat qp show" support
From: Stephen Hemminger @ 2019-07-16 19:01 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Leon Romanovsky, netdev, David Ahern, Mark Zhang,
RDMA mailing list
In-Reply-To: <20190710072455.9125-3-leon@kernel.org>
On Wed, 10 Jul 2019 10:24:49 +0300
Leon Romanovsky <leon@kernel.org> wrote:
> From: Mark Zhang <markz@mellanox.com>
>
> This patch presents link, id, task name, lqpn, as well as all sub
> counters of a QP counter.
> A QP counter is a dynamically allocated statistic counter that is
> bound with one or more QPs. It has several sub-counters, each is
> used for a different purpose.
>
> Examples:
> $ rdma stat qp show
> link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
> rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
> duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
> implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
> resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
> req_remote_access_errors 0 resp_remote_access_errors 0
> resp_cqe_flush_error 0 req_cqe_flush_error 0
> LQPN: <178>
> $ rdma stat show link rocep1s0f5/1
> link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 duplicate_request 0
> rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
> req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0 resp_cqe_flush_error 0
> req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
> $ rdma stat show link rocep1s0f5/1 -p
> link rocep1s0f5/1
> rx_write_requests 0
> rx_read_requests 0
> rx_atomic_requests 0
> out_of_buffer 0
> duplicate_request 0
> rnr_nak_retry_err 0
> packet_seq_err 0
> implied_nak_seq_err 0
> local_ack_timeout_err 0
> resp_local_length_error 0
> resp_cqe_error 0
> req_cqe_error 0
> req_remote_invalid_request 0
> req_remote_access_errors 0
> resp_remote_access_errors 0
> resp_cqe_flush_error 0
> req_cqe_flush_error 0
> rp_cnp_ignored 0
> rp_cnp_handled 0
> np_ecn_marked_roce_packets 0
> np_cnp_sent 0
>
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
> rdma/Makefile | 2 +-
> rdma/rdma.c | 3 +-
> rdma/rdma.h | 1 +
> rdma/stat.c | 268 ++++++++++++++++++++++++++++++++++++++++++++++++++
> rdma/utils.c | 7 ++
> 5 files changed, 279 insertions(+), 2 deletions(-)
> create mode 100644 rdma/stat.c
>
Headers have been merged, but this patch does not apply cleanly to current iproute2
^ permalink raw reply
* Re: [PATCH 2/9] rcu: Add support for consolidated-RCU reader checking (v3)
From: Paul E. McKenney @ 2019-07-16 18:53 UTC (permalink / raw)
To: Joel Fernandes
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716184649.GA130463@google.com>
On Tue, Jul 16, 2019 at 02:46:49PM -0400, Joel Fernandes wrote:
> On Tue, Jul 16, 2019 at 11:38:33AM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 15, 2019 at 10:36:58AM -0400, Joel Fernandes (Google) wrote:
> > > This patch adds support for checking RCU reader sections in list
> > > traversal macros. Optionally, if the list macro is called under SRCU or
> > > other lock/mutex protection, then appropriate lockdep expressions can be
> > > passed to make the checks pass.
> > >
> > > Existing list_for_each_entry_rcu() invocations don't need to pass the
> > > optional fourth argument (cond) unless they are under some non-RCU
> > > protection and needs to make lockdep check pass.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > Now that I am on the correct version, again please fold in the checks
> > for the extra argument. The ability to have an optional argument looks
> > quite helpful, especially when compared to growing the RCU API!
>
> I did fold this and replied with a pull request URL based on /dev branch. But
> we can hold off on the pull requests until we decide on the below comments:
>
> > A few more things below.
> > > ---
> > > include/linux/rculist.h | 28 ++++++++++++++++++++-----
> > > include/linux/rcupdate.h | 7 +++++++
> > > kernel/rcu/Kconfig.debug | 11 ++++++++++
> > > kernel/rcu/update.c | 44 ++++++++++++++++++++++++----------------
> > > 4 files changed, 67 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > > index e91ec9ddcd30..1048160625bb 100644
> > > --- a/include/linux/rculist.h
> > > +++ b/include/linux/rculist.h
> > > @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
> > > */
> > > #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
> > >
> > > +/*
> > > + * Check during list traversal that we are within an RCU reader
> > > + */
> > > +
> > > +#ifdef CONFIG_PROVE_RCU_LIST
> >
> > This new Kconfig option is OK temporarily, but unless there is reason to
> > fear malfunction that a few weeks of rcutorture, 0day, and -next won't
> > find, it would be better to just use CONFIG_PROVE_RCU. The overall goal
> > is to reduce the number of RCU knobs rather than grow them, must though
> > history might lead one to believe otherwise. :-/
>
> If you want, we can try to drop this option and just use PROVE_RCU however I
> must say there may be several warnings that need to be fixed in a short
> period of time (even a few weeks may be too short) considering the 1000+
> uses of RCU lists.
Do many people other than me build with CONFIG_PROVE_RCU? If so, then
that would be a good reason for a temporary CONFIG_PROVE_RCU_LIST,
as in going away in a release or two once the warnings get fixed.
> But I don't mind dropping it and it may just accelerate the fixing up of all
> callers.
I will let you decide based on the above question. But if you have
CONFIG_PROVE_RCU_LIST, as noted below, it needs to depend on RCU_EXPERT.
Thanx, Paul
> > > +#define __list_check_rcu(dummy, cond, ...) \
> > > + ({ \
> > > + RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
> > > + "RCU-list traversed in non-reader section!"); \
> > > + })
> > > +#else
> > > +#define __list_check_rcu(dummy, cond, ...) ({})
> > > +#endif
> > > +
> > > /*
> > > * Insert a new entry between two known consecutive entries.
> > > *
> > > @@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> > > * @pos: the type * to use as a loop cursor.
> > > * @head: the head for your list.
> > > * @member: the name of the list_head within the struct.
> > > + * @cond: optional lockdep expression if called from non-RCU protection.
> > > *
> > > * This list-traversal primitive may safely run concurrently with
> > > * the _rcu list-mutation primitives such as list_add_rcu()
> > > * as long as the traversal is guarded by rcu_read_lock().
> > > */
> > > -#define list_for_each_entry_rcu(pos, head, member) \
> > > - for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > > - &pos->member != (head); \
> > > +#define list_for_each_entry_rcu(pos, head, member, cond...) \
> > > + for (__list_check_rcu(dummy, ## cond, 0), \
> > > + pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > > + &pos->member != (head); \
> > > pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> > >
> > > /**
> > > @@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
> > > * @pos: the type * to use as a loop cursor.
> > > * @head: the head for your list.
> > > * @member: the name of the hlist_node within the struct.
> > > + * @cond: optional lockdep expression if called from non-RCU protection.
> > > *
> > > * This list-traversal primitive may safely run concurrently with
> > > * the _rcu list-mutation primitives such as hlist_add_head_rcu()
> > > * as long as the traversal is guarded by rcu_read_lock().
> > > */
> > > -#define hlist_for_each_entry_rcu(pos, head, member) \
> > > - for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> > > +#define hlist_for_each_entry_rcu(pos, head, member, cond...) \
> > > + for (__list_check_rcu(dummy, ## cond, 0), \
> > > + pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> > > typeof(*(pos)), member); \
> > > pos; \
> > > pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 8f7167478c1d..f3c29efdf19a 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
> > > int rcu_read_lock_held(void);
> > > int rcu_read_lock_bh_held(void);
> > > int rcu_read_lock_sched_held(void);
> > > +int rcu_read_lock_any_held(void);
> > >
> > > #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > >
> > > @@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
> > > {
> > > return !preemptible();
> > > }
> > > +
> > > +static inline int rcu_read_lock_any_held(void)
> > > +{
> > > + return !preemptible();
> > > +}
> > > +
> > > #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > >
> > > #ifdef CONFIG_PROVE_RCU
> > > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > > index 5ec3ea4028e2..7fbd21dbfcd0 100644
> > > --- a/kernel/rcu/Kconfig.debug
> > > +++ b/kernel/rcu/Kconfig.debug
> > > @@ -8,6 +8,17 @@ menu "RCU Debugging"
> > > config PROVE_RCU
> > > def_bool PROVE_LOCKING
> > >
> > > +config PROVE_RCU_LIST
> > > + bool "RCU list lockdep debugging"
> > > + depends on PROVE_RCU
> >
> > This must also depend on RCU_EXPERT.
>
> Sure.
>
> > > + default n
> > > + help
> > > + Enable RCU lockdep checking for list usages. By default it is
> > > + turned off since there are several list RCU users that still
> > > + need to be converted to pass a lockdep expression. To prevent
> > > + false-positive splats, we keep it default disabled but once all
> > > + users are converted, we can remove this config option.
> > > +
> > > config TORTURE_TEST
> > > tristate
> > > default n
> > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > index 9dd5aeef6e70..b7a4e3b5fa98 100644
> > > --- a/kernel/rcu/update.c
> > > +++ b/kernel/rcu/update.c
> > > @@ -91,14 +91,18 @@ module_param(rcu_normal_after_boot, int, 0);
> > > * Similarly, we avoid claiming an SRCU read lock held if the current
> > > * CPU is offline.
> > > */
> > > +#define rcu_read_lock_held_common() \
> > > + if (!debug_lockdep_rcu_enabled()) \
> > > + return 1; \
> > > + if (!rcu_is_watching()) \
> > > + return 0; \
> > > + if (!rcu_lockdep_current_cpu_online()) \
> > > + return 0;
> >
> > Nice abstraction of common code!
>
> Thanks!
>
^ permalink raw reply
* Re: [PATCH 2/2] net: apply proc_net_mkdir() harder
From: Pablo Neira Ayuso @ 2019-07-16 18:52 UTC (permalink / raw)
To: Alexey Dobriyan
Cc: davem, netdev, netfilter-devel, linux-nfs, j.vosburgh, vfalico,
andy, kadlec, fw, bfields, chuck.lever
In-Reply-To: <20190706165521.GB10550@avx2>
On Sat, Jul 06, 2019 at 07:55:21PM +0300, Alexey Dobriyan wrote:
> From: "Hallsmark, Per" <Per.Hallsmark@windriver.com>
>
> proc_net_mkdir() should be used to create stuff under /proc/net,
> so that dentry revalidation kicks in.
>
> See
>
> commit 1fde6f21d90f8ba5da3cb9c54ca991ed72696c43
> proc: fix /proc/net/* after setns(2)
>
> [added more chunks --adobriyan]
I don't find this in the tree, if you split the netfilter part in an
independent patch, I could take it into the netfilter tree.
Or just keep it like this and ask David to take it.
^ permalink raw reply
* Re: [PATCH v2 2/9] rcu: Add support for consolidated-RCU reader checking
From: Paul E. McKenney @ 2019-07-16 18:50 UTC (permalink / raw)
To: Joel Fernandes
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716183517.GA129705@google.com>
On Tue, Jul 16, 2019 at 02:35:17PM -0400, Joel Fernandes wrote:
> On Tue, Jul 16, 2019 at 11:22:37AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 12, 2019 at 01:00:17PM -0400, Joel Fernandes (Google) wrote:
> > > This patch adds support for checking RCU reader sections in list
> > > traversal macros. Optionally, if the list macro is called under SRCU or
> > > other lock/mutex protection, then appropriate lockdep expressions can be
> > > passed to make the checks pass.
> > >
> > > Existing list_for_each_entry_rcu() invocations don't need to pass the
> > > optional fourth argument (cond) unless they are under some non-RCU
> > > protection and needs to make lockdep check pass.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > If you fold in the checks for extra parameters, I will take this
> > one and also 1/9.
>
> I folded the checks in and also threw in the rcu-sync with Oleg's ack:
>
> Could you pull into /dev branch?
>
> git pull https://github.com/joelagnel/linux-kernel.git list-first-three
> (Based on your dev branch)
Given that I am going to have to rebase these a few times, please
email a v4.
Thanx, Paul
^ permalink raw reply
* Re: [PATCH 0/9] Harden list_for_each_entry_rcu() and family
From: Paul E. McKenney @ 2019-07-16 18:46 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:36:56AM -0400, Joel Fernandes (Google) wrote:
> Hi,
> This series aims to provide lockdep checking to RCU list macros for additional
> kernel hardening.
>
> RCU has a number of primitives for "consumption" of an RCU protected pointer.
> Most of the time, these consumers make sure that such accesses are under a RCU
> reader-section (such as rcu_dereference{,sched,bh} or under a lock, such as
> with rcu_dereference_protected()).
>
> However, there are other ways to consume RCU pointers, such as by
> list_for_each_entry_rcu or hlist_for_each_enry_rcu. Unlike the rcu_dereference
> family, these consumers do no lockdep checking at all. And with the growing
> number of RCU list uses (1000+), it is possible for bugs to creep in and go
> unnoticed which lockdep checks can catch.
>
> Since RCU consolidation efforts last year, the different traditional RCU
> flavors (preempt, bh, sched) are all consolidated. In other words, any of these
> flavors can cause a reader section to occur and all of them must cease before
> the reader section is considered to be unlocked. Thanks to this, we can
> generically check if we are in an RCU reader. This is what patch 1 does. Note
> that the list_for_each_entry_rcu and family are different from the
> rcu_dereference family in that, there is no _bh or _sched version of this
> macro. They are used under many different RCU reader flavors, and also SRCU.
> Patch 1 adds a new internal function rcu_read_lock_any_held() which checks
> if any reader section is active at all, when these macros are called. If no
> reader section exists, then the optional fourth argument to
> list_for_each_entry_rcu() can be a lockdep expression which is evaluated
> (similar to how rcu_dereference_check() works). If no lockdep expression is
> passed, and we are not in a reader, then a splat occurs. Just take off the
> lockdep expression after applying the patches, by using the following diff and
> see what happens:
>
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -55,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
> struct pci_mmcfg_region *cfg;
>
> /* keep list sorted by segment and starting bus number */
> - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
> + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
>
>
> The optional argument trick to list_for_each_entry_rcu() can also be used in
> the future to possibly remove rcu_dereference_{,bh,sched}_protected() API and
> we can pass an optional lockdep expression to rcu_dereference() itself. Thus
> eliminating 3 more RCU APIs.
>
> Note that some list macro wrappers already do their own lockdep checking in the
> caller side. These can be eliminated in favor of the built-in lockdep checking
> in the list macro that this series adds. For example, workqueue code has a
> assert_rcu_or_wq_mutex() function which is called in for_each_wq(). This
> series replaces that in favor of the built-in check.
>
> Also in the future, we can extend these checks to list_entry_rcu() and other
> list macros as well, if needed.
>
> Please note that I have kept this option default-disabled under a new config:
> CONFIG_PROVE_RCU_LIST. This is so that until all users are converted to pass
> the optional argument, we should keep the check disabled. There are about a
> 1000 or so users and it is not possible to pass in the optional lockdep
> expression in a single series since it is done on a case-by-case basis. I did
> convert a few users in this series itself.
I do like the optional argument as opposed to the traditional practice
of expanding the RCU API! Good stuff!!!
Please resend incorporating the acks and the changes from feedback.
I will hold off on any patches not yet having their maintainer's ack,
but it is OK to include them in v4. (I will just avoid applying them.)
The documentation patch needs a bit of wordsmithing, but I can do that.
Feel free to take another pass on it if you wish, though.
Thanx, Paul
> v2->v3: Simplified rcu-sync logic after rebase (Paul)
> Added check for bh_map (Paul)
> Refactored out more of the common code (Joel)
> Added Oleg ack to rcu-sync patch.
>
> v1->v2: Have assert_rcu_or_wq_mutex deleted (Daniel Jordan)
> Simplify rcu_read_lock_any_held() (Peter Zijlstra)
> Simplified rcu-sync logic (Oleg Nesterov)
> Updated documentation and rculist comments.
> Added GregKH ack.
>
> RFC->v1:
> Simplify list checking macro (Rasmus Villemoes)
>
> Joel Fernandes (Google) (9):
> rcu/update: Remove useless check for debug_locks (v1)
> rcu: Add support for consolidated-RCU reader checking (v3)
> rcu/sync: Remove custom check for reader-section (v2)
> ipv4: add lockdep condition to fix for_each_entry (v1)
> driver/core: Convert to use built-in RCU list checking (v1)
> workqueue: Convert for_each_wq to use built-in list check (v2)
> x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
> acpi: Use built-in RCU list checking for acpi_ioremaps list (v1)
> doc: Update documentation about list_for_each_entry_rcu (v1)
>
> Documentation/RCU/lockdep.txt | 15 ++++++++---
> Documentation/RCU/whatisRCU.txt | 9 ++++++-
> arch/x86/pci/mmconfig-shared.c | 5 ++--
> drivers/acpi/osl.c | 6 +++--
> drivers/base/base.h | 1 +
> drivers/base/core.c | 10 +++++++
> drivers/base/power/runtime.c | 15 +++++++----
> include/linux/rcu_sync.h | 4 +--
> include/linux/rculist.h | 28 +++++++++++++++----
> include/linux/rcupdate.h | 7 +++++
> kernel/rcu/Kconfig.debug | 11 ++++++++
> kernel/rcu/update.c | 48 ++++++++++++++++++---------------
> kernel/workqueue.c | 10 ++-----
> net/ipv4/fib_frontend.c | 3 ++-
> 14 files changed, 119 insertions(+), 53 deletions(-)
>
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 2/9] rcu: Add support for consolidated-RCU reader checking (v3)
From: Joel Fernandes @ 2019-07-16 18:46 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716183833.GD14271@linux.ibm.com>
On Tue, Jul 16, 2019 at 11:38:33AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 15, 2019 at 10:36:58AM -0400, Joel Fernandes (Google) wrote:
> > This patch adds support for checking RCU reader sections in list
> > traversal macros. Optionally, if the list macro is called under SRCU or
> > other lock/mutex protection, then appropriate lockdep expressions can be
> > passed to make the checks pass.
> >
> > Existing list_for_each_entry_rcu() invocations don't need to pass the
> > optional fourth argument (cond) unless they are under some non-RCU
> > protection and needs to make lockdep check pass.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> Now that I am on the correct version, again please fold in the checks
> for the extra argument. The ability to have an optional argument looks
> quite helpful, especially when compared to growing the RCU API!
I did fold this and replied with a pull request URL based on /dev branch. But
we can hold off on the pull requests until we decide on the below comments:
> A few more things below.
> > ---
> > include/linux/rculist.h | 28 ++++++++++++++++++++-----
> > include/linux/rcupdate.h | 7 +++++++
> > kernel/rcu/Kconfig.debug | 11 ++++++++++
> > kernel/rcu/update.c | 44 ++++++++++++++++++++++++----------------
> > 4 files changed, 67 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index e91ec9ddcd30..1048160625bb 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
> > */
> > #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
> >
> > +/*
> > + * Check during list traversal that we are within an RCU reader
> > + */
> > +
> > +#ifdef CONFIG_PROVE_RCU_LIST
>
> This new Kconfig option is OK temporarily, but unless there is reason to
> fear malfunction that a few weeks of rcutorture, 0day, and -next won't
> find, it would be better to just use CONFIG_PROVE_RCU. The overall goal
> is to reduce the number of RCU knobs rather than grow them, must though
> history might lead one to believe otherwise. :-/
If you want, we can try to drop this option and just use PROVE_RCU however I
must say there may be several warnings that need to be fixed in a short
period of time (even a few weeks may be too short) considering the 1000+
uses of RCU lists.
But I don't mind dropping it and it may just accelerate the fixing up of all
callers.
> > +#define __list_check_rcu(dummy, cond, ...) \
> > + ({ \
> > + RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
> > + "RCU-list traversed in non-reader section!"); \
> > + })
> > +#else
> > +#define __list_check_rcu(dummy, cond, ...) ({})
> > +#endif
> > +
> > /*
> > * Insert a new entry between two known consecutive entries.
> > *
> > @@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> > * @pos: the type * to use as a loop cursor.
> > * @head: the head for your list.
> > * @member: the name of the list_head within the struct.
> > + * @cond: optional lockdep expression if called from non-RCU protection.
> > *
> > * This list-traversal primitive may safely run concurrently with
> > * the _rcu list-mutation primitives such as list_add_rcu()
> > * as long as the traversal is guarded by rcu_read_lock().
> > */
> > -#define list_for_each_entry_rcu(pos, head, member) \
> > - for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > - &pos->member != (head); \
> > +#define list_for_each_entry_rcu(pos, head, member, cond...) \
> > + for (__list_check_rcu(dummy, ## cond, 0), \
> > + pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > + &pos->member != (head); \
> > pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> >
> > /**
> > @@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
> > * @pos: the type * to use as a loop cursor.
> > * @head: the head for your list.
> > * @member: the name of the hlist_node within the struct.
> > + * @cond: optional lockdep expression if called from non-RCU protection.
> > *
> > * This list-traversal primitive may safely run concurrently with
> > * the _rcu list-mutation primitives such as hlist_add_head_rcu()
> > * as long as the traversal is guarded by rcu_read_lock().
> > */
> > -#define hlist_for_each_entry_rcu(pos, head, member) \
> > - for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> > +#define hlist_for_each_entry_rcu(pos, head, member, cond...) \
> > + for (__list_check_rcu(dummy, ## cond, 0), \
> > + pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> > typeof(*(pos)), member); \
> > pos; \
> > pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 8f7167478c1d..f3c29efdf19a 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
> > int rcu_read_lock_held(void);
> > int rcu_read_lock_bh_held(void);
> > int rcu_read_lock_sched_held(void);
> > +int rcu_read_lock_any_held(void);
> >
> > #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >
> > @@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
> > {
> > return !preemptible();
> > }
> > +
> > +static inline int rcu_read_lock_any_held(void)
> > +{
> > + return !preemptible();
> > +}
> > +
> > #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >
> > #ifdef CONFIG_PROVE_RCU
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 5ec3ea4028e2..7fbd21dbfcd0 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -8,6 +8,17 @@ menu "RCU Debugging"
> > config PROVE_RCU
> > def_bool PROVE_LOCKING
> >
> > +config PROVE_RCU_LIST
> > + bool "RCU list lockdep debugging"
> > + depends on PROVE_RCU
>
> This must also depend on RCU_EXPERT.
Sure.
> > + default n
> > + help
> > + Enable RCU lockdep checking for list usages. By default it is
> > + turned off since there are several list RCU users that still
> > + need to be converted to pass a lockdep expression. To prevent
> > + false-positive splats, we keep it default disabled but once all
> > + users are converted, we can remove this config option.
> > +
> > config TORTURE_TEST
> > tristate
> > default n
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 9dd5aeef6e70..b7a4e3b5fa98 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -91,14 +91,18 @@ module_param(rcu_normal_after_boot, int, 0);
> > * Similarly, we avoid claiming an SRCU read lock held if the current
> > * CPU is offline.
> > */
> > +#define rcu_read_lock_held_common() \
> > + if (!debug_lockdep_rcu_enabled()) \
> > + return 1; \
> > + if (!rcu_is_watching()) \
> > + return 0; \
> > + if (!rcu_lockdep_current_cpu_online()) \
> > + return 0;
>
> Nice abstraction of common code!
Thanks!
^ permalink raw reply
* Re: [PATCH 8/9] acpi: Use built-in RCU list checking for acpi_ioremaps list (v1)
From: Paul E. McKenney @ 2019-07-16 18:43 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-9-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:37:04AM -0400, Joel Fernandes (Google) wrote:
> list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
> it for acpi_ioremaps list traversal.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Given that Rafael acked it, this one looks ready.
Thanx, Paul
> ---
> drivers/acpi/osl.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 9c0edf2fc0dd..2f9d0d20b836 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -14,6 +14,7 @@
> #include <linux/slab.h>
> #include <linux/mm.h>
> #include <linux/highmem.h>
> +#include <linux/lockdep.h>
> #include <linux/pci.h>
> #include <linux/interrupt.h>
> #include <linux/kmod.h>
> @@ -80,6 +81,7 @@ struct acpi_ioremap {
>
> static LIST_HEAD(acpi_ioremaps);
> static DEFINE_MUTEX(acpi_ioremap_lock);
> +#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
>
> static void __init acpi_request_region (struct acpi_generic_address *gas,
> unsigned int length, char *desc)
> @@ -206,7 +208,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
> {
> struct acpi_ioremap *map;
>
> - list_for_each_entry_rcu(map, &acpi_ioremaps, list)
> + list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
> if (map->phys <= phys &&
> phys + size <= map->phys + map->size)
> return map;
> @@ -249,7 +251,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
> {
> struct acpi_ioremap *map;
>
> - list_for_each_entry_rcu(map, &acpi_ioremaps, list)
> + list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
> if (map->virt <= virt &&
> virt + size <= map->virt + map->size)
> return map;
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 6/9] workqueue: Convert for_each_wq to use built-in list check (v2)
From: Paul E. McKenney @ 2019-07-16 18:41 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-7-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:37:02AM -0400, Joel Fernandes (Google) wrote:
> list_for_each_entry_rcu now has support to check for RCU reader sections
> as well as lock. Just use the support in it, instead of explictly
> checking in the caller.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
We need an ack from one of the subsystem maintainers on this one.
Thanx, Paul
> ---
> kernel/workqueue.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 601d61150b65..e882477ebf6e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -364,11 +364,6 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
> !lockdep_is_held(&wq_pool_mutex), \
> "RCU or wq_pool_mutex should be held")
>
> -#define assert_rcu_or_wq_mutex(wq) \
> - RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
> - !lockdep_is_held(&wq->mutex), \
> - "RCU or wq->mutex should be held")
> -
> #define assert_rcu_or_wq_mutex_or_pool_mutex(wq) \
> RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
> !lockdep_is_held(&wq->mutex) && \
> @@ -425,9 +420,8 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
> * ignored.
> */
> #define for_each_pwq(pwq, wq) \
> - list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node) \
> - if (({ assert_rcu_or_wq_mutex(wq); false; })) { } \
> - else
> + list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node, \
> + lock_is_held(&(wq->mutex).dep_map))
>
> #ifdef CONFIG_DEBUG_OBJECTS_WORK
>
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 7/9] x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
From: Paul E. McKenney @ 2019-07-16 18:42 UTC (permalink / raw)
To: Joel Fernandes
Cc: Bjorn Helgaas, linux-kernel, Alexey Kuznetsov, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716040303.GA73383@google.com>
On Tue, Jul 16, 2019 at 12:03:03AM -0400, Joel Fernandes wrote:
> On Mon, Jul 15, 2019 at 03:02:35PM -0500, Bjorn Helgaas wrote:
> > On Mon, Jul 15, 2019 at 10:37:03AM -0400, Joel Fernandes (Google) wrote:
> > > The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
> > > reader-lock held, because the pci_mmcfg_lock is already held. Make this
> > > known to the list macro so that it fixes new lockdep warnings that
> > > trigger due to lockdep checks added to list_for_each_entry_rcu().
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > Ingo takes care of most patches to this file, but FWIW,
> >
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Thanks.
>
> > I would personally prefer if you capitalized the subject to match the
> > "x86/PCI:" convention that's used fairly consistently in
> > arch/x86/pci/.
> >
> > Also, I didn't apply this to be sure, but it looks like this might
> > make a line or two wider than 80 columns, which I would rewrap if I
> > were applying this.
>
> Updated below is the patch with the nits corrected:
I am OK with this going either way, but it does depend on an earlier
patch.
Thanx, Paul
> ---8<-----------------------
>
> >From 73fab09d7e33ca2110c24215f8ed428c12625dbe Mon Sep 17 00:00:00 2001
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Date: Sat, 1 Jun 2019 15:05:49 -0400
> Subject: [PATCH] x86/PCI: Pass lockdep condition to pcm_mmcfg_list iterator
> (v1)
>
> The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
> reader-lock held, because the pci_mmcfg_lock is already held. Make this
> known to the list macro so that it fixes new lockdep warnings that
> trigger due to lockdep checks added to list_for_each_entry_rcu().
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> arch/x86/pci/mmconfig-shared.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 7389db538c30..9e3250ec5a37 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -29,6 +29,7 @@
> static bool pci_mmcfg_running_state;
> static bool pci_mmcfg_arch_init_failed;
> static DEFINE_MUTEX(pci_mmcfg_lock);
> +#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
>
> LIST_HEAD(pci_mmcfg_list);
>
> @@ -54,7 +55,8 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
> struct pci_mmcfg_region *cfg;
>
> /* keep list sorted by segment and starting bus number */
> - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
> + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list,
> + pci_mmcfg_lock_held()) {
> if (cfg->segment > new->segment ||
> (cfg->segment == new->segment &&
> cfg->start_bus >= new->start_bus)) {
> @@ -118,7 +120,8 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
> {
> struct pci_mmcfg_region *cfg;
>
> - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
> + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list
> + pci_mmcfg_lock_held())
> if (cfg->segment == segment &&
> cfg->start_bus <= bus && bus <= cfg->end_bus)
> return cfg;
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 5/9] driver/core: Convert to use built-in RCU list checking (v1)
From: Paul E. McKenney @ 2019-07-16 18:40 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Greg Kroah-Hartman, Alexey Kuznetsov, Bjorn Helgaas,
Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-6-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:37:01AM -0400, Joel Fernandes (Google) wrote:
> list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
> it in driver core.
>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
This one looks ready.
Thanx, Paul
> ---
> drivers/base/base.h | 1 +
> drivers/base/core.c | 10 ++++++++++
> drivers/base/power/runtime.c | 15 ++++++++++-----
> 3 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index b405436ee28e..0d32544b6f91 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; }
> /* Device links support */
> extern int device_links_read_lock(void);
> extern void device_links_read_unlock(int idx);
> +extern int device_links_read_lock_held(void);
> extern int device_links_check_suppliers(struct device *dev);
> extern void device_links_driver_bound(struct device *dev);
> extern void device_links_driver_cleanup(struct device *dev);
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index da84a73f2ba6..85e82f38717f 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -68,6 +68,11 @@ void device_links_read_unlock(int idx)
> {
> srcu_read_unlock(&device_links_srcu, idx);
> }
> +
> +int device_links_read_lock_held(void)
> +{
> + return srcu_read_lock_held(&device_links_srcu);
> +}
> #else /* !CONFIG_SRCU */
> static DECLARE_RWSEM(device_links_lock);
>
> @@ -91,6 +96,11 @@ void device_links_read_unlock(int not_used)
> {
> up_read(&device_links_lock);
> }
> +
> +int device_links_read_lock_held(void)
> +{
> + return lock_is_held(&device_links_lock);
> +}
> #endif /* !CONFIG_SRCU */
>
> /**
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 952a1e7057c7..7a10e8379a70 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev)
> {
> struct device_link *link;
>
> - list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
> + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> + device_links_read_lock_held()) {
> int retval;
>
> if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> @@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev)
> {
> struct device_link *link;
>
> - list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
> + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> + device_links_read_lock_held()) {
> if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> continue;
>
> @@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev)
>
> idx = device_links_read_lock();
>
> - list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> + list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
> + device_links_read_lock_held()) {
> if (link->flags & DL_FLAG_STATELESS)
> continue;
>
> @@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev)
>
> idx = device_links_read_lock();
>
> - list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> + device_links_read_lock_held())
> if (link->flags & DL_FLAG_PM_RUNTIME) {
> link->supplier_preactivated = true;
> refcount_inc(&link->rpm_active);
> @@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev)
>
> idx = device_links_read_lock();
>
> - list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> + device_links_read_lock_held())
> if (link->supplier_preactivated) {
> link->supplier_preactivated = false;
> if (refcount_dec_not_one(&link->rpm_active))
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 4/9] ipv4: add lockdep condition to fix for_each_entry (v1)
From: Paul E. McKenney @ 2019-07-16 18:39 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-5-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:37:00AM -0400, Joel Fernandes (Google) wrote:
> Using the previous support added, use it for adding lockdep conditions
> to list usage here.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
We need an ack or better from the subsystem maintainer for this one.
Thanx, Paul
> ---
> net/ipv4/fib_frontend.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 317339cd7f03..26b0fb24e2c2 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -124,7 +124,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
> h = id & (FIB_TABLE_HASHSZ - 1);
>
> head = &net->ipv4.fib_table_hash[h];
> - hlist_for_each_entry_rcu(tb, head, tb_hlist) {
> + hlist_for_each_entry_rcu(tb, head, tb_hlist,
> + lockdep_rtnl_is_held()) {
> if (tb->tb_id == id)
> return tb;
> }
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 2/9] rcu: Add support for consolidated-RCU reader checking (v3)
From: Paul E. McKenney @ 2019-07-16 18:38 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-3-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:36:58AM -0400, Joel Fernandes (Google) wrote:
> This patch adds support for checking RCU reader sections in list
> traversal macros. Optionally, if the list macro is called under SRCU or
> other lock/mutex protection, then appropriate lockdep expressions can be
> passed to make the checks pass.
>
> Existing list_for_each_entry_rcu() invocations don't need to pass the
> optional fourth argument (cond) unless they are under some non-RCU
> protection and needs to make lockdep check pass.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Now that I am on the correct version, again please fold in the checks
for the extra argument. The ability to have an optional argument looks
quite helpful, especially when compared to growing the RCU API!
A few more things below.
> ---
> include/linux/rculist.h | 28 ++++++++++++++++++++-----
> include/linux/rcupdate.h | 7 +++++++
> kernel/rcu/Kconfig.debug | 11 ++++++++++
> kernel/rcu/update.c | 44 ++++++++++++++++++++++++----------------
> 4 files changed, 67 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index e91ec9ddcd30..1048160625bb 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
> */
> #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
>
> +/*
> + * Check during list traversal that we are within an RCU reader
> + */
> +
> +#ifdef CONFIG_PROVE_RCU_LIST
This new Kconfig option is OK temporarily, but unless there is reason to
fear malfunction that a few weeks of rcutorture, 0day, and -next won't
find, it would be better to just use CONFIG_PROVE_RCU. The overall goal
is to reduce the number of RCU knobs rather than grow them, must though
history might lead one to believe otherwise. :-/
> +#define __list_check_rcu(dummy, cond, ...) \
> + ({ \
> + RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
> + "RCU-list traversed in non-reader section!"); \
> + })
> +#else
> +#define __list_check_rcu(dummy, cond, ...) ({})
> +#endif
> +
> /*
> * Insert a new entry between two known consecutive entries.
> *
> @@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> * @pos: the type * to use as a loop cursor.
> * @head: the head for your list.
> * @member: the name of the list_head within the struct.
> + * @cond: optional lockdep expression if called from non-RCU protection.
> *
> * This list-traversal primitive may safely run concurrently with
> * the _rcu list-mutation primitives such as list_add_rcu()
> * as long as the traversal is guarded by rcu_read_lock().
> */
> -#define list_for_each_entry_rcu(pos, head, member) \
> - for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> - &pos->member != (head); \
> +#define list_for_each_entry_rcu(pos, head, member, cond...) \
> + for (__list_check_rcu(dummy, ## cond, 0), \
> + pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> + &pos->member != (head); \
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> /**
> @@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
> * @pos: the type * to use as a loop cursor.
> * @head: the head for your list.
> * @member: the name of the hlist_node within the struct.
> + * @cond: optional lockdep expression if called from non-RCU protection.
> *
> * This list-traversal primitive may safely run concurrently with
> * the _rcu list-mutation primitives such as hlist_add_head_rcu()
> * as long as the traversal is guarded by rcu_read_lock().
> */
> -#define hlist_for_each_entry_rcu(pos, head, member) \
> - for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> +#define hlist_for_each_entry_rcu(pos, head, member, cond...) \
> + for (__list_check_rcu(dummy, ## cond, 0), \
> + pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> typeof(*(pos)), member); \
> pos; \
> pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 8f7167478c1d..f3c29efdf19a 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
> int rcu_read_lock_held(void);
> int rcu_read_lock_bh_held(void);
> int rcu_read_lock_sched_held(void);
> +int rcu_read_lock_any_held(void);
>
> #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> @@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
> {
> return !preemptible();
> }
> +
> +static inline int rcu_read_lock_any_held(void)
> +{
> + return !preemptible();
> +}
> +
> #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> #ifdef CONFIG_PROVE_RCU
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 5ec3ea4028e2..7fbd21dbfcd0 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -8,6 +8,17 @@ menu "RCU Debugging"
> config PROVE_RCU
> def_bool PROVE_LOCKING
>
> +config PROVE_RCU_LIST
> + bool "RCU list lockdep debugging"
> + depends on PROVE_RCU
This must also depend on RCU_EXPERT.
> + default n
> + help
> + Enable RCU lockdep checking for list usages. By default it is
> + turned off since there are several list RCU users that still
> + need to be converted to pass a lockdep expression. To prevent
> + false-positive splats, we keep it default disabled but once all
> + users are converted, we can remove this config option.
> +
> config TORTURE_TEST
> tristate
> default n
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 9dd5aeef6e70..b7a4e3b5fa98 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -91,14 +91,18 @@ module_param(rcu_normal_after_boot, int, 0);
> * Similarly, we avoid claiming an SRCU read lock held if the current
> * CPU is offline.
> */
> +#define rcu_read_lock_held_common() \
> + if (!debug_lockdep_rcu_enabled()) \
> + return 1; \
> + if (!rcu_is_watching()) \
> + return 0; \
> + if (!rcu_lockdep_current_cpu_online()) \
> + return 0;
Nice abstraction of common code!
Thanx, Paul
> +
> int rcu_read_lock_sched_held(void)
> {
> - if (!debug_lockdep_rcu_enabled())
> - return 1;
> - if (!rcu_is_watching())
> - return 0;
> - if (!rcu_lockdep_current_cpu_online())
> - return 0;
> + rcu_read_lock_held_common();
> +
> return lock_is_held(&rcu_sched_lock_map) || !preemptible();
> }
> EXPORT_SYMBOL(rcu_read_lock_sched_held);
> @@ -257,12 +261,8 @@ NOKPROBE_SYMBOL(debug_lockdep_rcu_enabled);
> */
> int rcu_read_lock_held(void)
> {
> - if (!debug_lockdep_rcu_enabled())
> - return 1;
> - if (!rcu_is_watching())
> - return 0;
> - if (!rcu_lockdep_current_cpu_online())
> - return 0;
> + rcu_read_lock_held_common();
> +
> return lock_is_held(&rcu_lock_map);
> }
> EXPORT_SYMBOL_GPL(rcu_read_lock_held);
> @@ -284,16 +284,24 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
> */
> int rcu_read_lock_bh_held(void)
> {
> - if (!debug_lockdep_rcu_enabled())
> - return 1;
> - if (!rcu_is_watching())
> - return 0;
> - if (!rcu_lockdep_current_cpu_online())
> - return 0;
> + rcu_read_lock_held_common();
> +
> return in_softirq() || irqs_disabled();
> }
> EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
>
> +int rcu_read_lock_any_held(void)
> +{
> + rcu_read_lock_held_common();
> +
> + if (lock_is_held(&rcu_lock_map) ||
> + lock_is_held(&rcu_bh_lock_map) ||
> + lock_is_held(&rcu_sched_lock_map))
> + return 1;
> + return !preemptible();
> +}
> +EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
> +
> #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> /**
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH 3/9] rcu/sync: Remove custom check for reader-section (v2)
From: Paul E. McKenney @ 2019-07-16 18:39 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-4-joel@joelfernandes.org>
On Mon, Jul 15, 2019 at 10:36:59AM -0400, Joel Fernandes (Google) wrote:
> The rcu/sync code was doing its own check whether we are in a reader
> section. With RCU consolidating flavors and the generic helper added in
> this series, this is no longer need. We can just use the generic helper
> and it results in a nice cleanup.
>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
This one looks good!
Thanx, Paul
> ---
> include/linux/rcu_sync.h | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> index 9b83865d24f9..0027d4c8087c 100644
> --- a/include/linux/rcu_sync.h
> +++ b/include/linux/rcu_sync.h
> @@ -31,9 +31,7 @@ struct rcu_sync {
> */
> static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> {
> - RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> - !rcu_read_lock_bh_held() &&
> - !rcu_read_lock_sched_held(),
> + RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> "suspicious rcu_sync_is_idle() usage");
> return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
> }
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH v2 2/9] rcu: Add support for consolidated-RCU reader checking
From: Joel Fernandes @ 2019-07-16 18:35 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716182237.GA22819@linux.ibm.com>
On Tue, Jul 16, 2019 at 11:22:37AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 01:00:17PM -0400, Joel Fernandes (Google) wrote:
> > This patch adds support for checking RCU reader sections in list
> > traversal macros. Optionally, if the list macro is called under SRCU or
> > other lock/mutex protection, then appropriate lockdep expressions can be
> > passed to make the checks pass.
> >
> > Existing list_for_each_entry_rcu() invocations don't need to pass the
> > optional fourth argument (cond) unless they are under some non-RCU
> > protection and needs to make lockdep check pass.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> If you fold in the checks for extra parameters, I will take this
> one and also 1/9.
I folded the checks in and also threw in the rcu-sync with Oleg's ack:
Could you pull into /dev branch?
git pull https://github.com/joelagnel/linux-kernel.git list-first-three
(Based on your dev branch)
^ permalink raw reply
* Re: [PATCH v2 2/9] rcu: Add support for consolidated-RCU reader checking
From: Paul E. McKenney @ 2019-07-16 18:22 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
Josh Triplett, keescook, kernel-hardening, kernel-team,
Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712170024.111093-3-joel@joelfernandes.org>
On Fri, Jul 12, 2019 at 01:00:17PM -0400, Joel Fernandes (Google) wrote:
> This patch adds support for checking RCU reader sections in list
> traversal macros. Optionally, if the list macro is called under SRCU or
> other lock/mutex protection, then appropriate lockdep expressions can be
> passed to make the checks pass.
>
> Existing list_for_each_entry_rcu() invocations don't need to pass the
> optional fourth argument (cond) unless they are under some non-RCU
> protection and needs to make lockdep check pass.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
If you fold in the checks for extra parameters, I will take this
one and also 1/9.
Thanx, Paul
> ---
> include/linux/rculist.h | 28 +++++++++++++++++++++++-----
> include/linux/rcupdate.h | 7 +++++++
> kernel/rcu/Kconfig.debug | 11 +++++++++++
> kernel/rcu/update.c | 14 ++++++++++++++
> 4 files changed, 55 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index e91ec9ddcd30..1048160625bb 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
> */
> #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
>
> +/*
> + * Check during list traversal that we are within an RCU reader
> + */
> +
> +#ifdef CONFIG_PROVE_RCU_LIST
> +#define __list_check_rcu(dummy, cond, ...) \
> + ({ \
> + RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
> + "RCU-list traversed in non-reader section!"); \
> + })
> +#else
> +#define __list_check_rcu(dummy, cond, ...) ({})
> +#endif
> +
> /*
> * Insert a new entry between two known consecutive entries.
> *
> @@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
> * @pos: the type * to use as a loop cursor.
> * @head: the head for your list.
> * @member: the name of the list_head within the struct.
> + * @cond: optional lockdep expression if called from non-RCU protection.
> *
> * This list-traversal primitive may safely run concurrently with
> * the _rcu list-mutation primitives such as list_add_rcu()
> * as long as the traversal is guarded by rcu_read_lock().
> */
> -#define list_for_each_entry_rcu(pos, head, member) \
> - for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> - &pos->member != (head); \
> +#define list_for_each_entry_rcu(pos, head, member, cond...) \
> + for (__list_check_rcu(dummy, ## cond, 0), \
> + pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> + &pos->member != (head); \
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> /**
> @@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
> * @pos: the type * to use as a loop cursor.
> * @head: the head for your list.
> * @member: the name of the hlist_node within the struct.
> + * @cond: optional lockdep expression if called from non-RCU protection.
> *
> * This list-traversal primitive may safely run concurrently with
> * the _rcu list-mutation primitives such as hlist_add_head_rcu()
> * as long as the traversal is guarded by rcu_read_lock().
> */
> -#define hlist_for_each_entry_rcu(pos, head, member) \
> - for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> +#define hlist_for_each_entry_rcu(pos, head, member, cond...) \
> + for (__list_check_rcu(dummy, ## cond, 0), \
> + pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> typeof(*(pos)), member); \
> pos; \
> pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 922bb6848813..712b464ab960 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -223,6 +223,7 @@ int debug_lockdep_rcu_enabled(void);
> int rcu_read_lock_held(void);
> int rcu_read_lock_bh_held(void);
> int rcu_read_lock_sched_held(void);
> +int rcu_read_lock_any_held(void);
>
> #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> @@ -243,6 +244,12 @@ static inline int rcu_read_lock_sched_held(void)
> {
> return !preemptible();
> }
> +
> +static inline int rcu_read_lock_any_held(void)
> +{
> + return !preemptible();
> +}
> +
> #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> #ifdef CONFIG_PROVE_RCU
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 0ec7d1d33a14..b20d0e2903d1 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -7,6 +7,17 @@ menu "RCU Debugging"
> config PROVE_RCU
> def_bool PROVE_LOCKING
>
> +config PROVE_RCU_LIST
> + bool "RCU list lockdep debugging"
> + depends on PROVE_RCU
> + default n
> + help
> + Enable RCU lockdep checking for list usages. By default it is
> + turned off since there are several list RCU users that still
> + need to be converted to pass a lockdep expression. To prevent
> + false-positive splats, we keep it default disabled but once all
> + users are converted, we can remove this config option.
> +
> config TORTURE_TEST
> tristate
> default n
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index bb961cd89e76..0cc7be0fb6b5 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -294,6 +294,20 @@ int rcu_read_lock_bh_held(void)
> }
> EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
>
> +int rcu_read_lock_any_held(void)
> +{
> + if (!debug_lockdep_rcu_enabled())
> + return 1;
> + if (!rcu_is_watching())
> + return 0;
> + if (!rcu_lockdep_current_cpu_online())
> + return 0;
> + if (lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_sched_lock_map))
> + return 1;
> + return !preemptible();
> +}
> +EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
> +
> #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> /**
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-16 18:28 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716182642.GB22819@linux.ibm.com>
On Tue, Jul 16, 2019 at 11:26:42AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > The rcu/sync code was doing its own check whether we are in a reader
> > section. With RCU consolidating flavors and the generic helper added in
> > this series, this is no longer need. We can just use the generic helper
> > and it results in a nice cleanup.
> >
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> This needs to be forward-ported to current mainline. (Or, I believe
> equivalently for this file, to branch "dev" of -rcu.)
>
> Especially given that you have Oleg's Ack, I would be happy to
> take the forward-ported version.
Never mind, I am one version behind. Apologies for the noise!
Thanx, Paul
> > ---
> > Please note: Only build and boot tested this particular patch so far.
> >
> > include/linux/rcu_sync.h | 5 ++---
> > kernel/rcu/sync.c | 22 ----------------------
> > 2 files changed, 2 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > index 6fc53a1345b3..c954f1efc919 100644
> > --- a/include/linux/rcu_sync.h
> > +++ b/include/linux/rcu_sync.h
> > @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
> > */
> > static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > {
> > -#ifdef CONFIG_PROVE_RCU
> > - rcu_sync_lockdep_assert(rsp);
> > -#endif
> > + RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > + "suspicious rcu_sync_is_idle() usage");
> > return !rsp->gp_state; /* GP_IDLE */
> > }
> >
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index a8304d90573f..535e02601f56 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -10,37 +10,25 @@
> > #include <linux/rcu_sync.h>
> > #include <linux/sched.h>
> >
> > -#ifdef CONFIG_PROVE_RCU
> > -#define __INIT_HELD(func) .held = func,
> > -#else
> > -#define __INIT_HELD(func)
> > -#endif
> > -
> > static const struct {
> > void (*sync)(void);
> > void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> > void (*wait)(void);
> > -#ifdef CONFIG_PROVE_RCU
> > - int (*held)(void);
> > -#endif
> > } gp_ops[] = {
> > [RCU_SYNC] = {
> > .sync = synchronize_rcu,
> > .call = call_rcu,
> > .wait = rcu_barrier,
> > - __INIT_HELD(rcu_read_lock_held)
> > },
> > [RCU_SCHED_SYNC] = {
> > .sync = synchronize_rcu,
> > .call = call_rcu,
> > .wait = rcu_barrier,
> > - __INIT_HELD(rcu_read_lock_sched_held)
> > },
> > [RCU_BH_SYNC] = {
> > .sync = synchronize_rcu,
> > .call = call_rcu,
> > .wait = rcu_barrier,
> > - __INIT_HELD(rcu_read_lock_bh_held)
> > },
> > };
> >
> > @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> >
> > #define rss_lock gp_wait.lock
> >
> > -#ifdef CONFIG_PROVE_RCU
> > -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> > -{
> > - RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> > - "suspicious rcu_sync_is_idle() usage");
> > -}
> > -
> > -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> > -#endif
> > -
> > /**
> > * rcu_sync_init() - Initialize an rcu_sync structure
> > * @rsp: Pointer to rcu_sync structure to be initialized
> > --
> > 2.22.0.510.g264f2c817a-goog
> >
^ permalink raw reply
* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-16 18:26 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712170024.111093-4-joel@joelfernandes.org>
On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> The rcu/sync code was doing its own check whether we are in a reader
> section. With RCU consolidating flavors and the generic helper added in
> this series, this is no longer need. We can just use the generic helper
> and it results in a nice cleanup.
>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
This needs to be forward-ported to current mainline. (Or, I believe
equivalently for this file, to branch "dev" of -rcu.)
Especially given that you have Oleg's Ack, I would be happy to
take the forward-ported version.
Thanx, Paul
> ---
> Please note: Only build and boot tested this particular patch so far.
>
> include/linux/rcu_sync.h | 5 ++---
> kernel/rcu/sync.c | 22 ----------------------
> 2 files changed, 2 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> index 6fc53a1345b3..c954f1efc919 100644
> --- a/include/linux/rcu_sync.h
> +++ b/include/linux/rcu_sync.h
> @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
> */
> static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> {
> -#ifdef CONFIG_PROVE_RCU
> - rcu_sync_lockdep_assert(rsp);
> -#endif
> + RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> + "suspicious rcu_sync_is_idle() usage");
> return !rsp->gp_state; /* GP_IDLE */
> }
>
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index a8304d90573f..535e02601f56 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -10,37 +10,25 @@
> #include <linux/rcu_sync.h>
> #include <linux/sched.h>
>
> -#ifdef CONFIG_PROVE_RCU
> -#define __INIT_HELD(func) .held = func,
> -#else
> -#define __INIT_HELD(func)
> -#endif
> -
> static const struct {
> void (*sync)(void);
> void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> void (*wait)(void);
> -#ifdef CONFIG_PROVE_RCU
> - int (*held)(void);
> -#endif
> } gp_ops[] = {
> [RCU_SYNC] = {
> .sync = synchronize_rcu,
> .call = call_rcu,
> .wait = rcu_barrier,
> - __INIT_HELD(rcu_read_lock_held)
> },
> [RCU_SCHED_SYNC] = {
> .sync = synchronize_rcu,
> .call = call_rcu,
> .wait = rcu_barrier,
> - __INIT_HELD(rcu_read_lock_sched_held)
> },
> [RCU_BH_SYNC] = {
> .sync = synchronize_rcu,
> .call = call_rcu,
> .wait = rcu_barrier,
> - __INIT_HELD(rcu_read_lock_bh_held)
> },
> };
>
> @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
>
> #define rss_lock gp_wait.lock
>
> -#ifdef CONFIG_PROVE_RCU
> -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> -{
> - RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> - "suspicious rcu_sync_is_idle() usage");
> -}
> -
> -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> -#endif
> -
> /**
> * rcu_sync_init() - Initialize an rcu_sync structure
> * @rsp: Pointer to rcu_sync structure to be initialized
> --
> 2.22.0.510.g264f2c817a-goog
>
^ permalink raw reply
* Re: [PATCH v12 1/5] can: m_can: Create a m_can platform framework
From: Dan Murphy @ 2019-07-16 18:23 UTC (permalink / raw)
To: wg, mkl, davem; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <dbb7bdef-820d-5dcc-d7b5-a82bc1b076fb@ti.com>
Hello
On 5/15/19 3:54 PM, Dan Murphy wrote:
> Marc
>
> On 5/9/19 11:11 AM, Dan Murphy wrote:
>> Create a m_can platform framework that peripheral
>> devices can register to and use common code and register sets.
>> The peripheral devices may provide read/write and configuration
>> support of the IP.
>>
>> Acked-by: Wolfgang Grandegger <wg@grandegger.com>
>> Signed-off-by: Dan Murphy <dmurphy@ti.com>
>> ---
>>
>> v12 - Update the m_can_read/write functions to create a backtrace if the callback
>> pointer is NULL. - https://lore.kernel.org/patchwork/patch/1052302/
>>
> Is this able to be merged now?
Is there anyone out there maintaining this sub system?
Dan
> Dan
>
> <snip>
^ permalink raw reply
* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Andrii Nakryiko @ 2019-07-16 17:49 UTC (permalink / raw)
To: Jiong Wang
Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
Yonghong Song
In-Reply-To: <87wogitlbi.fsf@netronome.com>
On Tue, Jul 16, 2019 at 1:50 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >>
> >> Andrii Nakryiko writes:
> >>
> >> > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
> >> >>
> >> >>
> >> >> Andrii Nakryiko writes:
> >> >>
> >> >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >> >> >>
> >> >> >> This is an RFC based on latest bpf-next about acclerating insn patching
> >> >> >> speed, it is now near the shape of final PATCH set, and we could see the
> >> >> >> changes migrating to list patching would brings, so send out for
> >> >> >> comments. Most of the info are in cover letter. I splitted the code in a
> >> >> >> way to show API migration more easily.
> >> >> >
> >> >> >
> >> >> > Hey Jiong,
> >> >> >
> >> >> >
> >> >> > Sorry, took me a while to get to this and learn more about instruction
> >> >> > patching. Overall this looks good and I think is a good direction.
> >> >> > I'll post high-level feedback here, and some more
> >> >> > implementation-specific ones in corresponding patches.
> >> >>
> >> >> Great, thanks very much for the feedbacks. Most of your feedbacks are
> >> >> hitting those pain points I exactly had ran into. For some of them, I
> >> >> thought similar solutions like yours, but failed due to various
> >> >> reasons. Let's go through them again, I could have missed some important
> >> >> things.
> >> >>
> >> >> Please see my replies below.
> >> >
> >> > Thanks for thoughtful reply :)
> >> >
> >> >>
> >> >> >>
> >> >> >> Test Results
> >> >> >> ===
> >> >> >> - Full pass on test_verifier/test_prog/test_prog_32 under all three
> >> >> >> modes (interpreter, JIT, JIT with blinding).
> >> >> >>
> >> >> >> - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> >> >> >> patching time from 5100s (nearly one and a half hour) to less than
> >> >> >> 0.5s for 1M insn patching.
> >> >> >>
> >> >> >> Known Issues
> >> >> >> ===
> >> >> >> - The following warning is triggered when running scale test which
> >> >> >> contains 1M insns and patching:
> >> >> >> warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> >> >> >>
> >> >> >> This is caused by existing code, it can be reproduced on bpf-next
> >> >> >> master with jit blinding enabled, then run scale unit test, it will
> >> >> >> shown up after half an hour. After this set, patching is very fast, so
> >> >> >> it shows up quickly.
> >> >> >>
> >> >> >> - No line info adjustment support when doing insn delete, subprog adj
> >> >> >> is with bug when doing insn delete as well. Generally, removal of insns
> >> >> >> could possibly cause remove of entire line or subprog, therefore
> >> >> >> entries of prog->aux->linfo or env->subprog needs to be deleted. I
> >> >> >> don't have good idea and clean code for integrating this into the
> >> >> >> linearization code at the moment, will do more experimenting,
> >> >> >> appreciate ideas and suggestions on this.
> >> >> >
> >> >> > Is there any specific problem to detect which line info to delete? Or
> >> >> > what am I missing besides careful implementation?
> >> >>
> >> >> Mostly line info and subprog info are range info which covers a range of
> >> >> insns. Deleting insns could causing you adjusting the range or removing one
> >> >> range entirely. subprog info could be fully recalcuated during
> >> >> linearization while line info I need some careful implementation and I
> >> >> failed to have clean code for this during linearization also as said no
> >> >> unit tests to help me understand whether the code is correct or not.
> >> >>
> >> >
> >> > Ok, that's good that it's just about clean implementation. Try to
> >> > implement it as clearly as possible. Then post it here, and if it can
> >> > be improved someone (me?) will try to help to clean it up further.
> >> >
> >> > Not a big expert on line info, so can't comment on that,
> >> > unfortunately. Maybe Yonghong can chime in (cc'ed)
> >> >
> >> >
> >> >> I will described this latter, spent too much time writing the following
> >> >> reply. Might worth an separate discussion thread.
> >> >>
> >> >> >>
> >> >> >> Insn delete doesn't happen on normal programs, for example Cilium
> >> >> >> benchmarks, and happens rarely on test_progs, so the test coverage is
> >> >> >> not good. That's also why this RFC have a full pass on selftest with
> >> >> >> this known issue.
> >> >> >
> >> >> > I hope you'll add test for deletion (and w/ corresponding line info)
> >> >> > in final patch set :)
> >> >>
> >> >> Will try. Need to spend some time on BTF format.
> >> >> >
> >> >> >>
> >> >> >> - Could further use mem pool to accelerate the speed, changes are trivial
> >> >> >> on top of this RFC, and could be 2x extra faster. Not included in this
> >> >> >> RFC as reducing the algo complexity from quadratic to linear of insn
> >> >> >> number is the first step.
> >> >> >
> >> >> > Honestly, I think that would add more complexity than necessary, and I
> >> >> > think we can further speed up performance without that, see below.
> >> >> >
> >> >> >>
> >> >> >> Background
> >> >> >> ===
> >> >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> >> >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> >> >> >> remove insns.
> >> >> >>
> >> >> >> At the moment, insn patching is quadratic of insn number, this is due to
> >> >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> >> >> >>
> >> >> >> for insn inside prog
> >> >> >> patch insn + regeneate bpf prog
> >> >> >> for insn inside new prog
> >> >> >> adjust jump target
> >> >> >>
> >> >> >> This is causing significant time spending when a bpf prog requires large
> >> >> >> amount of patching on different insns. Benchmarking shows it could take
> >> >> >> more than half minutes to finish patching when patching number is more
> >> >> >> than 50K, and the time spent could be more than one hour when patching
> >> >> >> number is around 1M.
> >> >> >>
> >> >> >> 15000 : 3s
> >> >> >> 45000 : 29s
> >> >> >> 95000 : 125s
> >> >> >> 195000 : 712s
> >> >> >> 1000000 : 5100s
> >> >> >>
> >> >> >> This RFC introduces new patching infrastructure. Before doing insn
> >> >> >> patching, insns in bpf prog are turned into a singly linked list, insert
> >> >> >> new insns just insert new list node, delete insns just set delete flag.
> >> >> >> And finally, the list is linearized back into array, and branch target
> >> >> >> adjustment is done for all jump insns during linearization. This algo
> >> >> >> brings the time complexity from quadratic to linear of insn number.
> >> >> >>
> >> >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> >> >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> >> >> >> to less than 0.5s.
> >> >> >>
> >> >> >> Patching API
> >> >> >> ===
> >> >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> >> >> >> where only BPF insns are patched. The other is "verification layer" where
> >> >> >> insns have corresponding aux info as well high level subprog info, so
> >> >> >> insn patching means aux info needs to be patched as well, and subprog info
> >> >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> >> >> >> should always be updated after insn patching.
> >> >> >>
> >> >> >> So, list creation, destroy, insert, delete is the same for both layer,
> >> >> >> but lineration is different. "verification layer" patching require extra
> >> >> >> work. Therefore the patch APIs are:
> >> >> >>
> >> >> >> list creation: bpf_create_list_insn
> >> >> >> list patch: bpf_patch_list_insn
> >> >> >> list pre-patch: bpf_prepatch_list_insn
> >> >> >
> >> >> > I think pre-patch name is very confusing, until I read full
> >> >> > description I couldn't understand what it's supposed to be used for.
> >> >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> >> >> > me wondering whether instruction buffer is inserted after instruction,
> >> >> > or instruction is replaced with a bunch of instructions.
> >> >> >
> >> >> > So how about two more specific names:
> >> >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> >> >> > instruction with a list of patch instructions)
> >> >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> >> >> > one is pretty clear).
> >> >>
> >> >> My sense on English word is not great, will switch to above which indeed
> >> >> reads more clear.
> >> >>
> >> >> >> list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> >> >> >> list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> >> >> >
> >> >> > These two functions are both quite involved, as well as share a lot of
> >> >> > common code. I'd rather have one linearize instruction, that takes env
> >> >> > as an optional parameter. If env is specified (which is the case for
> >> >> > all cases except for constant blinding pass), then adjust aux_data and
> >> >> > subprogs along the way.
> >> >>
> >> >> Two version of lineration and how to unify them was a painpoint to me. I
> >> >> thought to factor out some of the common code out, but it actually doesn't
> >> >> count much, the final size counting + insnsi resize parts are the same,
> >> >> then things start to diverge since the "Copy over insn" loop.
> >> >>
> >> >> verifier layer needs to copy and initialize aux data etc. And jump
> >> >> relocation is different. At core layer, the use case is JIT blinding which
> >> >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
> >> >
> >> > Sorry, I didn't get what "could expand an jump_imm insn into a
> >> > and/or/jump_reg sequence", maybe you can clarify if I'm missing
> >> > something.
> >> >
> >> > But from your cover letter description, core layer has no jumps at
> >> > all, while verifier has jumps inside patch buffer. So, if you support
> >> > jumps inside of patch buffer, it will automatically work for core
> >> > layer. Or what am I missing?
> >>
> >> I meant in core layer (JIT blinding), there is the following patching:
> >>
> >> input:
> >> insn 0 insn 0
> >> insn 1 insn 1
> >> jmp_imm >> mov_imm \
> >> insn 2 xor_imm insn seq expanded from jmp_imm
> >> insn 3 jmp_reg /
> >> insn 2
> >> insn 3
> >>
> >>
> >> jmp_imm is the insn that will be patched, and the actually transformation
> >> is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
> >> at the end of the patch buffer, must jump to the same destination as the
> >> original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
> >> be relocated, and the jump destination is outside of patch buffer.
> >
> >
> > Ok, great, thanks for explaining, yeah it's definitely something that
> > we should be able to support. BUT. It got me thinking a bit more and I
> > think I have simpler and more elegant solution now, again, supporting
> > both core-layer and verifier-layer operations.
> >
> > struct bpf_patchable_insn {
> > struct bpf_patchable_insn *next;
> > struct bpf_insn insn;
> > int orig_idx; /* original non-patched index */
> > int new_idx; /* new index, will be filled only during linearization */
> > };
> >
> > struct bpf_patcher {
> > /* dummy head node of a chain of patchable instructions */
> > struct bpf_patchable_insn insn_head;
> > /* dynamic array of size(original instruction count)
> > * this is a map from original instruction index to a first
> > * patchable instruction that replaced that instruction (or
> > * just original instruction as bpf_patchable_insn).
> > */
> > int *orig_idx_to_patchable_insn;
> > int cnt;
> > };
> >
> > Few points, but it should be pretty clear just from comments and definitions:
> > 1. When you created bpf_patcher, you create patchabe_insn list, fill
> > orig_idx_to_patchable_insn map to store proper pointers. This array is
> > NEVER changed after that.
> > 2. When replacing instruction, you re-use struct bpf_patchable_insn
> > for first patched instruction, then append after that (not prepend to
> > next instruction to not disrupt orig_idx -> patchable_insn mapping).
> > 3. During linearizations, you first traverse the chain of instructions
> > and trivially assing new_idxs.
> > 4. No need for patchabe_insn->target anymore. All jumps use relative
> > instruction offsets, right?
>
> Yes, all jumps are pc-relative.
>
> > So when you need to determine new
> > instruction index during linearization, you just do (after you
> > calculated new instruction indicies):
> >
> > func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
> > int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
> > int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
> > adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
> > }
>
> Hmm, this algo is kinds of the same this RFC, just we have organized "new_index"
> as "idx_map". And in this RFC, only new_idx of one original insn matters,
> no space is allocated for patched insns. (As mentioned, JIT blinding
It's not really about saving space. It's about having a mapping from
original index to a new one (in this case, through struct
bpf_patchable_insn *), which stays correct at all times, thus allowing
to not linearize between patching passes.
> requires the last insn inside patch buffer relocated to original jump
> offset, so there was a little special handling in the relocation loop in
> core layer linearization code)
>
> > The idea is that we want to support quick look-up by original
> > instruction index. That's what orig_idx_to_patchable_insn provides. On
> > the other hand, no existing instruction is ever referencing newly
> > patched instruction by its new offset, so with careful implementation,
> > you can transparently support all the cases, regardless if it's in
> > core layer or verifier layer (so, e.g., verifier layer patched
> > instructions now will be able to jump out of patched buffer, if
> > necessary, neat, right?).
> >
> > It is cleaner than everything we've discussed so far. Unless I missed
> > something critical (it's all quite convoluted, so I might have
> > forgotten some parts already). Let me know what you think.
>
> Let me digest a little bit and do some coding, then I will come back. Some
Sure, give it some thought and give it a go at coding, I bet overall
it will turn out more succinct and simpler. Please post an updated
version when you are done. Thanks!
> issues can only shown up during in-depth coding. I kind of feel handling
> aux reference in verifier layer is the part that will still introduce some
> un-clean code.
>
> <snip>
> >> If there is no dead insn elimination opt, then we could just adjust
> >> offsets. When there is insn deleting, I feel the logic becomes more
> >> complex. One subprog could be completely deleted or partially deleted, so
> >> I feel just recalculate the whole subprog info as a side-product is
> >> much simpler.
> >
> > What's the situation where entirety of subprog can be deleted?
>
> Suppose you have conditional jmp_imm, true path calls one subprog, false
> path calls the other. If insn walker later found it is also true, then the
> subprog at false path won't be marked as "seen", so it is entirely deleted.
>
> I actually thought it is in theory one subprog could be deleted entirely,
> so if we support insn deletion inside verifier, then range info like
> line_info/subprog_info needs to consider one range is deleted.
Seems like this is not a problem, according to Alexei. But in the
worst case, it's now simple to re-calculate all this, given that we
have this simple operation to get new insn idx by old insn idx.
>
> Thanks.
> Regards,
> Jiong
^ permalink raw reply
* Re: [PATCH bpf] selftests/bpf: make directory prerequisites order-only
From: Alexei Starovoitov @ 2019-07-16 17:49 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Ilya Leoshkevich, bpf, Network Development, gor, Heiko Carstens
In-Reply-To: <a3823fec-3816-9c38-bb2d-a8391766e64d@iogearbox.net>
On Mon, Jul 15, 2019 at 3:22 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/12/19 3:56 PM, Ilya Leoshkevich wrote:
> > When directories are used as prerequisites in Makefiles, they can cause
> > a lot of unnecessary rebuilds, because a directory is considered changed
> > whenever a file in this directory is added, removed or modified.
> >
> > If the only thing a target is interested in is the existence of the
> > directory it depends on, which is the case for selftests/bpf, this
> > directory should be specified as an order-only prerequisite: it would
> > still be created in case it does not exist, but it would not trigger a
> > rebuild of a target in case it's considered changed.
> >
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>
> Applied, thanks!
Hi Ilya,
this commit breaks map_tests.
To reproduce:
rm map_tests/tests.h
make
tests.h will not be regenerated.
Please provide a fix asap.
We cannot ship bpf tree with such failure.
^ permalink raw reply
* Re: [PATCH bpf] selftests/bpf: fix perf_buffer on s390
From: Andrii Nakryiko @ 2019-07-16 17:42 UTC (permalink / raw)
To: Ilya Leoshkevich; +Cc: bpf, Networking, gor, heiko.carstens
In-Reply-To: <20190716125827.24413-1-iii@linux.ibm.com>
On Tue, Jul 16, 2019 at 5:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> perf_buffer test fails for exactly the same reason test_attach_probe
> used to fail: different nanosleep syscall kprobe name.
>
> Reuse the test_attach_probe fix.
>
> Fixes: ee5cf82ce04a ("selftests/bpf: test perf buffer API")
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
Thanks for the fix!
Acked-by: Andrii Nakryiko <andriin@fb.com>
> .../testing/selftests/bpf/prog_tests/attach_probe.c | 12 ++----------
> tools/testing/selftests/bpf/prog_tests/perf_buffer.c | 8 +-------
> tools/testing/selftests/bpf/test_progs.h | 8 ++++++++
> 3 files changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> index 47af4afc5013..5ecc267d98b0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> +++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> @@ -21,14 +21,6 @@ ssize_t get_base_addr() {
> return -EINVAL;
> }
>
> -#ifdef __x86_64__
> -#define SYS_KPROBE_NAME "__x64_sys_nanosleep"
> -#elif defined(__s390x__)
> -#define SYS_KPROBE_NAME "__s390x_sys_nanosleep"
> -#else
> -#define SYS_KPROBE_NAME "sys_nanosleep"
> -#endif
> -
> void test_attach_probe(void)
> {
> const char *kprobe_name = "kprobe/sys_nanosleep";
> @@ -86,7 +78,7 @@ void test_attach_probe(void)
>
> kprobe_link = bpf_program__attach_kprobe(kprobe_prog,
> false /* retprobe */,
> - SYS_KPROBE_NAME);
> + SYS_NANOSLEEP_KPROBE_NAME);
> if (CHECK(IS_ERR(kprobe_link), "attach_kprobe",
> "err %ld\n", PTR_ERR(kprobe_link))) {
> kprobe_link = NULL;
> @@ -94,7 +86,7 @@ void test_attach_probe(void)
> }
> kretprobe_link = bpf_program__attach_kprobe(kretprobe_prog,
> true /* retprobe */,
> - SYS_KPROBE_NAME);
> + SYS_NANOSLEEP_KPROBE_NAME);
> if (CHECK(IS_ERR(kretprobe_link), "attach_kretprobe",
> "err %ld\n", PTR_ERR(kretprobe_link))) {
> kretprobe_link = NULL;
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> index 3f1ef95865ff..3003fddc0613 100644
> --- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> @@ -5,12 +5,6 @@
> #include <sys/socket.h>
> #include <test_progs.h>
>
> -#ifdef __x86_64__
> -#define SYS_KPROBE_NAME "__x64_sys_nanosleep"
> -#else
> -#define SYS_KPROBE_NAME "sys_nanosleep"
> -#endif
> -
> static void on_sample(void *ctx, int cpu, void *data, __u32 size)
> {
> int cpu_data = *(int *)data, duration = 0;
> @@ -56,7 +50,7 @@ void test_perf_buffer(void)
>
> /* attach kprobe */
> link = bpf_program__attach_kprobe(prog, false /* retprobe */,
> - SYS_KPROBE_NAME);
> + SYS_NANOSLEEP_KPROBE_NAME);
> if (CHECK(IS_ERR(link), "attach_kprobe", "err %ld\n", PTR_ERR(link)))
> goto out_close;
>
> diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> index f095e1d4c657..49e0f7d85643 100644
> --- a/tools/testing/selftests/bpf/test_progs.h
> +++ b/tools/testing/selftests/bpf/test_progs.h
> @@ -92,3 +92,11 @@ int compare_map_keys(int map1_fd, int map2_fd);
> int compare_stack_ips(int smap_fd, int amap_fd, int stack_trace_len);
> int extract_build_id(char *build_id, size_t size);
> void *spin_lock_thread(void *arg);
> +
> +#ifdef __x86_64__
> +#define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep"
> +#elif defined(__s390x__)
> +#define SYS_NANOSLEEP_KPROBE_NAME "__s390x_sys_nanosleep"
> +#else
> +#define SYS_NANOSLEEP_KPROBE_NAME "sys_nanosleep"
> +#endif
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH bpf] libbpf: fix another GCC8 warning for strncpy
From: Alexei Starovoitov @ 2019-07-16 17:35 UTC (permalink / raw)
To: Magnus Karlsson
Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, bpf,
Network Development, Andrii Nakryiko, Kernel Team,
Magnus Karlsson
In-Reply-To: <CAJ8uoz03xFA4TW7GNmLAw_A0wMjHUjYU2rG3pRWsEX-sAX8BFw@mail.gmail.com>
On Tue, Jul 16, 2019 at 10:31 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Tue, Jul 16, 2019 at 5:59 AM Andrii Nakryiko <andriin@fb.com> wrote:
> >
> > Similar issue was fixed in cdfc7f888c2a ("libbpf: fix GCC8 warning for
> > strncpy") already. This one was missed. Fixing now.
>
> Thanks Andrii.
>
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Applied. Thanks
^ permalink raw reply
* Re: [PATCH bpf] bpf: net: Set sk_bpf_storage back to NULL for cloned sk
From: Stanislav Fomichev @ 2019-07-16 17:32 UTC (permalink / raw)
To: Martin Lau
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Alexei Starovoitov,
Daniel Borkmann, David Miller, Kernel Team
In-Reply-To: <20190716054624.ea6sbbzn62grde2n@kafai-mbp>
On 07/16, Martin Lau wrote:
> On Tue, Jul 09, 2019 at 09:33:21AM -0700, Stanislav Fomichev wrote:
> > On 06/11, Martin KaFai Lau wrote:
> > > The cloned sk should not carry its parent-listener's sk_bpf_storage.
> > > This patch fixes it by setting it back to NULL.
> > Have you thought about some kind of inheritance for listener sockets'
> > storage? Suppose I have a situation where I write something
> > to listener's sk storage (directly or via recently added sockopts hooks)
> > and I want to inherit that state for a freshly established connection.
> >
> > I was looking into adding possibility to call bpf_get_listener_sock form
> > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB callback to manually
> > copy some data form the listener socket, but I don't think
> > at this point there is any association between newly established
> > socket and the listener.
> Right, at that point, the child sk has no reference back
> to the listener's sk.
>
> After a quick look, the listener sk may not always be available
> also (e.g. the backlog processing case). Hence, adding
> the listener sk to the bpf running ctx is not obvious
> either.
>
> >
> > Thoughts/ideas?
> I think cloning the listener's bpf sk storage could be added
> to the existing sk cloning logic. It seems to be a more straight
> forward approach instead of figuring out the right place to call
> another bpf prog to clone it.
>
> Quick thoughts out of my head:
> 1. Default should be not-to-clone. Have a way (a map's flag?) to opt-in.
> 2. The listener's sk storage could be being modified while being cloned.
> One possibility is to check if the value has bpf_spin_lock.
> If there is, lock it before cloning.
Thanks for suggestion! An optional inherit/clone flag to
bpf_sk_storage_get seems like a good option. I'll try to play with it,
will probably get back with an rfc at some point.
^ permalink raw reply
* Re: [PATCH bpf] libbpf: fix another GCC8 warning for strncpy
From: Magnus Karlsson @ 2019-07-16 17:31 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
Andrii Nakryiko, kernel-team, Magnus Karlsson
In-Reply-To: <20190716035704.948081-1-andriin@fb.com>
On Tue, Jul 16, 2019 at 5:59 AM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Similar issue was fixed in cdfc7f888c2a ("libbpf: fix GCC8 warning for
> strncpy") already. This one was missed. Fixing now.
Thanks Andrii.
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
> tools/lib/bpf/xsk.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index b33740221b7e..5007b5d4fd2c 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -517,7 +517,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> err = -errno;
> goto out_socket;
> }
> - strncpy(xsk->ifname, ifname, IFNAMSIZ);
> + strncpy(xsk->ifname, ifname, IFNAMSIZ - 1);
> + xsk->ifname[IFNAMSIZ - 1] = '\0';
>
> err = xsk_set_xdp_socket_config(&xsk->config, usr_config);
> if (err)
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH -next] iwlwifi: dbg: work around clang bug by marking debug strings static
From: Nick Desaulniers @ 2019-07-16 17:28 UTC (permalink / raw)
To: Joe Perches, Kalle Valo
Cc: Arnd Bergmann, Nathan Chancellor, Johannes Berg,
Emmanuel Grumbach, Luca Coelho, Intel Linux Wireless,
David S. Miller, Shahar S Matityahu, Sara Sharon, linux-wireless,
netdev, LKML, clang-built-linux
In-Reply-To: <b219cf41933b2f965572af515cf9d3119293bfba.camel@perches.com>
On Thu, Jul 11, 2019 at 7:15 PM Joe Perches <joe@perches.com> wrote:
>
> On Thu, 2019-07-11 at 17:17 -0700, Nick Desaulniers wrote:
> > Commit r353569 in prerelease Clang-9 is producing a linkage failure:
> >
> > ld: drivers/net/wireless/intel/iwlwifi/fw/dbg.o:
> > in function `_iwl_fw_dbg_apply_point':
> > dbg.c:(.text+0x827a): undefined reference to `__compiletime_assert_2387'
> >
> > when the following configs are enabled:
> > - CONFIG_IWLWIFI
> > - CONFIG_IWLMVM
> > - CONFIG_KASAN
> >
> > Work around the issue for now by marking the debug strings as `static`,
> > which they probably should be any ways.
> >
> > Link: https://bugs.llvm.org/show_bug.cgi?id=42580
> > Link: https://github.com/ClangBuiltLinux/linux/issues/580
> > Reported-by: Arnd Bergmann <arnd@arndb.de>
> > Reported-by: Nathan Chancellor <natechancellor@gmail.com>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> > ---
> > drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> > index e411ac98290d..f8c90ea4e9b4 100644
> > --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> > @@ -2438,7 +2438,7 @@ static void iwl_fw_dbg_info_apply(struct iwl_fw_runtime *fwrt,
> > {
> > u32 img_name_len = le32_to_cpu(dbg_info->img_name_len);
> > u32 dbg_cfg_name_len = le32_to_cpu(dbg_info->dbg_cfg_name_len);
> > - const char err_str[] =
> > + static const char err_str[] =
> > "WRT: ext=%d. Invalid %s name length %d, expected %d\n";
>
> Better still would be to use the format string directly
> in both locations instead of trying to deduplicate it
> via storing it into a separate pointer.
>
> Let the compiler/linker consolidate the format.
> It's smaller object code, allows format/argument verification,
> and is simpler for humans to understand.
Whichever Kalle prefers, I just want my CI green again.
>
> ---
> diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> index e411ac98290d..25e6712932b8 100644
> --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
> @@ -2438,17 +2438,17 @@ static void iwl_fw_dbg_info_apply(struct iwl_fw_runtime *fwrt,
> {
> u32 img_name_len = le32_to_cpu(dbg_info->img_name_len);
> u32 dbg_cfg_name_len = le32_to_cpu(dbg_info->dbg_cfg_name_len);
> - const char err_str[] =
> - "WRT: ext=%d. Invalid %s name length %d, expected %d\n";
>
> if (img_name_len != IWL_FW_INI_MAX_IMG_NAME_LEN) {
> - IWL_WARN(fwrt, err_str, ext, "image", img_name_len,
> + IWL_WARN(fwrt, "WRT: ext=%d. Invalid %s name length %d, expected %d\n",
> + ext, "image", img_name_len,
> IWL_FW_INI_MAX_IMG_NAME_LEN);
> return;
> }
>
> if (dbg_cfg_name_len != IWL_FW_INI_MAX_DBG_CFG_NAME_LEN) {
> - IWL_WARN(fwrt, err_str, ext, "debug cfg", dbg_cfg_name_len,
> + IWL_WARN(fwrt, "WRT: ext=%d. Invalid %s name length %d, expected %d\n",
> + ext, "debug cfg", dbg_cfg_name_len,
> IWL_FW_INI_MAX_DBG_CFG_NAME_LEN);
> return;
> }
> @@ -2775,8 +2775,6 @@ static void _iwl_fw_dbg_apply_point(struct iwl_fw_runtime *fwrt,
> struct iwl_ucode_tlv *tlv = iter;
> void *ini_tlv = (void *)tlv->data;
> u32 type = le32_to_cpu(tlv->type);
> - const char invalid_ap_str[] =
> - "WRT: ext=%d. Invalid apply point %d for %s\n";
>
> switch (type) {
> case IWL_UCODE_TLV_TYPE_DEBUG_INFO:
> @@ -2786,8 +2784,8 @@ static void _iwl_fw_dbg_apply_point(struct iwl_fw_runtime *fwrt,
> struct iwl_fw_ini_allocation_data *buf_alloc = ini_tlv;
>
> if (pnt != IWL_FW_INI_APPLY_EARLY) {
> - IWL_ERR(fwrt, invalid_ap_str, ext, pnt,
> - "buffer allocation");
> + IWL_ERR(fwrt, "WRT: ext=%d. Invalid apply point %d for %s\n",
> + ext, pnt, "buffer allocation");
> goto next;
> }
>
> @@ -2797,8 +2795,8 @@ static void _iwl_fw_dbg_apply_point(struct iwl_fw_runtime *fwrt,
> }
> case IWL_UCODE_TLV_TYPE_HCMD:
> if (pnt < IWL_FW_INI_APPLY_AFTER_ALIVE) {
> - IWL_ERR(fwrt, invalid_ap_str, ext, pnt,
> - "host command");
> + IWL_ERR(fwrt, "WRT: ext=%d. Invalid apply point %d for %s\n",
> + ext, pnt, "host command");
> goto next;
> }
> iwl_fw_dbg_send_hcmd(fwrt, tlv, ext);
>
>
--
Thanks,
~Nick Desaulniers
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox