Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH 8/9] acpi: Use built-in RCU list checking for acpi_ioremaps list (v1)
From: Rafael J. Wysocki @ 2019-07-15 21:44 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: Linux Kernel Mailing List, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, Kees Cook,
	Kernel Hardening, Cc: Android Kernel, Lai Jiangshan, Len Brown,
	ACPI Devel Maling List, open list:DOCUMENTATION, Linux PCI,
	Linux PM, Mathieu Desnoyers, NeilBrown, netdev, Oleg Nesterov,
	Paul E. McKenney, Pavel Machek, Peter Zijlstra, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Will Deacon, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-9-joel@joelfernandes.org>

On Mon, Jul 15, 2019 at 4:43 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
> it for acpi_ioremaps list traversal.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/osl.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 9c0edf2fc0dd..2f9d0d20b836 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -14,6 +14,7 @@
>  #include <linux/slab.h>
>  #include <linux/mm.h>
>  #include <linux/highmem.h>
> +#include <linux/lockdep.h>
>  #include <linux/pci.h>
>  #include <linux/interrupt.h>
>  #include <linux/kmod.h>
> @@ -80,6 +81,7 @@ struct acpi_ioremap {
>
>  static LIST_HEAD(acpi_ioremaps);
>  static DEFINE_MUTEX(acpi_ioremap_lock);
> +#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
>
>  static void __init acpi_request_region (struct acpi_generic_address *gas,
>         unsigned int length, char *desc)
> @@ -206,7 +208,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
>  {
>         struct acpi_ioremap *map;
>
> -       list_for_each_entry_rcu(map, &acpi_ioremaps, list)
> +       list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
>                 if (map->phys <= phys &&
>                     phys + size <= map->phys + map->size)
>                         return map;
> @@ -249,7 +251,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
>  {
>         struct acpi_ioremap *map;
>
> -       list_for_each_entry_rcu(map, &acpi_ioremaps, list)
> +       list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
>                 if (map->virt <= virt &&
>                     virt + size <= map->virt + map->size)
>                         return map;
> --
> 2.22.0.510.g264f2c817a-goog
>

^ permalink raw reply

* Re: [PATCH v9 01/18] kunit: test: add KUnit test runner core
From: Brendan Higgins @ 2019-07-15 21:25 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <20190715201054.C69AA2086C@mail.kernel.org>

On Mon, Jul 15, 2019 at 1:10 PM Stephen Boyd <sboyd@kernel.org> wrote:
>
> Quoting Brendan Higgins (2019-07-12 01:17:27)
> > Add core facilities for defining unit tests; this provides a common way
> > to define test cases, functions that execute code which is under test
> > and determine whether the code under test behaves as expected; this also
> > provides a way to group together related test cases in test suites (here
> > we call them test_modules).
> >
> > Just define test cases and how to execute them for now; setting
> > expectations on code will be defined later.
> >
> > Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> > Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>
> Reviewed-by: Stephen Boyd <sboyd@kernel.org>
>
> Minor nits below.
>
> > diff --git a/kunit/test.c b/kunit/test.c
> > new file mode 100644
> > index 0000000000000..571e4c65deb5c
> > --- /dev/null
> > +++ b/kunit/test.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Base unit test (KUnit) API.
> > + *
> > + * Copyright (C) 2019, Google LLC.
> > + * Author: Brendan Higgins <brendanhiggins@google.com>
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <kunit/test.h>
> > +
> > +static void kunit_set_failure(struct kunit *test)
> > +{
> > +       WRITE_ONCE(test->success, false);
> > +}
> > +
> [...]
> > +
> > +void kunit_init_test(struct kunit *test, const char *name)
> > +{
> > +       test->name = name;
> > +       test->success = true;
> > +}
> > +
> > +/*
> > + * Performs all logic to run a test case.
> > + */
> > +static void kunit_run_case(struct kunit_suite *suite,
> > +                          struct kunit_case *test_case)
> > +{
> > +       struct kunit test;
> > +       int ret = 0;
> > +
> > +       kunit_init_test(&test, test_case->name);
> > +
> > +       if (suite->init) {
> > +               ret = suite->init(&test);
>
> Can you push the ret definition into this if scope? That way we can
> avoid default initialize to 0 for it.

Sure! I would actually prefer that from a cosmetic standpoint. I just
thought that mixing declarations and code was against the style guide.

> > +               if (ret) {
> > +                       kunit_err(&test, "failed to initialize: %d\n", ret);
> > +                       kunit_set_failure(&test);
>
> Do we need to 'test_case->success = test.success' here too? Or is the
> test failure extracted somewhere else?

Er, yes. That's kind of embarrassing. Good catch.

> > +                       return;
> > +               }
> > +       }
> > +
> > +       test_case->run_case(&test);
> > +
> > +       if (suite->exit)
> > +               suite->exit(&test);
> > +
> > +       test_case->success = test.success;

Thanks!

^ permalink raw reply

* Re: [PATCH v9 03/18] kunit: test: add string_stream a std::stream like string builder
From: Brendan Higgins @ 2019-07-15 21:11 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <20190715204356.4E3F92145D@mail.kernel.org>

On Mon, Jul 15, 2019 at 1:43 PM Stephen Boyd <sboyd@kernel.org> wrote:
>
> Quoting Brendan Higgins (2019-07-12 01:17:29)
> > diff --git a/include/kunit/string-stream.h b/include/kunit/string-stream.h
> > new file mode 100644
> > index 0000000000000..0552a05781afe
> > --- /dev/null
> > +++ b/include/kunit/string-stream.h
> > @@ -0,0 +1,49 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * C++ stream style string builder used in KUnit for building messages.
> > + *
> > + * Copyright (C) 2019, Google LLC.
> > + * Author: Brendan Higgins <brendanhiggins@google.com>
> > + */
> > +
> > +#ifndef _KUNIT_STRING_STREAM_H
> > +#define _KUNIT_STRING_STREAM_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/kref.h>
>
> What is this include for? I'd expect to see linux/list.h instead.

Sorry about that. I used to reference count this before I made it a
kunit managed resource.

> > +#include <stdarg.h>
> > +
> > +struct string_stream_fragment {
> > +       struct list_head node;
> > +       char *fragment;
> > +};
> > +
> > +struct string_stream {
> > +       size_t length;
> > +       struct list_head fragments;
> > +       /* length and fragments are protected by this lock */
> > +       spinlock_t lock;
> > +};
> > +
> > diff --git a/kunit/string-stream.c b/kunit/string-stream.c
> > new file mode 100644
> > index 0000000000000..0463a92dad74b
> > --- /dev/null
> > +++ b/kunit/string-stream.c
> > @@ -0,0 +1,147 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * C++ stream style string builder used in KUnit for building messages.
> > + *
> > + * Copyright (C) 2019, Google LLC.
> > + * Author: Brendan Higgins <brendanhiggins@google.com>
> > + */
> > +
> > +#include <linux/list.h>
> > +#include <linux/slab.h>
> > +#include <kunit/string-stream.h>
> > +#include <kunit/test.h>
> > +
> > +int string_stream_vadd(struct string_stream *stream,
> > +                      const char *fmt,
> > +                      va_list args)
> > +{
> > +       struct string_stream_fragment *frag_container;
> > +       int len;
> > +       va_list args_for_counting;
> > +       unsigned long flags;
> > +
> > +       /* Make a copy because `vsnprintf` could change it */
> > +       va_copy(args_for_counting, args);
> > +
> > +       /* Need space for null byte. */
> > +       len = vsnprintf(NULL, 0, fmt, args_for_counting) + 1;
> > +
> > +       va_end(args_for_counting);
> > +
> > +       frag_container = kmalloc(sizeof(*frag_container), GFP_KERNEL);
>
> This is confusing in that it allocates with GFP_KERNEL but then grabs a
> spinlock to add and remove from the fragment list. Is it ever going to
> be called from a place where it can't sleep? If so, the GFP_KERNEL needs
> to be changed. Otherwise, maybe a mutex would work better to protect
> access to the fragment list.

Right, using a mutex here would be fine. Sorry, I meant to filter for
my usage of them after you asked me to remove them in 01, but
evidently I forgot to do so. Sorry, will fix.

> I also wonder if it would be better to just have a big slop buffer of a
> 4K page or something so that we almost never have to allocate anything
> with a string_stream and we can just rely on a reader consuming data
> while writers are writing. That might work out better, but I don't quite
> understand the use case for the string stream.

That makes sense, but might that also waste memory since we will
almost never need that much memory?

> > +       if (!frag_container)
> > +               return -ENOMEM;
> > +
> > +       frag_container->fragment = kmalloc(len, GFP_KERNEL);
> > +       if (!frag_container->fragment) {
> > +               kfree(frag_container);
> > +               return -ENOMEM;
> > +       }
> > +
> > +       len = vsnprintf(frag_container->fragment, len, fmt, args);
> > +       spin_lock_irqsave(&stream->lock, flags);
> > +       stream->length += len;
> > +       list_add_tail(&frag_container->node, &stream->fragments);
> > +       spin_unlock_irqrestore(&stream->lock, flags);
> > +
> > +       return 0;
> > +}
> > +
> [...]
> > +
> > +bool string_stream_is_empty(struct string_stream *stream)
> > +{
> > +       bool is_empty;
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&stream->lock, flags);
>
> I'm not sure what benefit grabbing the lock is having here. If the list
> isn't empty after this is called then the race isn't resolved by
> grabbing and releasing the lock. The function is returning stale data in
> that case.

Good point, I didn't realize list_empty was protected by READ_ONCE. Will fix.

> > +       is_empty = list_empty(&stream->fragments);
> > +       spin_unlock_irqrestore(&stream->lock, flags);
> > +
> > +       return is_empty;
> > +}
> > +
> > +static int string_stream_init(struct kunit_resource *res, void *context)
> > +{
> > +       struct string_stream *stream;
> > +
> > +       stream = kzalloc(sizeof(*stream), GFP_KERNEL);
> > +       if (!stream)
> > +               return -ENOMEM;
> > +
> > +       res->allocation = stream;
> > +       INIT_LIST_HEAD(&stream->fragments);
> > +       spin_lock_init(&stream->lock);
> > +
> > +       return 0;
> > +}
> > +
> > +static void string_stream_free(struct kunit_resource *res)
> > +{
> > +       struct string_stream *stream = res->allocation;
> > +
> > +       string_stream_clear(stream);
> > +       kfree(stream);
> > +}
> > +
> > +struct string_stream *alloc_string_stream(struct kunit *test)
> > +{
> > +       struct kunit_resource *res;
> > +
> > +       res = kunit_alloc_resource(test,
> > +                                  string_stream_init,
> > +                                  string_stream_free,
> > +                                  NULL);
> > +
> > +       if (!res)
> > +               return NULL;
> > +
> > +       return res->allocation;
>
> Maybe kunit_alloc_resource() should just return res->allocation, or
> NULL, so that these functions can be simplified to 'return
> kunit_alloc_resource()'? Does the caller ever care to do anything with
> struct kunit_resource anyway?

Another good point. I think originally I thought it might, but now
with the mandatory init function, the user has to provide a function
where they can do the init work. They might as well do it there. Will
fix.

^ permalink raw reply

* Re: [PATCH v9 02/18] kunit: test: add test resource management API
From: Stephen Boyd @ 2019-07-15 20:51 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <CAFd5g45iHnMLOGQbXwzX6F74pkQGKBCSufkpYPOcw_iNSeiQKg@mail.gmail.com>

Quoting Brendan Higgins (2019-07-15 13:30:22)
> On Mon, Jul 15, 2019 at 1:24 PM Stephen Boyd <sboyd@kernel.org> wrote:
> >
> > Quoting Brendan Higgins (2019-07-12 01:17:28)
> > > diff --git a/kunit/test.c b/kunit/test.c
> > > index 571e4c65deb5c..f165c9d8e10b0 100644
> 
> > One solution would be to piggyback on all the existing devres allocation
> > logic we already have and make each struct kunit a device that we pass
> > into the devres functions. A far simpler solution would be to just
> > copy/paste what devres does and use a spinlock and an allocation
> > function that takes GFP flags.
> 
> Yeah, that's what I did originally, but I thought from the discussion
> on patch 01 that you thought a spinlock was overkill for struct kunit.
> I take it you only meant in that initial patch?

Correct. I was only talking about the success bit in there.


^ permalink raw reply

* Re: [PATCH v9 06/18] kbuild: enable building KUnit
From: Stephen Boyd @ 2019-07-15 20:49 UTC (permalink / raw)
  To: Brendan Higgins, frowand.list, gregkh, jpoimboe, keescook,
	kieran.bingham, mcgrof, peterz, robh, shuah, tytso,
	yamada.masahiro
  Cc: devicetree, dri-devel, kunit-dev, linux-doc, linux-fsdevel,
	linux-kbuild, linux-kernel, linux-kselftest, linux-nvdimm,
	linux-um, Alexander.Levin, Tim.Bird, amir73il, dan.carpenter,
	daniel, jdike, joel, julia.lawall, khilman, knut.omang, logang,
	mpe, pmladek, rdunlap, richard, rientjes, rostedt, wfg,
	Brendan Higgins, Michal Marek
In-Reply-To: <20190712081744.87097-7-brendanhiggins@google.com>

Quoting Brendan Higgins (2019-07-12 01:17:32)
> KUnit is a new unit testing framework for the kernel and when used is
> built into the kernel as a part of it. Add KUnit to the root Kconfig and
> Makefile to allow it to be actually built.
> 
> Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Michal Marek <michal.lkml@markovi.net>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> ---

Reviewed-by: Stephen Boyd <sboyd@kernel.org>


^ permalink raw reply

* Re: [PATCH v9 03/18] kunit: test: add string_stream a std::stream like string builder
From: Stephen Boyd @ 2019-07-15 20:43 UTC (permalink / raw)
  To: Brendan Higgins, frowand.list, gregkh, jpoimboe, keescook,
	kieran.bingham, mcgrof, peterz, robh, shuah, tytso,
	yamada.masahiro
  Cc: devicetree, dri-devel, kunit-dev, linux-doc, linux-fsdevel,
	linux-kbuild, linux-kernel, linux-kselftest, linux-nvdimm,
	linux-um, Alexander.Levin, Tim.Bird, amir73il, dan.carpenter,
	daniel, jdike, joel, julia.lawall, khilman, knut.omang, logang,
	mpe, pmladek, rdunlap, richard, rientjes, rostedt, wfg,
	Brendan Higgins
In-Reply-To: <20190712081744.87097-4-brendanhiggins@google.com>

Quoting Brendan Higgins (2019-07-12 01:17:29)
> diff --git a/include/kunit/string-stream.h b/include/kunit/string-stream.h
> new file mode 100644
> index 0000000000000..0552a05781afe
> --- /dev/null
> +++ b/include/kunit/string-stream.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * C++ stream style string builder used in KUnit for building messages.
> + *
> + * Copyright (C) 2019, Google LLC.
> + * Author: Brendan Higgins <brendanhiggins@google.com>
> + */
> +
> +#ifndef _KUNIT_STRING_STREAM_H
> +#define _KUNIT_STRING_STREAM_H
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +#include <linux/kref.h>

What is this include for? I'd expect to see linux/list.h instead.

> +#include <stdarg.h>
> +
> +struct string_stream_fragment {
> +       struct list_head node;
> +       char *fragment;
> +};
> +
> +struct string_stream {
> +       size_t length;
> +       struct list_head fragments;
> +       /* length and fragments are protected by this lock */
> +       spinlock_t lock;
> +};
> +
> diff --git a/kunit/string-stream.c b/kunit/string-stream.c
> new file mode 100644
> index 0000000000000..0463a92dad74b
> --- /dev/null
> +++ b/kunit/string-stream.c
> @@ -0,0 +1,147 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * C++ stream style string builder used in KUnit for building messages.
> + *
> + * Copyright (C) 2019, Google LLC.
> + * Author: Brendan Higgins <brendanhiggins@google.com>
> + */
> +
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <kunit/string-stream.h>
> +#include <kunit/test.h>
> +
> +int string_stream_vadd(struct string_stream *stream,
> +                      const char *fmt,
> +                      va_list args)
> +{
> +       struct string_stream_fragment *frag_container;
> +       int len;
> +       va_list args_for_counting;
> +       unsigned long flags;
> +
> +       /* Make a copy because `vsnprintf` could change it */
> +       va_copy(args_for_counting, args);
> +
> +       /* Need space for null byte. */
> +       len = vsnprintf(NULL, 0, fmt, args_for_counting) + 1;
> +
> +       va_end(args_for_counting);
> +
> +       frag_container = kmalloc(sizeof(*frag_container), GFP_KERNEL);

This is confusing in that it allocates with GFP_KERNEL but then grabs a
spinlock to add and remove from the fragment list. Is it ever going to
be called from a place where it can't sleep? If so, the GFP_KERNEL needs
to be changed. Otherwise, maybe a mutex would work better to protect
access to the fragment list.

I also wonder if it would be better to just have a big slop buffer of a
4K page or something so that we almost never have to allocate anything
with a string_stream and we can just rely on a reader consuming data
while writers are writing. That might work out better, but I don't quite
understand the use case for the string stream.

> +       if (!frag_container)
> +               return -ENOMEM;
> +
> +       frag_container->fragment = kmalloc(len, GFP_KERNEL);
> +       if (!frag_container->fragment) {
> +               kfree(frag_container);
> +               return -ENOMEM;
> +       }
> +
> +       len = vsnprintf(frag_container->fragment, len, fmt, args);
> +       spin_lock_irqsave(&stream->lock, flags);
> +       stream->length += len;
> +       list_add_tail(&frag_container->node, &stream->fragments);
> +       spin_unlock_irqrestore(&stream->lock, flags);
> +
> +       return 0;
> +}
> +
[...]
> +
> +bool string_stream_is_empty(struct string_stream *stream)
> +{
> +       bool is_empty;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&stream->lock, flags);

I'm not sure what benefit grabbing the lock is having here. If the list
isn't empty after this is called then the race isn't resolved by
grabbing and releasing the lock. The function is returning stale data in
that case.

> +       is_empty = list_empty(&stream->fragments);
> +       spin_unlock_irqrestore(&stream->lock, flags);
> +
> +       return is_empty;
> +}
> +
> +static int string_stream_init(struct kunit_resource *res, void *context)
> +{
> +       struct string_stream *stream;
> +
> +       stream = kzalloc(sizeof(*stream), GFP_KERNEL);
> +       if (!stream)
> +               return -ENOMEM;
> +
> +       res->allocation = stream;
> +       INIT_LIST_HEAD(&stream->fragments);
> +       spin_lock_init(&stream->lock);
> +
> +       return 0;
> +}
> +
> +static void string_stream_free(struct kunit_resource *res)
> +{
> +       struct string_stream *stream = res->allocation;
> +
> +       string_stream_clear(stream);
> +       kfree(stream);
> +}
> +
> +struct string_stream *alloc_string_stream(struct kunit *test)
> +{
> +       struct kunit_resource *res;
> +
> +       res = kunit_alloc_resource(test,
> +                                  string_stream_init,
> +                                  string_stream_free,
> +                                  NULL);
> +
> +       if (!res)
> +               return NULL;
> +
> +       return res->allocation;

Maybe kunit_alloc_resource() should just return res->allocation, or
NULL, so that these functions can be simplified to 'return
kunit_alloc_resource()'? Does the caller ever care to do anything with
struct kunit_resource anyway?


^ permalink raw reply

* Re: [PATCH v9 02/18] kunit: test: add test resource management API
From: Brendan Higgins @ 2019-07-15 20:30 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <20190715202425.CE64C20665@mail.kernel.org>

On Mon, Jul 15, 2019 at 1:24 PM Stephen Boyd <sboyd@kernel.org> wrote:
>
> Quoting Brendan Higgins (2019-07-12 01:17:28)
> > diff --git a/kunit/test.c b/kunit/test.c
> > index 571e4c65deb5c..f165c9d8e10b0 100644
> > --- a/kunit/test.c
> > +++ b/kunit/test.c
> > @@ -171,6 +175,96 @@ int kunit_run_tests(struct kunit_suite *suite)
> >         return 0;
> >  }
> >
> > +struct kunit_resource *kunit_alloc_resource(struct kunit *test,
> > +                                           kunit_resource_init_t init,
> > +                                           kunit_resource_free_t free,
> > +                                           void *context)
> > +{
> > +       struct kunit_resource *res;
> > +       int ret;
> > +
> > +       res = kzalloc(sizeof(*res), GFP_KERNEL);
>
> This uses GFP_KERNEL.
>
> > +       if (!res)
> > +               return NULL;
> > +
> > +       ret = init(res, context);
> > +       if (ret)
> > +               return NULL;
> > +
> > +       res->free = free;
> > +       mutex_lock(&test->lock);
>
> And this can sleep.
>
> > +       list_add_tail(&res->node, &test->resources);
> > +       mutex_unlock(&test->lock);
> > +
> > +       return res;
> > +}
> > +
> > +void kunit_free_resource(struct kunit *test, struct kunit_resource *res)
>
> Should probably add a note that we assume the test lock is held here, or
> even add a lockdep_assert_held(&test->lock) into the function to
> document that and assert it at the same time.

Seems reasonable.

> > +{
> > +       res->free(res);
> > +       list_del(&res->node);
> > +       kfree(res);
> > +}
> > +
> > +struct kunit_kmalloc_params {
> > +       size_t size;
> > +       gfp_t gfp;
> > +};
> > +
> > +static int kunit_kmalloc_init(struct kunit_resource *res, void *context)
> > +{
> > +       struct kunit_kmalloc_params *params = context;
> > +
> > +       res->allocation = kmalloc(params->size, params->gfp);
> > +       if (!res->allocation)
> > +               return -ENOMEM;
> > +
> > +       return 0;
> > +}
> > +
> > +static void kunit_kmalloc_free(struct kunit_resource *res)
> > +{
> > +       kfree(res->allocation);
> > +}
> > +
> > +void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
> > +{
> > +       struct kunit_kmalloc_params params;
> > +       struct kunit_resource *res;
> > +
> > +       params.size = size;
> > +       params.gfp = gfp;
> > +
> > +       res = kunit_alloc_resource(test,
>
> This calls that sleeping function above...
>
> > +                                  kunit_kmalloc_init,
> > +                                  kunit_kmalloc_free,
> > +                                  &params);
>
> but this passes a GFP flags parameter through to the
> kunit_kmalloc_init() function. How is this going to work if some code
> uses GFP_ATOMIC, but then we try to allocate and sleep in
> kunit_alloc_resource() with GFP_KERNEL?

Yeah, that's an inconsistency. I need to fix that.

> One solution would be to piggyback on all the existing devres allocation
> logic we already have and make each struct kunit a device that we pass
> into the devres functions. A far simpler solution would be to just
> copy/paste what devres does and use a spinlock and an allocation
> function that takes GFP flags.

Yeah, that's what I did originally, but I thought from the discussion
on patch 01 that you thought a spinlock was overkill for struct kunit.
I take it you only meant in that initial patch?

> > +
> > +       if (res)
> > +               return res->allocation;
> > +
> > +       return NULL;
> > +}

Cheers

^ permalink raw reply

* Re: [PATCH v9 02/18] kunit: test: add test resource management API
From: Stephen Boyd @ 2019-07-15 20:24 UTC (permalink / raw)
  To: Brendan Higgins, frowand.list, gregkh, jpoimboe, keescook,
	kieran.bingham, mcgrof, peterz, robh, shuah, tytso,
	yamada.masahiro
  Cc: devicetree, dri-devel, kunit-dev, linux-doc, linux-fsdevel,
	linux-kbuild, linux-kernel, linux-kselftest, linux-nvdimm,
	linux-um, Alexander.Levin, Tim.Bird, amir73il, dan.carpenter,
	daniel, jdike, joel, julia.lawall, khilman, knut.omang, logang,
	mpe, pmladek, rdunlap, richard, rientjes, rostedt, wfg,
	Brendan Higgins
In-Reply-To: <20190712081744.87097-3-brendanhiggins@google.com>

Quoting Brendan Higgins (2019-07-12 01:17:28)
> diff --git a/kunit/test.c b/kunit/test.c
> index 571e4c65deb5c..f165c9d8e10b0 100644
> --- a/kunit/test.c
> +++ b/kunit/test.c
> @@ -171,6 +175,96 @@ int kunit_run_tests(struct kunit_suite *suite)
>         return 0;
>  }
>  
> +struct kunit_resource *kunit_alloc_resource(struct kunit *test,
> +                                           kunit_resource_init_t init,
> +                                           kunit_resource_free_t free,
> +                                           void *context)
> +{
> +       struct kunit_resource *res;
> +       int ret;
> +
> +       res = kzalloc(sizeof(*res), GFP_KERNEL);

This uses GFP_KERNEL.

> +       if (!res)
> +               return NULL;
> +
> +       ret = init(res, context);
> +       if (ret)
> +               return NULL;
> +
> +       res->free = free;
> +       mutex_lock(&test->lock);

And this can sleep.

> +       list_add_tail(&res->node, &test->resources);
> +       mutex_unlock(&test->lock);
> +
> +       return res;
> +}
> +
> +void kunit_free_resource(struct kunit *test, struct kunit_resource *res)

Should probably add a note that we assume the test lock is held here, or
even add a lockdep_assert_held(&test->lock) into the function to
document that and assert it at the same time.

> +{
> +       res->free(res);
> +       list_del(&res->node);
> +       kfree(res);
> +}
> +
> +struct kunit_kmalloc_params {
> +       size_t size;
> +       gfp_t gfp;
> +};
> +
> +static int kunit_kmalloc_init(struct kunit_resource *res, void *context)
> +{
> +       struct kunit_kmalloc_params *params = context;
> +
> +       res->allocation = kmalloc(params->size, params->gfp);
> +       if (!res->allocation)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +static void kunit_kmalloc_free(struct kunit_resource *res)
> +{
> +       kfree(res->allocation);
> +}
> +
> +void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
> +{
> +       struct kunit_kmalloc_params params;
> +       struct kunit_resource *res;
> +
> +       params.size = size;
> +       params.gfp = gfp;
> +
> +       res = kunit_alloc_resource(test,

This calls that sleeping function above...

> +                                  kunit_kmalloc_init,
> +                                  kunit_kmalloc_free,
> +                                  &params);

but this passes a GFP flags parameter through to the
kunit_kmalloc_init() function. How is this going to work if some code
uses GFP_ATOMIC, but then we try to allocate and sleep in
kunit_alloc_resource() with GFP_KERNEL? 

One solution would be to piggyback on all the existing devres allocation
logic we already have and make each struct kunit a device that we pass
into the devres functions. A far simpler solution would be to just
copy/paste what devres does and use a spinlock and an allocation
function that takes GFP flags.

> +
> +       if (res)
> +               return res->allocation;
> +
> +       return NULL;
> +}

^ permalink raw reply

* Re: [PATCH v9 01/18] kunit: test: add KUnit test runner core
From: Stephen Boyd @ 2019-07-15 20:10 UTC (permalink / raw)
  To: Brendan Higgins, frowand.list, gregkh, jpoimboe, keescook,
	kieran.bingham, mcgrof, peterz, robh, shuah, tytso,
	yamada.masahiro
  Cc: devicetree, dri-devel, kunit-dev, linux-doc, linux-fsdevel,
	linux-kbuild, linux-kernel, linux-kselftest, linux-nvdimm,
	linux-um, Alexander.Levin, Tim.Bird, amir73il, dan.carpenter,
	daniel, jdike, joel, julia.lawall, khilman, knut.omang, logang,
	mpe, pmladek, rdunlap, richard, rientjes, rostedt, wfg,
	Brendan Higgins
In-Reply-To: <20190712081744.87097-2-brendanhiggins@google.com>

Quoting Brendan Higgins (2019-07-12 01:17:27)
> Add core facilities for defining unit tests; this provides a common way
> to define test cases, functions that execute code which is under test
> and determine whether the code under test behaves as expected; this also
> provides a way to group together related test cases in test suites (here
> we call them test_modules).
> 
> Just define test cases and how to execute them for now; setting
> expectations on code will be defined later.
> 
> Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

Reviewed-by: Stephen Boyd <sboyd@kernel.org>

Minor nits below.

> diff --git a/kunit/test.c b/kunit/test.c
> new file mode 100644
> index 0000000000000..571e4c65deb5c
> --- /dev/null
> +++ b/kunit/test.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Base unit test (KUnit) API.
> + *
> + * Copyright (C) 2019, Google LLC.
> + * Author: Brendan Higgins <brendanhiggins@google.com>
> + */
> +
> +#include <linux/kernel.h>
> +#include <kunit/test.h>
> +
> +static void kunit_set_failure(struct kunit *test)
> +{
> +       WRITE_ONCE(test->success, false);
> +}
> +
[...]
> +
> +void kunit_init_test(struct kunit *test, const char *name)
> +{
> +       test->name = name;
> +       test->success = true;
> +}
> +
> +/*
> + * Performs all logic to run a test case.
> + */
> +static void kunit_run_case(struct kunit_suite *suite,
> +                          struct kunit_case *test_case)
> +{
> +       struct kunit test;
> +       int ret = 0;
> +
> +       kunit_init_test(&test, test_case->name);
> +
> +       if (suite->init) {
> +               ret = suite->init(&test);

Can you push the ret definition into this if scope? That way we can
avoid default initialize to 0 for it.

> +               if (ret) {
> +                       kunit_err(&test, "failed to initialize: %d\n", ret);
> +                       kunit_set_failure(&test);

Do we need to 'test_case->success = test.success' here too? Or is the
test failure extracted somewhere else?

> +                       return;
> +               }
> +       }
> +
> +       test_case->run_case(&test);
> +
> +       if (suite->exit)
> +               suite->exit(&test);
> +
> +       test_case->success = test.success;

^ permalink raw reply

* Re: [PATCH 7/9] x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
From: Bjorn Helgaas @ 2019-07-15 20:02 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Alexey Kuznetsov, Borislav Petkov, c0d1n61at3,
	David S. Miller, edumazet, Greg Kroah-Hartman, Hideaki YOSHIFUJI,
	H. Peter Anvin, Ingo Molnar, Jonathan Corbet, Josh Triplett,
	keescook, kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-8-joel@joelfernandes.org>

On Mon, Jul 15, 2019 at 10:37:03AM -0400, Joel Fernandes (Google) wrote:
> The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
> reader-lock held, because the pci_mmcfg_lock is already held. Make this
> known to the list macro so that it fixes new lockdep warnings that
> trigger due to lockdep checks added to list_for_each_entry_rcu().
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Ingo takes care of most patches to this file, but FWIW,

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

I would personally prefer if you capitalized the subject to match the
"x86/PCI:" convention that's used fairly consistently in
arch/x86/pci/.

Also, I didn't apply this to be sure, but it looks like this might
make a line or two wider than 80 columns, which I would rewrap if I
were applying this.

> ---
>  arch/x86/pci/mmconfig-shared.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 7389db538c30..6fa42e9c4e6f 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -29,6 +29,7 @@
>  static bool pci_mmcfg_running_state;
>  static bool pci_mmcfg_arch_init_failed;
>  static DEFINE_MUTEX(pci_mmcfg_lock);
> +#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
>  
>  LIST_HEAD(pci_mmcfg_list);
>  
> @@ -54,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
>  	struct pci_mmcfg_region *cfg;
>  
>  	/* keep list sorted by segment and starting bus number */
> -	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
> +	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
>  		if (cfg->segment > new->segment ||
>  		    (cfg->segment == new->segment &&
>  		     cfg->start_bus >= new->start_bus)) {
> @@ -118,7 +119,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
>  {
>  	struct pci_mmcfg_region *cfg;
>  
> -	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
> +	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held())
>  		if (cfg->segment == segment &&
>  		    cfg->start_bus <= bus && bus <= cfg->end_bus)
>  			return cfg;
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* Re: [PATCH v8] Documentation: filesystem: Convert xfs.txt to ReST
From: Darrick J. Wong @ 2019-07-15 16:13 UTC (permalink / raw)
  To: Sheriff Esseson
  Cc: skhan, linux-xfs, corbet, linux-doc, linux-kernel,
	linux-kernel-mentees
In-Reply-To: <20190714125831.GA19200@localhost>

On Sun, Jul 14, 2019 at 01:58:31PM +0100, Sheriff Esseson wrote:
> Move xfs.txt to admin-guide, convert xfs.txt to ReST and broken references
> 
> Signed-off-by: Sheriff Esseson <sheriffesseson@gmail.com>

Looks ok, will pull through the XFS tree.  Thanks for the submission!
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> 
> changes in v8:
> 	- fix table of Deprecated and Removed options.
> 
>  Documentation/admin-guide/index.rst           |   1 +
>  .../xfs.txt => admin-guide/xfs.rst}           | 132 +++++++++---------
>  Documentation/filesystems/dax.txt             |   2 +-
>  MAINTAINERS                                   |   2 +-
>  4 files changed, 67 insertions(+), 70 deletions(-)
>  rename Documentation/{filesystems/xfs.txt => admin-guide/xfs.rst} (80%)
> 
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 24fbe0568eff..0615ea3a744c 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -70,6 +70,7 @@ configure specific aspects of kernel behavior to your liking.
>     ras
>     bcache
>     ext4
> +   xfs
>     binderfs
>     pm/index
>     thunderbolt
> diff --git a/Documentation/filesystems/xfs.txt b/Documentation/admin-guide/xfs.rst
> similarity index 80%
> rename from Documentation/filesystems/xfs.txt
> rename to Documentation/admin-guide/xfs.rst
> index a5cbb5e0e3db..e76665a8f2f2 100644
> --- a/Documentation/filesystems/xfs.txt
> +++ b/Documentation/admin-guide/xfs.rst
> @@ -1,4 +1,6 @@
> +.. SPDX-License-Identifier: GPL-2.0
>  
> +======================
>  The SGI XFS Filesystem
>  ======================
>  
> @@ -18,8 +20,6 @@ Mount Options
>  =============
>  
>  When mounting an XFS filesystem, the following options are accepted.
> -For boolean mount options, the names with the (*) suffix is the
> -default behaviour.
>  
>    allocsize=size
>  	Sets the buffered I/O end-of-file preallocation size when
> @@ -31,46 +31,43 @@ default behaviour.
>  	preallocation size, which uses a set of heuristics to
>  	optimise the preallocation size based on the current
>  	allocation patterns within the file and the access patterns
> -	to the file. Specifying a fixed allocsize value turns off
> +	to the file. Specifying a fixed ``allocsize`` value turns off
>  	the dynamic behaviour.
>  
> -  attr2
> -  noattr2
> +  attr2 or noattr2
>  	The options enable/disable an "opportunistic" improvement to
>  	be made in the way inline extended attributes are stored
>  	on-disk.  When the new form is used for the first time when
> -	attr2 is selected (either when setting or removing extended
> +	``attr2`` is selected (either when setting or removing extended
>  	attributes) the on-disk superblock feature bit field will be
>  	updated to reflect this format being in use.
>  
>  	The default behaviour is determined by the on-disk feature
> -	bit indicating that attr2 behaviour is active. If either
> -	mount option it set, then that becomes the new default used
> +	bit indicating that ``attr2`` behaviour is active. If either
> +	mount option is set, then that becomes the new default used
>  	by the filesystem.
>  
> -	CRC enabled filesystems always use the attr2 format, and so
> -	will reject the noattr2 mount option if it is set.
> +	CRC enabled filesystems always use the ``attr2`` format, and so
> +	will reject the ``noattr2`` mount option if it is set.
>  
> -  discard
> -  nodiscard (*)
> +  discard or nodiscard (default)
>  	Enable/disable the issuing of commands to let the block
>  	device reclaim space freed by the filesystem.  This is
>  	useful for SSD devices, thinly provisioned LUNs and virtual
>  	machine images, but may have a performance impact.
>  
> -	Note: It is currently recommended that you use the fstrim
> -	application to discard unused blocks rather than the discard
> +	Note: It is currently recommended that you use the ``fstrim``
> +	application to ``discard`` unused blocks rather than the ``discard``
>  	mount option because the performance impact of this option
>  	is quite severe.
>  
> -  grpid/bsdgroups
> -  nogrpid/sysvgroups (*)
> +  grpid/bsdgroups or nogrpid/sysvgroups (default)
>  	These options define what group ID a newly created file
> -	gets.  When grpid is set, it takes the group ID of the
> +	gets.  When ``grpid`` is set, it takes the group ID of the
>  	directory in which it is created; otherwise it takes the
> -	fsgid of the current process, unless the directory has the
> -	setgid bit set, in which case it takes the gid from the
> -	parent directory, and also gets the setgid bit set if it is
> +	``fsgid`` of the current process, unless the directory has the
> +	``setgid`` bit set, in which case it takes the ``gid`` from the
> +	parent directory, and also gets the ``setgid`` bit set if it is
>  	a directory itself.
>  
>    filestreams
> @@ -78,46 +75,42 @@ default behaviour.
>  	across the entire filesystem rather than just on directories
>  	configured to use it.
>  
> -  ikeep
> -  noikeep (*)
> -	When ikeep is specified, XFS does not delete empty inode
> -	clusters and keeps them around on disk.  When noikeep is
> +  ikeep or noikeep (default)
> +	When ``ikeep`` is specified, XFS does not delete empty inode
> +	clusters and keeps them around on disk.  When ``noikeep`` is
>  	specified, empty inode clusters are returned to the free
>  	space pool.
>  
> -  inode32
> -  inode64 (*)
> -	When inode32 is specified, it indicates that XFS limits
> +  inode32 or inode64 (default)
> +	When ``inode32`` is specified, it indicates that XFS limits
>  	inode creation to locations which will not result in inode
>  	numbers with more than 32 bits of significance.
>  
> -	When inode64 is specified, it indicates that XFS is allowed
> +	When ``inode64`` is specified, it indicates that XFS is allowed
>  	to create inodes at any location in the filesystem,
>  	including those which will result in inode numbers occupying
> -	more than 32 bits of significance. 
> +	more than 32 bits of significance.
>  
> -	inode32 is provided for backwards compatibility with older
> +	``inode32`` is provided for backwards compatibility with older
>  	systems and applications, since 64 bits inode numbers might
>  	cause problems for some applications that cannot handle
>  	large inode numbers.  If applications are in use which do
> -	not handle inode numbers bigger than 32 bits, the inode32
> +	not handle inode numbers bigger than 32 bits, the ``inode32``
>  	option should be specified.
>  
> -
> -  largeio
> -  nolargeio (*)
> -	If "nolargeio" is specified, the optimal I/O reported in
> -	st_blksize by stat(2) will be as small as possible to allow
> +  largeio or nolargeio (default)
> +	If ``nolargeio`` is specified, the optimal I/O reported in
> +	``st_blksize`` by **stat(2)** will be as small as possible to allow
>  	user applications to avoid inefficient read/modify/write
>  	I/O.  This is typically the page size of the machine, as
>  	this is the granularity of the page cache.
>  
> -	If "largeio" specified, a filesystem that was created with a
> -	"swidth" specified will return the "swidth" value (in bytes)
> -	in st_blksize. If the filesystem does not have a "swidth"
> -	specified but does specify an "allocsize" then "allocsize"
> +	If ``largeio`` is specified, a filesystem that was created with a
> +	``swidth`` specified will return the ``swidth`` value (in bytes)
> +	in ``st_blksize``. If the filesystem does not have a ``swidth``
> +	specified but does specify an ``allocsize`` then ``allocsize``
>  	(in bytes) will be returned instead. Otherwise the behaviour
> -	is the same as if "nolargeio" was specified.
> +	is the same as if ``nolargeio`` was specified.
>  
>    logbufs=value
>  	Set the number of in-memory log buffers.  Valid numbers
> @@ -127,7 +120,7 @@ default behaviour.
>  
>  	If the memory cost of 8 log buffers is too high on small
>  	systems, then it may be reduced at some cost to performance
> -	on metadata intensive workloads. The logbsize option below
> +	on metadata intensive workloads. The ``logbsize`` option below
>  	controls the size of each buffer and so is also relevant to
>  	this case.
>  
> @@ -138,7 +131,7 @@ default behaviour.
>  	and 32768 (32k).  Valid sizes for version 2 logs also
>  	include 65536 (64k), 131072 (128k) and 262144 (256k). The
>  	logbsize must be an integer multiple of the log
> -	stripe unit configured at mkfs time.
> +	stripe unit configured at **mkfs(8)** time.
>  
>  	The default value for for version 1 logs is 32768, while the
>  	default value for version 2 logs is MAX(32768, log_sunit).
> @@ -153,21 +146,21 @@ default behaviour.
>    noalign
>  	Data allocations will not be aligned at stripe unit
>  	boundaries. This is only relevant to filesystems created
> -	with non-zero data alignment parameters (sunit, swidth) by
> -	mkfs.
> +	with non-zero data alignment parameters (``sunit``, ``swidth``) by
> +	**mkfs(8)**.
>  
>    norecovery
>  	The filesystem will be mounted without running log recovery.
>  	If the filesystem was not cleanly unmounted, it is likely to
> -	be inconsistent when mounted in "norecovery" mode.
> +	be inconsistent when mounted in ``norecovery`` mode.
>  	Some files or directories may not be accessible because of this.
> -	Filesystems mounted "norecovery" must be mounted read-only or
> +	Filesystems mounted ``norecovery`` must be mounted read-only or
>  	the mount will fail.
>  
>    nouuid
>  	Don't check for double mounted file systems using the file
> -	system uuid.  This is useful to mount LVM snapshot volumes,
> -	and often used in combination with "norecovery" for mounting
> +	system ``uuid``.  This is useful to mount LVM snapshot volumes,
> +	and often used in combination with ``norecovery`` for mounting
>  	read-only snapshots.
>  
>    noquota
> @@ -176,15 +169,15 @@ default behaviour.
>  
>    uquota/usrquota/uqnoenforce/quota
>  	User disk quota accounting enabled, and limits (optionally)
> -	enforced.  Refer to xfs_quota(8) for further details.
> +	enforced.  Refer to **xfs_quota(8)** for further details.
>  
>    gquota/grpquota/gqnoenforce
>  	Group disk quota accounting enabled and limits (optionally)
> -	enforced.  Refer to xfs_quota(8) for further details.
> +	enforced.  Refer to **xfs_quota(8)** for further details.
>  
>    pquota/prjquota/pqnoenforce
>  	Project disk quota accounting enabled and limits (optionally)
> -	enforced.  Refer to xfs_quota(8) for further details.
> +	enforced.  Refer to **xfs_quota(8)** for further details.
>  
>    sunit=value and swidth=value
>  	Used to specify the stripe unit and width for a RAID device
> @@ -192,11 +185,11 @@ default behaviour.
>  	block units. These options are only relevant to filesystems
>  	that were created with non-zero data alignment parameters.
>  
> -	The sunit and swidth parameters specified must be compatible
> +	The ``sunit`` and ``swidth`` parameters specified must be compatible
>  	with the existing filesystem alignment characteristics.  In
> -	general, that means the only valid changes to sunit are
> -	increasing it by a power-of-2 multiple. Valid swidth values
> -	are any integer multiple of a valid sunit value.
> +	general, that means the only valid changes to ``sunit`` are
> +	increasing it by a power-of-2 multiple. Valid ``swidth`` values
> +	are any integer multiple of a valid ``sunit`` value.
>  
>  	Typically the only time these mount options are necessary if
>  	after an underlying RAID device has had it's geometry
> @@ -221,22 +214,25 @@ default behaviour.
>  Deprecated Mount Options
>  ========================
>  
> +===========================     ================
>    Name				Removal Schedule
> -  ----				----------------
> +===========================     ================
> +===========================     ================
>  
>  
>  Removed Mount Options
>  =====================
>  
> +===========================     =======
>    Name				Removed
> -  ----				-------
> +===========================	=======
>    delaylog/nodelaylog		v4.0
>    ihashsize			v4.0
>    irixsgid			v4.0
>    osyncisdsync/osyncisosync	v4.0
>    barrier			v4.19
>    nobarrier			v4.19
> -
> +===========================     =======
>  
>  sysctls
>  =======
> @@ -302,27 +298,27 @@ The following sysctls are available for the XFS filesystem:
>  
>    fs.xfs.inherit_sync		(Min: 0  Default: 1  Max: 1)
>  	Setting this to "1" will cause the "sync" flag set
> -	by the xfs_io(8) chattr command on a directory to be
> +	by the **xfs_io(8)** chattr command on a directory to be
>  	inherited by files in that directory.
>  
>    fs.xfs.inherit_nodump		(Min: 0  Default: 1  Max: 1)
>  	Setting this to "1" will cause the "nodump" flag set
> -	by the xfs_io(8) chattr command on a directory to be
> +	by the **xfs_io(8)** chattr command on a directory to be
>  	inherited by files in that directory.
>  
>    fs.xfs.inherit_noatime	(Min: 0  Default: 1  Max: 1)
>  	Setting this to "1" will cause the "noatime" flag set
> -	by the xfs_io(8) chattr command on a directory to be
> +	by the **xfs_io(8)** chattr command on a directory to be
>  	inherited by files in that directory.
>  
>    fs.xfs.inherit_nosymlinks	(Min: 0  Default: 1  Max: 1)
>  	Setting this to "1" will cause the "nosymlinks" flag set
> -	by the xfs_io(8) chattr command on a directory to be
> +	by the **xfs_io(8)** chattr command on a directory to be
>  	inherited by files in that directory.
>  
>    fs.xfs.inherit_nodefrag	(Min: 0  Default: 1  Max: 1)
>  	Setting this to "1" will cause the "nodefrag" flag set
> -	by the xfs_io(8) chattr command on a directory to be
> +	by the **xfs_io(8)** chattr command on a directory to be
>  	inherited by files in that directory.
>  
>    fs.xfs.rotorstep		(Min: 1  Default: 1  Max: 256)
> @@ -368,7 +364,7 @@ handler:
>   -error handlers:
>  	Defines the behavior for a specific error.
>  
> -The filesystem behavior during an error can be set via sysfs files. Each
> +The filesystem behavior during an error can be set via ``sysfs`` files. Each
>  error handler works independently - the first condition met by an error handler
>  for a specific class will cause the error to be propagated rather than reset and
>  retried.
> @@ -419,7 +415,7 @@ level directory:
>  	handler configurations.
>  
>  	Note: there is no guarantee that fail_at_unmount can be set while an
> -	unmount is in progress. It is possible that the sysfs entries are
> +	unmount is in progress. It is possible that the ``sysfs`` entries are
>  	removed by the unmounting filesystem before a "retry forever" error
>  	handler configuration causes unmount to hang, and hence the filesystem
>  	must be configured appropriately before unmount begins to prevent
> @@ -428,7 +424,7 @@ level directory:
>  Each filesystem has specific error class handlers that define the error
>  propagation behaviour for specific errors. There is also a "default" error
>  handler defined, which defines the behaviour for all errors that don't have
> -specific handlers defined. Where multiple retry constraints are configuredi for
> +specific handlers defined. Where multiple retry constraints are configured for
>  a single error, the first retry configuration that expires will cause the error
>  to be propagated. The handler configurations are found in the directory:
>  
> @@ -463,7 +459,7 @@ to be propagated. The handler configurations are found in the directory:
>  	Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the
>  	operation for up to "N" seconds before propagating the error.
>  
> -Note: The default behaviour for a specific error handler is dependent on both
> +**Note:** The default behaviour for a specific error handler is dependent on both
>  the class and error context. For example, the default values for
>  "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
>  to "fail immediately" behaviour. This is done because ENODEV is a fatal,
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 6d2c0d340dea..679729442fd2 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -76,7 +76,7 @@ exposure of uninitialized data through mmap.
>  These filesystems may be used for inspiration:
>  - ext2: see Documentation/filesystems/ext2.txt
>  - ext4: see Documentation/filesystems/ext4/
> -- xfs:  see Documentation/filesystems/xfs.txt
> +- xfs:  see Documentation/admin-guide/xfs.rst
>  
>  
>  Handling Media Errors
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 43ca94856944..3b6e0b6d8cbd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17453,7 +17453,7 @@ L:	linux-xfs@vger.kernel.org
>  W:	http://xfs.org/
>  T:	git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
>  S:	Supported
> -F:	Documentation/filesystems/xfs.txt
> +F:	Documentation/admin-guide/xfs.rst
>  F:	fs/xfs/
>  
>  XILINX AXI ETHERNET DRIVER
> -- 
> 2.22.0
> 

^ permalink raw reply

* Re: [PATCH v5 1/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
From: Dave Chiluk @ 2019-07-15 15:44 UTC (permalink / raw)
  To: Ben Segall
  Cc: Peter Zijlstra, Pqhil Auld, Peter Oskolkov, Ingo Molnar, cgroups,
	Linux Kernel Mailing List, Brendan Gregg, Kyle Anderson,
	Gabriel Munos, John Hammond, Cong Wang, Jonathan Corbet,
	linux-doc, Paul Turner
In-Reply-To: <xm26muhikiq2.fsf@bsegall-linux.svl.corp.google.com>

On Fri, Jul 12, 2019 at 5:10 PM <bsegall@google.com> wrote:
> Ugh. Maybe we /do/ just give up and say that most people don't seem to
> be using cfs_b in a way that expiration of the leftover 1ms matters.

That was my conclusion as well.  Does this mean you want to proceed
with my patch set?  Do you have any changes you want made to the
proposed documentation changes, or any other changes for that matter?

^ permalink raw reply

* Re: [PATCH v8] Documentation: filesystem: Convert xfs.txt to ReST
From: Matthew Wilcox @ 2019-07-15 15:37 UTC (permalink / raw)
  To: Sheriff Esseson
  Cc: skhan, darrick.wong, linux-xfs, corbet, linux-doc, linux-kernel,
	linux-kernel-mentees
In-Reply-To: <20190714125831.GA19200@localhost>

On Sun, Jul 14, 2019 at 01:58:31PM +0100, Sheriff Esseson wrote:
> Move xfs.txt to admin-guide, convert xfs.txt to ReST and broken references
> 
> Signed-off-by: Sheriff Esseson <sheriffesseson@gmail.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply

* [PATCH AUTOSEL 5.1 099/219] sched/fair: Fix "runnable_avg_yN_inv" not used warnings
From: Sasha Levin @ 2019-07-15 14:01 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Qian Cai, Peter Zijlstra, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, Sasha Levin, linux-doc
In-Reply-To: <20190715140341.6443-1-sashal@kernel.org>

From: Qian Cai <cai@lca.pw>

[ Upstream commit 509466b7d480bc5d22e90b9fbe6122ae0e2fbe39 ]

runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was
included in several other places because they need other macros all
came from kernel/sched/sched-pelt.h which was generated by
Documentation/scheduler/sched-pelt. As the result, it causes compilation
a lot of warnings,

  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  ...

Silence it by appending the __maybe_unused attribute for it, so all
generated variables and macros can still be kept in the same file.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 Documentation/scheduler/sched-pelt.c | 3 ++-
 kernel/sched/sched-pelt.h            | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/scheduler/sched-pelt.c b/Documentation/scheduler/sched-pelt.c
index e4219139386a..7238b355919c 100644
--- a/Documentation/scheduler/sched-pelt.c
+++ b/Documentation/scheduler/sched-pelt.c
@@ -20,7 +20,8 @@ void calc_runnable_avg_yN_inv(void)
 	int i;
 	unsigned int x;
 
-	printf("static const u32 runnable_avg_yN_inv[] = {");
+	/* To silence -Wunused-but-set-variable warnings. */
+	printf("static const u32 runnable_avg_yN_inv[] __maybe_unused = {");
 	for (i = 0; i < HALFLIFE; i++) {
 		x = ((1UL<<32)-1)*pow(y, i);
 
diff --git a/kernel/sched/sched-pelt.h b/kernel/sched/sched-pelt.h
index a26473674fb7..c529706bed11 100644
--- a/kernel/sched/sched-pelt.h
+++ b/kernel/sched/sched-pelt.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Generated by Documentation/scheduler/sched-pelt; do not modify. */
 
-static const u32 runnable_avg_yN_inv[] = {
+static const u32 runnable_avg_yN_inv[] __maybe_unused = {
 	0xffffffff, 0xfa83b2da, 0xf5257d14, 0xefe4b99a, 0xeac0c6e6, 0xe5b906e6,
 	0xe0ccdeeb, 0xdbfbb796, 0xd744fcc9, 0xd2a81d91, 0xce248c14, 0xc9b9bd85,
 	0xc5672a10, 0xc12c4cc9, 0xbd08a39e, 0xb8fbaf46, 0xb504f333, 0xb123f581,
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 075/158] sched/fair: Fix "runnable_avg_yN_inv" not used warnings
From: Sasha Levin @ 2019-07-15 14:16 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Qian Cai, Peter Zijlstra, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, Sasha Levin, linux-doc
In-Reply-To: <20190715141809.8445-1-sashal@kernel.org>

From: Qian Cai <cai@lca.pw>

[ Upstream commit 509466b7d480bc5d22e90b9fbe6122ae0e2fbe39 ]

runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was
included in several other places because they need other macros all
came from kernel/sched/sched-pelt.h which was generated by
Documentation/scheduler/sched-pelt. As the result, it causes compilation
a lot of warnings,

  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=]
  ...

Silence it by appending the __maybe_unused attribute for it, so all
generated variables and macros can still be kept in the same file.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 Documentation/scheduler/sched-pelt.c | 3 ++-
 kernel/sched/sched-pelt.h            | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/scheduler/sched-pelt.c b/Documentation/scheduler/sched-pelt.c
index e4219139386a..7238b355919c 100644
--- a/Documentation/scheduler/sched-pelt.c
+++ b/Documentation/scheduler/sched-pelt.c
@@ -20,7 +20,8 @@ void calc_runnable_avg_yN_inv(void)
 	int i;
 	unsigned int x;
 
-	printf("static const u32 runnable_avg_yN_inv[] = {");
+	/* To silence -Wunused-but-set-variable warnings. */
+	printf("static const u32 runnable_avg_yN_inv[] __maybe_unused = {");
 	for (i = 0; i < HALFLIFE; i++) {
 		x = ((1UL<<32)-1)*pow(y, i);
 
diff --git a/kernel/sched/sched-pelt.h b/kernel/sched/sched-pelt.h
index a26473674fb7..c529706bed11 100644
--- a/kernel/sched/sched-pelt.h
+++ b/kernel/sched/sched-pelt.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Generated by Documentation/scheduler/sched-pelt; do not modify. */
 
-static const u32 runnable_avg_yN_inv[] = {
+static const u32 runnable_avg_yN_inv[] __maybe_unused = {
 	0xffffffff, 0xfa83b2da, 0xf5257d14, 0xefe4b99a, 0xeac0c6e6, 0xe5b906e6,
 	0xe0ccdeeb, 0xdbfbb796, 0xd744fcc9, 0xd2a81d91, 0xce248c14, 0xc9b9bd85,
 	0xc5672a10, 0xc12c4cc9, 0xbd08a39e, 0xb8fbaf46, 0xb504f333, 0xb123f581,
-- 
2.20.1


^ permalink raw reply related

* [PATCH 2/9] rcu: Add support for consolidated-RCU reader checking (v3)
From: Joel Fernandes (Google) @ 2019-07-15 14:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

This patch adds support for checking RCU reader sections in list
traversal macros. Optionally, if the list macro is called under SRCU or
other lock/mutex protection, then appropriate lockdep expressions can be
passed to make the checks pass.

Existing list_for_each_entry_rcu() invocations don't need to pass the
optional fourth argument (cond) unless they are under some non-RCU
protection and needs to make lockdep check pass.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rculist.h  | 28 ++++++++++++++++++++-----
 include/linux/rcupdate.h |  7 +++++++
 kernel/rcu/Kconfig.debug | 11 ++++++++++
 kernel/rcu/update.c      | 44 ++++++++++++++++++++++++----------------
 4 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index e91ec9ddcd30..1048160625bb 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
  */
 #define list_next_rcu(list)	(*((struct list_head __rcu **)(&(list)->next)))
 
+/*
+ * Check during list traversal that we are within an RCU reader
+ */
+
+#ifdef CONFIG_PROVE_RCU_LIST
+#define __list_check_rcu(dummy, cond, ...)				\
+	({								\
+	RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(),		\
+			 "RCU-list traversed in non-reader section!");	\
+	 })
+#else
+#define __list_check_rcu(dummy, cond, ...) ({})
+#endif
+
 /*
  * Insert a new entry between two known consecutive entries.
  *
@@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
  * @member:	the name of the list_head within the struct.
+ * @cond:	optional lockdep expression if called from non-RCU protection.
  *
  * This list-traversal primitive may safely run concurrently with
  * the _rcu list-mutation primitives such as list_add_rcu()
  * as long as the traversal is guarded by rcu_read_lock().
  */
-#define list_for_each_entry_rcu(pos, head, member) \
-	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
-		&pos->member != (head); \
+#define list_for_each_entry_rcu(pos, head, member, cond...)		\
+	for (__list_check_rcu(dummy, ## cond, 0),			\
+	     pos = list_entry_rcu((head)->next, typeof(*pos), member);	\
+		&pos->member != (head);					\
 		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
@@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
  * @member:	the name of the hlist_node within the struct.
+ * @cond:	optional lockdep expression if called from non-RCU protection.
  *
  * This list-traversal primitive may safely run concurrently with
  * the _rcu list-mutation primitives such as hlist_add_head_rcu()
  * as long as the traversal is guarded by rcu_read_lock().
  */
-#define hlist_for_each_entry_rcu(pos, head, member)			\
-	for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
+#define hlist_for_each_entry_rcu(pos, head, member, cond...)		\
+	for (__list_check_rcu(dummy, ## cond, 0),			\
+	     pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
 			typeof(*(pos)), member);			\
 		pos;							\
 		pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 8f7167478c1d..f3c29efdf19a 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
 int rcu_read_lock_held(void);
 int rcu_read_lock_bh_held(void);
 int rcu_read_lock_sched_held(void);
+int rcu_read_lock_any_held(void);
 
 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
@@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
 {
 	return !preemptible();
 }
+
+static inline int rcu_read_lock_any_held(void)
+{
+	return !preemptible();
+}
+
 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
 #ifdef CONFIG_PROVE_RCU
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 5ec3ea4028e2..7fbd21dbfcd0 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -8,6 +8,17 @@ menu "RCU Debugging"
 config PROVE_RCU
 	def_bool PROVE_LOCKING
 
+config PROVE_RCU_LIST
+	bool "RCU list lockdep debugging"
+	depends on PROVE_RCU
+	default n
+	help
+	  Enable RCU lockdep checking for list usages. By default it is
+	  turned off since there are several list RCU users that still
+	  need to be converted to pass a lockdep expression. To prevent
+	  false-positive splats, we keep it default disabled but once all
+	  users are converted, we can remove this config option.
+
 config TORTURE_TEST
 	tristate
 	default n
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 9dd5aeef6e70..b7a4e3b5fa98 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -91,14 +91,18 @@ module_param(rcu_normal_after_boot, int, 0);
  * Similarly, we avoid claiming an SRCU read lock held if the current
  * CPU is offline.
  */
+#define rcu_read_lock_held_common()		\
+	if (!debug_lockdep_rcu_enabled())	\
+		return 1;			\
+	if (!rcu_is_watching())			\
+		return 0;			\
+	if (!rcu_lockdep_current_cpu_online())	\
+		return 0;
+
 int rcu_read_lock_sched_held(void)
 {
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	if (!rcu_is_watching())
-		return 0;
-	if (!rcu_lockdep_current_cpu_online())
-		return 0;
+	rcu_read_lock_held_common();
+
 	return lock_is_held(&rcu_sched_lock_map) || !preemptible();
 }
 EXPORT_SYMBOL(rcu_read_lock_sched_held);
@@ -257,12 +261,8 @@ NOKPROBE_SYMBOL(debug_lockdep_rcu_enabled);
  */
 int rcu_read_lock_held(void)
 {
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	if (!rcu_is_watching())
-		return 0;
-	if (!rcu_lockdep_current_cpu_online())
-		return 0;
+	rcu_read_lock_held_common();
+
 	return lock_is_held(&rcu_lock_map);
 }
 EXPORT_SYMBOL_GPL(rcu_read_lock_held);
@@ -284,16 +284,24 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
  */
 int rcu_read_lock_bh_held(void)
 {
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	if (!rcu_is_watching())
-		return 0;
-	if (!rcu_lockdep_current_cpu_online())
-		return 0;
+	rcu_read_lock_held_common();
+
 	return in_softirq() || irqs_disabled();
 }
 EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
 
+int rcu_read_lock_any_held(void)
+{
+	rcu_read_lock_held_common();
+
+	if (lock_is_held(&rcu_lock_map) ||
+	    lock_is_held(&rcu_bh_lock_map) ||
+	    lock_is_held(&rcu_sched_lock_map))
+		return 1;
+	return !preemptible();
+}
+EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
+
 #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
 /**
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 0/9] Harden list_for_each_entry_rcu() and family
From: Joel Fernandes (Google) @ 2019-07-15 14:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi,
This series aims to provide lockdep checking to RCU list macros for additional
kernel hardening.

RCU has a number of primitives for "consumption" of an RCU protected pointer.
Most of the time, these consumers make sure that such accesses are under a RCU
reader-section (such as rcu_dereference{,sched,bh} or under a lock, such as
with rcu_dereference_protected()).

However, there are other ways to consume RCU pointers, such as by
list_for_each_entry_rcu or hlist_for_each_enry_rcu. Unlike the rcu_dereference
family, these consumers do no lockdep checking at all. And with the growing
number of RCU list uses (1000+), it is possible for bugs to creep in and go
unnoticed which lockdep checks can catch.

Since RCU consolidation efforts last year, the different traditional RCU
flavors (preempt, bh, sched) are all consolidated. In other words, any of these
flavors can cause a reader section to occur and all of them must cease before
the reader section is considered to be unlocked. Thanks to this, we can
generically check if we are in an RCU reader. This is what patch 1 does. Note
that the list_for_each_entry_rcu and family are different from the
rcu_dereference family in that, there is no _bh or _sched version of this
macro. They are used under many different RCU reader flavors, and also SRCU.
Patch 1 adds a new internal function rcu_read_lock_any_held() which checks
if any reader section is active at all, when these macros are called. If no
reader section exists, then the optional fourth argument to
list_for_each_entry_rcu() can be a lockdep expression which is evaluated
(similar to how rcu_dereference_check() works). If no lockdep expression is
passed, and we are not in a reader, then a splat occurs. Just take off the
lockdep expression after applying the patches, by using the following diff and
see what happens:

+++ b/arch/x86/pci/mmconfig-shared.c
@@ -55,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
        struct pci_mmcfg_region *cfg;

        /* keep list sorted by segment and starting bus number */
-       list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
+       list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {

The optional argument trick to list_for_each_entry_rcu() can also be used in
the future to possibly remove rcu_dereference_{,bh,sched}_protected() API and
we can pass an optional lockdep expression to rcu_dereference() itself. Thus
eliminating 3 more RCU APIs.

Note that some list macro wrappers already do their own lockdep checking in the
caller side. These can be eliminated in favor of the built-in lockdep checking
in the list macro that this series adds. For example, workqueue code has a
assert_rcu_or_wq_mutex() function which is called in for_each_wq().  This
series replaces that in favor of the built-in check.

Also in the future, we can extend these checks to list_entry_rcu() and other
list macros as well, if needed.

Please note that I have kept this option default-disabled under a new config:
CONFIG_PROVE_RCU_LIST. This is so that until all users are converted to pass
the optional argument, we should keep the check disabled. There are about a
1000 or so users and it is not possible to pass in the optional lockdep
expression in a single series since it is done on a case-by-case basis. I did
convert a few users in this series itself.

v2->v3: Simplified rcu-sync logic after rebase (Paul)
	Added check for bh_map (Paul)
	Refactored out more of the common code (Joel)
	Added Oleg ack to rcu-sync patch.

v1->v2: Have assert_rcu_or_wq_mutex deleted (Daniel Jordan)
	Simplify rcu_read_lock_any_held()   (Peter Zijlstra)
	Simplified rcu-sync logic	    (Oleg Nesterov)
	Updated documentation and rculist comments.
	Added GregKH ack.

RFC->v1: 
	Simplify list checking macro (Rasmus Villemoes)

Joel Fernandes (Google) (9):
rcu/update: Remove useless check for debug_locks (v1)
rcu: Add support for consolidated-RCU reader checking (v3)
rcu/sync: Remove custom check for reader-section (v2)
ipv4: add lockdep condition to fix for_each_entry (v1)
driver/core: Convert to use built-in RCU list checking (v1)
workqueue: Convert for_each_wq to use built-in list check (v2)
x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
acpi: Use built-in RCU list checking for acpi_ioremaps list (v1)
doc: Update documentation about list_for_each_entry_rcu (v1)

Documentation/RCU/lockdep.txt   | 15 ++++++++---
Documentation/RCU/whatisRCU.txt |  9 ++++++-
arch/x86/pci/mmconfig-shared.c  |  5 ++--
drivers/acpi/osl.c              |  6 +++--
drivers/base/base.h             |  1 +
drivers/base/core.c             | 10 +++++++
drivers/base/power/runtime.c    | 15 +++++++----
include/linux/rcu_sync.h        |  4 +--
include/linux/rculist.h         | 28 +++++++++++++++----
include/linux/rcupdate.h        |  7 +++++
kernel/rcu/Kconfig.debug        | 11 ++++++++
kernel/rcu/update.c             | 48 ++++++++++++++++++---------------
kernel/workqueue.c              | 10 ++-----
net/ipv4/fib_frontend.c         |  3 ++-
14 files changed, 119 insertions(+), 53 deletions(-)

--
2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* [PATCH 3/9] rcu/sync: Remove custom check for reader-section (v2)
From: Joel Fernandes (Google) @ 2019-07-15 14:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Oleg Nesterov, Alexey Kuznetsov,
	Bjorn Helgaas, Borislav Petkov, c0d1n61at3, David S. Miller,
	edumazet, Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Paul E. McKenney, Pavel Machek, peterz,
	Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

The rcu/sync code was doing its own check whether we are in a reader
section. With RCU consolidating flavors and the generic helper added in
this series, this is no longer need. We can just use the generic helper
and it results in a nice cleanup.

Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcu_sync.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
index 9b83865d24f9..0027d4c8087c 100644
--- a/include/linux/rcu_sync.h
+++ b/include/linux/rcu_sync.h
@@ -31,9 +31,7 @@ struct rcu_sync {
  */
 static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
 {
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
-			 !rcu_read_lock_bh_held() &&
-			 !rcu_read_lock_sched_held(),
+	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
 			 "suspicious rcu_sync_is_idle() usage");
 	return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
 }
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 4/9] ipv4: add lockdep condition to fix for_each_entry (v1)
From: Joel Fernandes (Google) @ 2019-07-15 14:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

Using the previous support added, use it for adding lockdep conditions
to list usage here.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 net/ipv4/fib_frontend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 317339cd7f03..26b0fb24e2c2 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -124,7 +124,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
 	h = id & (FIB_TABLE_HASHSZ - 1);
 
 	head = &net->ipv4.fib_table_hash[h];
-	hlist_for_each_entry_rcu(tb, head, tb_hlist) {
+	hlist_for_each_entry_rcu(tb, head, tb_hlist,
+				 lockdep_rtnl_is_held()) {
 		if (tb->tb_id == id)
 			return tb;
 	}
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 1/9] rcu/update: Remove useless check for debug_locks (v1)
From: Joel Fernandes (Google) @ 2019-07-15 14:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

In rcu_read_lock_sched_held(), debug_locks can never be true at the
point we check it because we already check debug_locks in
debug_lockdep_rcu_enabled() in the beginning. Remove the check.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/update.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 61df2bf08563..9dd5aeef6e70 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -93,17 +93,13 @@ module_param(rcu_normal_after_boot, int, 0);
  */
 int rcu_read_lock_sched_held(void)
 {
-	int lockdep_opinion = 0;
-
 	if (!debug_lockdep_rcu_enabled())
 		return 1;
 	if (!rcu_is_watching())
 		return 0;
 	if (!rcu_lockdep_current_cpu_online())
 		return 0;
-	if (debug_locks)
-		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
-	return lockdep_opinion || !preemptible();
+	return lock_is_held(&rcu_sched_lock_map) || !preemptible();
 }
 EXPORT_SYMBOL(rcu_read_lock_sched_held);
 #endif
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 6/9] workqueue: Convert for_each_wq to use built-in list check (v2)
From: Joel Fernandes (Google) @ 2019-07-15 14:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

list_for_each_entry_rcu now has support to check for RCU reader sections
as well as lock. Just use the support in it, instead of explictly
checking in the caller.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/workqueue.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 601d61150b65..e882477ebf6e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -364,11 +364,6 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
 			 !lockdep_is_held(&wq_pool_mutex),		\
 			 "RCU or wq_pool_mutex should be held")
 
-#define assert_rcu_or_wq_mutex(wq)					\
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
-			 !lockdep_is_held(&wq->mutex),			\
-			 "RCU or wq->mutex should be held")
-
 #define assert_rcu_or_wq_mutex_or_pool_mutex(wq)			\
 	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
 			 !lockdep_is_held(&wq->mutex) &&		\
@@ -425,9 +420,8 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
  * ignored.
  */
 #define for_each_pwq(pwq, wq)						\
-	list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node)		\
-		if (({ assert_rcu_or_wq_mutex(wq); false; })) { }	\
-		else
+	list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,		\
+				 lock_is_held(&(wq->mutex).dep_map))
 
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
 
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 8/9] acpi: Use built-in RCU list checking for acpi_ioremaps list (v1)
From: Joel Fernandes (Google) @ 2019-07-15 14:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
it for acpi_ioremaps list traversal.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 drivers/acpi/osl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 9c0edf2fc0dd..2f9d0d20b836 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -14,6 +14,7 @@
 #include <linux/slab.h>
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/lockdep.h>
 #include <linux/pci.h>
 #include <linux/interrupt.h>
 #include <linux/kmod.h>
@@ -80,6 +81,7 @@ struct acpi_ioremap {
 
 static LIST_HEAD(acpi_ioremaps);
 static DEFINE_MUTEX(acpi_ioremap_lock);
+#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
 
 static void __init acpi_request_region (struct acpi_generic_address *gas,
 	unsigned int length, char *desc)
@@ -206,7 +208,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
 {
 	struct acpi_ioremap *map;
 
-	list_for_each_entry_rcu(map, &acpi_ioremaps, list)
+	list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
 		if (map->phys <= phys &&
 		    phys + size <= map->phys + map->size)
 			return map;
@@ -249,7 +251,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
 {
 	struct acpi_ioremap *map;
 
-	list_for_each_entry_rcu(map, &acpi_ioremaps, list)
+	list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
 		if (map->virt <= virt &&
 		    virt + size <= map->virt + map->size)
 			return map;
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 9/9] doc: Update documentation about list_for_each_entry_rcu (v1)
From: Joel Fernandes (Google) @ 2019-07-15 14:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

This patch updates the documentation with information about
usage of lockdep with list_for_each_entry_rcu().

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 Documentation/RCU/lockdep.txt   | 15 +++++++++++----
 Documentation/RCU/whatisRCU.txt |  9 ++++++++-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt
index da51d3068850..3d967df3a801 100644
--- a/Documentation/RCU/lockdep.txt
+++ b/Documentation/RCU/lockdep.txt
@@ -96,7 +96,14 @@ other flavors of rcu_dereference().  On the other hand, it is illegal
 to use rcu_dereference_protected() if either the RCU-protected pointer
 or the RCU-protected data that it points to can change concurrently.
 
-There are currently only "universal" versions of the rcu_assign_pointer()
-and RCU list-/tree-traversal primitives, which do not (yet) check for
-being in an RCU read-side critical section.  In the future, separate
-versions of these primitives might be created.
+Similar to rcu_dereference_protected, The RCU list and hlist traversal
+primitives also check for whether there are called from within a reader
+section. However, an optional lockdep expression can be passed to them as
+the last argument in case they are called under other non-RCU protection.
+
+For example, the workqueue for_each_pwq() macro is implemented as follows.
+It is safe to call for_each_pwq() outside a reader section but under protection
+of wq->mutex:
+#define for_each_pwq(pwq, wq)
+	list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,
+				lock_is_held(&(wq->mutex).dep_map))
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 7e1a8721637a..00fe77ede1e2 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -290,7 +290,7 @@ rcu_dereference()
 	at any time, including immediately after the rcu_dereference().
 	And, again like rcu_assign_pointer(), rcu_dereference() is
 	typically used indirectly, via the _rcu list-manipulation
-	primitives, such as list_for_each_entry_rcu().
+	primitives, such as list_for_each_entry_rcu() [2].
 
 	[1] The variant rcu_dereference_protected() can be used outside
 	of an RCU read-side critical section as long as the usage is
@@ -305,6 +305,13 @@ rcu_dereference()
 	a lockdep splat is emitted.  See RCU/Design/Requirements/Requirements.html
 	and the API's code comments for more details and example usage.
 
+	[2] In case the list_for_each_entry_rcu() primitive is intended
+	to be used outside of an RCU reader section such as when
+	protected by a lock, then an additional lockdep expression can be
+	passed as the last argument to it so that RCU lockdep checking code
+	knows that the dereference of the list pointers are safe. If the
+	indicated protection is not provided, a lockdep splat is emitted.
+
 The following diagram shows how each API communicates among the
 reader, updater, and reclaimer.
 
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 7/9] x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
From: Joel Fernandes (Google) @ 2019-07-15 14:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
reader-lock held, because the pci_mmcfg_lock is already held. Make this
known to the list macro so that it fixes new lockdep warnings that
trigger due to lockdep checks added to list_for_each_entry_rcu().

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 arch/x86/pci/mmconfig-shared.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 7389db538c30..6fa42e9c4e6f 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -29,6 +29,7 @@
 static bool pci_mmcfg_running_state;
 static bool pci_mmcfg_arch_init_failed;
 static DEFINE_MUTEX(pci_mmcfg_lock);
+#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
 
 LIST_HEAD(pci_mmcfg_list);
 
@@ -54,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
 	struct pci_mmcfg_region *cfg;
 
 	/* keep list sorted by segment and starting bus number */
-	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
 		if (cfg->segment > new->segment ||
 		    (cfg->segment == new->segment &&
 		     cfg->start_bus >= new->start_bus)) {
@@ -118,7 +119,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
 {
 	struct pci_mmcfg_region *cfg;
 
-	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held())
 		if (cfg->segment == segment &&
 		    cfg->start_bus <= bus && bus <= cfg->end_bus)
 			return cfg;
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH 5/9] driver/core: Convert to use built-in RCU list checking (v1)
From: Joel Fernandes (Google) @ 2019-07-15 14:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Greg Kroah-Hartman, Alexey Kuznetsov,
	Bjorn Helgaas, Borislav Petkov, c0d1n61at3, David S. Miller,
	edumazet, Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar,
	Jonathan Corbet, Josh Triplett, keescook, kernel-hardening,
	kernel-team, Lai Jiangshan, Len Brown, linux-acpi, linux-doc,
	linux-pci, linux-pm, Mathieu Desnoyers, neilb, netdev,
	Oleg Nesterov, Paul E. McKenney, Pavel Machek, peterz,
	Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-1-joel@joelfernandes.org>

list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
it in driver core.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 drivers/base/base.h          |  1 +
 drivers/base/core.c          | 10 ++++++++++
 drivers/base/power/runtime.c | 15 ++++++++++-----
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index b405436ee28e..0d32544b6f91 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; }
 /* Device links support */
 extern int device_links_read_lock(void);
 extern void device_links_read_unlock(int idx);
+extern int device_links_read_lock_held(void);
 extern int device_links_check_suppliers(struct device *dev);
 extern void device_links_driver_bound(struct device *dev);
 extern void device_links_driver_cleanup(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index da84a73f2ba6..85e82f38717f 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -68,6 +68,11 @@ void device_links_read_unlock(int idx)
 {
 	srcu_read_unlock(&device_links_srcu, idx);
 }
+
+int device_links_read_lock_held(void)
+{
+	return srcu_read_lock_held(&device_links_srcu);
+}
 #else /* !CONFIG_SRCU */
 static DECLARE_RWSEM(device_links_lock);
 
@@ -91,6 +96,11 @@ void device_links_read_unlock(int not_used)
 {
 	up_read(&device_links_lock);
 }
+
+int device_links_read_lock_held(void)
+{
+	return lock_is_held(&device_links_lock);
+}
 #endif /* !CONFIG_SRCU */
 
 /**
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 952a1e7057c7..7a10e8379a70 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev)
 {
 	struct device_link *link;
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held()) {
 		int retval;
 
 		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
@@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev)
 {
 	struct device_link *link;
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held()) {
 		if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
 			continue;
 
@@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev)
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
+				device_links_read_lock_held()) {
 		if (link->flags & DL_FLAG_STATELESS)
 			continue;
 
@@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev)
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held())
 		if (link->flags & DL_FLAG_PM_RUNTIME) {
 			link->supplier_preactivated = true;
 			refcount_inc(&link->rpm_active);
@@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev)
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held())
 		if (link->supplier_preactivated) {
 			link->supplier_preactivated = false;
 			if (refcount_dec_not_one(&link->rpm_active))
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox