Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] sit: use dst_cache in ipip6_tunnel_xmit
From: Haishuang Yan @ 2019-07-14 13:31 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan

Same as other ip tunnel, use dst_cache in xmit action to avoid
unnecessary fib lookups.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/sit.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 8061089..b2ccbc4 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -900,12 +900,17 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			   RT_TOS(tos), RT_SCOPE_UNIVERSE, IPPROTO_IPV6,
 			   0, dst, tiph->saddr, 0, 0,
 			   sock_net_uid(tunnel->net, NULL));
-	rt = ip_route_output_flow(tunnel->net, &fl4, NULL);
 
-	if (IS_ERR(rt)) {
-		dev->stats.tx_carrier_errors++;
-		goto tx_error_icmp;
+	rt = dst_cache_get_ip4(&tunnel->dst_cache, &fl4.saddr);
+	if (!rt) {
+		rt = ip_route_output_flow(tunnel->net, &fl4, NULL);
+		if (IS_ERR(rt)) {
+			dev->stats.tx_carrier_errors++;
+			goto tx_error_icmp;
+		}
+		dst_cache_set_ip4(&tunnel->dst_cache, &rt->dst, fl4.saddr);
 	}
+
 	if (rt->rt_type != RTN_UNICAST) {
 		ip_rt_put(rt);
 		dev->stats.tx_carrier_errors++;
-- 
1.8.3.1




^ permalink raw reply related

* Re: [PATCH net v2] net: neigh: fix multiple neigh timer scheduling
From: David Ahern @ 2019-07-14 13:58 UTC (permalink / raw)
  To: Lorenzo Bianconi, davem; +Cc: netdev, marek
In-Reply-To: <793a1166667e00a3553577e1505bebd435e22c88.1563041150.git.lorenzo.bianconi@redhat.com>

On 7/14/19 2:45 AM, Lorenzo Bianconi wrote:
> @@ -1124,7 +1125,9 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
>  
>  			atomic_set(&neigh->probes,
>  				   NEIGH_VAR(neigh->parms, UCAST_PROBES));
> -			neigh->nud_state     = NUD_INCOMPLETE;
> +			if (check_timer)
> +				neigh_del_timer(neigh);

Why not just always call neigh_del_timer and avoid the check_timer flag?
Let the NUD_IN_TIMER flag handle whether anything needs to be done.

> +			neigh->nud_state = NUD_INCOMPLETE;
>  			neigh->updated = now;
>  			next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME),
>  					 HZ/2);
> @@ -1140,6 +1143,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
>  		}
>  	} else if (neigh->nud_state & NUD_STALE) {
>  		neigh_dbg(2, "neigh %p is delayed\n", neigh);
> +		if (check_timer)
> +			neigh_del_timer(neigh);
>  		neigh->nud_state = NUD_DELAY;
>  		neigh->updated = jiffies;
>  		neigh_add_timer(neigh, jiffies +


^ permalink raw reply

* Re: INFO: rcu detected stall in ext4_write_checks
From: Dmitry Vyukov @ 2019-07-14 14:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Theodore Ts'o, syzbot, Andreas Dilger, David Miller, eladr,
	Ido Schimmel, Jiri Pirko, John Stultz, linux-ext4, LKML, netdev,
	syzkaller-bugs, Thomas Gleixner, Peter Zijlstra, Ingo Molnar
In-Reply-To: <20190707011655.GA22081@linux.ibm.com>

On Sun, Jul 7, 2019 at 4:17 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> > > > I suppose RCU could take the dueling-banjos approach and use increasingly
> > > > aggressive scheduler policies itself, up to and including SCHED_DEADLINE,
> > > > until it started getting decent forward progress.  However, that
> > > > sounds like the something that just might have unintended consequences,
> > > > particularly if other kernel subsystems were to also play similar
> > > > games of dueling banjos.
> > >
> > > So long as the RCU threads are well-behaved, using SCHED_DEADLINE
> > > shouldn't have much of an impact on the system --- and the scheduling
> > > parameters that you can specify on SCHED_DEADLINE allows you to
> > > specify the worst-case impact on the system while also guaranteeing
> > > that the SCHED_DEADLINE tasks will urn in the first place.  After all,
> > > that's the whole point of SCHED_DEADLINE.
> > >
> > > So I wonder if the right approach is during the the first userspace
> > > system call to shced_setattr to enable a (any) real-time priority
> > > scheduler (SCHED_DEADLINE, SCHED_FIFO or SCHED_RR) on a userspace
> > > thread, before that's allowed to proceed, the RCU kernel threads are
> > > promoted to be SCHED_DEADLINE with appropriately set deadline
> > > parameters.  That way, a root user won't be able to shoot the system
> > > in the foot, and since the vast majority of the time, there shouldn't
> > > be any processes running with real-time priorities, we won't be
> > > changing the behavior of a normal server system.
> >
> > It might well be.  However, running the RCU kthreads at real-time
> > priority does not come for free.  For example, it tends to crank up the
> > context-switch rate.
> >
> > Plus I have taken several runs at computing SCHED_DEADLINE parameters,
> > but things like the rcuo callback-offload threads have computational
> > requirements that are controlled not by RCU, and not just by the rest of
> > the kernel, but also by userspace (keeping in mind the example of opening
> > and closing a file in a tight loop, each pass of which queues a callback).
> > I suspect that RCU is not the only kernel subsystem whose computational
> > requirements are set not by the subsystem, but rather by external code.
> >
> > OK, OK, I suppose I could just set insanely large SCHED_DEADLINE
> > parameters, following syzkaller's example, and then trust my ability to
> > keep the RCU code from abusing the resulting awesome power.  But wouldn't
> > a much nicer approach be to put SCHED_DEADLINE between SCHED_RR/SCHED_FIFO
> > priorities 98 and 99 or some such?  Then the same (admittedly somewhat
> > scary) result could be obtained much more simply via SCHED_FIFO or
> > SCHED_RR priority 99.
> >
> > Some might argue that this is one of those situations where simplicity
> > is not necessarily an advantage, but then again, you can find someone
> > who will complain about almost anything.  ;-)
> >
> > > (I suspect there might be some audio applications that might try to
> > > set real-time priorities, but for desktop systems, it's probably more
> > > important that the system not tie its self into knots since the
> > > average desktop user isn't going to be well equipped to debug the
> > > problem.)
> >
> > Not only that, but if core counts continue to increase, and if reliance
> > on cloud computing continues to grow, there are going to be an increasing
> > variety of mixed workloads in increasingly less-controlled environments.
> >
> > So, yes, it would be good to solve this problem in some reasonable way.
> >
> > I don't see this as urgent just yet, but I am sure you all will let
> > me know if I am mistaken on that point.
> >
> > > > Alternatively, is it possible to provide stricter admission control?
> > >
> > > I think that's an orthogonal issue; better admission control would be
> > > nice, but it looks to me that it's going to be fundamentally an issue
> > > of tweaking hueristics, and a fool-proof solution that will protect
> > > against all malicious userspace applications (including syzkaller) is
> > > going to require solving the halting problem.  So while it would be
> > > nice to improve the admission control, I don't think that's a going to
> > > be a general solution.
> >
> > Agreed, and my earlier point about the need to trust the coding abilities
> > of those writing ultimate-priority code is all too consistent with your
> > point about needing to solve the halting problem.  Nevertheless,  I believe
> > that we could make something that worked reasonably well in practice.
> >
> > Here are a few components of a possible solution, in practice, but
> > of course not in theory:
> >
> > 1.    We set limits to SCHED_DEADLINE parameters, perhaps novel ones.
> >       For one example, insist on (say) 10 milliseconds of idle time
> >       every second on each CPU.  Yes, you can configure beyond that
> >       given sufficient permissions, but if you do so, you just voided
> >       your warranty.
> >
> > 2.    Only allow SCHED_DEADLINE on nohz_full CPUs.  (Partial solution,
> >       given that such a CPU might be running in the kernel or have
> >       more than one runnable task.  Just for fun, I will suggest the
> >       option of disabling SCHED_DEADLINE during such times.)
> >
> > 3.    RCU detects slowdowns, and does something TBD to increase its
> >       priority, but only while the slowdown persists.  This likely
> >       relies on scheduling-clock interrupts to detect the slowdowns,
> >       so there might be additional challenges on a fully nohz_full
> >       system.
>
> 4.      SCHED_DEADLINE treats the other three scheduling classes as each
>         having a period, deadline, and a modest CPU consumption budget
>         for the members of the class in aggregate.  But this has to have
>         been discussed before.  How did that go?
>
> > 5.    Your idea here.

Trying to digest this thread.

Do I understand correctly that setting rcutree.kthread_prio=99 won't
help because the deadline priority is higher?
And there are no other existing mechanisms to either fix the stalls
nor make kernel reject the non well-behaving parameters? Kernel tries
to filter out non well-behaving parameters, but the check detects only
obvious misconfigurations, right?
This reminds of priority inversion/inheritance problem. I wonder if
there are other kernel subsystems that suffer from the same problem.
E.g. the background kernel thread that destroys net namespaces and any
other type of async work. A high prio user process can overload the
queue and make kernel eat all memory. May be relatively easy to do
even unintentionally. I suspect the problem is not specific to rcu and
plumbing just rcu may just make the next problem pop up.
Should user be able to starve basic kernel services? User should be
able to prioritize across user processes (potentially in radical
ways), but perhaps it should not be able to badly starve kernel
functions that just happened to be asynchronous? I guess it's not as
simple as setting the highest prio for all kernel threads because in
normal case we want to reduce latency of user work by making the work
async. But user must not be able to starve kernel threads
infinitely... sounds like something similar to the deadline scheduling
-- kernel threads need to get at least some time slice per unit of
time.

But short term I don't see any other solution than stop testing
sched_setattr because it does not check arguments enough to prevent
system misbehavior. Which is a pity because syzkaller has found some
bad misconfigurations that were oversight on checking side.
Any other suggestions?

^ permalink raw reply

* Re: [PATCH iproute2] utils: don't match empty strings as prefixes
From: Matteo Croce @ 2019-07-14 14:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David Ahern
In-Reply-To: <CAGnkfhz+p1o_yHxk2jkY9ggNwLSO-Jk4BcxPuWhSHw1YXoJsSw@mail.gmail.com>

On Wed, Jul 10, 2019 at 1:18 AM Matteo Croce <mcroce@redhat.com> wrote:
>
> On Tue, Jul 9, 2019 at 11:38 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Tue,  9 Jul 2019 22:40:40 +0200
> > Matteo Croce <mcroce@redhat.com> wrote:
> >
> > > iproute has an utility function which checks if a string is a prefix for
> > > another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
> > > instead of 'address'.
> > >
> > > This routine unfortunately considers an empty string as prefix
> > > of any pattern, leading to undefined behaviour when an empty
> > > argument is passed to ip:
> > >
> > >     # ip ''
> > >     1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
> > >         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > >         inet 127.0.0.1/8 scope host lo
> > >            valid_lft forever preferred_lft forever
> > >         inet6 ::1/128 scope host
> > >            valid_lft forever preferred_lft forever
> > >
> > >     # tc ''
> > >     qdisc noqueue 0: dev lo root refcnt 2
> > >
> > >     # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
> > >     # ip addr show dev dummy0
> > >     6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > >         link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
> > >         inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
> > >            valid_lft forever preferred_lft forever
> > >
> > > Rewrite matches() so it takes care of an empty input, and doesn't
> > > scan the input strings three times: the actual implementation
> > > does 2 strlen and a memcpy to accomplish the same task.
> > >
> > > Signed-off-by: Matteo Croce <mcroce@redhat.com>
> > > ---
> > >  include/utils.h |  2 +-
> > >  lib/utils.c     | 14 +++++++++-----
> > >  2 files changed, 10 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/utils.h b/include/utils.h
> > > index 927fdc17..f4d12abb 100644
> > > --- a/include/utils.h
> > > +++ b/include/utils.h
> > > @@ -198,7 +198,7 @@ int nodev(const char *dev);
> > >  int check_ifname(const char *);
> > >  int get_ifname(char *, const char *);
> > >  const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
> > > -int matches(const char *arg, const char *pattern);
> > > +int matches(const char *prefix, const char *string);
> > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
> > >  int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
> > >
> > > diff --git a/lib/utils.c b/lib/utils.c
> > > index be0f11b0..73ce19bb 100644
> > > --- a/lib/utils.c
> > > +++ b/lib/utils.c
> > > @@ -887,13 +887,17 @@ const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
> > >       return name;
> > >  }
> > >
> > > -int matches(const char *cmd, const char *pattern)
> > > +/* Check if 'prefix' is a non empty prefix of 'string' */
> > > +int matches(const char *prefix, const char *string)
> > >  {
> > > -     int len = strlen(cmd);
> > > +     if (!*prefix)
> > > +             return 1;
> > > +     while(*string && *prefix == *string) {
> > > +             prefix++;
> > > +             string++;
> > > +     }
> > >
> > > -     if (len > strlen(pattern))
> > > -             return -1;
> > > -     return memcmp(pattern, cmd, len);
> > > +     return *prefix;
> > >  }
> > >
> > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits)
> >
> > ERROR: space required before the open parenthesis '('
> > #134: FILE: lib/utils.c:895:
> > +       while(*string && *prefix == *string) {
> >
> > total: 1 errors, 1 warnings, 30 lines checked
> >
> > The empty prefix string is a bug and should not be allowed.
> > Also return value should be same as old code (yours isn't).
> >
> >
> >
>
> The old return value was the difference between the first pair of
> bytes, according to the memcmp manpage.
> All calls only checks if the matches() return value is 0 or not 0:
>
> iproute2$ git grep 'matches(' |grep -v -e '== 0' -e '= 0' -e '!matches('
> include/utils.h:int matches(const char *prefix, const char *string);
> include/xtables.h:extern void xtables_register_matches(struct
> xtables_match *, unsigned int);
> lib/color.c:    if (matches(dup, "-color"))
> lib/utils.c:int matches(const char *prefix, const char *string)
> tc/tc.c:                if (matches(argv[0], iter->c))
>
> Is it a problem if it returns a non negative value for non matching strings?
>
> Regards,
>
>
> --
> Matteo Croce
> per aspera ad upstream

Hi Stephen,

should I send a v2 which keeps the old behaviour, even if noone checks
for all the values?
Just to clarify, the old behaviour of matches(cmd, pattern) was:

-1 if len(cmd) > len(pattern)
0 if pattern is equal to cmd
0 if pattern starts with cmd
< 0 if pattern is alphabetically lower than cmd
> 0 if pattern is alphabetically higher than cmd

Regards,
-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* [PATCH 8/8] docs: remove extra conf.py files
From: Mauro Carvalho Chehab @ 2019-07-14 15:10 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet, Herbert Xu, David S. Miller, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard, Sean Paul,
	Dmitry Torokhov, Yoshinori Sato, Rich Felker, Jaroslav Kysela,
	Takashi Iwai, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-crypto, dri-devel, linux-input, netdev,
	linux-sh, alsa-devel
In-Reply-To: <cover.1563115732.git.mchehab+samsung@kernel.org>

Now that the latex_documents are handled automatically, we can
remove those extra conf.py files.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/conf.py      | 10 ----------
 Documentation/core-api/conf.py         | 10 ----------
 Documentation/crypto/conf.py           | 10 ----------
 Documentation/dev-tools/conf.py        | 10 ----------
 Documentation/doc-guide/conf.py        | 10 ----------
 Documentation/driver-api/80211/conf.py | 10 ----------
 Documentation/driver-api/conf.py       | 10 ----------
 Documentation/driver-api/pm/conf.py    | 10 ----------
 Documentation/filesystems/conf.py      | 10 ----------
 Documentation/gpu/conf.py              | 10 ----------
 Documentation/input/conf.py            | 10 ----------
 Documentation/kernel-hacking/conf.py   | 10 ----------
 Documentation/maintainer/conf.py       | 10 ----------
 Documentation/media/conf.py            | 12 ------------
 Documentation/networking/conf.py       | 10 ----------
 Documentation/process/conf.py          | 10 ----------
 Documentation/sh/conf.py               | 10 ----------
 Documentation/sound/conf.py            | 10 ----------
 Documentation/userspace-api/conf.py    | 10 ----------
 Documentation/vm/conf.py               | 10 ----------
 Documentation/x86/conf.py              | 10 ----------
 21 files changed, 212 deletions(-)
 delete mode 100644 Documentation/admin-guide/conf.py
 delete mode 100644 Documentation/core-api/conf.py
 delete mode 100644 Documentation/crypto/conf.py
 delete mode 100644 Documentation/dev-tools/conf.py
 delete mode 100644 Documentation/doc-guide/conf.py
 delete mode 100644 Documentation/driver-api/80211/conf.py
 delete mode 100644 Documentation/driver-api/conf.py
 delete mode 100644 Documentation/driver-api/pm/conf.py
 delete mode 100644 Documentation/filesystems/conf.py
 delete mode 100644 Documentation/gpu/conf.py
 delete mode 100644 Documentation/input/conf.py
 delete mode 100644 Documentation/kernel-hacking/conf.py
 delete mode 100644 Documentation/maintainer/conf.py
 delete mode 100644 Documentation/media/conf.py
 delete mode 100644 Documentation/networking/conf.py
 delete mode 100644 Documentation/process/conf.py
 delete mode 100644 Documentation/sh/conf.py
 delete mode 100644 Documentation/sound/conf.py
 delete mode 100644 Documentation/userspace-api/conf.py
 delete mode 100644 Documentation/vm/conf.py
 delete mode 100644 Documentation/x86/conf.py

diff --git a/Documentation/admin-guide/conf.py b/Documentation/admin-guide/conf.py
deleted file mode 100644
index 86f738953799..000000000000
--- a/Documentation/admin-guide/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel User Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'linux-user.tex', 'Linux Kernel User Documentation',
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/core-api/conf.py b/Documentation/core-api/conf.py
deleted file mode 100644
index db1f7659f3da..000000000000
--- a/Documentation/core-api/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Core-API Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'core-api.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/crypto/conf.py b/Documentation/crypto/conf.py
deleted file mode 100644
index 4335d251ddf3..000000000000
--- a/Documentation/crypto/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Crypto API'
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/dev-tools/conf.py b/Documentation/dev-tools/conf.py
deleted file mode 100644
index 7faafa3f7888..000000000000
--- a/Documentation/dev-tools/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Development tools for the kernel"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'dev-tools.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/doc-guide/conf.py b/Documentation/doc-guide/conf.py
deleted file mode 100644
index fd3731182d5a..000000000000
--- a/Documentation/doc-guide/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Documentation Guide'
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/driver-api/80211/conf.py b/Documentation/driver-api/80211/conf.py
deleted file mode 100644
index 4424b4b0b9c3..000000000000
--- a/Documentation/driver-api/80211/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux 802.11 Driver Developer's Guide"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', '80211.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/driver-api/conf.py b/Documentation/driver-api/conf.py
deleted file mode 100644
index 202726d20088..000000000000
--- a/Documentation/driver-api/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "The Linux driver implementer's API guide"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'driver-api.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/driver-api/pm/conf.py b/Documentation/driver-api/pm/conf.py
deleted file mode 100644
index a89fac11272f..000000000000
--- a/Documentation/driver-api/pm/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Device Power Management"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'pm.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/filesystems/conf.py b/Documentation/filesystems/conf.py
deleted file mode 100644
index ea44172af5c4..000000000000
--- a/Documentation/filesystems/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Filesystems API"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'filesystems.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/gpu/conf.py b/Documentation/gpu/conf.py
deleted file mode 100644
index 1757b040fb32..000000000000
--- a/Documentation/gpu/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux GPU Driver Developer's Guide"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'gpu.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/input/conf.py b/Documentation/input/conf.py
deleted file mode 100644
index d2352fdc92ed..000000000000
--- a/Documentation/input/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "The Linux input driver subsystem"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'linux-input.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/kernel-hacking/conf.py b/Documentation/kernel-hacking/conf.py
deleted file mode 100644
index 3d8acf0f33ad..000000000000
--- a/Documentation/kernel-hacking/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Kernel Hacking Guides"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'kernel-hacking.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/maintainer/conf.py b/Documentation/maintainer/conf.py
deleted file mode 100644
index 81e9eb7a7884..000000000000
--- a/Documentation/maintainer/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Development Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'maintainer.tex', 'Linux Kernel Development Documentation',
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/media/conf.py b/Documentation/media/conf.py
deleted file mode 100644
index 1f194fcd2cae..000000000000
--- a/Documentation/media/conf.py
+++ /dev/null
@@ -1,12 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-# SPDX-License-Identifier: GPL-2.0
-
-project = 'Linux Media Subsystem Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'media.tex', 'Linux Media Subsystem Documentation',
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/networking/conf.py b/Documentation/networking/conf.py
deleted file mode 100644
index 40f69e67a883..000000000000
--- a/Documentation/networking/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Networking Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'networking.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/process/conf.py b/Documentation/process/conf.py
deleted file mode 100644
index 1b01a80ad9ce..000000000000
--- a/Documentation/process/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Development Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'process.tex', 'Linux Kernel Development Documentation',
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/sh/conf.py b/Documentation/sh/conf.py
deleted file mode 100644
index 1eb684a13ac8..000000000000
--- a/Documentation/sh/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "SuperH architecture implementation manual"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'sh.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/sound/conf.py b/Documentation/sound/conf.py
deleted file mode 100644
index 3f1fc5e74e7b..000000000000
--- a/Documentation/sound/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Sound Subsystem Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'sound.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/userspace-api/conf.py b/Documentation/userspace-api/conf.py
deleted file mode 100644
index 2eaf59f844e5..000000000000
--- a/Documentation/userspace-api/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "The Linux kernel user-space API guide"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'userspace-api.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/vm/conf.py b/Documentation/vm/conf.py
deleted file mode 100644
index 3b0b601af558..000000000000
--- a/Documentation/vm/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Memory Management Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'memory-management.tex', project,
-     'The kernel development community', 'manual'),
-]
diff --git a/Documentation/x86/conf.py b/Documentation/x86/conf.py
deleted file mode 100644
index 33c5c3142e20..000000000000
--- a/Documentation/x86/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "X86 architecture specific documentation"
-
-tags.add("subproject")
-
-latex_documents = [
-    ('index', 'x86.tex', project,
-     'The kernel development community', 'manual'),
-]
-- 
2.21.0


^ permalink raw reply related

* [PATCH 0/8] docs: some improvements when producing PDF files
From: Mauro Carvalho Chehab @ 2019-07-14 15:10 UTC (permalink / raw)
  To: Linux Doc Mailing List, Jonathan Corbet
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, linux-kernel,
	Herbert Xu, netdev, Sean Paul, Maarten Lankhorst, Rich Felker,
	Takashi Iwai, Daniel Vetter, x86, alsa-devel, Maxime Ripard,
	linux-sh, David Airlie, linux-crypto, Jaroslav Kysela, dri-devel,
	Yoshinori Sato, Ingo Molnar, Borislav Petkov, David S. Miller,
	linux-input, H. Peter Anvin, Dmitry Torokhov, Thomas Gleixner

Hi Jon,

This series addresses your concerns related to CJK fonts that are
needed for translations.pdf.

It touches only the documentation build system, not the docs
themselves.

It ended to be bigger than I originally foreseen, as I found several issues
when running "make pdfdocs" for the distros that are recognized by
the  scripts/sphinx-pre-install script.

It also took a lot of time, as I tested it with several VMs (each
one updated to latest packages):

- Fedora 30, CentOS 7, Mageia 7, ArchLinux, Ubuntu 18.04, Gentoo, 
  OpenSuse Tumbleweed.

Patch 1 addresses an issue that could be related to the fact that I
don't use openSUSE. Basically, I was unable to find the right package
for texlive to use CJK fonts on openSUSE. [1]. So, the first patch on this
series adds a workaround: if the needed CJK font is not found on a
system, conf.py won't use xeCjk extension. That sounds a good
thing to have, as other distros may not package it, or maybe the
one building the doc is not that interested on translations.pdf file;

[1] I actually found some, but they are not recognized with the
    font name conf.py is expecting ("Noto Sans CJK SC"). Perhaps
    SUSE uses a different name for those fonts?

Patch 2 fixes the logic with recognizes CentOS/RHEL;

Patch 3 is another workaround: CentOS 7 (and similar distros) don't
package all texlive packages we need. So, it just ignores PDF when
recommending packages on such distros, and point to a URL with
explains how to install TexLive outside distro-specific package
management (for the brave enough people);

Patch 4 fixes latexmk dependency on a few distros;

Patch 5 suppreses a Gentoo specific instruction if the user already
followed in the past;

Patch 6 is the one that actually does what you requested.

Patch 7 solves an issue when SPHINXDIRS is used with make pdfdocs:
right now, using it will produce a lot of warnings and won't do anything,
if a dir-specific conf.py file is not found. With the patch, latex_documents
are now properly updated when SPHINXDIRS is used.

Patch 8 is a cleanup: with patch 7 applied, we don't need to have anymore
any conf.py file due to pdfdocs. 

With regard to the load_config.py extension, It keeps accepting custom
configuration. That's helpful if someone wants, for example, to have
something like:

	Documentation/media/conf_nitpick.py

with would enable extra nitpick options if one wants that.

-

Jon,

Please let me know if you prefer if I submit those together with the big
pile of doc files I have, or if you prefer adding (some of?) them on your
tree after the merge window.

Regards,
Mauro

Mauro Carvalho Chehab (8):
  docs: conf.py: only use CJK if the font is available
  scripts/sphinx-pre-install: fix script for RHEL/CentOS
  scripts/sphinx-pre-install: don't use LaTeX with CentOS 7
  scripts/sphinx-pre-install: fix latexmk dependencies
  scripts/sphinx-pre-install: cleanup Gentoo checks
  scripts/sphinx-pre-install: seek for Noto CJK fonts for pdf output
  docs: load_config.py: avoid needing a conf.py just due to LaTeX docs
  docs: remove extra conf.py files

 Documentation/admin-guide/conf.py      |  10 ---
 Documentation/conf.py                  |  13 ++-
 Documentation/core-api/conf.py         |  10 ---
 Documentation/crypto/conf.py           |  10 ---
 Documentation/dev-tools/conf.py        |  10 ---
 Documentation/doc-guide/conf.py        |  10 ---
 Documentation/driver-api/80211/conf.py |  10 ---
 Documentation/driver-api/conf.py       |  10 ---
 Documentation/driver-api/pm/conf.py    |  10 ---
 Documentation/filesystems/conf.py      |  10 ---
 Documentation/gpu/conf.py              |  10 ---
 Documentation/input/conf.py            |  10 ---
 Documentation/kernel-hacking/conf.py   |  10 ---
 Documentation/maintainer/conf.py       |  10 ---
 Documentation/media/conf.py            |  12 ---
 Documentation/networking/conf.py       |  10 ---
 Documentation/process/conf.py          |  10 ---
 Documentation/sh/conf.py               |  10 ---
 Documentation/sound/conf.py            |  10 ---
 Documentation/sphinx/load_config.py    |  25 +++++-
 Documentation/userspace-api/conf.py    |  10 ---
 Documentation/vm/conf.py               |  10 ---
 Documentation/x86/conf.py              |  10 ---
 scripts/sphinx-pre-install             | 118 ++++++++++++++++++++-----
 24 files changed, 131 insertions(+), 237 deletions(-)
 delete mode 100644 Documentation/admin-guide/conf.py
 delete mode 100644 Documentation/core-api/conf.py
 delete mode 100644 Documentation/crypto/conf.py
 delete mode 100644 Documentation/dev-tools/conf.py
 delete mode 100644 Documentation/doc-guide/conf.py
 delete mode 100644 Documentation/driver-api/80211/conf.py
 delete mode 100644 Documentation/driver-api/conf.py
 delete mode 100644 Documentation/driver-api/pm/conf.py
 delete mode 100644 Documentation/filesystems/conf.py
 delete mode 100644 Documentation/gpu/conf.py
 delete mode 100644 Documentation/input/conf.py
 delete mode 100644 Documentation/kernel-hacking/conf.py
 delete mode 100644 Documentation/maintainer/conf.py
 delete mode 100644 Documentation/media/conf.py
 delete mode 100644 Documentation/networking/conf.py
 delete mode 100644 Documentation/process/conf.py
 delete mode 100644 Documentation/sh/conf.py
 delete mode 100644 Documentation/sound/conf.py
 delete mode 100644 Documentation/userspace-api/conf.py
 delete mode 100644 Documentation/vm/conf.py
 delete mode 100644 Documentation/x86/conf.py

-- 
2.21.0

^ permalink raw reply

* Re: [PATCH iproute2-next] Makefile: pass -pipe to the compiler
From: Matteo Croce @ 2019-07-14 16:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David Ahern
In-Reply-To: <20190611122800.56d72eba@hermes.lan>

On Tue, Jun 11, 2019 at 9:28 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 11 Jun 2019 20:05:13 +0200
> Matteo Croce <mcroce@redhat.com> wrote:
>
> > Pass the -pipe option to GCC, to use pipes instead of temp files.
> > On a slow AMD G-T40E CPU we get a non negligible 6% improvement
> > in build time.
> >
> > real    1m15,111s
> > user    1m2,521s
> > sys     0m12,465s
> >
> > real    1m10,861s
> > user    1m2,520s
> > sys     0m12,901s
> >
> > Signed-off-by: Matteo Croce <mcroce@redhat.com>
>
> Why bother, on my machine (make -j12).
>
> Before
> real    0m6.320s
> user    0m30.674s
> sys     0m3.649s
>
>
> After (with -pipe)
> real    0m6.158s
> user    0m31.197s
> sys     0m3.532s
>
>
> So it is slower. Get a faster disk :-)
>

I did it :)

root@apu:~# hdparm -t /dev/sda

/dev/sda:
 Timing buffered disk reads: 1086 MB in  3.00 seconds = 361.58 MB/sec

No change at all thought. It's really a CPU bound job, and this
machine is not a number cruncher:

root@apu:~# dd if=/dev/zero bs=1G count=1 status=none |time -p sha1sum
2a492f15396a6768bcbca016993f4b4c8b0b5307  -
real 16.00
user 12.38
sys 2.05

I think that the slight increase is due to the fact that the processes
starts in parallel, instead of being serialized.

> Maybe allow "EXTRA_CFLAGS" to be passed to Makefile for those that
> have a burning need for this.

Just out of curiosity, I checked the linux history repo to discover
when it was introduced in the kernel, it tooks me a while to really
find it:

Linux-0.99.13k (September 19, 1993)

-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* [PATCH] sis900: correct a few typos
From: Sergej Benilov @ 2019-07-14 16:56 UTC (permalink / raw)
  To: venza, netdev, andrew; +Cc: Sergej Benilov

Correct a few typos in comments and debug text.

Signed-off-by: Sergej Benilov <sergej.benilov@googlemail.com>
---
 drivers/net/ethernet/sis/sis900.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sis/sis900.c b/drivers/net/ethernet/sis/sis900.c
index aba6eea72..6e07f5eba 100644
--- a/drivers/net/ethernet/sis/sis900.c
+++ b/drivers/net/ethernet/sis/sis900.c
@@ -262,7 +262,7 @@ static int sis900_get_mac_addr(struct pci_dev *pci_dev,
 	/* check to see if we have sane EEPROM */
 	signature = (u16) read_eeprom(ioaddr, EEPROMSignature);
 	if (signature == 0xffff || signature == 0x0000) {
-		printk (KERN_WARNING "%s: Error EERPOM read %x\n",
+		printk (KERN_WARNING "%s: Error EEPROM read %x\n",
 			pci_name(pci_dev), signature);
 		return 0;
 	}
@@ -359,9 +359,9 @@ static int sis635_get_mac_addr(struct pci_dev *pci_dev,
  *
  *	SiS962 or SiS963 model, use EEPROM to store MAC address. And EEPROM
  *	is shared by
- *	LAN and 1394. When access EEPROM, send EEREQ signal to hardware first
+ *	LAN and 1394. When accessing EEPROM, send EEREQ signal to hardware first
  *	and wait for EEGNT. If EEGNT is ON, EEPROM is permitted to be accessed
- *	by LAN, otherwise is not. After MAC address is read from EEPROM, send
+ *	by LAN, otherwise it is not. After MAC address is read from EEPROM, send
  *	EEDONE signal to refuse EEPROM access by LAN.
  *	The EEPROM map of SiS962 or SiS963 is different to SiS900.
  *	The signature field in SiS962 or SiS963 spec is meaningless.
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-14 18:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713212812.GH26519@linux.ibm.com>

On Sat, Jul 13, 2019 at 02:28:12PM -0700, Paul E. McKenney wrote:
> On Sat, Jul 13, 2019 at 12:13:16PM -0400, Joel Fernandes wrote:
> > On Sat, Jul 13, 2019 at 08:50:10AM -0700, Paul E. McKenney wrote:
> > > On Sat, Jul 13, 2019 at 11:36:06AM -0400, Joel Fernandes wrote:
> > > > On Sat, Jul 13, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> > > > > On Sat, Jul 13, 2019 at 09:30:49AM -0400, Joel Fernandes wrote:
> > > > > > On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> > > > > > > On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > > > > > > > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > > > > > > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > > > > > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > > > > > > and it results in a nice cleanup.
> > > > > > > > > > > > 
> > > > > > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > > > 
> > > > > > > > > > > Hi Oleg,
> > > > > > > > > > > Slightly unrelated to the patch,
> > > > > > > > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > > > > > > > 
> > > > > > > > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > > > > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > > > > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > > > > > > > reflect latest changes?
> > > > > > > > > > > Also could you help understand how is a writer not able to change
> > > > > > > > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > > > > > > > comment tries to say?
> > > > > > > > > > > 
> > > > > > > > > > > 	/*
> > > > > > > > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > > > > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > > > > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > > > > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > > > > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > > > > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > > > > > > > 	 */
> > > > > > > > > > > 
> > > > > > > > > > > Also,
> > > > > > > > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > > > > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > > > > > > > series.
> > > > > > > > > > 
> > > > > > > > > > Hello, Joel,
> > > > > > > > > > 
> > > > > > > > > > Oleg has a set of patches updating this code that just hit mainline
> > > > > > > > > > this week.  These patches get rid of the code that previously handled
> > > > > > > > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > > > > > > > me just missing your point?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Hi Paul,
> > > > > > > > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > > > > > > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > > > > > > > say, I need to rebase my tree.
> > > > > > > > > 
> > > > > > > > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > > > > > > > rebase just fine.
> > > > > > > > > 
> > > > > > > > > The 2 options I see are:
> > > > > > > > > 1. Let us drop this patch for now and I resend it later.
> > > > > > > > > 2. I resend all patches based on Linus's master branch.
> > > > > > > > 
> > > > > > > > Below is the updated patch based on Linus master branch:
> > > > > > > > 
> > > > > > > > ---8<-----------------------
> > > > > > > > 
> > > > > > > > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > > > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > > > > > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > > > > > > > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > > > > > > > 
> > > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > > and it results in a nice cleanup.
> > > > > > > > 
> > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > ---
> > > > > > > >  include/linux/rcu_sync.h | 4 +---
> > > > > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > > > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > > > > > --- a/include/linux/rcu_sync.h
> > > > > > > > +++ b/include/linux/rcu_sync.h
> > > > > > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > > > > > >   */
> > > > > > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > > > > > >  {
> > > > > > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > > > > > -			 !rcu_read_lock_bh_held() &&
> > > > > > > > -			 !rcu_read_lock_sched_held(),
> > > > > > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > > > > > 
> > > > > > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > > > > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > > > > > If you have not already done so, could you please give it a try?
> > > > > > 
> > > > > > Hi Paul,
> > > > > > I don't think it will cause splats for !CONFIG_PREEMPT.
> > > > > > 
> > > > > > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > > > > > !preemptible(). This means that:
> > > > > > 
> > > > > > The following expression above:
> > > > > > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > > > > > 
> > > > > > Becomes:
> > > > > > RCU_LOCKDEP_WARN(preemptible(), ...)
> > > > > > 
> > > > > > For, CONFIG_PREEMPT=n kernels, this means:
> > > > > > RCU_LOCKDEP_WARN(0, ...)
> > > > > > 
> > > > > > Which would mean no splats. Or, did I miss the point?
> > > > > 
> > > > > I suggest trying it out on a CONFIG_PREEMPT=n kernel.
> > > > 
> > > > Sure, will do, sorry did not try it out yet because was busy with weekend
> > > > chores but will do soon, thanks!
> > > 
> > > I am not faulting you for taking the weekend off, actually.  ;-)
> > 
> > ;-) 
> > 
> > I tried doing RCU_LOCKDEP_WARN(preemptible(), ...) in this code path and I
> > don't get any splats. I also disassembled the code and it seems to me
> > RCU_LOCKDEP_WARN() becomes a NOOP which also the above reasoning confirms.
> 
> OK, very good.  Could you do the same thing for the RCU_LOCKDEP_WARN()
> in synchronize_rcu()?  Why or why not?
> 

Hi Paul,

Yes synchronize_rcu() can also make use of this technique since it is
strictly illegal to call synchronize_rcu() within a reader section.

I will add this to the set of my patches as well and send them all out next
week, along with the rcu-sync and bh clean ups we discussed.

thanks,

- Joel


^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-14 18:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190714181053.GB34501@google.com>

On Sun, Jul 14, 2019 at 02:10:53PM -0400, Joel Fernandes wrote:
> On Sat, Jul 13, 2019 at 02:28:12PM -0700, Paul E. McKenney wrote:
[snip]
> > > > > > > > > 
> > > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > ---
> > > > > > > > >  include/linux/rcu_sync.h | 4 +---
> > > > > > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > > > > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > > > > > > --- a/include/linux/rcu_sync.h
> > > > > > > > > +++ b/include/linux/rcu_sync.h
> > > > > > > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > > > > > > >   */
> > > > > > > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > > > > > > >  {
> > > > > > > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > > > > > > -			 !rcu_read_lock_bh_held() &&
> > > > > > > > > -			 !rcu_read_lock_sched_held(),
> > > > > > > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > > > > > > 
> > > > > > > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > > > > > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > > > > > > If you have not already done so, could you please give it a try?
> > > > > > > 
> > > > > > > Hi Paul,
> > > > > > > I don't think it will cause splats for !CONFIG_PREEMPT.
> > > > > > > 
> > > > > > > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > > > > > > !preemptible(). This means that:
> > > > > > > 
> > > > > > > The following expression above:
> > > > > > > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > > > > > > 
> > > > > > > Becomes:
> > > > > > > RCU_LOCKDEP_WARN(preemptible(), ...)
> > > > > > > 
> > > > > > > For, CONFIG_PREEMPT=n kernels, this means:
> > > > > > > RCU_LOCKDEP_WARN(0, ...)
> > > > > > > 
> > > > > > > Which would mean no splats. Or, did I miss the point?
> > > > > > 
> > > > > > I suggest trying it out on a CONFIG_PREEMPT=n kernel.
> > > > > 
> > > > > Sure, will do, sorry did not try it out yet because was busy with weekend
> > > > > chores but will do soon, thanks!
> > > > 
> > > > I am not faulting you for taking the weekend off, actually.  ;-)
> > > 
> > > ;-) 
> > > 
> > > I tried doing RCU_LOCKDEP_WARN(preemptible(), ...) in this code path and I
> > > don't get any splats. I also disassembled the code and it seems to me
> > > RCU_LOCKDEP_WARN() becomes a NOOP which also the above reasoning confirms.
> > 
> > OK, very good.  Could you do the same thing for the RCU_LOCKDEP_WARN()
> > in synchronize_rcu()?  Why or why not?
> > 
> 
> Hi Paul,
> 
> Yes synchronize_rcu() can also make use of this technique since it is
> strictly illegal to call synchronize_rcu() within a reader section.
> 
> I will add this to the set of my patches as well and send them all out next
> week, along with the rcu-sync and bh clean ups we discussed.

After sending this email, it occurs to me it wont work in synchronize_rcu()
for !CONFIG_PREEMPT kernels. This is because in a !CONFIG_PREEMPT kernel,
executing in kernel mode itself looks like being in an RCU reader. So we
should leave that as is. However it will work fine for rcu_sync_is_idle (for
CONFIG_PREEMPT=n kernels) as I mentioned earlier.

Were trying to throw me a Quick-Quiz ? ;-) In that case, hope I passed!

thanks,

 - Joel


^ permalink raw reply

* Re: INFO: rcu detected stall in ext4_write_checks
From: Paul E. McKenney @ 2019-07-14 18:49 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Theodore Ts'o, syzbot, Andreas Dilger, David Miller, eladr,
	Ido Schimmel, Jiri Pirko, John Stultz, linux-ext4, LKML, netdev,
	syzkaller-bugs, Thomas Gleixner, Peter Zijlstra, Ingo Molnar
In-Reply-To: <CACT4Y+asYe-uH9OV5R0Nkb-JKP4erYUZ68S9gYNnGg6v+fD20w@mail.gmail.com>

On Sun, Jul 14, 2019 at 05:48:00PM +0300, Dmitry Vyukov wrote:
> On Sun, Jul 7, 2019 at 4:17 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> > > > > I suppose RCU could take the dueling-banjos approach and use increasingly
> > > > > aggressive scheduler policies itself, up to and including SCHED_DEADLINE,
> > > > > until it started getting decent forward progress.  However, that
> > > > > sounds like the something that just might have unintended consequences,
> > > > > particularly if other kernel subsystems were to also play similar
> > > > > games of dueling banjos.
> > > >
> > > > So long as the RCU threads are well-behaved, using SCHED_DEADLINE
> > > > shouldn't have much of an impact on the system --- and the scheduling
> > > > parameters that you can specify on SCHED_DEADLINE allows you to
> > > > specify the worst-case impact on the system while also guaranteeing
> > > > that the SCHED_DEADLINE tasks will urn in the first place.  After all,
> > > > that's the whole point of SCHED_DEADLINE.
> > > >
> > > > So I wonder if the right approach is during the the first userspace
> > > > system call to shced_setattr to enable a (any) real-time priority
> > > > scheduler (SCHED_DEADLINE, SCHED_FIFO or SCHED_RR) on a userspace
> > > > thread, before that's allowed to proceed, the RCU kernel threads are
> > > > promoted to be SCHED_DEADLINE with appropriately set deadline
> > > > parameters.  That way, a root user won't be able to shoot the system
> > > > in the foot, and since the vast majority of the time, there shouldn't
> > > > be any processes running with real-time priorities, we won't be
> > > > changing the behavior of a normal server system.
> > >
> > > It might well be.  However, running the RCU kthreads at real-time
> > > priority does not come for free.  For example, it tends to crank up the
> > > context-switch rate.
> > >
> > > Plus I have taken several runs at computing SCHED_DEADLINE parameters,
> > > but things like the rcuo callback-offload threads have computational
> > > requirements that are controlled not by RCU, and not just by the rest of
> > > the kernel, but also by userspace (keeping in mind the example of opening
> > > and closing a file in a tight loop, each pass of which queues a callback).
> > > I suspect that RCU is not the only kernel subsystem whose computational
> > > requirements are set not by the subsystem, but rather by external code.
> > >
> > > OK, OK, I suppose I could just set insanely large SCHED_DEADLINE
> > > parameters, following syzkaller's example, and then trust my ability to
> > > keep the RCU code from abusing the resulting awesome power.  But wouldn't
> > > a much nicer approach be to put SCHED_DEADLINE between SCHED_RR/SCHED_FIFO
> > > priorities 98 and 99 or some such?  Then the same (admittedly somewhat
> > > scary) result could be obtained much more simply via SCHED_FIFO or
> > > SCHED_RR priority 99.
> > >
> > > Some might argue that this is one of those situations where simplicity
> > > is not necessarily an advantage, but then again, you can find someone
> > > who will complain about almost anything.  ;-)
> > >
> > > > (I suspect there might be some audio applications that might try to
> > > > set real-time priorities, but for desktop systems, it's probably more
> > > > important that the system not tie its self into knots since the
> > > > average desktop user isn't going to be well equipped to debug the
> > > > problem.)
> > >
> > > Not only that, but if core counts continue to increase, and if reliance
> > > on cloud computing continues to grow, there are going to be an increasing
> > > variety of mixed workloads in increasingly less-controlled environments.
> > >
> > > So, yes, it would be good to solve this problem in some reasonable way.
> > >
> > > I don't see this as urgent just yet, but I am sure you all will let
> > > me know if I am mistaken on that point.
> > >
> > > > > Alternatively, is it possible to provide stricter admission control?
> > > >
> > > > I think that's an orthogonal issue; better admission control would be
> > > > nice, but it looks to me that it's going to be fundamentally an issue
> > > > of tweaking hueristics, and a fool-proof solution that will protect
> > > > against all malicious userspace applications (including syzkaller) is
> > > > going to require solving the halting problem.  So while it would be
> > > > nice to improve the admission control, I don't think that's a going to
> > > > be a general solution.
> > >
> > > Agreed, and my earlier point about the need to trust the coding abilities
> > > of those writing ultimate-priority code is all too consistent with your
> > > point about needing to solve the halting problem.  Nevertheless,  I believe
> > > that we could make something that worked reasonably well in practice.
> > >
> > > Here are a few components of a possible solution, in practice, but
> > > of course not in theory:
> > >
> > > 1.    We set limits to SCHED_DEADLINE parameters, perhaps novel ones.
> > >       For one example, insist on (say) 10 milliseconds of idle time
> > >       every second on each CPU.  Yes, you can configure beyond that
> > >       given sufficient permissions, but if you do so, you just voided
> > >       your warranty.
> > >
> > > 2.    Only allow SCHED_DEADLINE on nohz_full CPUs.  (Partial solution,
> > >       given that such a CPU might be running in the kernel or have
> > >       more than one runnable task.  Just for fun, I will suggest the
> > >       option of disabling SCHED_DEADLINE during such times.)
> > >
> > > 3.    RCU detects slowdowns, and does something TBD to increase its
> > >       priority, but only while the slowdown persists.  This likely
> > >       relies on scheduling-clock interrupts to detect the slowdowns,
> > >       so there might be additional challenges on a fully nohz_full
> > >       system.
> >
> > 4.      SCHED_DEADLINE treats the other three scheduling classes as each
> >         having a period, deadline, and a modest CPU consumption budget
> >         for the members of the class in aggregate.  But this has to have
> >         been discussed before.  How did that go?
> >
> > > 5.    Your idea here.
> 
> Trying to digest this thread.
> 
> Do I understand correctly that setting rcutree.kthread_prio=99 won't
> help because the deadline priority is higher?
> And there are no other existing mechanisms to either fix the stalls
> nor make kernel reject the non well-behaving parameters? Kernel tries
> to filter out non well-behaving parameters, but the check detects only
> obvious misconfigurations, right?
> This reminds of priority inversion/inheritance problem. I wonder if
> there are other kernel subsystems that suffer from the same problem.
> E.g. the background kernel thread that destroys net namespaces and any
> other type of async work. A high prio user process can overload the
> queue and make kernel eat all memory. May be relatively easy to do
> even unintentionally. I suspect the problem is not specific to rcu and
> plumbing just rcu may just make the next problem pop up.
> Should user be able to starve basic kernel services? User should be
> able to prioritize across user processes (potentially in radical
> ways), but perhaps it should not be able to badly starve kernel
> functions that just happened to be asynchronous? I guess it's not as
> simple as setting the highest prio for all kernel threads because in
> normal case we want to reduce latency of user work by making the work
> async. But user must not be able to starve kernel threads
> infinitely... sounds like something similar to the deadline scheduling
> -- kernel threads need to get at least some time slice per unit of
> time.

As I understand it, there is provision for giving other threads slack
time even in SCHED_DEADLINE, but the timing of that slack time depends
on the other tasks' SCHED_DEADLINE settings.  And RCU's kthreads do
need some response time: Optimally a few milliseconds, preferably about
a hundred milliseconds, but definitely a second.  With the huge cycle
time specified, RCU might not get that.

And yes, I suspect that RCU is not the only thing in the system needing
a little CPU time fairly frequently, for some ill-defined notion of
"fairly frequently".

> But short term I don't see any other solution than stop testing
> sched_setattr because it does not check arguments enough to prevent
> system misbehavior. Which is a pity because syzkaller has found some
> bad misconfigurations that were oversight on checking side.
> Any other suggestions?

Keep the times down to a few seconds?  Of course, that might also
fail to find interesting bugs.

							Thanx, Paul


^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-14 18:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190714183820.GD34501@google.com>

On Sun, Jul 14, 2019 at 02:38:20PM -0400, Joel Fernandes wrote:
> On Sun, Jul 14, 2019 at 02:10:53PM -0400, Joel Fernandes wrote:
> > On Sat, Jul 13, 2019 at 02:28:12PM -0700, Paul E. McKenney wrote:
> [snip]
> > > > > > > > > > 
> > > > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > > ---
> > > > > > > > > >  include/linux/rcu_sync.h | 4 +---
> > > > > > > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > > > > > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > > > > > > > --- a/include/linux/rcu_sync.h
> > > > > > > > > > +++ b/include/linux/rcu_sync.h
> > > > > > > > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > > > > > > > >   */
> > > > > > > > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > > > > > > > >  {
> > > > > > > > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > > > > > > > -			 !rcu_read_lock_bh_held() &&
> > > > > > > > > > -			 !rcu_read_lock_sched_held(),
> > > > > > > > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > > > > > > > 
> > > > > > > > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > > > > > > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > > > > > > > If you have not already done so, could you please give it a try?
> > > > > > > > 
> > > > > > > > Hi Paul,
> > > > > > > > I don't think it will cause splats for !CONFIG_PREEMPT.
> > > > > > > > 
> > > > > > > > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > > > > > > > !preemptible(). This means that:
> > > > > > > > 
> > > > > > > > The following expression above:
> > > > > > > > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > > > > > > > 
> > > > > > > > Becomes:
> > > > > > > > RCU_LOCKDEP_WARN(preemptible(), ...)
> > > > > > > > 
> > > > > > > > For, CONFIG_PREEMPT=n kernels, this means:
> > > > > > > > RCU_LOCKDEP_WARN(0, ...)
> > > > > > > > 
> > > > > > > > Which would mean no splats. Or, did I miss the point?
> > > > > > > 
> > > > > > > I suggest trying it out on a CONFIG_PREEMPT=n kernel.
> > > > > > 
> > > > > > Sure, will do, sorry did not try it out yet because was busy with weekend
> > > > > > chores but will do soon, thanks!
> > > > > 
> > > > > I am not faulting you for taking the weekend off, actually.  ;-)
> > > > 
> > > > ;-) 
> > > > 
> > > > I tried doing RCU_LOCKDEP_WARN(preemptible(), ...) in this code path and I
> > > > don't get any splats. I also disassembled the code and it seems to me
> > > > RCU_LOCKDEP_WARN() becomes a NOOP which also the above reasoning confirms.
> > > 
> > > OK, very good.  Could you do the same thing for the RCU_LOCKDEP_WARN()
> > > in synchronize_rcu()?  Why or why not?
> > > 
> > 
> > Hi Paul,
> > 
> > Yes synchronize_rcu() can also make use of this technique since it is
> > strictly illegal to call synchronize_rcu() within a reader section.
> > 
> > I will add this to the set of my patches as well and send them all out next
> > week, along with the rcu-sync and bh clean ups we discussed.
> 
> After sending this email, it occurs to me it wont work in synchronize_rcu()
> for !CONFIG_PREEMPT kernels. This is because in a !CONFIG_PREEMPT kernel,
> executing in kernel mode itself looks like being in an RCU reader. So we
> should leave that as is. However it will work fine for rcu_sync_is_idle (for
> CONFIG_PREEMPT=n kernels) as I mentioned earlier.
> 
> Were trying to throw me a Quick-Quiz ? ;-) In that case, hope I passed!

You did pass.  This time.  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: INFO: rcu detected stall in ext4_write_checks
From: Theodore Ts'o @ 2019-07-14 19:05 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Paul E. McKenney, syzbot, Andreas Dilger, David Miller, eladr,
	Ido Schimmel, Jiri Pirko, John Stultz, linux-ext4, LKML, netdev,
	syzkaller-bugs, Thomas Gleixner, Peter Zijlstra, Ingo Molnar
In-Reply-To: <CACT4Y+asYe-uH9OV5R0Nkb-JKP4erYUZ68S9gYNnGg6v+fD20w@mail.gmail.com>

On Sun, Jul 14, 2019 at 05:48:00PM +0300, Dmitry Vyukov wrote:
> But short term I don't see any other solution than stop testing
> sched_setattr because it does not check arguments enough to prevent
> system misbehavior. Which is a pity because syzkaller has found some
> bad misconfigurations that were oversight on checking side.
> Any other suggestions?

Or maybe syzkaller can put its own limitations on what parameters are
sent to sched_setattr?  In practice, there are any number of ways a
root user can shoot themselves in the foot when using sched_setattr or
sched_setaffinity, for that matter.  I imagine there must be some such
constraints already --- or else syzkaller might have set a kernel
thread to run with priority SCHED_BATCH, with similar catastrophic
effects --- or do similar configurations to make system threads
completely unschedulable.

Real time administrators who know what they are doing --- and who know
that their real-time threads are well behaved --- will always want to
be able to do things that will be catastrophic if the real-time thread
is *not* well behaved.  I don't it is possible to add safety checks
which would allow the kernel to automatically detect and reject unsafe
configurations.

An apt analogy might be civilian versus military aircraft.  Most
airplanes are designed to be "inherently stable"; that way, modulo
buggy/insane control systems like on the 737 Max, the airplane will
automatically return to straight and level flight.  On the other hand,
some military planes (for example, the F-16, F-22, F-36, the
Eurofighter, etc.) are sometimes designed to be unstable, since that
way they can be more maneuverable.

There are use cases for real-time Linux where this flexibility/power
vs. stability tradeoff is going to argue for giving root the
flexibility to crash the system.  Some of these systems might
literally involve using real-time Linux in military applications,
something for which Paul and I have had some experience.  :-)

Speaking of sched_setaffinity, one thing which we can do is have
syzkaller move all of the system threads to they run on the "system
CPU's", and then move the syzkaller processes which are testing the
kernel to be on the "system under test CPU's".  Then regardless of
what priority the syzkaller test programs try to run themselves at,
they can't crash the system.

Some real-time systems do actually run this way, and it's a
recommended configuration which is much safer than letting the
real-time threads take over the whole system:

http://linuxrealtime.org/index.php/Improving_the_Real-Time_Properties#Isolating_the_Application

					- Ted

^ permalink raw reply

* [PATCH] mm/gup: Use put_user_page*() instead of put_page*()
From: Bharath Vedartham @ 2019-07-14 19:08 UTC (permalink / raw)
  To: akpm, ira.weiny, jhubbard
  Cc: Bharath Vedartham, Mauro Carvalho Chehab, Dimitri Sivanich,
	Arnd Bergmann, Greg Kroah-Hartman, Alex Williamson, Cornelia Huck,
	Jens Axboe, Alexander Viro, Björn Töpel,
	Magnus Karlsson, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Enrico Weigelt, Thomas Gleixner, Alexios Zavras,
	Dan Carpenter, Max Filippov, Matt Sickler, Kirill A. Shutemov,
	Keith Busch, YueHaibing, linux-media, linux-kernel, devel, kvm,
	linux-block, linux-fsdevel, linux-mm, netdev, bpf, xdp-newbies

This patch converts all call sites of get_user_pages
to use put_user_page*() instead of put_page*() functions to
release reference to gup pinned pages.

This is a bunch of trivial conversions which is a part of an effort
by John Hubbard to solve issues with gup pinned pages and 
filesystem writeback.

The issue is more clearly described in John Hubbard's patch[1] where
put_user_page*() functions are introduced.

Currently put_user_page*() simply does put_page but future implementations
look to change that once treewide change of put_page callsites to 
put_user_page*() is finished.

The lwn article describing the issue with gup pinned pages and filesystem 
writeback [2].

This patch has been tested by building and booting the kernel as I don't
have the required hardware to test the device drivers.

I did not modify gpu/drm drivers which use release_pages instead of
put_page() to release reference of gup pinned pages as I am not clear
whether release_pages and put_page are interchangable. 

[1] https://lkml.org/lkml/2019/3/26/1396

[2] https://lwn.net/Articles/784574/

Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
---
 drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +--
 drivers/misc/sgi-gru/grufault.c           | 2 +-
 drivers/staging/kpc2000/kpc_dma/fileops.c | 4 +---
 drivers/vfio/vfio_iommu_type1.c           | 2 +-
 fs/io_uring.c                             | 7 +++----
 mm/gup_benchmark.c                        | 6 +-----
 net/xdp/xdp_umem.c                        | 7 +------
 7 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 66a6c6c..d6eeb43 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,7 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
 	BUG_ON(dma->sglen);
 
 	if (dma->pages) {
-		for (i = 0; i < dma->nr_pages; i++)
-			put_page(dma->pages[i]);
+		put_user_pages(dma->pages, dma->nr_pages);
 		kfree(dma->pages);
 		dma->pages = NULL;
 	}
diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..61b3447 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
 	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
 		return -EFAULT;
 	*paddr = page_to_phys(page);
-	put_page(page);
+	put_user_page(page);
 	return 0;
 }
 
diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 6166587..26dceed 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
 	sg_free_table(&acd->sgt);
  err_dma_map_sg:
  err_alloc_sg_table:
-	for (i = 0 ; i < acd->page_count ; i++){
-		put_page(acd->user_pages[i]);
-	}
+	put_user_pages(acd->user_pages, acd->page_count);
  err_get_user_pages:
 	kfree(acd->user_pages);
  err_alloc_userpages:
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index add34ad..c491524 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -369,7 +369,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 		 */
 		if (ret > 0 && vma_is_fsdax(vmas[0])) {
 			ret = -EOPNOTSUPP;
-			put_page(page[0]);
+			put_user_page(page[0]);
 		}
 	}
 	up_read(&mm->mmap_sem);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4ef62a4..b4a4549 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2694,10 +2694,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 			 * if we did partial map, or found file backed vmas,
 			 * release any pages we did get
 			 */
-			if (pret > 0) {
-				for (j = 0; j < pret; j++)
-					put_page(pages[j]);
-			}
+			if (pret > 0)
+				put_user_pages(pages, pret);
+
 			if (ctx->account_mem)
 				io_unaccount_mem(ctx->user, nr_pages);
 			kvfree(imu->bvec);
diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 7dd602d..15fc7a2 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -76,11 +76,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 	gup->size = addr - gup->addr;
 
 	start_time = ktime_get();
-	for (i = 0; i < nr_pages; i++) {
-		if (!pages[i])
-			break;
-		put_page(pages[i]);
-	}
+	put_user_pages(pages, nr_pages);
 	end_time = ktime_get();
 	gup->put_delta_usec = ktime_us_delta(end_time, start_time);
 
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 9c6de4f..6103e19 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -173,12 +173,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
 	unsigned int i;
 
-	for (i = 0; i < umem->npgs; i++) {
-		struct page *page = umem->pgs[i];
-
-		set_page_dirty_lock(page);
-		put_page(page);
-	}
+	put_user_pages_dirty_lock(umem->pgs, umem->npgs);
 
 	kfree(umem->pgs);
 	umem->pgs = NULL;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH] gve: Remove the exporting of gve_probe
From: David Miller @ 2019-07-14 19:13 UTC (permalink / raw)
  To: efremov; +Cc: csully, sagis, jonolson, netdev, linux-kernel
In-Reply-To: <20190714120225.15279-1-efremov@linux.com>

From: Denis Efremov <efremov@linux.com>
Date: Sun, 14 Jul 2019 15:02:25 +0300

> The function gve_probe is declared static and marked EXPORT_SYMBOL, which
> is at best an odd combination. Because the function is not used outside of
> the drivers/net/ethernet/google/gve/gve_main.c file it is defined in, this
> commit removes the EXPORT_SYMBOL() marking.
> 
> Signed-off-by: Denis Efremov <efremov@linux.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] sit: use dst_cache in ipip6_tunnel_xmit
From: David Miller @ 2019-07-14 19:15 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <1563111082-10721-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sun, 14 Jul 2019 21:31:22 +0800

> Same as other ip tunnel, use dst_cache in xmit action to avoid
> unnecessary fib lookups.
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Looks good, applied, thanks.

^ permalink raw reply

* Re: [PATCH net v2] net: neigh: fix multiple neigh timer scheduling
From: David Miller @ 2019-07-14 19:15 UTC (permalink / raw)
  To: dsahern; +Cc: lorenzo.bianconi, netdev, marek
In-Reply-To: <26f58e35-f1f8-9543-819f-ef7f52da1e49@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Sun, 14 Jul 2019 07:58:19 -0600

> On 7/14/19 2:45 AM, Lorenzo Bianconi wrote:
>> @@ -1124,7 +1125,9 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
>>  
>>  			atomic_set(&neigh->probes,
>>  				   NEIGH_VAR(neigh->parms, UCAST_PROBES));
>> -			neigh->nud_state     = NUD_INCOMPLETE;
>> +			if (check_timer)
>> +				neigh_del_timer(neigh);
> 
> Why not just always call neigh_del_timer and avoid the check_timer flag?
> Let the NUD_IN_TIMER flag handle whether anything needs to be done.

Agreed.

^ permalink raw reply

* Re: [PATCH] sis900: correct a few typos
From: David Miller @ 2019-07-14 19:22 UTC (permalink / raw)
  To: sergej.benilov; +Cc: venza, netdev, andrew
In-Reply-To: <20190714165627.32765-1-sergej.benilov@googlemail.com>

From: Sergej Benilov <sergej.benilov@googlemail.com>
Date: Sun, 14 Jul 2019 18:56:27 +0200

> Correct a few typos in comments and debug text.
> 
> Signed-off-by: Sergej Benilov <sergej.benilov@googlemail.com>

Applied.

^ permalink raw reply

* Re: INFO: rcu detected stall in ext4_write_checks
From: Paul E. McKenney @ 2019-07-14 19:29 UTC (permalink / raw)
  To: Theodore Ts'o, Dmitry Vyukov, syzbot, Andreas Dilger,
	David Miller, eladr, Ido Schimmel, Jiri Pirko, John Stultz,
	linux-ext4, LKML, netdev, syzkaller-bugs, Thomas Gleixner,
	Peter Zijlstra, Ingo Molnar
In-Reply-To: <20190714190522.GA24049@mit.edu>

On Sun, Jul 14, 2019 at 03:05:22PM -0400, Theodore Ts'o wrote:
> On Sun, Jul 14, 2019 at 05:48:00PM +0300, Dmitry Vyukov wrote:
> > But short term I don't see any other solution than stop testing
> > sched_setattr because it does not check arguments enough to prevent
> > system misbehavior. Which is a pity because syzkaller has found some
> > bad misconfigurations that were oversight on checking side.
> > Any other suggestions?
> 
> Or maybe syzkaller can put its own limitations on what parameters are
> sent to sched_setattr?  In practice, there are any number of ways a
> root user can shoot themselves in the foot when using sched_setattr or
> sched_setaffinity, for that matter.  I imagine there must be some such
> constraints already --- or else syzkaller might have set a kernel
> thread to run with priority SCHED_BATCH, with similar catastrophic
> effects --- or do similar configurations to make system threads
> completely unschedulable.
> 
> Real time administrators who know what they are doing --- and who know
> that their real-time threads are well behaved --- will always want to
> be able to do things that will be catastrophic if the real-time thread
> is *not* well behaved.  I don't it is possible to add safety checks
> which would allow the kernel to automatically detect and reject unsafe
> configurations.
> 
> An apt analogy might be civilian versus military aircraft.  Most
> airplanes are designed to be "inherently stable"; that way, modulo
> buggy/insane control systems like on the 737 Max, the airplane will
> automatically return to straight and level flight.  On the other hand,
> some military planes (for example, the F-16, F-22, F-36, the
> Eurofighter, etc.) are sometimes designed to be unstable, since that
> way they can be more maneuverable.
> 
> There are use cases for real-time Linux where this flexibility/power
> vs. stability tradeoff is going to argue for giving root the
> flexibility to crash the system.  Some of these systems might
> literally involve using real-time Linux in military applications,
> something for which Paul and I have had some experience.  :-)
> 
> Speaking of sched_setaffinity, one thing which we can do is have
> syzkaller move all of the system threads to they run on the "system
> CPU's", and then move the syzkaller processes which are testing the
> kernel to be on the "system under test CPU's".  Then regardless of
> what priority the syzkaller test programs try to run themselves at,
> they can't crash the system.
> 
> Some real-time systems do actually run this way, and it's a
> recommended configuration which is much safer than letting the
> real-time threads take over the whole system:
> 
> http://linuxrealtime.org/index.php/Improving_the_Real-Time_Properties#Isolating_the_Application

Good point!  We might still have issues with some per-CPU kthreads,
but perhaps use of nohz_full would help at least reduce these sorts
of problems.  (There could still be issues on CPUs with more than
one runnable threads.)

							Thanx, Paul


^ permalink raw reply

* Re: [PATCH net v2] net: neigh: fix multiple neigh timer scheduling
From: Lorenzo Bianconi @ 2019-07-14 20:04 UTC (permalink / raw)
  To: David Ahern; +Cc: davem, netdev, marek
In-Reply-To: <26f58e35-f1f8-9543-819f-ef7f52da1e49@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

> On 7/14/19 2:45 AM, Lorenzo Bianconi wrote:
> > @@ -1124,7 +1125,9 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
> >  
> >  			atomic_set(&neigh->probes,
> >  				   NEIGH_VAR(neigh->parms, UCAST_PROBES));
> > -			neigh->nud_state     = NUD_INCOMPLETE;
> > +			if (check_timer)
> > +				neigh_del_timer(neigh);
> 
> Why not just always call neigh_del_timer and avoid the check_timer flag?
> Let the NUD_IN_TIMER flag handle whether anything needs to be done.

ack, I have been too paranoid here. I will post a v3 fixing it.

Regards,
Lorenzo

> 
> > +			neigh->nud_state = NUD_INCOMPLETE;
> >  			neigh->updated = now;
> >  			next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME),
> >  					 HZ/2);
> > @@ -1140,6 +1143,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
> >  		}
> >  	} else if (neigh->nud_state & NUD_STALE) {
> >  		neigh_dbg(2, "neigh %p is delayed\n", neigh);
> > +		if (check_timer)
> > +			neigh_del_timer(neigh);
> >  		neigh->nud_state = NUD_DELAY;
> >  		neigh->updated = jiffies;
> >  		neigh_add_timer(neigh, jiffies +
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] sky2: Disable MSI on P5W DH Deluxe
From: David Miller @ 2019-07-14 20:46 UTC (permalink / raw)
  To: tasos; +Cc: netdev, linux-kernel, stephen, mlindner
In-Reply-To: <14c872c3-09ac-7f51-dc3d-e68319459fcf@tasossah.com>

From: Tasos Sahanidis <tasos@tasossah.com>
Date: Sun, 14 Jul 2019 13:31:11 +0300

> The onboard sky2 NICs send IRQs after S3, resulting in ethernet not
> working after resume.
> Maskable MSI and MSI-X are also not supported, so fall back to INTx.
> 
> Signed-off-by: Tasos Sahanidis <tasos@tasossah.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net v3] net: neigh: fix multiple neigh timer scheduling
From: Lorenzo Bianconi @ 2019-07-14 21:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, dsahern, marek

Neigh timer can be scheduled multiple times from userspace adding
multiple neigh entries and forcing the neigh timer scheduling passing
NTF_USE in the netlink requests.
This will result in a refcount leak and in the following dump stack:

[   32.465295] NEIGH: BUG, double timer add, state is 8
[   32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65
[   32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[   32.465313] Call Trace:
[   32.465318]  dump_stack+0x7c/0xc0
[   32.465323]  __neigh_event_send+0x20c/0x880
[   32.465326]  ? ___neigh_create+0x846/0xfb0
[   32.465329]  ? neigh_lookup+0x2a9/0x410
[   32.465332]  ? neightbl_fill_info.constprop.0+0x800/0x800
[   32.465334]  neigh_add+0x4f8/0x5e0
[   32.465337]  ? neigh_xmit+0x620/0x620
[   32.465341]  ? find_held_lock+0x85/0xa0
[   32.465345]  rtnetlink_rcv_msg+0x204/0x570
[   32.465348]  ? rtnl_dellink+0x450/0x450
[   32.465351]  ? mark_held_locks+0x90/0x90
[   32.465354]  ? match_held_lock+0x1b/0x230
[   32.465357]  netlink_rcv_skb+0xc4/0x1d0
[   32.465360]  ? rtnl_dellink+0x450/0x450
[   32.465363]  ? netlink_ack+0x420/0x420
[   32.465366]  ? netlink_deliver_tap+0x115/0x560
[   32.465369]  ? __alloc_skb+0xc9/0x2f0
[   32.465372]  netlink_unicast+0x270/0x330
[   32.465375]  ? netlink_attachskb+0x2f0/0x2f0
[   32.465378]  netlink_sendmsg+0x34f/0x5a0
[   32.465381]  ? netlink_unicast+0x330/0x330
[   32.465385]  ? move_addr_to_kernel.part.0+0x20/0x20
[   32.465388]  ? netlink_unicast+0x330/0x330
[   32.465391]  sock_sendmsg+0x91/0xa0
[   32.465394]  ___sys_sendmsg+0x407/0x480
[   32.465397]  ? copy_msghdr_from_user+0x200/0x200
[   32.465401]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[   32.465404]  ? lockdep_hardirqs_on+0x17d/0x250
[   32.465407]  ? __wake_up_common_lock+0xcb/0x110
[   32.465410]  ? __wake_up_common+0x230/0x230
[   32.465413]  ? netlink_bind+0x3e1/0x490
[   32.465416]  ? netlink_setsockopt+0x540/0x540
[   32.465420]  ? __fget_light+0x9c/0xf0
[   32.465423]  ? sockfd_lookup_light+0x8c/0xb0
[   32.465426]  __sys_sendmsg+0xa5/0x110
[   32.465429]  ? __ia32_sys_shutdown+0x30/0x30
[   32.465432]  ? __fd_install+0xe1/0x2c0
[   32.465435]  ? lockdep_hardirqs_off+0xb5/0x100
[   32.465438]  ? mark_held_locks+0x24/0x90
[   32.465441]  ? do_syscall_64+0xf/0x270
[   32.465444]  do_syscall_64+0x63/0x270
[   32.465448]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
receiving a netlink request with NTF_USE flag set

Reported-by: Marek Majkowski <marek@cloudflare.com>
Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
Changes since v2:
- remove check_timer flag and run neigh_del_timer directly
Changes since v1:
- fix compilation errors defining neigh_event_send_check_timer routine
---
 net/core/neighbour.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 742cea4ce72e..0dfc97bc8760 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1124,6 +1124,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 
 			atomic_set(&neigh->probes,
 				   NEIGH_VAR(neigh->parms, UCAST_PROBES));
+			neigh_del_timer(neigh);
 			neigh->nud_state     = NUD_INCOMPLETE;
 			neigh->updated = now;
 			next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME),
@@ -1140,6 +1141,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 		}
 	} else if (neigh->nud_state & NUD_STALE) {
 		neigh_dbg(2, "neigh %p is delayed\n", neigh);
+		neigh_del_timer(neigh);
 		neigh->nud_state = NUD_DELAY;
 		neigh->updated = jiffies;
 		neigh_add_timer(neigh, jiffies +
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] mm/gup: Use put_user_page*() instead of put_page*()
From: John Hubbard @ 2019-07-14 23:33 UTC (permalink / raw)
  To: Bharath Vedartham, akpm, ira.weiny
  Cc: Mauro Carvalho Chehab, Dimitri Sivanich, Arnd Bergmann,
	Greg Kroah-Hartman, Alex Williamson, Cornelia Huck, Jens Axboe,
	Alexander Viro, Björn Töpel, Magnus Karlsson,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Enrico Weigelt, Thomas Gleixner, Alexios Zavras, Dan Carpenter,
	Max Filippov, Matt Sickler, Kirill A. Shutemov, Keith Busch,
	YueHaibing, linux-media, linux-kernel, devel, kvm, linux-block,
	linux-fsdevel, linux-mm, netdev, bpf, xdp-newbies,
	Jason Gunthorpe
In-Reply-To: <1563131456-11488-1-git-send-email-linux.bhar@gmail.com>

On 7/14/19 12:08 PM, Bharath Vedartham wrote:
> This patch converts all call sites of get_user_pages
> to use put_user_page*() instead of put_page*() functions to
> release reference to gup pinned pages.

Hi Bharath,

Thanks for jumping in to help, and welcome to the party!

You've caught everyone in the middle of a merge window, btw.  As a
result, I'm busy rebasing and reworking the get_user_pages call sites, 
and gup tracking, in the wake of some semi-traumatic changes to bio 
and gup and such. I plan to re-post right after 5.3-rc1 shows up, from 
here:

    https://github.com/johnhubbard/linux/commits/gup_dma_core

...which you'll find already covers the changes you've posted, except for:

    drivers/misc/sgi-gru/grufault.c
    drivers/staging/kpc2000/kpc_dma/fileops.c

...and this one, which is undergoing to larger local changes, due to
bvec, so let's leave it out of the choices:

    fs/io_uring.c

Therefore, until -rc1, if you'd like to help, I'd recommend one or more
of the following ideas:

1. Pull down https://github.com/johnhubbard/linux/commits/gup_dma_core
and find missing conversions: look for any additional missing 
get_user_pages/put_page conversions. You've already found a couple missing 
ones. I haven't re-run a search in a long time, so there's probably even more.

	a) And find more, after I rebase to 5.3-rc1: people probably are adding
	get_user_pages() calls as we speak. :)

2. Patches: Focus on just one subsystem at a time, and perfect the patch for
it. For example, I think this the staging driver would be perfect to start with:

    drivers/staging/kpc2000/kpc_dma/fileops.c

	a) verify that you've really, corrected converted the whole
	driver. (Hint: I think you might be overlooking a put_page call.)

	b) Attempt to test it if you can (I'm being hypocritical in
	the extreme here, but one of my problems is that testing
	has been light, so any help is very valuable). qemu...?
	OTOH, maybe even qemu cannot easily test a kpc2000, but
	perhaps `git blame` and talking to the authors would help
	figure out a way to validate the changes.

	Thinking about whether you can run a test that would prove or
	disprove my claim in (a), above, could be useful in coming up
	with tests to run.

In other words, a few very high quality conversions (even just one) that
we can really put our faith in, is what I value most here. Tested patches
are awesome.

3. Once I re-post, turn on the new CONFIG_DEBUG_GET_USER_PAGES_REFERENCES
and run things such as xfstest/fstest. (Again, doing so would be going
further than I have yet--very helpful). Help clarify what conversions have
actually been tested and work, and which ones remain unvalidated.

Other: Please note that this:

    https://github.com/johnhubbard/linux/commits/gup_dma_core

    a) gets rebased often, and

    b) has a bunch of commits (iov_iter and related) that conflict
       with the latest linux.git,

    c) has some bugs in the bio area, that I'm fixing, so I don't trust
       that's it's safely runnable, for a few more days.

One note below, for the future:

> 
> This is a bunch of trivial conversions which is a part of an effort
> by John Hubbard to solve issues with gup pinned pages and 
> filesystem writeback.
> 
> The issue is more clearly described in John Hubbard's patch[1] where
> put_user_page*() functions are introduced.
> 
> Currently put_user_page*() simply does put_page but future implementations
> look to change that once treewide change of put_page callsites to 
> put_user_page*() is finished.
> 
> The lwn article describing the issue with gup pinned pages and filesystem 
> writeback [2].
> 
> This patch has been tested by building and booting the kernel as I don't
> have the required hardware to test the device drivers.
> 
> I did not modify gpu/drm drivers which use release_pages instead of
> put_page() to release reference of gup pinned pages as I am not clear
> whether release_pages and put_page are interchangable. 
> 
> [1] https://lkml.org/lkml/2019/3/26/1396

When referring to patches in a commit description, please use the 
commit hash, not an external link. See Submitting Patches [1] for details.

Also, once you figure out the right maintainers and other involved people,
putting Cc: in the commit description is common practice, too.

[1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html

thanks,
-- 
John Hubbard
NVIDIA

> 
> [2] https://lwn.net/Articles/784574/
> 
> Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> ---
>  drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +--
>  drivers/misc/sgi-gru/grufault.c           | 2 +-
>  drivers/staging/kpc2000/kpc_dma/fileops.c | 4 +---
>  drivers/vfio/vfio_iommu_type1.c           | 2 +-
>  fs/io_uring.c                             | 7 +++----
>  mm/gup_benchmark.c                        | 6 +-----
>  net/xdp/xdp_umem.c                        | 7 +------
>  7 files changed, 9 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index 66a6c6c..d6eeb43 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -349,8 +349,7 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
>  	BUG_ON(dma->sglen);
>  
>  	if (dma->pages) {
> -		for (i = 0; i < dma->nr_pages; i++)
> -			put_page(dma->pages[i]);
> +		put_user_pages(dma->pages, dma->nr_pages);
>  		kfree(dma->pages);
>  		dma->pages = NULL;
>  	}
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 4b713a8..61b3447 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
>  	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
>  		return -EFAULT;
>  	*paddr = page_to_phys(page);
> -	put_page(page);
> +	put_user_page(page);
>  	return 0;
>  }
>  
> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
> index 6166587..26dceed 100644
> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> @@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
>  	sg_free_table(&acd->sgt);
>   err_dma_map_sg:
>   err_alloc_sg_table:
> -	for (i = 0 ; i < acd->page_count ; i++){
> -		put_page(acd->user_pages[i]);
> -	}
> +	put_user_pages(acd->user_pages, acd->page_count);
>   err_get_user_pages:
>  	kfree(acd->user_pages);
>   err_alloc_userpages:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index add34ad..c491524 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -369,7 +369,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  		 */
>  		if (ret > 0 && vma_is_fsdax(vmas[0])) {
>  			ret = -EOPNOTSUPP;
> -			put_page(page[0]);
> +			put_user_page(page[0]);
>  		}
>  	}
>  	up_read(&mm->mmap_sem);
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 4ef62a4..b4a4549 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2694,10 +2694,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
>  			 * if we did partial map, or found file backed vmas,
>  			 * release any pages we did get
>  			 */
> -			if (pret > 0) {
> -				for (j = 0; j < pret; j++)
> -					put_page(pages[j]);
> -			}
> +			if (pret > 0)
> +				put_user_pages(pages, pret);
> +
>  			if (ctx->account_mem)
>  				io_unaccount_mem(ctx->user, nr_pages);
>  			kvfree(imu->bvec);
> diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> index 7dd602d..15fc7a2 100644
> --- a/mm/gup_benchmark.c
> +++ b/mm/gup_benchmark.c
> @@ -76,11 +76,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
>  	gup->size = addr - gup->addr;
>  
>  	start_time = ktime_get();
> -	for (i = 0; i < nr_pages; i++) {
> -		if (!pages[i])
> -			break;
> -		put_page(pages[i]);
> -	}
> +	put_user_pages(pages, nr_pages);
>  	end_time = ktime_get();
>  	gup->put_delta_usec = ktime_us_delta(end_time, start_time);
>  
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 9c6de4f..6103e19 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -173,12 +173,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>  {
>  	unsigned int i;
>  
> -	for (i = 0; i < umem->npgs; i++) {
> -		struct page *page = umem->pgs[i];
> -
> -		set_page_dirty_lock(page);
> -		put_page(page);
> -	}
> +	put_user_pages_dirty_lock(umem->pgs, umem->npgs);
>  
>  	kfree(umem->pgs);
>  	umem->pgs = NULL;
> 

^ permalink raw reply

* Re: [PATCH net-next 01/16] qlge: Remove irq_cnt
From: Benjamin Poirier @ 2019-07-15  1:40 UTC (permalink / raw)
  To: David Miller, Greg Kroah-Hartman; +Cc: Manish Chopra, GR-Linux-NIC-Dev, netdev
In-Reply-To: <20190617074858.32467-1-bpoirier@suse.com>

On 2019/06/17 16:48, Benjamin Poirier wrote:
> qlge uses an irq enable/disable refcounting scheme that is:
> * poorly implemented
> 	Uses a spin_lock to protect accesses to the irq_cnt atomic variable
> * buggy
> 	Breaks when there is not a 1:1 sequence of irq - napi_poll, such as
> 	when using SO_BUSY_POLL.
> * unnecessary
> 	The purpose or irq_cnt is to reduce irq control writes when
> 	multiple work items result from one irq: the irq is re-enabled
> 	after all work is done.
> 	Analysis of the irq handler shows that there is only one case where
> 	there might be two workers scheduled at once, and those have
> 	separate irq masking bits.
> 
> Therefore, remove irq_cnt.
> 
> Additionally, we get a performance improvement:
> perf stat -e cycles -a -r5 super_netperf 100 -H 192.168.33.1 -t TCP_RR
> 
> Before:
> 628560
> 628056
> 622103
> 622744
> 627202
> [...]
>    268,803,947,669      cycles                 ( +-  0.09% )
> 
> After:
> 636300
> 634106
> 634984
> 638555
> 634188
> [...]
>    259,237,291,449      cycles                 ( +-  0.19% )
> 
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

David, Greg,

Before I send v2 of this patchset, how about moving this driver out to
staging? The hardware has been declared EOL by the vendor but what's
more relevant to the Linux kernel is that the quality of this driver is
not on par with many other mainline drivers, IMO. Over in staging, the
code might benefit from the attention of interested parties (drive-by
fixers) or fade away into obscurity.

Now that I check, the code has >500 basic checkpatch issues. While
working on the rx processing code for the current patchset, I noticed
the following more structural issues:
* commit 7c734359d350 ("qlge: Size RX buffers based on MTU.",
  v2.6.33-rc1) introduced dead code in the receive routines, which
  should be rewritten anyways by the admission of the author himself
  (see the comment above ql_build_rx_skb())
* truesize accounting is incorrect (ex: a 9000B frame has truesize 10280
  while containing two frags of order-1 allocations, ie. >16K)
* in some cases the driver allocates skbs with head room but only puts
  data in the frags
* there is an inordinate amount of disparate debugging code, most of
  which is of questionable value. In particular, qlge_dbg.c has hundreds
  of lines of code bitrotting away in ifdef land (doesn't compile since
  commit 18c49b91777c ("qlge: do vlan cleanup", v3.1-rc1), 8 years ago).
* triggering an ethtool regdump will instead hexdump a 176k struct to
  dmesg depending on some module parameters
* many structures are defined full of holes, as we found in this
  thread already; are used inefficiently (struct rx_ring is used for rx
  and tx completions, with some members relevant to one case only); are
  initialized redundantly (ex. memset 0 after alloc_etherdev). The
  driver also has a habit of using runtime checks where compile time
  checks are possible.
* in terms of namespace, the driver uses either qlge_, ql_ (used by
  other qlogic drivers, with clashes, ex: ql_sem_spinlock) or nothing (with
  clashes, ex: struct ob_mac_iocb_req)

I'm guessing other parts of the driver have more issues of the same
nature. The driver also has many smaller issues like deprecated apis,
duplicate or useless comments, weird code structure (some "while" loops
that could be rewritten with simple "for", ex. ql_wait_reg_rdy()) and
weird line wrapping (all over, ex. the ql_set_routing_reg() calls in
qlge_set_multicast_list()).

I'm aware of some precedents for code moving from mainline to staging:
1bb8155080c6 ncpfs: move net/ncpfs to drivers/staging/ncpfs (v4.16-rc1)
6c391ff758eb irda: move drivers/net/irda to drivers/staging/irda/drivers
(v4.14-rc1)

What do you think is the best course of action in this case?

Thanks,
-Benjamin

^ permalink raw reply

* Re: [PATCH] mm/gup: Use put_user_page*() instead of put_page*()
From: Jens Axboe @ 2019-07-15  2:33 UTC (permalink / raw)
  To: Bharath Vedartham, akpm, ira.weiny, jhubbard
  Cc: Mauro Carvalho Chehab, Dimitri Sivanich, Arnd Bergmann,
	Greg Kroah-Hartman, Alex Williamson, Cornelia Huck,
	Alexander Viro, Björn Töpel, Magnus Karlsson,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Enrico Weigelt, Thomas Gleixner, Alexios Zavras, Dan Carpenter,
	Max Filippov, Matt Sickler, Kirill A. Shutemov, Keith Busch,
	YueHaibing, linux-media, linux-kernel, devel, kvm, linux-block,
	linux-fsdevel, linux-mm, netdev, bpf, xdp-newbies
In-Reply-To: <1563131456-11488-1-git-send-email-linux.bhar@gmail.com>

On 7/14/19 1:08 PM, Bharath Vedartham wrote:
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 4ef62a4..b4a4549 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2694,10 +2694,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
>   			 * if we did partial map, or found file backed vmas,
>   			 * release any pages we did get
>   			 */
> -			if (pret > 0) {
> -				for (j = 0; j < pret; j++)
> -					put_page(pages[j]);
> -			}
> +			if (pret > 0)
> +				put_user_pages(pages, pret);
> +
>   			if (ctx->account_mem)
>   				io_unaccount_mem(ctx->user, nr_pages);
>   			kvfree(imu->bvec);

You handled just the failure case of the buffer registration, but not
the actual free in io_sqe_buffer_unregister().

-- 
Jens Axboe


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox