Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Dmitry Vyukov @ 2018-11-01 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Trond Myklebust, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	"anna.schumaker@netapp.com
In-Reply-To: <20181101171846.GI3178@hirez.programming.kicks-ass.net>

On Thu, Nov 1, 2018 at 6:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > > > > My one question (and the reason why I went with cmpxchg() in the
>> > > > > > first place) would be about the overflow behaviour for
>> > > > > > atomic_fetch_inc() and friends. I believe those functions should
>> > > > > > be OK on x86, so that when we overflow the counter, it behaves
>> > > > > > like an unsigned value and wraps back around.  Is that the case
>> > > > > > for all architectures?
>> > > > > >
>> > > > > > i.e. are atomic_t/atomic64_t always guaranteed to behave like
>> > > > > > u32/u64 on increment?
>> > > > > >
>> > > > > > I could not find any documentation that explicitly stated that
>> > > > > > they should.
>> > > > >
>> > > > > Peter, Will, I understand that the atomic_t/atomic64_t ops are
>> > > > > required to wrap per 2's-complement. IIUC the refcount code relies
>> > > > > on this.
>> > > > >
>> > > > > Can you confirm?
>> > > >
>> > > > There is quite a bit of core code that hard assumes 2s-complement.
>> > > > Not only for atomics but for any signed integer type. Also see the
>> > > > kernel using -fno-strict-overflow which implies -fwrapv, which
>> > > > defines signed overflow to behave like 2s-complement (and rids us of
>> > > > that particular UB).
>> > >
>> > > Fair enough, but there have also been bugfixes to explicitly fix unsafe
>> > > C standards assumptions for signed integers. See, for instance commit
>> > > 5a581b367b5d "jiffies: Avoid undefined behavior from signed overflow"
>> > > from Paul McKenney.
>> >
>> > Yes, I feel Paul has been to too many C/C++ committee meetings and got
>> > properly paranoid. Which isn't always a bad thing :-)
>>
>> Even the C standard defines 2s complement for atomics.
>
> Ooh good to know.
>
>> Just not for
>> normal arithmetic, where yes, signed overflow is UB.  And yes, I do
>> know about -fwrapv, but I would like to avoid at least some copy-pasta
>> UB from my kernel code to who knows what user-mode environment.  :-/
>>
>> At least where it is reasonably easy to do so.
>
> Fair enough I suppose; I just always make sure to include the same
> -fknobs for the userspace thing when I lift code.
>
>> And there is a push to define C++ signed arithmetic as 2s complement,
>> but there are still 1s complement systems with C compilers.  Just not
>> C++ compilers.  Legacy...
>
> *groan*; how about those ancient hardwares keep using ancient compilers
> and we all move on to the 70s :-)
>
>> > But for us using -fno-strict-overflow which actually defines signed
>> > overflow, I myself am really not worried. I'm also not sure if KASAN has
>> > been taught about this, or if it will still (incorrectly) warn about UB
>> > for signed types.
>>
>> UBSAN gave me a signed-overflow warning a few days ago.  Which I have
>> fixed, even though 2s complement did the right thing.  I am also taking
>> advantage of the change to use better naming.
>
> Oh too many *SANs I suppose; and yes, if you can make the code better,
> why not.

If there is a warning that we don't want to see at all, then we can
disable it. It supposed to be a useful tool, rather than a thing in
itself that lives own life. We already I think removed 1 particularly
noisy warning and made another optional via a config.
But the thing with overflows is that, even if it's defined, it's not
necessary the intended behavior. For example, take allocation size
calculation done via unsigned size_t. If it overflows it does not help
if C defines result or not, it still gives a user controlled write
primitive. We've seen similar cases with timeout/deadline calculation
in kernel, we really don't want it to just wrap modulo-2, right. Some
user-space projects even test with unsigned overflow warnings or
implicit truncation warnings, which are formally legal, but frequently
bugs.

^ permalink raw reply

* [PATCH iproute2-next] rdma: Refresh help section of resource information
From: Leon Romanovsky @ 2018-11-01  8:35 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise

From: Leon Romanovsky <leonro@mellanox.com>

After commit 4060e4c0d257 ("rdma: Add PD resource tracking
information"), the resource information shows PDs and MRs,
but help pages didn't fully reflect it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 rdma/res.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/rdma/res.c b/rdma/res.c
index 0d8c1c38..cbb2efe6 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -16,13 +16,17 @@ static int res_help(struct rd *rd)
 {
 	pr_out("Usage: %s resource\n", rd->filename);
 	pr_out("          resource show [DEV]\n");
-	pr_out("          resource show [qp|cm_id]\n");
+	pr_out("          resource show [qp|cm_id|pd|mr|cq]\n");
 	pr_out("          resource show qp link [DEV/PORT]\n");
 	pr_out("          resource show qp link [DEV/PORT] [FILTER-NAME FILTER-VALUE]\n");
 	pr_out("          resource show cm_id link [DEV/PORT]\n");
 	pr_out("          resource show cm_id link [DEV/PORT] [FILTER-NAME FILTER-VALUE]\n");
 	pr_out("          resource show cq link [DEV/PORT]\n");
 	pr_out("          resource show cq link [DEV/PORT] [FILTER-NAME FILTER-VALUE]\n");
+	pr_out("          resource show pd dev [DEV]\n");
+	pr_out("          resource show pd dev [DEV] [FILTER-NAME FILTER-VALUE]\n");
+	pr_out("          resource show mr dev [DEV]\n");
+	pr_out("          resource show mr dev [DEV] [FILTER-NAME FILTER-VALUE]\n");
 	return 0;
 }

^ permalink raw reply related

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Peter Zijlstra @ 2018-11-01 17:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Trond Myklebust, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	anna.schumaker@netapp.com, "jhogan@kerne
In-Reply-To: <20181101171432.GH3178@hirez.programming.kicks-ass.net>

On Thu, Nov 01, 2018 at 06:14:32PM +0100, Peter Zijlstra wrote:
> > This reminds me of this sooooo silly patch :/
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=adb03115f4590baa280ddc440a8eff08a6be0cb7

You'd probably want to write it like so; +- some ordering stuff, that
code didn't look like it really needs the memory barriers implied by
these, but I didn't look too hard.

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c0a9d26c06ce..11deb1d7e96b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -485,16 +485,10 @@ u32 ip_idents_reserve(u32 hash, int segs)
 	u32 now = (u32)jiffies;
 	u32 new, delta = 0;
 
-	if (old != now && cmpxchg(p_tstamp, old, now) == old)
+	if (old != now && try_cmpxchg(p_tstamp, &old, now))
 		delta = prandom_u32_max(now - old);
 
-	/* Do not use atomic_add_return() as it makes UBSAN unhappy */
-	do {
-		old = (u32)atomic_read(p_id);
-		new = old + delta + segs;
-	} while (atomic_cmpxchg(p_id, old, new) != old);
-
-	return new - segs;
+	return atomic_fetch_add(segs + delta, p_id) + delta;
 }
 EXPORT_SYMBOL(ip_idents_reserve);
 

^ permalink raw reply related

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Peter Zijlstra @ 2018-11-01 17:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Trond Myklebust, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	anna.schumaker@netapp.com, "jhogan@kerne
In-Reply-To: <20181101170146.GQ4170@linux.ibm.com>

On Thu, Nov 01, 2018 at 10:01:46AM -0700, Paul E. McKenney wrote:
> On Thu, Nov 01, 2018 at 05:32:12PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 01, 2018 at 03:22:15PM +0000, Trond Myklebust wrote:
> > > On Thu, 2018-11-01 at 15:59 +0100, Peter Zijlstra wrote:
> > > > On Thu, Nov 01, 2018 at 01:18:46PM +0000, Mark Rutland wrote:
> > 
> > > > > > My one question (and the reason why I went with cmpxchg() in the
> > > > > > first place) would be about the overflow behaviour for
> > > > > > atomic_fetch_inc() and friends. I believe those functions should
> > > > > > be OK on x86, so that when we overflow the counter, it behaves
> > > > > > like an unsigned value and wraps back around.  Is that the case
> > > > > > for all architectures?
> > > > > > 
> > > > > > i.e. are atomic_t/atomic64_t always guaranteed to behave like
> > > > > > u32/u64 on increment?
> > > > > > 
> > > > > > I could not find any documentation that explicitly stated that
> > > > > > they should.
> > > > > 
> > > > > Peter, Will, I understand that the atomic_t/atomic64_t ops are
> > > > > required to wrap per 2's-complement. IIUC the refcount code relies
> > > > > on this.
> > > > > 
> > > > > Can you confirm?
> > > > 
> > > > There is quite a bit of core code that hard assumes 2s-complement.
> > > > Not only for atomics but for any signed integer type. Also see the
> > > > kernel using -fno-strict-overflow which implies -fwrapv, which
> > > > defines signed overflow to behave like 2s-complement (and rids us of
> > > > that particular UB).
> > > 
> > > Fair enough, but there have also been bugfixes to explicitly fix unsafe
> > > C standards assumptions for signed integers. See, for instance commit
> > > 5a581b367b5d "jiffies: Avoid undefined behavior from signed overflow"
> > > from Paul McKenney.
> > 
> > Yes, I feel Paul has been to too many C/C++ committee meetings and got
> > properly paranoid. Which isn't always a bad thing :-)
> 
> Even the C standard defines 2s complement for atomics.  

Ooh good to know.

> Just not for
> normal arithmetic, where yes, signed overflow is UB.  And yes, I do
> know about -fwrapv, but I would like to avoid at least some copy-pasta
> UB from my kernel code to who knows what user-mode environment.  :-/
> 
> At least where it is reasonably easy to do so.

Fair enough I suppose; I just always make sure to include the same
-fknobs for the userspace thing when I lift code.

> And there is a push to define C++ signed arithmetic as 2s complement,
> but there are still 1s complement systems with C compilers.  Just not
> C++ compilers.  Legacy...

*groan*; how about those ancient hardwares keep using ancient compilers
and we all move on to the 70s :-)

> > But for us using -fno-strict-overflow which actually defines signed
> > overflow, I myself am really not worried. I'm also not sure if KASAN has
> > been taught about this, or if it will still (incorrectly) warn about UB
> > for signed types.
> 
> UBSAN gave me a signed-overflow warning a few days ago.  Which I have
> fixed, even though 2s complement did the right thing.  I am also taking
> advantage of the change to use better naming.

Oh too many *SANs I suppose; and yes, if you can make the code better,
why not.

> > > Anyhow, if the atomic maintainers are willing to stand up and state for
> > > the record that the atomic counters are guaranteed to wrap modulo 2^n
> > > just like unsigned integers, then I'm happy to take Paul's patch.
> > 
> > I myself am certainly relying on it.
> 
> Color me confused.  My 5a581b367b5d is from 2013.  Or is "Paul" instead
> intended to mean Paul Mackerras, who happens to be on CC?

Paul Burton I think, on a part of the thread before we joined :-)

^ permalink raw reply

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Peter Zijlstra @ 2018-11-01 17:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Trond Myklebust, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	anna.schumaker@netapp.com, "jhogan@kerne
In-Reply-To: <b0160f4b-b996-b0ee-405a-3d5f1866272e@gmail.com>

On Thu, Nov 01, 2018 at 09:59:38AM -0700, Eric Dumazet wrote:
> On 11/01/2018 09:32 AM, Peter Zijlstra wrote:
> 
> >> Anyhow, if the atomic maintainers are willing to stand up and state for
> >> the record that the atomic counters are guaranteed to wrap modulo 2^n
> >> just like unsigned integers, then I'm happy to take Paul's patch.
> > 
> > I myself am certainly relying on it.
> 
> Could we get uatomic_t support maybe ?

Whatever for; it'd be the exact identical same functions as for
atomic_t, except for a giant amount of code duplication to deal with the
new type.

That is; today we merged a bunch of scripts that generates most of
atomic*_t, so we could probably script uatomic*_t wrappers with minimal
effort, but it would add several thousand lines of code to each compile
for absolutely no reason what so ever.

> This reminds me of this sooooo silly patch :/
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=adb03115f4590baa280ddc440a8eff08a6be0cb7

Yes, that's stupid. UBSAN is just wrong there.

^ permalink raw reply

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Paul E. McKenney @ 2018-11-01 17:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Trond Myklebust, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	anna.schumaker@netapp.com, "jhogan@kerne
In-Reply-To: <20181101163212.GF3159@hirez.programming.kicks-ass.net>

On Thu, Nov 01, 2018 at 05:32:12PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 01, 2018 at 03:22:15PM +0000, Trond Myklebust wrote:
> > On Thu, 2018-11-01 at 15:59 +0100, Peter Zijlstra wrote:
> > > On Thu, Nov 01, 2018 at 01:18:46PM +0000, Mark Rutland wrote:
> 
> > > > > My one question (and the reason why I went with cmpxchg() in the
> > > > > first place) would be about the overflow behaviour for
> > > > > atomic_fetch_inc() and friends. I believe those functions should
> > > > > be OK on x86, so that when we overflow the counter, it behaves
> > > > > like an unsigned value and wraps back around.  Is that the case
> > > > > for all architectures?
> > > > > 
> > > > > i.e. are atomic_t/atomic64_t always guaranteed to behave like
> > > > > u32/u64 on increment?
> > > > > 
> > > > > I could not find any documentation that explicitly stated that
> > > > > they should.
> > > > 
> > > > Peter, Will, I understand that the atomic_t/atomic64_t ops are
> > > > required to wrap per 2's-complement. IIUC the refcount code relies
> > > > on this.
> > > > 
> > > > Can you confirm?
> > > 
> > > There is quite a bit of core code that hard assumes 2s-complement.
> > > Not only for atomics but for any signed integer type. Also see the
> > > kernel using -fno-strict-overflow which implies -fwrapv, which
> > > defines signed overflow to behave like 2s-complement (and rids us of
> > > that particular UB).
> > 
> > Fair enough, but there have also been bugfixes to explicitly fix unsafe
> > C standards assumptions for signed integers. See, for instance commit
> > 5a581b367b5d "jiffies: Avoid undefined behavior from signed overflow"
> > from Paul McKenney.
> 
> Yes, I feel Paul has been to too many C/C++ committee meetings and got
> properly paranoid. Which isn't always a bad thing :-)

Even the C standard defines 2s complement for atomics.  Just not for
normal arithmetic, where yes, signed overflow is UB.  And yes, I do
know about -fwrapv, but I would like to avoid at least some copy-pasta
UB from my kernel code to who knows what user-mode environment.  :-/

At least where it is reasonably easy to do so.

And there is a push to define C++ signed arithmetic as 2s complement,
but there are still 1s complement systems with C compilers.  Just not
C++ compilers.  Legacy...

> But for us using -fno-strict-overflow which actually defines signed
> overflow, I myself am really not worried. I'm also not sure if KASAN has
> been taught about this, or if it will still (incorrectly) warn about UB
> for signed types.

UBSAN gave me a signed-overflow warning a few days ago.  Which I have
fixed, even though 2s complement did the right thing.  I am also taking
advantage of the change to use better naming.

> > Anyhow, if the atomic maintainers are willing to stand up and state for
> > the record that the atomic counters are guaranteed to wrap modulo 2^n
> > just like unsigned integers, then I'm happy to take Paul's patch.
> 
> I myself am certainly relying on it.

Color me confused.  My 5a581b367b5d is from 2013.  Or is "Paul" instead
intended to mean Paul Mackerras, who happens to be on CC?

							Thanx, Paul

^ permalink raw reply

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Eric Dumazet @ 2018-11-01 16:59 UTC (permalink / raw)
  To: Peter Zijlstra, Trond Myklebust
  Cc: mark.rutland@arm.com, linux-kernel@vger.kernel.org,
	ralf@linux-mips.org, jlayton@kernel.org,
	linuxppc-dev@lists.ozlabs.org, bfields@fieldses.org,
	linux-mips@linux-mips.org, linux@roeck-us.net,
	linux-nfs@vger.kernel.org, akpm@linux-foundation.org,
	will.deacon@arm.com, boqun.feng@gmail.com, paul.burton@mips.com,
	anna.schumaker@netapp.com, jhogan@kernel.org, "netdev@vger.ke
In-Reply-To: <20181101163212.GF3159@hirez.programming.kicks-ass.net>



On 11/01/2018 09:32 AM, Peter Zijlstra wrote:

>> Anyhow, if the atomic maintainers are willing to stand up and state for
>> the record that the atomic counters are guaranteed to wrap modulo 2^n
>> just like unsigned integers, then I'm happy to take Paul's patch.
> 
> I myself am certainly relying on it.


Could we get uatomic_t support maybe ?

This reminds me of this sooooo silly patch :/

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=adb03115f4590baa280ddc440a8eff08a6be0cb7

^ permalink raw reply

* Re: [PATCH v1] net: ipv6: fix racey clock check in route cache aging logic
From: Brendan Higgins @ 2018-11-01 16:56 UTC (permalink / raw)
  To: eric.dumazet
  Cc: David S . Miller, Alexey Kuznetsov, yoshfuji, netdev,
	Linux Kernel Mailing List
In-Reply-To: <9ce74f5b-387d-b02f-2efa-f12c1450577c@gmail.com>

On Thu, Oct 25, 2018 at 3:15 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/25/2018 02:46 PM, Brendan Higgins wrote:
> > On Thu, Oct 25, 2018 at 2:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> On 10/25/2018 02:13 PM, Brendan Higgins wrote:
> > <snip>
> >>>
> >>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> >>> index 2a7423c394560..54d28b91fd840 100644
> >>> --- a/net/ipv6/route.c
> >>> +++ b/net/ipv6/route.c
> >>> @@ -1734,7 +1734,7 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket,
> >>>                       rt6_remove_exception(bucket, rt6_ex);
> >>>                       return;
> >>>               }
> >>> -     } else if (time_after(jiffies, rt->dst.expires)) {
> >>> +     } else if (time_after(now, rt->dst.expires)) {
> >>>               RT6_TRACE("purging expired route %p\n", rt);
> >>>               rt6_remove_exception(bucket, rt6_ex);
> >>>               return;
> >>>
> >>
> >>
> >> I do not think there is a bug here ?
> >>
> >> As a matter of fact, using the latest value of jiffies is probably better,
> >> since in some cases the @now variable could be quite in the past.
> >
> > Then why do we pass the `now` parameter in at all and use it at all,
> > like here: https://elixir.bootlin.com/linux/latest/source/net/ipv6/route.c#L1764
> > ?
> >
> > I am still skeptical that we should check jiffies in each check, but
> > we should at least be consistent.
>
> Well, this is a case where we do not really care.
>
> When a bug is fixed (you added a Fixes: tag which is good), we want
> to understand the real problem that needs to be fixed on stable kernels.

So by bug you mean user visible bug?
>
> Since this does not seem to be a real issue, I would suggest you send a cleanup
> patch when net-next is open (few days after linux-4.20-rc1 is release)

Sounds good, I will resend shortly.

Thanks!

^ permalink raw reply

* Re: [GIT] Networking
From: Linus Torvalds @ 2018-11-01 16:17 UTC (permalink / raw)
  To: David Miller; +Cc: Andrew Morton, netdev, Linux Kernel Mailing List
In-Reply-To: <20181031.184402.2213867913967695313.davem@davemloft.net>

On Wed, Oct 31, 2018 at 6:44 PM David Miller <davem@davemloft.net> wrote:
>
> Please pull, thanks a lot!

Pulled,

                    Linus

^ permalink raw reply

* [PATCH bpf 1/3] bpf: show real jited prog address in /proc/kallsyms
From: Song Liu @ 2018-11-01  7:00 UTC (permalink / raw)
  To: netdev; +Cc: kernel-team, Song Liu, ast, daniel, sandipan
In-Reply-To: <20181101070058.2760251-1-songliubraving@fb.com>

Currently, /proc/kallsyms shows page address of jited bpf program. This
is not ideal for detailed profiling (find hot instructions from stack
traces). This patch replaces the page address with real prog start
address.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/core.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 6377225b2082..1a796e0799ec 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -553,7 +553,6 @@ bool is_bpf_text_address(unsigned long addr)
 int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
 		    char *sym)
 {
-	unsigned long symbol_start, symbol_end;
 	struct bpf_prog_aux *aux;
 	unsigned int it = 0;
 	int ret = -ERANGE;
@@ -566,10 +565,9 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
 		if (it++ != symnum)
 			continue;
 
-		bpf_get_prog_addr_region(aux->prog, &symbol_start, &symbol_end);
 		bpf_get_prog_name(aux->prog, sym);
 
-		*value = symbol_start;
+		*value = (unsigned long)aux->prog->bpf_func;
 		*type  = BPF_SYM_ELF_TYPE;
 
 		ret = 0;
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf 0/3] show more accurrate bpf program address
From: Song Liu @ 2018-11-01  7:00 UTC (permalink / raw)
  To: netdev; +Cc: kernel-team, Song Liu, ast, daniel, sandipan

This set improves bpf program address showed in /proc/kallsyms and in
bpf_prog_info. First, real program address is showed instead of page
address. Second, when there is no subprogram, bpf_prog_info->jited_ksyms
returns the main prog address.

Song Liu (3):
  bpf: show real jited prog address in /proc/kallsyms
  bpf: show real jited address in bpf_prog_info->jited_ksyms
  bpf: show main program address in bpf_prog_info->jited_ksyms

 kernel/bpf/core.c    |  4 +---
 kernel/bpf/syscall.c | 17 ++++++++++++-----
 2 files changed, 13 insertions(+), 8 deletions(-)

^ permalink raw reply

* [PATCH bpf 2/3] bpf: show real jited address in bpf_prog_info->jited_ksyms
From: Song Liu @ 2018-11-01  7:00 UTC (permalink / raw)
  To: netdev; +Cc: kernel-team, Song Liu, ast, daniel, sandipan
In-Reply-To: <20181101070058.2760251-1-songliubraving@fb.com>

Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
bpf program. This is not ideal for detailed profiling (find hot
instructions from stack traces). This patch replaces the page address
with real prog start address.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/syscall.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ccb93277aae2..34a9eef5992c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2172,7 +2172,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
 			for (i = 0; i < ulen; i++) {
 				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
-				ksym_addr &= PAGE_MASK;
 				if (put_user((u64) ksym_addr, &user_ksyms[i]))
 					return -EFAULT;
 			}
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf 3/3] bpf: show main program address in bpf_prog_info->jited_ksyms
From: Song Liu @ 2018-11-01  7:00 UTC (permalink / raw)
  To: netdev; +Cc: kernel-team, Song Liu, ast, daniel, sandipan
In-Reply-To: <20181101070058.2760251-1-songliubraving@fb.com>

Currently, when there is not subprog (prog->aux->func_cnt == 0),
bpf_prog_info does not return any jited_ksyms. This patch adds
main program address (prog->bpf_func) to jited_ksyms.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/syscall.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 34a9eef5992c..7293b17ca62a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2158,7 +2158,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 	}
 
 	ulen = info.nr_jited_ksyms;
-	info.nr_jited_ksyms = prog->aux->func_cnt;
+	info.nr_jited_ksyms = prog->aux->func_cnt ? : 1;
 	if (info.nr_jited_ksyms && ulen) {
 		if (bpf_dump_raw_ok()) {
 			u64 __user *user_ksyms;
@@ -2170,9 +2170,17 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 			 */
 			ulen = min_t(u32, info.nr_jited_ksyms, ulen);
 			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
-			for (i = 0; i < ulen; i++) {
-				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
-				if (put_user((u64) ksym_addr, &user_ksyms[i]))
+			if (prog->aux->func_cnt) {
+				for (i = 0; i < ulen; i++) {
+					ksym_addr = (ulong)
+						prog->aux->func[i]->bpf_func;
+					if (put_user((u64) ksym_addr,
+						     &user_ksyms[i]))
+						return -EFAULT;
+				}
+			} else {
+				ksym_addr = (ulong) prog->bpf_func;
+				if (put_user((u64) ksym_addr, &user_ksyms[0]))
 					return -EFAULT;
 			}
 		} else {
-- 
2.17.1

^ permalink raw reply related

* RE: [PATCH net-next v2 5/6] net/ncsi: Reset channel state in ncsi_start_dev()
From: Justin.Lee1 @ 2018-11-01 15:51 UTC (permalink / raw)
  To: sam, netdev; +Cc: davem, linux-kernel, openbmc
In-Reply-To: <8004d6c236582253a82004ce72425b45f09d7d56.camel@mendozajonas.com>


> For your trace below can you share exactly which commands you were
> sending? Those messages aren't upstream so it's not 100% clear what's
> being sent.
> 
> Thanks!
> Sam
> 

It is sending via netlink directly. I have two packages (0 and 1) and each package has
two channels (0 and 1) on my system. If I convert it to command line, it would be
as below.

/home/root# ncsi_netlink -l 2 -m -a 0x01
Set cmd - ifindex 2, multi 1, mask 0x00000001

/home/root# ncsi_netlink -l 2 -m -b 0x03 -p 0
Set cmd - ifindex 2, package 0, channel -1, multi 1, mask 0x00000003

/home/root# ncsi_netlink -l 2 -m -b 0x00 -p 1
Set cmd - ifindex 2, package 1, channel -1, multi 1, mask 0x00000000

/home/root# ncsi_netlink
Usage: ncsi_netlink COMMAND [OPTION]...
Send messages to the NCSI driver via Netlink (0.4)

Mandatory arguments to long options are mandatory for short options too.
COMMAND:
  -i, --info                 Display info for packages and channels
  -s, --set                  Force the usage of a certain package/channel combination
  -x, --clear                Clear the above setting
  -a, --package-mask MASK    Set package selection mask
  -b, --channel-mask MASK    Set channel selection mask
  -o, --cmd DATA0 [DATA1]... Send NC-SI command
  -d, --discovery            Request to discover packages and channels
  -k, --lookup               Look up channel id or name
  -u, --status               Display status
  -h, --help                 Print this help text

OPTION:
  -l, --ifindex INDEX        Specify the interface index
  -p, --package PACKAGE      Package number
  -c, --channel CHANNEL      Channel number (aka. port number)
  -n, --channel NAME         Channel name (aka. ncsiX)
  -m, --multi-mode           Set multi-mode

Example: ncsi_netlink -l 2 --info


> > 
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-package enabled on ifindex 2, mask 0x00000001
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 0
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0400
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: pkg 0 ch 0 set as preferred channel
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000003
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 1
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 1 state 0400
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: Package 1 set to all channels disabled
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000000
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel()
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 0
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass pkg whitelist
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 0
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 1
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - next pkg
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 1
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: No channel found to configure!
> > > > npcm7xx-emc f0825000.eth eth2: NCSI interface down
> > > > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > > > npcm7xx-emc f0825000.eth eth2: Wrong NCSI state 0x100 in workqueue
> > > > 
> > > > All masks are set correctly, but you can see the PS column is not right and channel doesn't
> > > > configure correctly.
> > > > 
> > > > /sys/kernel/debug/ncsi_protocol# cat ncsi_device_status
> > > > IFIDX IFNAME NAME   PID CID RX TX MP MC WP WC PC PS LS RU CR NQ HA
> > > > ===================================================================
> > > >   2   eth2   ncsi0  000 000 1  1  1  1  1  1  1  0  1  1  1  0  1
> > > >   2   eth2   ncsi1  000 001 1  0  1  1  1  1  0  0  1  1  1  0  1
> > > >   2   eth2   ncsi2  001 000 0  0  1  1  0  0  0  0  1  1  1  0  1
> > > >   2   eth2   ncsi3  001 001 0  0  1  1  0  0  0  0  1  1  1  0  1
> > > > ===================================================================
> > > > MP: Multi-mode Package     WP: Whitelist Package
> > > > MC: Multi-mode Channel     WC: Whitelist Channel
> > > > PC: Primary Channel
> > > > PS: Poll Status
> > > > LS: Link Status
> > > > RU: Running
> > > > CR: Carrier OK
> > > > NQ: Queue Stopped
> > > > HA: Hardware Arbitration
> > > > 
> > > > PS column is getting from (int)nc->monitor.enabled.
> 
> 



^ permalink raw reply

* Re: [RFC PATCH v3 04/10] ip: factor out protocol delivery helper
From: Subash Abhinov Kasiviswanathan @ 2018-11-01  6:35 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Willem de Bruijn, Steffen Klassert
In-Reply-To: <06f628363fc53443f30f1d3120c5e844800b7718.1540920083.git.pabeni@redhat.com>

On 2018-10-30 11:24, Paolo Abeni wrote:
> So that we can re-use it at the UDP lavel in a later patch
> 

Hi Paolo

Minor queries -
Should it be "level" instead of "lavel"? Similar comment for the ipv6
patch as well.

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/ip_input.c | 73 ++++++++++++++++++++++-----------------------
>  1 file changed, 36 insertions(+), 37 deletions(-)
> 
> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index 35a786c0aaa0..72250b4e466d 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -188,51 +188,50 @@ bool ip_call_ra_chain(struct sk_buff *skb)
>  	return false;
>  }
> 
> -static int ip_local_deliver_finish(struct net *net, struct sock *sk,
> struct sk_buff *skb)
> +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb,
> int protocol)

Would it be better if this function was declared in include/net/ip.h &
include/net/ipv6.h rather than in net/ipv4/udp.c & net/ipv6/udp.c as in
the patch "udp: cope with UDP GRO packet misdirection"

diff --git a/include/net/ip.h b/include/net/ip.h
index 72593e1..3d7fdb4 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -717,4 +717,6 @@ static inline void ip_cmsg_recv(struct msghdr *msg, 
struct sk_buff *skb)
  int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
                                 struct netlink_ext_ack *extack);

+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int 
proto);
+
  #endif /* _IP_H */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 8296505..4d4d2cfe 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1101,4 +1101,8 @@ int ipv6_sock_mc_join_ssm(struct sock *sk, int 
ifindex,
                           const struct in6_addr *addr, unsigned int 
mode);
  int ipv6_sock_mc_drop(struct sock *sk, int ifindex,
                       const struct in6_addr *addr);
+
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int 
nexthdr,
+                              bool have_final);
+
  #endif /* _NET_IPV6_H */

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related

* [PATCH] cw1200: fix small typo
From: Yangtao Li @ 2018-11-01 15:33 UTC (permalink / raw)
  To: pizza, kvalo, davem; +Cc: linux-wireless, netdev, linux-kernel, Yangtao Li

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
---
 drivers/net/wireless/st/cw1200/sta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/st/cw1200/sta.c b/drivers/net/wireless/st/cw1200/sta.c
index 38678e9a0562..8dae92a79fe1 100644
--- a/drivers/net/wireless/st/cw1200/sta.c
+++ b/drivers/net/wireless/st/cw1200/sta.c
@@ -1123,7 +1123,7 @@ int cw1200_setup_mac(struct cw1200_common *priv)
 	 *
 	 * NOTE2: RSSI based reports have been switched to RCPI, since
 	 * FW has a bug and RSSI reported values are not stable,
-	 * what can leads to signal level oscilations in user-end applications
+	 * what can lead to signal level oscilations in user-end applications
 	 */
 	struct wsm_rcpi_rssi_threshold threshold = {
 		.rssiRcpiMode = WSM_RCPI_RSSI_THRESHOLD_ENABLE |
-- 
2.17.0

^ permalink raw reply related

* [PATCH] ath10k: fix some typo
From: Yangtao Li @ 2018-11-01 15:29 UTC (permalink / raw)
  To: kvalo, davem; +Cc: ath10k, linux-wireless, netdev, linux-kernel, Yangtao Li

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
---
 drivers/net/wireless/ath/ath10k/wow.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wow.c b/drivers/net/wireless/ath/ath10k/wow.c
index a6b179f88d36..e834c25a9330 100644
--- a/drivers/net/wireless/ath/ath10k/wow.c
+++ b/drivers/net/wireless/ath/ath10k/wow.c
@@ -135,7 +135,7 @@ static void ath10k_wow_convert_8023_to_80211
 	       &old_hdr_mask->h_proto,
 	       sizeof(old_hdr_mask->h_proto));
 
-	/* Caculate new pkt_offset */
+	/* Calculate new pkt_offset */
 	if (old->pkt_offset < ETH_ALEN)
 		new->pkt_offset = old->pkt_offset +
 			offsetof(struct ieee80211_hdr_3addr, addr1);
@@ -146,7 +146,7 @@ static void ath10k_wow_convert_8023_to_80211
 	else
 		new->pkt_offset = old->pkt_offset + hdr_len + rfc_len - ETH_HLEN;
 
-	/* Caculate new hdr end offset */
+	/* Calculate new hdr end offset */
 	if (total_len > ETH_HLEN)
 		hdr_80211_end_offset = hdr_len + rfc_len;
 	else if (total_len > offsetof(struct ethhdr, h_proto))
-- 
2.17.0

^ permalink raw reply related

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Guenter Roeck @ 2018-11-01 15:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: paul.burton@mips.com, linux-kernel@vger.kernel.org,
	ralf@linux-mips.org, jlayton@kernel.org,
	linuxppc-dev@lists.ozlabs.org, bfields@fieldses.org,
	linux-mips@linux-mips.org, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, anna.schumaker@netapp.com,
	jhogan@kernel.org, netdev@vger.kernel.org, davem@davemloft.net,
	arnd@arndb.de, paulus@samba.org, mpe@ellerman.id.au
In-Reply-To: <d7fe095d8d1f848b5742a5b3e8cce9f89e0c1c8d.camel@hammerspace.com>

On Thu, Nov 01, 2018 at 06:30:08AM +0000, Trond Myklebust wrote:
[ ... ]
> > 
> > For my part I agree that this would be a much better solution. The
> > argument
> > that it is not always absolutely guaranteed that atomics don't wrap
> > doesn't
> > really hold for me because it looks like they all do. On top of that,
> > there
> > is an explicit atomic_dec_if_positive() and
> > atomic_fetch_add_unless(),
> > which to me strongly suggests that they _are_ supposed to wrap.
> > Given the cost of adding a comparison to each atomic operation to
> > prevent it from wrapping, anything else would not really make sense
> > to me.
> 
> That's a hypothesis, not a proven fact. There are architectures out
> there that do not wrap signed integers, hence my question.
> 

If what you say is correct, the kernel is in big trouble on those architectures.
atomic_inc_return() is used all over the place in the kernel with the assumption
that each returned value differs from the previous value (ie the value is used
as cookie, session ID, or for similar purposes).

Guenter

^ permalink raw reply

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Trond Myklebust @ 2018-11-01 15:22 UTC (permalink / raw)
  To: mark.rutland@arm.com, peterz@infradead.org
  Cc: linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	anna.schumaker@netapp.com, jhogan@kernel.org,
	netdev@vger.kernel.org, "davem@dave
In-Reply-To: <20181101145926.GE3178@hirez.programming.kicks-ass.net>

On Thu, 2018-11-01 at 15:59 +0100, Peter Zijlstra wrote:
> On Thu, Nov 01, 2018 at 01:18:46PM +0000, Mark Rutland wrote:
> > > My one question (and the reason why I went with cmpxchg() in the
> > > first
> > > place) would be about the overflow behaviour for
> > > atomic_fetch_inc() and
> > > friends. I believe those functions should be OK on x86, so that
> > > when we
> > > overflow the counter, it behaves like an unsigned value and wraps
> > > back
> > > around.  Is that the case for all architectures?
> > > 
> > > i.e. are atomic_t/atomic64_t always guaranteed to behave like
> > > u32/u64
> > > on increment?
> > > 
> > > I could not find any documentation that explicitly stated that
> > > they
> > > should.
> > 
> > Peter, Will, I understand that the atomic_t/atomic64_t ops are
> > required
> > to wrap per 2's-complement. IIUC the refcount code relies on this.
> > 
> > Can you confirm?
> 
> There is quite a bit of core code that hard assumes 2s-complement.
> Not
> only for atomics but for any signed integer type. Also see the kernel
> using -fno-strict-overflow which implies -fwrapv, which defines
> signed
> overflow to behave like 2s-complement (and rids us of that particular
> UB).

Fair enough, but there have also been bugfixes to explicitly fix unsafe
C standards assumptions for signed integers. See, for instance commit
5a581b367b5d "jiffies: Avoid undefined behavior from signed overflow"
from Paul McKenney.

Anyhow, if the atomic maintainers are willing to stand up and state for
the record that the atomic counters are guaranteed to wrap modulo 2^n
just like unsigned integers, then I'm happy to take Paul's patch.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply

* Re: [PATCH v3 1/2] kretprobe: produce sane stack traces
From: Masami Hiramatsu @ 2018-11-01 15:20 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Steven Rostedt, Shuah Khan, Alexei Starovoitov,
	Daniel Borkmann, Brendan Gregg, Christian Brauner, Aleksa Sarai,
	netdev, linux-doc, linux-kernel
In-Reply-To: <20181101083551.3805-2-cyphar@cyphar.com>

Hi Aleksa,

On Thu,  1 Nov 2018 19:35:50 +1100
Aleksa Sarai <cyphar@cyphar.com> wrote:
[...]
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc

Please split the test case as an independent patch.

> new file mode 100644
> index 000000000000..03146c6a1a3c
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
> @@ -0,0 +1,25 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0+
> +# description: Kretprobe dynamic event with a stacktrace
> +
> +[ -f kprobe_events ] || exit_unsupported # this is configurable
> +
> +echo 0 > events/enable
> +echo 1 > options/stacktrace
> +
> +echo 'r:teststackprobe sched_fork $retval' > kprobe_events
> +grep teststackprobe kprobe_events
> +test -d events/kprobes/teststackprobe

Hmm, what happen if we have 2 or more kretprobes on same stack?
It seems you just save stack in pre_handler, but that stack can already
includes another kretprobe's trampline address...

Thank you,

> +
> +clear_trace
> +echo 1 > events/kprobes/teststackprobe/enable
> +( echo "forked")
> +echo 0 > events/kprobes/teststackprobe/enable
> +
> +# Make sure we don't see kretprobe_trampoline and we see _do_fork.
> +! grep 'kretprobe' trace
> +grep '_do_fork' trace
> +
> +echo '-:teststackprobe' >> kprobe_events
> +clear_trace
> +test -d events/kprobes/teststackprobe && exit_fail || exit_pass
> -- 
> 2.19.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Peter Zijlstra @ 2018-11-01 14:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Trond Myklebust, linux@roeck-us.net, paul.burton@mips.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux-nfs@vger.kernel.org, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, jhogan@kernel.org,
	netdev@vger.kernel.org, davem@davemloft.net,
	"arnd@arndb.de"
In-Reply-To: <20181101131846.biyilr2msonljmij@lakrids.cambridge.arm.com>

On Thu, Nov 01, 2018 at 01:18:46PM +0000, Mark Rutland wrote:
> > My one question (and the reason why I went with cmpxchg() in the first
> > place) would be about the overflow behaviour for atomic_fetch_inc() and
> > friends. I believe those functions should be OK on x86, so that when we
> > overflow the counter, it behaves like an unsigned value and wraps back
> > around.  Is that the case for all architectures?
> > 
> > i.e. are atomic_t/atomic64_t always guaranteed to behave like u32/u64
> > on increment?
> > 
> > I could not find any documentation that explicitly stated that they
> > should.
> 
> Peter, Will, I understand that the atomic_t/atomic64_t ops are required
> to wrap per 2's-complement. IIUC the refcount code relies on this.
> 
> Can you confirm?

There is quite a bit of core code that hard assumes 2s-complement. Not
only for atomics but for any signed integer type. Also see the kernel
using -fno-strict-overflow which implies -fwrapv, which defines signed
overflow to behave like 2s-complement (and rids us of that particular
UB).

^ permalink raw reply

* Re: [PATCH v2 3/3] Bluetooth: hci_qca: Set HCI_QUIRK_USE_BDADDR_PROPERTY for wcn3990
From: Balakrishna Godavarthi @ 2018-11-01 14:37 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg
In-Reply-To: <20181030004415.237101-4-mka@chromium.org>

On 2018-10-30 06:14, Matthias Kaehlcke wrote:
> Set quirk for wcn3990 to read BD_ADDR from a firmware node property.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> Changes in v2:
> - patch added to the series
> 
> tested with https://lore.kernel.org/patchwork/patch/1003830
> ("Bluetooth: hci_qca: Add helper to set device address.")
> ---
>  drivers/bluetooth/hci_qca.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index f036c8f98ea3..0535833caa52 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -1193,6 +1193,7 @@ static int qca_setup(struct hci_uart *hu)
>  		 * setup for every hci up.
>  		 */
>  		set_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks);
> +		set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
>  		hu->hdev->shutdown = qca_power_off;
>  		ret = qca_wcn3990_init(hu);
>  		if (ret)

Tested-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>

-- 
Regards
Balakrishna.

^ permalink raw reply

* Re: [PATCH v2 1/3] Bluetooth: Add quirk for reading BD_ADDR from fwnode property
From: Balakrishna Godavarthi @ 2018-11-01 14:37 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg
In-Reply-To: <20181030004415.237101-2-mka@chromium.org>

On 2018-10-30 06:14, Matthias Kaehlcke wrote:
> Add HCI_QUIRK_USE_BDADDR_PROPERTY to allow controllers to retrieve
> the public Bluetooth address from the firmware node property
> 'local-bd-address'. If quirk is set and the property does not exist
> or is invalid the controller is marked as unconfigured.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
> ---
> Changes in v2:
> - added check for return value of ->setup()
> - only read BD_ADDR from the property if it isn't assigned yet. This
>   is needed to support configuration from user space
> - refactored the branch of the new quirk to get rid of 'bd_addr_set'
> - added 'Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>' 
> tag
> ---
>  include/net/bluetooth/hci.h | 12 ++++++++++
>  net/bluetooth/hci_core.c    | 45 +++++++++++++++++++++++++++++++++++++
>  net/bluetooth/mgmt.c        |  6 +++--
>  3 files changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index c36dc1e20556..fbba43e9bef5 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -158,6 +158,18 @@ enum {
>  	 */
>  	HCI_QUIRK_INVALID_BDADDR,
> 
> +	/* When this quirk is set, the public Bluetooth address
> +	 * initially reported by HCI Read BD Address command
> +	 * is considered invalid. The public BD Address can be
> +	 * specified in the fwnode property 'local-bd-address'.
> +	 * If this property does not exist or is invalid controller
> +	 * configuration is required before this device can be used.
> +	 *
> +	 * This quirk can be set before hci_register_dev is called or
> +	 * during the hdev->setup vendor callback.
> +	 */
> +	HCI_QUIRK_USE_BDADDR_PROPERTY,
> +
>  	/* When this quirk is set, the duplicate filtering during
>  	 * scanning is based on Bluetooth devices addresses. To allow
>  	 * RSSI based updates, restart scanning if needed.
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 7352fe85674b..d4149005a661 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -30,6 +30,7 @@
>  #include <linux/rfkill.h>
>  #include <linux/debugfs.h>
>  #include <linux/crypto.h>
> +#include <linux/property.h>
>  #include <asm/unaligned.h>
> 
>  #include <net/bluetooth/bluetooth.h>
> @@ -1355,6 +1356,36 @@ int hci_inquiry(void __user *arg)
>  	return err;
>  }
> 
> +/**
> + * hci_dev_get_bd_addr_from_property - Get the Bluetooth Device 
> Address
> + *				       (BD_ADDR) for a HCI device from
> + *				       a firmware node property.
> + * @hdev:	The HCI device
> + *
> + * Search the firmware node for 'local-bd-address'.
> + *
> + * All-zero BD addresses are rejected, because those could be 
> properties
> + * that exist in the firmware tables, but were not updated by the 
> firmware. For
> + * example, the DTS could define 'local-bd-address', with zero BD 
> addresses.
> + */
> +static int hci_dev_get_bd_addr_from_property(struct hci_dev *hdev)
> +{
> +	struct fwnode_handle *fwnode = dev_fwnode(hdev->dev.parent);
> +	bdaddr_t ba;
> +	int ret;
> +
> +	ret = fwnode_property_read_u8_array(fwnode, "local-bd-address",
> +					    (u8 *)&ba, sizeof(ba));
> +	if (ret < 0)
> +		return ret;
> +	if (!bacmp(&ba, BDADDR_ANY))
> +		return -ENODATA;
> +
> +	hdev->public_addr = ba;
> +
> +	return 0;
> +}
> +
>  static int hci_dev_do_open(struct hci_dev *hdev)
>  {
>  	int ret = 0;
> @@ -1422,6 +1453,20 @@ static int hci_dev_do_open(struct hci_dev *hdev)
>  		if (hdev->setup)
>  			ret = hdev->setup(hdev);
> 
> +		if (ret)
> +			goto setup_failed;
> +
> +		if (test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) {
> +			if (!bacmp(&hdev->public_addr, BDADDR_ANY))
> +				hci_dev_get_bd_addr_from_property(hdev);
> +
> +			if (!bacmp(&hdev->public_addr, BDADDR_ANY) ||
> +			    !hdev->set_bdaddr ||
> +			    hdev->set_bdaddr(hdev, &hdev->public_addr))
> +				hci_dev_set_flag(hdev, HCI_UNCONFIGURED);
> +		}
> +
> +setup_failed:
>  		/* The transport driver can set these quirks before
>  		 * creating the HCI device or in its setup callback.
>  		 *
> diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
> index ccce954f8146..fae84353d030 100644
> --- a/net/bluetooth/mgmt.c
> +++ b/net/bluetooth/mgmt.c
> @@ -551,7 +551,8 @@ static bool is_configured(struct hci_dev *hdev)
>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>  		return false;
> 
> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>  		return false;
> 
> @@ -566,7 +567,8 @@ static __le32 get_missing_options(struct hci_dev 
> *hdev)
>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>  		options |= MGMT_OPTION_EXTERNAL_CONFIG;
> 
> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>  		options |= MGMT_OPTION_PUBLIC_ADDRESS;

Tested by dts entry and also via btmgnt tool

Tested-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>

-- 
Regards
Balakrishna.

^ permalink raw reply

* Re: [PATCH net v6] net/ipv6: Add anycast addresses to a global hashtable
From: Stephen Hemminger @ 2018-11-01  5:34 UTC (permalink / raw)
  To: Jeff Barnhill; +Cc: netdev, davem, kuznet, yoshfuji
In-Reply-To: <20181101001438.9446-1-0xeffeff@gmail.com>

On Thu,  1 Nov 2018 00:14:38 +0000
Jeff Barnhill <0xeffeff@gmail.com> wrote:

> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index 14b789a123e7..799af1a037d1 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -317,6 +317,8 @@ bool ipv6_chk_acast_addr(struct net *net, struct net_device *dev,
>  			 const struct in6_addr *addr);
>  bool ipv6_chk_acast_addr_src(struct net *net, struct net_device *dev,
>  			     const struct in6_addr *addr);
> +int anycast_init(void);
> +void anycast_cleanup(void);

One minor nit that should be fixed.


To avoid any potential naming conflicts, please prefix all ipv6 global symbols
with ipv6_

^ permalink raw reply

* Re: [RFC 0/2] Delayed binding of UDP sockets for Quic per-connection sockets
From: Christoph Paasch @ 2018-11-01  3:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ian Swett, Leif Hedstrom, Jana Iyengar
In-Reply-To: <0ce864f0-38b9-59cc-18ea-e071afca347d@gmail.com>

On 31/10/18 - 17:53:22, Eric Dumazet wrote:
> On 10/31/2018 04:26 PM, Christoph Paasch wrote:
> > Implementations of Quic might want to create a separate socket for each
> > Quic-connection by creating a connected UDP-socket.
> > 
> 
> Nice proposal, but I doubt a QUIC server can afford having one UDP socket per connection ?
> 
> It would add a huge overhead in term of memory usage in the kernel,
> and lots of epoll events to manage (say a QUIC server with one million flows, receiving
> very few packets per second per flow)
> 
> Maybe you could elaborate on the need of having one UDP socket per connection.

I let Leif chime in on that as the ask came from him. Leif & his team are
implementing Quic in the Apache Traffic Server.


One advantage I can see is that it would allow to benefit from fq_pacing as
one could set sk_pacing_rate simply on the socket. That way there is no need
to implement the pacing in the user-space anymore.


> > To achieve that on the server-side, a "master-socket" needs to wait for
> > incoming new connections and then creates a new socket that will be a
> > connected UDP-socket. To create that latter one, the server needs to
> > first bind() and then connect(). However, after the bind() the server
> > might already receive traffic on that new socket that is unrelated to the
> > Quic-connection at hand. Only after the connect() a full 4-tuple match
> > is happening. So, one can't really create this kind of a server that has
> > a connected UDP-socket per Quic connection.
> > 
> > So, what is needed is an "atomic bind & connect" that basically
> > prevents any incoming traffic until the connect() call has been issued
> > at which point the full 4-tuple is known.
> > 
> > 
> > This patchset implements this functionality and exposes a socket-option
> > to do this.
> > 
> > Usage would be:
> > 
> >         int fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
> > 
> >         int val = 1;
> >         setsockopt(fd, SOL_SOCKET, SO_DELAYED_BIND, &val, sizeof(val));
> > 
> >         bind(fd, (struct sockaddr *)&src, sizeof(src));
> > 
> > 	/* At this point, incoming traffic will never match on this socket */
> > 
> >         connect(fd, (struct sockaddr *)&dst, sizeof(dst));
> > 
> > 	/* Only now incoming traffic will reach the socket */
> > 
> > 
> > 
> > There is literally an infinite number of ways on how to implement it,
> > which is why I first send it out as an RFC. With this approach here I
> > chose the least invasive one, just preventing the match on the incoming
> > path.
> > 
> > 
> > The reason for choosing a SOL_SOCKET socket-option and not at the
> > SOL_UDP-level is because that functionality actually could be useful for
> > other protocols as well. E.g., TCP wants to better use the full 4-tuple space
> > by binding to the source-IP and the destination-IP at the same time.
> 
> Passive TCP flows can not benefit from this idea.
> 
> Active TCP flows can already do that, I do not really understand what you are suggesting.

What we had here is that we wanted to let a server initiate more than 64K
connections *while* binding also to a source-IP.
With TCP the bind() would then pick a source-port and we ended up hitting the
64K limit. If we could do an atomic "bind + connect", source-port selection
could ensure that the 4-tuple is unique.

Or has something changed in recent times that allows to use the 4-tuple
matching when doing this with TCP?


Christoph

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox