Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next] bpf: flowlabel in bpf_fib_lookup should be flowinfo
From: David Ahern @ 2018-06-04  1:47 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, borkmann, ast, Michal Kubecek
In-Reply-To: <20180604014118.5nazn4s43443esio@ast-mbp>

On 6/3/18 7:41 PM, Alexei Starovoitov wrote:
> On Sun, Jun 03, 2018 at 08:15:19AM -0700, dsahern@kernel.org wrote:
>> From: David Ahern <dsahern@gmail.com>
>>
>> As Michal noted the flow struct takes both the flow label and priority.
>> Update the bpf_fib_lookup API to note that it is flowinfo and not just
>> the flow label.
>>
>> Cc: Michal Kubecek <mkubecek@suse.cz>
>> Signed-off-by: David Ahern <dsahern@gmail.com>
> 
> Applied, Thanks
> 

I noticed 4.17 was released. Just to make sure we are on the same page,
this patch needs to be 4.18.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: flowlabel in bpf_fib_lookup should be flowinfo
From: Alexei Starovoitov @ 2018-06-04  1:48 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, borkmann, ast, Michal Kubecek
In-Reply-To: <32d91faa-a963-43f5-1dea-8f238abc434c@gmail.com>

On Sun, Jun 03, 2018 at 07:47:11PM -0600, David Ahern wrote:
> On 6/3/18 7:41 PM, Alexei Starovoitov wrote:
> > On Sun, Jun 03, 2018 at 08:15:19AM -0700, dsahern@kernel.org wrote:
> >> From: David Ahern <dsahern@gmail.com>
> >>
> >> As Michal noted the flow struct takes both the flow label and priority.
> >> Update the bpf_fib_lookup API to note that it is flowinfo and not just
> >> the flow label.
> >>
> >> Cc: Michal Kubecek <mkubecek@suse.cz>
> >> Signed-off-by: David Ahern <dsahern@gmail.com>
> > 
> > Applied, Thanks
> > 
> 
> I noticed 4.17 was released. Just to make sure we are on the same page,
> this patch needs to be 4.18.

It was applied to bpf-next obviously.

As soon as we resolve the situation with af_xdp the PR will be sent to Dave
for net-next, so net-next can be sent to Linus.

^ permalink raw reply

* Re: [PATCH 10/18] rhashtable: remove rhashtable_walk_peek()
From: NeilBrown @ 2018-06-04  2:09 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Herbert Xu, Thomas Graf, Linux Kernel Network Developers, LKML,
	Tom Herbert
In-Reply-To: <CALx6S37AYrFoJNrdToRc84J=B4f57jaHS077UsxdEoQ-xwyCeg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4320 bytes --]

On Sun, Jun 03 2018, Tom Herbert wrote:

> On Sun, Jun 3, 2018 at 5:30 PM, NeilBrown <neilb@suse.com> wrote:
>> On Sat, Jun 02 2018, Herbert Xu wrote:
>>
>>> On Fri, Jun 01, 2018 at 02:44:09PM +1000, NeilBrown wrote:
>>>> This function has a somewhat confused behavior that is not properly
>>>> described by the documentation.
>>>> Sometimes is returns the previous object, sometimes it returns the
>>>> next one.
>>>> Sometimes it changes the iterator, sometimes it doesn't.
>>>>
>>>> This function is not currently used and is not worth keeping, so
>>>> remove it.
>>>>
>>>> A future patch will introduce a new function with a
>>>> simpler interface which can meet the same need that
>>>> this was added for.
>>>>
>>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>>
>>> Please keep Tom Herbert in the loop.  IIRC he had an issue with
>>> this patch.
>>
>> Yes you are right - sorry for forgetting to add Tom.
>>
>> My understanding of where this issue stands is that Tom raised issue and
>> asked for clarification, I replied, nothing further happened.
>>
>> It summary, my position is that:
>> - most users of my new rhashtable_walk_prev() will use it like
>>    rhasthable_talk_prev() ?: rhashtable_walk_next()
>>   which is close to what rhashtable_walk_peek() does
>> - I know of no use-case that could not be solved if we only had
>>   the combined operation
>> - BUT it is hard to document the combined operation, as it really
>>   does two things.  If it is hard to document, then it might be
>>   hard to understand.
>>
>> So provide the most understandable/maintainable solution, I think
>> we should provide rhashtable_walk_prev() as a separate interface.
>>
> I'm still missing why requiring two API operations instead of one is
> simpler or easier to document. Also, I disagree that
> rhashtable_walk_peek does two things-- it just does one which is to
> return the current element in the walk without advancing to the next
> one. The fact that the iterator may or may not move is immaterial in
> the API, that is an implementation detail. In fact, it's conceivable
> that we might completely reimplement this someday such that the
> iterator works completely differently implementation semantics but the
> API doesn't change. Also the naming in your proposal is confusing,
> we'd have operations to get the previous, and the next next object--
> so the user may ask where's the API to get the current object in the
> walk? The idea that we get it by first trying to get the previous
> object, and then if that fails getting the next object seems
> counterintuitive.

To respond to your points out of order:

- I accept that "rhashtable_walk_prev" is not a perfect name.  It
  suggests a stronger symmetry with rhasthable_walk_next than actually
  exist.  I cannot think of a better name, but I think the
  description "Return the previously returned object if it is
  still in the table" is clear and simple and explains the name.
  I'm certainly open to suggestions for a better name.

- I don't think it is meaningful to talk about a "current" element in a
  table where asynchronous insert/remove is to be expected.
  The best we can hope for is a "current location" is the sequence of
  objects in the table - a location which is after some objects and
  before all others.  rhashtable_walk_next() returns the next object
  after the current location, and advances the location pointer past
  that object.
  rhashtable_walk_prev() *doesn't* return the previous object in the
  table.  It returns the previously returned object. ("previous" in
  time, but not in space, if you like).

- rhashtable_walk_peek() currently does one of two different things.
  It either returns the previously returned object (iter->p) if that
  is still in the table, or it find the next object, steps over it, and
  returns it.

- I would like to suggest that when an API acts on a iterator object,
  the question of whether or not the iterator is advanced *must* be a
  fundamental question, not one that might change from time to time.

Maybe a useful way forward would be for you to write documentation for
the rhashtable_walk_peek() interface which correctly describes what it
does and how it is used.  Given that, I can implement that interface
with the stability improvements that I'm working on.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: ipv6: prevent use after free in ip6_route_mpath_notify()
From: Eric Dumazet @ 2018-06-04  2:39 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, David S . Miller; +Cc: netdev
In-Reply-To: <26bd52c0-1218-3c65-95ca-b2a908466841@cumulusnetworks.com>

On 06/03/2018 07:46 AM, David Ahern wrote:

> It was a mistake to set rt_last before checking err. So the
> use-after-free exposed the semantic error.
> 

SGTM, please send the formal patch then, thanks !

^ permalink raw reply

* [PATCH 15a/18] rhashtables: add lockdep tracking to bucket bit-spin-locks.
From: NeilBrown @ 2018-06-04  2:52 UTC (permalink / raw)
  To: Eric Dumazet, Herbert Xu
  Cc: Thomas Graf, netdev, linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <871sdnqty4.fsf@notabene.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 6838 bytes --]


Native bit_spin_locks are not tracked by lockdep.

The bit_spin_locks used for rhashtable buckets are local
to the rhashtable implementation, so there is little opportunity
for the sort of misuse that lockdep might detect.
However locks are held while a hash function or compare
function is called, and if one of these took a lock,
a misbehaviour is possible.

As it is quite easy to add lockdep support this unlikely
possibility see to be enough justification.

So create a lockdep class for bucket bit_spin_lock as attach
through a lockdep_map in each bucket_table.

With the 'nested' annotation in rhashtable_rehash_one(), lockdep
correctly reports a possible problem as this lock it taken
while another bucket lock (in another table) is held.  This
confirms that the added support works.
With the correct nested annotation in place, lockdep reports
no problems.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/rhashtable.h | 40 +++++++++++++++++++++++++++-------------
 lib/rhashtable.c           | 17 ++++++++++-------
 2 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 42b5d037ad2e..b12c2cd30a56 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -70,6 +70,8 @@ struct bucket_table {
 
 	struct bucket_table __rcu *future_tbl;
 
+	struct lockdep_map	dep_map;
+
 	struct rhash_head __rcu *buckets[] ____cacheline_aligned_in_smp;
 };
 
@@ -90,20 +92,32 @@ struct bucket_table {
  * will have provided that.
  */
 
-static inline void rht_lock(struct rhash_head **bucket)
+static inline void rht_lock(struct bucket_table *tbl, struct rhash_head **bucket)
+{
+	local_bh_disable();
+	bit_spin_lock(1, (unsigned long *)bucket);
+	lock_map_acquire(&tbl->dep_map);
+}
+
+static inline void rht_lock_nested(struct bucket_table *tbl,
+				   struct rhash_head **bucket,
+				   unsigned int subclass)
 {
 	local_bh_disable();
 	bit_spin_lock(1, (unsigned long *)bucket);
+	lock_acquire_exclusive(&tbl->dep_map, subclass, 0, NULL, _THIS_IP_);
 }
 
-static inline void rht_unlock(struct rhash_head **bucket)
+static inline void rht_unlock(struct bucket_table *tbl, struct rhash_head **bucket)
 {
+	lock_map_release(&tbl->dep_map);
 	bit_spin_unlock(1, (unsigned long *)bucket);
 	local_bh_enable();
 }
 
-static inline void rht_unlocked(void)
+static inline void rht_unlocked(struct bucket_table *tbl)
 {
+	lock_map_release(&tbl->dep_map);
 	preempt_enable();
 	__release(bitlock);
 	local_bh_enable();
@@ -742,11 +756,11 @@ static inline void *__rhashtable_insert_fast(
 	if (!headp)
 		goto out;
 	lock = pprev = headp;
-	rht_lock(lock);
+	rht_lock(tbl, lock);
 
 	if (unlikely(rcu_access_pointer(tbl->future_tbl))) {
 slow_path:
-		rht_unlock(lock);
+		rht_unlock(tbl, lock);
 		rcu_read_unlock();
 		return rhashtable_insert_slow(ht, key, obj);
 	}
@@ -825,10 +839,10 @@ static inline void *__rhashtable_insert_fast(
 		/* Assigning to *headp unlocked the chain, so we
 		 * don't need to do it again.
 		 */
-		rht_unlocked();
+		rht_unlocked(tbl);
 	} else {
 out:
-		rht_unlock(lock);
+		rht_unlock(tbl, lock);
 	}
 	rcu_read_unlock();
 
@@ -1033,7 +1047,7 @@ static inline int __rhashtable_remove_fast_one(
 	if (!pprev)
 		return -ENOENT;
 	lock = pprev;
-	rht_lock(lock);
+	rht_lock(tbl, lock);
 
 	rht_for_each_from(he, rht_ptr(*pprev), tbl, hash) {
 		struct rhlist_head *list;
@@ -1078,13 +1092,13 @@ static inline int __rhashtable_remove_fast_one(
 		rcu_assign_pointer(*pprev, obj);
 		if (lock == pprev) {
 			/* That rcu_assign_pointer() unlocked the chain */
-			rht_unlocked();
+			rht_unlocked(tbl);
 			goto unlocked;
 		}
 		break;
 	}
 
-	rht_unlock(lock);
+	rht_unlock(tbl, lock);
 unlocked:
 	if (err > 0) {
 		if (params.percpu_count)
@@ -1195,7 +1209,7 @@ static inline int __rhashtable_replace_fast(
 		return -ENOENT;
 
 	lock = pprev;
-	rht_lock(lock);
+	rht_lock(tbl, lock);
 
 	rht_for_each_from(he, rht_ptr(*pprev), tbl, hash) {
 		if (he != obj_old) {
@@ -1208,13 +1222,13 @@ static inline int __rhashtable_replace_fast(
 		err = 0;
 		if (pprev == lock) {
 			/* We just unlocked the chain by assigning to *pprev */
-			rht_unlocked();
+			rht_unlocked(tbl);
 			goto unlocked;
 		}
 		break;
 	}
 
-	rht_unlock(lock);
+	rht_unlock(tbl, lock);
 
 unlocked:
 	return err;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index aa1e7be8fc5b..1fc235f5d04e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -174,6 +174,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	struct bucket_table *tbl = NULL;
 	size_t size;
 	int i;
+	static struct lock_class_key __key;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
 	if (gfp != GFP_KERNEL)
@@ -190,6 +191,8 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	if (tbl == NULL)
 		return NULL;
 
+	lockdep_init_map(&tbl->dep_map, "rhashtable_bucket", &__key, 0);
+
 	tbl->size = size;
 
 	INIT_LIST_HEAD(&tbl->walkers);
@@ -249,7 +252,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht,
 
 	inspos = &new_tbl->buckets[new_hash];
 	lock = inspos;
-	rht_lock(lock);
+	rht_lock_nested(new_tbl, lock, SINGLE_DEPTH_NESTING);
 
 	head = rht_ptr(rht_dereference_bucket(*inspos, new_tbl, new_hash));
 	while (!rht_is_a_nulls(head) && head < entry) {
@@ -260,9 +263,9 @@ static int rhashtable_rehash_one(struct rhashtable *ht,
 
 	rcu_assign_pointer(*inspos, entry);
 	if (inspos != lock)
-		rht_unlock(lock);
+		rht_unlock(new_tbl, lock);
 	else
-		rht_unlocked();
+		rht_unlocked(new_tbl);
 
 	/* Need to preserved the bit lock. */
 	if (rht_is_locked(*pprev))
@@ -283,14 +286,14 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,
 
 	if (!pprev)
 		return 0;
-	rht_lock(pprev);
+	rht_lock(old_tbl, pprev);
 
 	while (!(err = rhashtable_rehash_one(ht, pprev, old_hash)))
 		;
 
 	if (err == -ENOENT)
 		err = 0;
-	rht_unlock(pprev);
+	rht_unlock(old_tbl, pprev);
 
 	return err;
 }
@@ -658,7 +661,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 			new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 			data = ERR_PTR(-EAGAIN);
 		} else {
-			rht_lock(pprev);
+			rht_lock(tbl, pprev);
 			data = rhashtable_lookup_one(ht, pprev, tbl,
 						     hash, key, obj);
 			new_tbl = rhashtable_insert_one(ht, pprev, tbl,
@@ -666,7 +669,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
 			if (PTR_ERR(new_tbl) != -EEXIST)
 				data = ERR_CAST(new_tbl);
 
-			rht_unlock(pprev);
+			rht_unlock(tbl, pprev);
 		}
 	} while (!IS_ERR_OR_NULL(new_tbl));
 
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related

* Re: [PATCH 04/18] rhashtable: detect when object movement might have invalidated a lookup
From: NeilBrown @ 2018-06-04  3:38 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Thomas Graf, netdev, linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <20180601160613.7ud25g2ux55k3bma@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

On Sat, Jun 02 2018, Herbert Xu wrote:

> On Fri, Jun 01, 2018 at 02:44:09PM +1000, NeilBrown wrote:
>> Some users of rhashtable might need to change the key
>> of an object and move it to a different location in the table.
>> Other users might want to allocate objects using
>> SLAB_TYPESAFE_BY_RCU which can result in the same memory allocation
>> being used for a different (type-compatible) purpose and similarly
>> end up in a different hash-chain.
>>
>> To support these, we store a unique NULLS_MARKER at the end of
>> each chain, and when a search fails to find a match, we check
>> if the NULLS marker found was the expected one.  If not,
>> the search is repeated.
>> 
>> The unique NULLS_MARKER is derived from the address of the
>> head of the chain.
>
> Yes I thinks makes a lot more sense than the existing rhashtable
> nulls code.  The current rhashtable nulls code harkens back to the
> time of the old rhashtable implementation where the same chain
> existed in two different tables and that is no longer the case.
>
>> If an object is removed and re-added to the same hash chain, we won't
>> notice by looking that the NULLS marker.  In this case we must be sure
>
> This is not currently required by TCP/UDP.  I'd rather not add
> extra constraints that aren't actually used.

I assume you are referring here to the change to insert at the head of
the chain, rather than at the end.
This is important for me to be able to use SLAB_TYPESAFE_BY_RCU
for the kmem_cache.  Without that, I need a new rcu-free callback
for every different kmem_cache.
When using SLAB_TYPESAFE_BY_RCU, there is no way to know where
a freed and re-allocated object might be used again - it could
be used in the same table with the same hash value.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH] NFC: fix attrs checks in netlink interface
From: Samuel Ortiz @ 2018-06-04  4:29 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: David S . Miller, linux-wireless, netdev, linux-kernel,
	Dmitry Vyukov, Kostya Serebryany
In-Reply-To: <75ce3040b4086ffa2d2e088ad7f24f5e4a87be56.1523552145.git.andreyknvl@google.com>

Hi Andrey,

On Thu, Apr 12, 2018 at 06:56:56PM +0200, Andrey Konovalov wrote:
> nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
> attribute being present, but doesn't check whether it is actually
> provided by the user. Same goes for nfc_genl_fw_download() and
> NFC_ATTR_FIRMWARE_NAME.
> 
> This patch adds appropriate checks.
> 
> Found with syzkaller.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  net/nfc/netlink.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
Thanks, applied to nfc-next.

Cheers,
Samuel.

^ permalink raw reply

* Re: ANNOUNCE: Enhanced IP v1.4
From: Willy Tarreau @ 2018-06-04  4:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, Sam Patton, Linux Kernel Network Developers
In-Reply-To: <b70e6448-8d34-1d18-0ec0-10e803f489b5@gmail.com>

On Sun, Jun 03, 2018 at 03:41:08PM -0700, Eric Dumazet wrote:
> 
> 
> On 06/03/2018 01:37 PM, Tom Herbert wrote:
> 
> > This is not an inconsequential mechanism that is being proposed. It's
> > a modification to IP protocol that is intended to work on the
> > Internet, but it looks like the draft hasn't been updated for two
> > years and it is not adopted by any IETF working group. I don't see how
> > this can go anywhere without IETF support. Also, I suggest that you
> > look at the IPv10 proposal since that was very similar in intent. One
> > of the reasons that IPv10 shot down was because protocol transition
> > mechanisms were more interesting ten years ago than today. IPv6 has
> > good traction now. In fact, it's probably the case that it's now
> > easier to bring up IPv6 than to try to make IPv4 options work over the
> > Internet.
> 
> +1
> 
> Many hosts do not use IPv4 anymore.
> 
> We even have the project making IPv4 support in linux optional.

I agree on these points, but I'd like to figure what can be done to put
a bit more pressure on ISPs to *always* provide IPv6. It's still very
hard to have decent connectivity at home and without this it will continue
to be marginalize.

I do have IPv6 at home (a /48, waste of addressing space, I'd be fine
with less), there's none at work (I don't even know if the ISP supports
it, at least it was never ever mentioned so probably they don't know
about this), and some ISPs only provide a /64 which is as ridiculous
as providing a single address as it forces the end user to NAT thus
breaking the end-to-end principle. Ideally with IoT at the door, every
home connection should have at least a /60 and enterprises should have
a /56, and this by default, without having to request anything.

Maybe setting up a public list of ISPs where users don't have at least
a /60 by default could help, but I suspect that most of them will
consider that as long as their competitors are on the list there's no
emergency.

Willy

^ permalink raw reply

* Re: KASAN: slab-out-of-bounds Read in bpf_csum_update
From: Dmitry Vyukov @ 2018-06-04  5:36 UTC (permalink / raw)
  To: syzbot
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <000000000000b2a7ea056dc54779@google.com>

On Mon, Jun 4, 2018 at 1:36 AM, syzbot
<syzbot+efae31b384d5badbd620@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    0512e0134582 Merge tag 'xfs-4.17-fixes-3' of git://git.ker..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=17eb2d7b800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=968b0b23c7854c0b
> dashboard link: https://syzkaller.appspot.com/bug?extid=efae31b384d5badbd620
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=162c6def800000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14fe3db7800000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+efae31b384d5badbd620@syzkaller.appspotmail.com
>
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> random: sshd: uninitialized urandom read (32 bytes read)
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in ____bpf_csum_update net/core/filter.c:1679
> [inline]
> BUG: KASAN: slab-out-of-bounds in bpf_csum_update+0xb4/0xc0
> net/core/filter.c:1673
> Read of size 1 at addr ffff8801d9235b50 by task syz-executor507/4513
>
> CPU: 0 PID: 4513 Comm: syz-executor507 Not tainted 4.17.0-rc7+ #78
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
>  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
>  ____bpf_csum_update net/core/filter.c:1679 [inline]
>  bpf_csum_update+0xb4/0xc0 net/core/filter.c:1673

/\/\/\/\/\

Are there any known bugs with unwind through bpf functions?

> Allocated by task 0:
> (stack is not available)
>
> Freed by task 0:
> (stack is not available)
>
> The buggy address belongs to the object at ffff8801d9235a40
>  which belongs to the cache skbuff_head_cache of size 232
> The buggy address is located 40 bytes to the right of
>  232-byte region [ffff8801d9235a40, ffff8801d9235b28)
> The buggy address belongs to the page:
> page:ffffea0007648d40 count:1 mapcount:0 mapping:ffff8801d9235040 index:0x0
> flags: 0x2fffc0000000100(slab)
> raw: 02fffc0000000100 ffff8801d9235040 0000000000000000 000000010000000c
> raw: ffffea00074360a0 ffff8801d944d848 ffff8801d9bdd6c0 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff8801d9235a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8801d9235a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>
>> ffff8801d9235b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
>                                                  ^
>  ffff8801d9235b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8801d9235c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000b2a7ea056dc54779%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* 答复: ANNOUNCE: Enhanced IP v1.4
From: PKU.孙斌 @ 2018-06-04  5:58 UTC (permalink / raw)
  To: 'Willy Tarreau', 'Eric Dumazet'
  Cc: 'Linux Kernel Network Developers'
In-Reply-To: <20180604043426.GB11775@1wt.eu>

On Sun, Jun 03, 2018 at 03:41:08PM -0700, Eric Dumazet wrote:
> 
> 
> On 06/03/2018 01:37 PM, Tom Herbert wrote:
> 
> > This is not an inconsequential mechanism that is being proposed. It's
> > a modification to IP protocol that is intended to work on the
> > Internet, but it looks like the draft hasn't been updated for two
> > years and it is not adopted by any IETF working group. I don't see how
> > this can go anywhere without IETF support. Also, I suggest that you
> > look at the IPv10 proposal since that was very similar in intent. One
> > of the reasons that IPv10 shot down was because protocol transition
> > mechanisms were more interesting ten years ago than today. IPv6 has
> > good traction now. In fact, it's probably the case that it's now
> > easier to bring up IPv6 than to try to make IPv4 options work over the
> > Internet.
> 
> +1
> 
> Many hosts do not use IPv4 anymore.
> 
> We even have the project making IPv4 support in linux optional.

I guess then Linux kernel wouldn't be able to boot itself without IPv4 built in, e.g., when we only have old L2 links (without the IPv6 frame type)...

^ permalink raw reply

* [PATCH RFC ipsec-next 1/3] flow: Extend flow informations with xfrm interface id.
From: Steffen Klassert @ 2018-06-04  6:09 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
	Lorenzo Colitti, Shannon Nelson
In-Reply-To: <20180604060910.13896-1-steffen.klassert@secunet.com>

Add a new flowi_xfrm structure with informations needed to do
a xfrm lookup. At the moment it keeps the informations about
the xfrm interface id.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
---
 include/net/flow.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index 8ce21793094e..187c9bef672f 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -26,6 +26,10 @@ struct flowi_tunnel {
 	__be64			tun_id;
 };
 
+struct flowi_xfrm {
+	__u32			if_id;
+};
+
 struct flowi_common {
 	int	flowic_oif;
 	int	flowic_iif;
@@ -39,6 +43,7 @@ struct flowi_common {
 #define FLOWI_FLAG_SKIP_NH_OIF		0x04
 	__u32	flowic_secid;
 	struct flowi_tunnel flowic_tun_key;
+	struct flowi_xfrm xfrm;
 	kuid_t  flowic_uid;
 };
 
@@ -78,6 +83,7 @@ struct flowi4 {
 #define flowi4_secid		__fl_common.flowic_secid
 #define flowi4_tun_key		__fl_common.flowic_tun_key
 #define flowi4_uid		__fl_common.flowic_uid
+#define flowi4_xfrm		__fl_common.xfrm
 
 	/* (saddr,daddr) must be grouped, same order as in IP header */
 	__be32			saddr;
@@ -109,6 +115,7 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif,
 	fl4->flowi4_flags = flags;
 	fl4->flowi4_secid = 0;
 	fl4->flowi4_tun_key.tun_id = 0;
+	fl4->flowi4_xfrm.if_id = 0;
 	fl4->flowi4_uid = uid;
 	fl4->daddr = daddr;
 	fl4->saddr = saddr;
@@ -138,6 +145,7 @@ struct flowi6 {
 #define flowi6_secid		__fl_common.flowic_secid
 #define flowi6_tun_key		__fl_common.flowic_tun_key
 #define flowi6_uid		__fl_common.flowic_uid
+#define flowi6_xfrm		__fl_common.xfrm
 	struct in6_addr		daddr;
 	struct in6_addr		saddr;
 	/* Note: flowi6_tos is encoded in flowlabel, too. */
@@ -185,6 +193,7 @@ struct flowi {
 #define flowi_secid	u.__fl_common.flowic_secid
 #define flowi_tun_key	u.__fl_common.flowic_tun_key
 #define flowi_uid	u.__fl_common.flowic_uid
+#define flowi_xfrm	u.__fl_common.xfrm
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
-- 
2.14.1

^ permalink raw reply related

* [PATCH RFC ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-04  6:09 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
	Lorenzo Colitti, Shannon Nelson

This patchset introduces new virtual xfrm interfaces.
The design of virtual xfrm interfaces interfaces was
discussed at the Linux IPsec workshop 2018. This patchset
implements these interfaces as the IPsec userspace and
kernel developers agreed. The purpose of these interfaces
is to overcome the design limitations that the existing
VTI devices have.

We had two presentations about xfrm interfaces at
the workshop. Slides with further informations
can be found at the workshop homepage:

https://workshop.linux-ipsec.org/2018/

^ permalink raw reply

* [PATCH RFC ipsec-next 2/3] xfrm: Add a new lookup key to match xfrm interfaces.
From: Steffen Klassert @ 2018-06-04  6:09 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
	Lorenzo Colitti, Shannon Nelson
In-Reply-To: <20180604060910.13896-1-steffen.klassert@secunet.com>

This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
---
 include/net/xfrm.h        | 21 ++++++++++++++-----
 include/uapi/linux/xfrm.h |  1 +
 net/core/pktgen.c         |  2 +-
 net/key/af_key.c          |  6 +++---
 net/xfrm/xfrm_policy.c    | 18 +++++++++++-----
 net/xfrm/xfrm_state.c     | 19 ++++++++++++-----
 net/xfrm/xfrm_user.c      | 53 ++++++++++++++++++++++++++++++++++++++++++-----
 7 files changed, 96 insertions(+), 24 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 45e75c36b738..26df8d6d507c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -147,6 +147,7 @@ struct xfrm_state {
 	struct xfrm_id		id;
 	struct xfrm_selector	sel;
 	struct xfrm_mark	mark;
+	u32			if_id;
 	u32			tfcpad;
 
 	u32			genid;
@@ -574,6 +575,7 @@ struct xfrm_policy {
 	atomic_t		genid;
 	u32			priority;
 	u32			index;
+	u32			if_id;
 	struct xfrm_mark	mark;
 	struct xfrm_selector	selector;
 	struct xfrm_lifetime_cfg lft;
@@ -1533,7 +1535,7 @@ struct xfrm_state *xfrm_state_find(const xfrm_address_t *daddr,
 				   struct xfrm_tmpl *tmpl,
 				   struct xfrm_policy *pol, int *err,
 				   unsigned short family);
-struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark,
+struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark, u32 if_id,
 				       xfrm_address_t *daddr,
 				       xfrm_address_t *saddr,
 				       unsigned short family,
@@ -1690,20 +1692,20 @@ int xfrm_policy_walk(struct net *net, struct xfrm_policy_walk *walk,
 		     void *);
 void xfrm_policy_walk_done(struct xfrm_policy_walk *walk, struct net *net);
 int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl);
-struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
+struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u32 if_id,
 					  u8 type, int dir,
 					  struct xfrm_selector *sel,
 					  struct xfrm_sec_ctx *ctx, int delete,
 					  int *err);
-struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
-				     u32 id, int delete, int *err);
+struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u32 if_id, u8,
+				     int dir, u32 id, int delete, int *err);
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
 void xfrm_policy_hash_rebuild(struct net *net);
 u32 xfrm_get_acqseq(void);
 int verify_spi_info(u8 proto, u32 min, u32 max);
 int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
 struct xfrm_state *xfrm_find_acq(struct net *net, const struct xfrm_mark *mark,
-				 u8 mode, u32 reqid, u8 proto,
+				 u8 mode, u32 reqid, u32 if_id, u8 proto,
 				 const xfrm_address_t *daddr,
 				 const xfrm_address_t *saddr, int create,
 				 unsigned short family);
@@ -2012,6 +2014,15 @@ static inline int xfrm_mark_put(struct sk_buff *skb, const struct xfrm_mark *m)
 	return ret;
 }
 
+static inline int xfrm_if_id_put(struct sk_buff *skb, __u32 if_id)
+{
+	int ret = 0;
+
+	if (if_id)
+		ret = nla_put_u32(skb, XFRMA_IF_ID, if_id);
+	return ret;
+}
+
 static inline int xfrm_tunnel_check(struct sk_buff *skb, struct xfrm_state *x,
 				    unsigned int family)
 {
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index e3af2859188b..690df6e7580b 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -306,6 +306,7 @@ enum xfrm_attr_type_t {
 	XFRMA_PAD,
 	XFRMA_OFFLOAD_DEV,	/* struct xfrm_state_offload */
 	XFRMA_OUTPUT_MARK,	/* __u32 */
+	XFRMA_IF_ID,		/* __u32 */
 	__XFRMA_MAX
 
 #define XFRMA_MAX (__XFRMA_MAX - 1)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 7e4ede34cc52..a1b2d769fa36 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2255,7 +2255,7 @@ static void get_ipsec_sa(struct pktgen_dev *pkt_dev, int flow)
 			x = xfrm_state_lookup_byspi(pn->net, htonl(pkt_dev->spi), AF_INET);
 		} else {
 			/* slow path: we dont already have xfrm_state */
-			x = xfrm_stateonly_find(pn->net, DUMMY_MARK,
+			x = xfrm_stateonly_find(pn->net, DUMMY_MARK, 0,
 						(xfrm_address_t *)&pkt_dev->cur_daddr,
 						(xfrm_address_t *)&pkt_dev->cur_saddr,
 						AF_INET,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index e62e52e8f141..bf38b358cf38 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1383,7 +1383,7 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_
 	}
 
 	if (!x)
-		x = xfrm_find_acq(net, &dummy_mark, mode, reqid, proto, xdaddr, xsaddr, 1, family);
+		x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, proto, xdaddr, xsaddr, 1, family);
 
 	if (x == NULL)
 		return -ENOENT;
@@ -2414,7 +2414,7 @@ static int pfkey_spddelete(struct sock *sk, struct sk_buff *skb, const struct sa
 			return err;
 	}
 
-	xp = xfrm_policy_bysel_ctx(net, DUMMY_MARK, XFRM_POLICY_TYPE_MAIN,
+	xp = xfrm_policy_bysel_ctx(net, DUMMY_MARK, 0, XFRM_POLICY_TYPE_MAIN,
 				   pol->sadb_x_policy_dir - 1, &sel, pol_ctx,
 				   1, &err);
 	security_xfrm_policy_free(pol_ctx);
@@ -2663,7 +2663,7 @@ static int pfkey_spdget(struct sock *sk, struct sk_buff *skb, const struct sadb_
 		return -EINVAL;
 
 	delete = (hdr->sadb_msg_type == SADB_X_SPDDELETE2);
-	xp = xfrm_policy_byid(net, DUMMY_MARK, XFRM_POLICY_TYPE_MAIN,
+	xp = xfrm_policy_byid(net, DUMMY_MARK, 0, XFRM_POLICY_TYPE_MAIN,
 			      dir, pol->sadb_x_policy_id, delete, &err);
 	if (xp == NULL)
 		return -ENOENT;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 40b54cc64243..08b745aade43 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -747,6 +747,7 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 	newpos = NULL;
 	hlist_for_each_entry(pol, chain, bydst) {
 		if (pol->type == policy->type &&
+		    pol->if_id == policy->if_id &&
 		    !selector_cmp(&pol->selector, &policy->selector) &&
 		    xfrm_policy_mark_match(policy, pol) &&
 		    xfrm_sec_ctx_match(pol->security, policy->security) &&
@@ -798,8 +799,9 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 }
 EXPORT_SYMBOL(xfrm_policy_insert);
 
-struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
-					  int dir, struct xfrm_selector *sel,
+struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u32 if_id,
+					  u8 type, int dir,
+					  struct xfrm_selector *sel,
 					  struct xfrm_sec_ctx *ctx, int delete,
 					  int *err)
 {
@@ -812,6 +814,7 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
 	ret = NULL;
 	hlist_for_each_entry(pol, chain, bydst) {
 		if (pol->type == type &&
+		    pol->if_id == if_id &&
 		    (mark & pol->mark.m) == pol->mark.v &&
 		    !selector_cmp(sel, &pol->selector) &&
 		    xfrm_sec_ctx_match(ctx, pol->security)) {
@@ -837,8 +840,9 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
 }
 EXPORT_SYMBOL(xfrm_policy_bysel_ctx);
 
-struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8 type,
-				     int dir, u32 id, int delete, int *err)
+struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u32 if_id,
+				     u8 type, int dir, u32 id, int delete,
+				     int *err)
 {
 	struct xfrm_policy *pol, *ret;
 	struct hlist_head *chain;
@@ -853,6 +857,7 @@ struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8 type,
 	ret = NULL;
 	hlist_for_each_entry(pol, chain, byidx) {
 		if (pol->type == type && pol->index == id &&
+		    pol->if_id == if_id &&
 		    (mark & pol->mark.m) == pol->mark.v) {
 			xfrm_pol_hold(pol);
 			if (delete) {
@@ -1063,6 +1068,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
 	bool match;
 
 	if (pol->family != family ||
+	    pol->if_id != fl->flowi_xfrm.if_id ||
 	    (fl->flowi_mark & pol->mark.m) != pol->mark.v ||
 	    pol->type != type)
 		return ret;
@@ -1177,7 +1183,8 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir,
 
 		match = xfrm_selector_match(&pol->selector, fl, family);
 		if (match) {
-			if ((sk->sk_mark & pol->mark.m) != pol->mark.v) {
+			if ((sk->sk_mark & pol->mark.m) != pol->mark.v ||
+			    pol->if_id != fl->flowi_xfrm.if_id) {
 				pol = NULL;
 				goto out;
 			}
@@ -1305,6 +1312,7 @@ static struct xfrm_policy *clone_policy(const struct xfrm_policy *old, int dir)
 		newp->lft = old->lft;
 		newp->curlft = old->curlft;
 		newp->mark = old->mark;
+		newp->if_id = old->if_id;
 		newp->action = old->action;
 		newp->flags = old->flags;
 		newp->xfrm_nr = old->xfrm_nr;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8308281f3253..3803b6813fc5 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -941,6 +941,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	int error = 0;
 	struct xfrm_state *best = NULL;
 	u32 mark = pol->mark.v & pol->mark.m;
+	u32 if_id = fl->flowi_xfrm.if_id;
 	unsigned short encap_family = tmpl->encap_family;
 	unsigned int sequence;
 	struct km_event c;
@@ -955,6 +956,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		if (x->props.family == encap_family &&
 		    x->props.reqid == tmpl->reqid &&
 		    (mark & x->mark.m) == x->mark.v &&
+		    x->if_id == if_id &&
 		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
 		    xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
 		    tmpl->mode == x->props.mode &&
@@ -971,6 +973,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		if (x->props.family == encap_family &&
 		    x->props.reqid == tmpl->reqid &&
 		    (mark & x->mark.m) == x->mark.v &&
+		    x->if_id == if_id &&
 		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
 		    xfrm_addr_equal(&x->id.daddr, daddr, encap_family) &&
 		    tmpl->mode == x->props.mode &&
@@ -1010,6 +1013,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		 * to current session. */
 		xfrm_init_tempstate(x, fl, tmpl, daddr, saddr, family);
 		memcpy(&x->mark, &pol->mark, sizeof(x->mark));
+		x->if_id = if_id;
 
 		error = security_xfrm_state_alloc_acquire(x, pol->security, fl->flowi_secid);
 		if (error) {
@@ -1067,7 +1071,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 }
 
 struct xfrm_state *
-xfrm_stateonly_find(struct net *net, u32 mark,
+xfrm_stateonly_find(struct net *net, u32 mark, u32 if_id,
 		    xfrm_address_t *daddr, xfrm_address_t *saddr,
 		    unsigned short family, u8 mode, u8 proto, u32 reqid)
 {
@@ -1080,6 +1084,7 @@ xfrm_stateonly_find(struct net *net, u32 mark,
 		if (x->props.family == family &&
 		    x->props.reqid == reqid &&
 		    (mark & x->mark.m) == x->mark.v &&
+		    x->if_id == if_id &&
 		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
 		    xfrm_state_addr_check(x, daddr, saddr, family) &&
 		    mode == x->props.mode &&
@@ -1160,11 +1165,13 @@ static void __xfrm_state_bump_genids(struct xfrm_state *xnew)
 	struct xfrm_state *x;
 	unsigned int h;
 	u32 mark = xnew->mark.v & xnew->mark.m;
+	u32 if_id = xnew->if_id;
 
 	h = xfrm_dst_hash(net, &xnew->id.daddr, &xnew->props.saddr, reqid, family);
 	hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) {
 		if (x->props.family	== family &&
 		    x->props.reqid	== reqid &&
+		    x->if_id		== if_id &&
 		    (mark & x->mark.m) == x->mark.v &&
 		    xfrm_addr_equal(&x->id.daddr, &xnew->id.daddr, family) &&
 		    xfrm_addr_equal(&x->props.saddr, &xnew->props.saddr, family))
@@ -1187,7 +1194,7 @@ EXPORT_SYMBOL(xfrm_state_insert);
 static struct xfrm_state *__find_acq_core(struct net *net,
 					  const struct xfrm_mark *m,
 					  unsigned short family, u8 mode,
-					  u32 reqid, u8 proto,
+					  u32 reqid, u32 if_id, u8 proto,
 					  const xfrm_address_t *daddr,
 					  const xfrm_address_t *saddr,
 					  int create)
@@ -1242,6 +1249,7 @@ static struct xfrm_state *__find_acq_core(struct net *net,
 		x->props.family = family;
 		x->props.mode = mode;
 		x->props.reqid = reqid;
+		x->if_id = if_id;
 		x->mark.v = m->v;
 		x->mark.m = m->m;
 		x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
@@ -1296,7 +1304,7 @@ int xfrm_state_add(struct xfrm_state *x)
 
 	if (use_spi && !x1)
 		x1 = __find_acq_core(net, &x->mark, family, x->props.mode,
-				     x->props.reqid, x->id.proto,
+				     x->props.reqid, x->if_id, x->id.proto,
 				     &x->id.daddr, &x->props.saddr, 0);
 
 	__xfrm_state_bump_genids(x);
@@ -1395,6 +1403,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
 	x->props.flags = orig->props.flags;
 	x->props.extra_flags = orig->props.extra_flags;
 
+	x->if_id = orig->if_id;
 	x->tfcpad = orig->tfcpad;
 	x->replay_maxdiff = orig->replay_maxdiff;
 	x->replay_maxage = orig->replay_maxage;
@@ -1619,13 +1628,13 @@ EXPORT_SYMBOL(xfrm_state_lookup_byaddr);
 
 struct xfrm_state *
 xfrm_find_acq(struct net *net, const struct xfrm_mark *mark, u8 mode, u32 reqid,
-	      u8 proto, const xfrm_address_t *daddr,
+	      u32 if_id, u8 proto, const xfrm_address_t *daddr,
 	      const xfrm_address_t *saddr, int create, unsigned short family)
 {
 	struct xfrm_state *x;
 
 	spin_lock_bh(&net->xfrm.xfrm_state_lock);
-	x = __find_acq_core(net, mark, family, mode, reqid, proto, daddr, saddr, create);
+	x = __find_acq_core(net, mark, family, mode, reqid, if_id, proto, daddr, saddr, create);
 	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
 
 	return x;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 080035f056d9..08ec1ab8e36c 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -582,6 +582,9 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
 	if (attrs[XFRMA_OUTPUT_MARK])
 		x->props.output_mark = nla_get_u32(attrs[XFRMA_OUTPUT_MARK]);
 
+	if (attrs[XFRMA_IF_ID])
+		x->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
 	err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV]);
 	if (err)
 		goto error;
@@ -905,6 +908,11 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 		if (ret)
 			goto out;
 	}
+	if (x->if_id) {
+		ret = nla_put_u32(skb, XFRMA_IF_ID, x->if_id);
+		if (ret)
+			goto out;
+	}
 	if (x->security)
 		ret = copy_sec_ctx(x->security, skb);
 out:
@@ -1253,6 +1261,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 	int err;
 	u32 mark;
 	struct xfrm_mark m;
+	u32 if_id = 0;
 
 	p = nlmsg_data(nlh);
 	err = verify_spi_info(p->info.id.proto, p->min, p->max);
@@ -1265,6 +1274,10 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 	x = NULL;
 
 	mark = xfrm_mark_get(attrs, &m);
+
+	if (attrs[XFRMA_IF_ID])
+		if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
 	if (p->info.seq) {
 		x = xfrm_find_acq_byseq(net, mark, p->info.seq);
 		if (x && !xfrm_addr_equal(&x->id.daddr, daddr, family)) {
@@ -1275,7 +1288,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	if (!x)
 		x = xfrm_find_acq(net, &m, p->info.mode, p->info.reqid,
-				  p->info.id.proto, daddr,
+				  if_id, p->info.id.proto, daddr,
 				  &p->info.saddr, 1,
 				  family);
 	err = -ENOENT;
@@ -1563,6 +1576,9 @@ static struct xfrm_policy *xfrm_policy_construct(struct net *net, struct xfrm_us
 
 	xfrm_mark_get(attrs, &xp->mark);
 
+	if (attrs[XFRMA_IF_ID])
+		xp->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
 	return xp;
  error:
 	*errp = err;
@@ -1708,6 +1724,8 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
 		err = copy_to_user_policy_type(xp->type, skb);
 	if (!err)
 		err = xfrm_mark_put(skb, &xp->mark);
+	if (!err)
+		err = xfrm_if_id_put(skb, xp->if_id);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -1789,6 +1807,7 @@ static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 	int delete;
 	struct xfrm_mark m;
 	u32 mark = xfrm_mark_get(attrs, &m);
+	u32 if_id = 0;
 
 	p = nlmsg_data(nlh);
 	delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY;
@@ -1801,8 +1820,11 @@ static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err)
 		return err;
 
+	if (attrs[XFRMA_IF_ID])
+		if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
 	if (p->index)
-		xp = xfrm_policy_byid(net, mark, type, p->dir, p->index, delete, &err);
+		xp = xfrm_policy_byid(net, mark, if_id, type, p->dir, p->index, delete, &err);
 	else {
 		struct nlattr *rt = attrs[XFRMA_SEC_CTX];
 		struct xfrm_sec_ctx *ctx;
@@ -1819,7 +1841,7 @@ static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 			if (err)
 				return err;
 		}
-		xp = xfrm_policy_bysel_ctx(net, mark, type, p->dir, &p->sel,
+		xp = xfrm_policy_bysel_ctx(net, mark, if_id, type, p->dir, &p->sel,
 					   ctx, delete, &err);
 		security_xfrm_policy_free(ctx);
 	}
@@ -1942,6 +1964,10 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
 	if (err)
 		goto out_cancel;
 
+	err = xfrm_if_id_put(skb, x->if_id);
+	if (err)
+		goto out_cancel;
+
 	nlmsg_end(skb, nlh);
 	return 0;
 
@@ -2084,6 +2110,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	int err = -ENOENT;
 	struct xfrm_mark m;
 	u32 mark = xfrm_mark_get(attrs, &m);
+	u32 if_id = 0;
 
 	err = copy_from_user_policy_type(&type, attrs);
 	if (err)
@@ -2093,8 +2120,11 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err)
 		return err;
 
+	if (attrs[XFRMA_IF_ID])
+		if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
 	if (p->index)
-		xp = xfrm_policy_byid(net, mark, type, p->dir, p->index, 0, &err);
+		xp = xfrm_policy_byid(net, mark, if_id, type, p->dir, p->index, 0, &err);
 	else {
 		struct nlattr *rt = attrs[XFRMA_SEC_CTX];
 		struct xfrm_sec_ctx *ctx;
@@ -2111,7 +2141,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 			if (err)
 				return err;
 		}
-		xp = xfrm_policy_bysel_ctx(net, mark, type, p->dir,
+		xp = xfrm_policy_bysel_ctx(net, mark, if_id, type, p->dir,
 					   &p->sel, ctx, 0, &err);
 		security_xfrm_policy_free(ctx);
 	}
@@ -2494,6 +2524,7 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_ADDRESS_FILTER]	= { .len = sizeof(struct xfrm_address_filter) },
 	[XFRMA_OFFLOAD_DEV]	= { .len = sizeof(struct xfrm_user_offload) },
 	[XFRMA_OUTPUT_MARK]	= { .type = NLA_U32 },
+	[XFRMA_IF_ID]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
@@ -2625,6 +2656,10 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
 	if (err)
 		return err;
 
+	err = xfrm_if_id_put(skb, x->if_id);
+	if (err)
+		return err;
+
 	nlmsg_end(skb, nlh);
 	return 0;
 }
@@ -2721,6 +2756,8 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 		 l += nla_total_size(sizeof(x->xso));
 	if (x->props.output_mark)
 		l += nla_total_size(sizeof(x->props.output_mark));
+	if (x->if_id)
+		l += nla_total_size(sizeof(x->if_id));
 
 	/* Must count x->lastused as it may become non-zero behind our back. */
 	l += nla_total_size_64bit(sizeof(u64));
@@ -2850,6 +2887,8 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 		err = copy_to_user_policy_type(xp->type, skb);
 	if (!err)
 		err = xfrm_mark_put(skb, &xp->mark);
+	if (!err)
+		err = xfrm_if_id_put(skb, xp->if_id);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -2966,6 +3005,8 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
 		err = copy_to_user_policy_type(xp->type, skb);
 	if (!err)
 		err = xfrm_mark_put(skb, &xp->mark);
+	if (!err)
+		err = xfrm_if_id_put(skb, xp->if_id);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3047,6 +3088,8 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 		err = copy_to_user_policy_type(xp->type, skb);
 	if (!err)
 		err = xfrm_mark_put(skb, &xp->mark);
+	if (!err)
+		err = xfrm_if_id_put(skb, xp->if_id);
 	if (err)
 		goto out_free_skb;
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH RFC ipsec-next 3/3] xfrm: Add virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-04  6:09 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Eyal Birger, Antony Antony, Benedict Wong,
	Lorenzo Colitti, Shannon Nelson
In-Reply-To: <20180604060910.13896-1-steffen.klassert@secunet.com>

This patch adds support for virtual xfrm interfaces.
Packets that are routed through such an interface
are guaranteed to be IPsec transformed or dropped.

Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Co-developed-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
---
 include/net/xfrm.h           |  24 ++
 include/uapi/linux/if_link.h |  10 +
 net/xfrm/Kconfig             |   8 +
 net/xfrm/Makefile            |   1 +
 net/xfrm/xfrm_input.c        |   3 +
 net/xfrm/xfrm_interface.c    | 971 +++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_policy.c       |  43 ++
 7 files changed, 1060 insertions(+)
 create mode 100644 net/xfrm/xfrm_interface.c

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 26df8d6d507c..211b15c55a92 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -23,6 +23,7 @@
 #include <net/ipv6.h>
 #include <net/ip6_fib.h>
 #include <net/flow.h>
+#include <net/gro_cells.h>
 
 #include <linux/interrupt.h>
 
@@ -293,6 +294,13 @@ struct xfrm_replay {
 	int	(*overflow)(struct xfrm_state *x, struct sk_buff *skb);
 };
 
+struct xfrm_if_cb {
+	struct xfrm_if	*(*decode_session)(struct sk_buff *skb);
+};
+
+void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb);
+void xfrm_if_unregister_cb(void);
+
 struct net_device;
 struct xfrm_type;
 struct xfrm_dst;
@@ -1039,6 +1047,22 @@ static inline void xfrm_dst_destroy(struct xfrm_dst *xdst)
 
 void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev);
 
+struct xfrm_if_parms {
+	char name[IFNAMSIZ];	/* name of XFRM device */
+	int link;		/* ifindex of underlying L2 interface */
+	u32 if_id;		/* interface identifyer */
+};
+
+struct xfrm_if {
+	struct xfrm_if __rcu *next;	/* next interface in list */
+	struct net_device *dev;		/* virtual device associated with interface */
+	struct net_device *phydev;	/* physical device */
+	struct net *net;		/* netns for packet i/o */
+	struct xfrm_if_parms p;		/* interface parms */
+
+	struct gro_cells gro_cells;
+};
+
 struct xfrm_offload {
 	/* Output sequence number for replay protection on offloading. */
 	struct {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index cf01b6824244..bff0af507b32 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -459,6 +459,16 @@ enum {
 
 #define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1)
 
+/* XFRM section */
+enum {
+	IFLA_XFRM_UNSPEC,
+	IFLA_XFRM_LINK,
+	IFLA_XFRM_IF_ID,
+	__IFLA_XFRM_MAX
+};
+
+#define IFLA_XFRM_MAX (__IFLA_XFRM_MAX - 1)
+
 enum macsec_validation_type {
 	MACSEC_VALIDATE_DISABLED = 0,
 	MACSEC_VALIDATE_CHECK = 1,
diff --git a/net/xfrm/Kconfig b/net/xfrm/Kconfig
index 286ed25c1a69..53381888a7b3 100644
--- a/net/xfrm/Kconfig
+++ b/net/xfrm/Kconfig
@@ -25,6 +25,14 @@ config XFRM_USER
 
 	  If unsure, say Y.
 
+config XFRM_INTERFACE
+	tristate "Transformation virtual interface"
+	depends on XFRM && IPV6
+	---help---
+	  This provides a virtual interface to route IPsec traffic.
+
+	  If unsure, say N.
+
 config XFRM_SUB_POLICY
 	bool "Transformation sub policy support"
 	depends on XFRM
diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index 0bd2465a8c5a..fbc4552d17b8 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_XFRM_STATISTICS) += xfrm_proc.o
 obj-$(CONFIG_XFRM_ALGO) += xfrm_algo.o
 obj-$(CONFIG_XFRM_USER) += xfrm_user.o
 obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o
+obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 352abca2605f..b104724d1caa 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -320,6 +320,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	seq = 0;
 	if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0) {
+		secpath_reset(skb);
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
 		goto drop;
 	}
@@ -328,12 +329,14 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 				   XFRM_SPI_SKB_CB(skb)->daddroff);
 	do {
 		if (skb->sp->len == XFRM_MAX_DEPTH) {
+			secpath_reset(skb);
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
 			goto drop;
 		}
 
 		x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
 		if (x == NULL) {
+			secpath_reset(skb);
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
 			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
new file mode 100644
index 000000000000..57df8f087132
--- /dev/null
+++ b/net/xfrm/xfrm_interface.c
@@ -0,0 +1,971 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *	XFRM virtual interface
+ *
+ *	Copyright (C) 2018 secunet Security Networks AG
+ *
+ *	Author:
+ *	Steffen Klassert <steffen.klassert@secunet.com>
+ */
+
+#include <linux/module.h>
+#include <linux/capability.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/sockios.h>
+#include <linux/icmp.h>
+#include <linux/if.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/net.h>
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/if_link.h>
+#include <linux/if_arp.h>
+#include <linux/icmpv6.h>
+#include <linux/init.h>
+#include <linux/route.h>
+#include <linux/rtnetlink.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+
+#include <linux/uaccess.h>
+#include <linux/atomic.h>
+
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/addrconf.h>
+#include <net/xfrm.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <linux/etherdevice.h>
+
+static int xfrmi_dev_init(struct net_device *dev);
+static void xfrmi_dev_setup(struct net_device *dev);
+static struct rtnl_link_ops xfrmi_link_ops __read_mostly;
+static unsigned int xfrmi_net_id __read_mostly;
+
+struct xfrmi_net {
+	/* lists for storing interfaces in use */
+	struct xfrm_if __rcu *xfrmi[1];
+};
+
+#define for_each_xfrmi_rcu(start, xi) \
+	for (xi = rcu_dereference(start); xi; xi = rcu_dereference(xi->next))
+
+static struct xfrm_if *xfrmi_lookup(struct net *net, struct xfrm_state *x)
+{
+	struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+	struct xfrm_if *xi;
+
+	for_each_xfrmi_rcu(xfrmn->xfrmi[0], xi) {
+		if (x->if_id == xi->p.if_id &&
+		    (xi->dev->flags & IFF_UP))
+			return xi;
+	}
+
+	return NULL;
+}
+
+static struct xfrm_if *xfrmi_decode_session(struct sk_buff *skb)
+{
+	struct xfrmi_net *xfrmn;
+	int ifindex;
+	struct xfrm_if *xi;
+
+	if (!skb->dev)
+		return NULL;
+
+	xfrmn = net_generic(dev_net(skb->dev), xfrmi_net_id);
+	ifindex = skb->dev->ifindex;
+
+	for_each_xfrmi_rcu(xfrmn->xfrmi[0], xi) {
+		if (ifindex == xi->dev->ifindex &&
+			(xi->dev->flags & IFF_UP))
+				return xi;
+	}
+
+	return NULL;
+}
+
+static void xfrmi_link(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
+{
+	struct xfrm_if __rcu **xip = &xfrmn->xfrmi[0];
+
+	rcu_assign_pointer(xi->next , rtnl_dereference(*xip));
+	rcu_assign_pointer(*xip, xi);
+}
+
+static void xfrmi_unlink(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
+{
+	struct xfrm_if __rcu **xip;
+	struct xfrm_if *iter;
+
+	for (xip = &xfrmn->xfrmi[0];
+	     (iter = rtnl_dereference(*xip)) != NULL;
+	     xip = &iter->next) {
+		if (xi == iter) {
+			rcu_assign_pointer(*xip, xi->next);
+			break;
+		}
+	}
+}
+
+static void xfrmi_dev_free(struct net_device *dev)
+{
+	free_percpu(dev->tstats);
+}
+
+static int xfrmi_create2(struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct net *net = dev_net(dev);
+	struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+	int err;
+
+	dev->rtnl_link_ops = &xfrmi_link_ops;
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto out;
+
+	strcpy(xi->p.name, dev->name);
+
+	dev_hold(dev);
+	xfrmi_link(xfrmn, xi);
+
+	return 0;
+
+out:
+	return err;
+}
+
+static struct xfrm_if *xfrmi_create(struct net *net, struct xfrm_if_parms *p)
+{
+	struct net_device *dev;
+	struct xfrm_if *xi;
+	char name[IFNAMSIZ];
+	int err;
+
+	if (p->name[0])
+		strlcpy(name, p->name, IFNAMSIZ);
+	else
+		goto failed;
+
+	dev = alloc_netdev(sizeof(*xi), name, NET_NAME_UNKNOWN, xfrmi_dev_setup);
+	if (!dev)
+		goto failed;
+
+	dev_net_set(dev, net);
+
+	xi = netdev_priv(dev);
+	xi->p = *p;
+	xi->net = net;
+	xi->dev = dev;
+	xi->phydev = dev_get_by_index(net, p->link);
+	if (!xi->phydev)
+		goto failed_free;
+
+	err = xfrmi_create2(dev);
+	if (err < 0)
+		goto failed_dev_put;
+
+	return xi;
+
+failed_dev_put:
+	dev_put(xi->phydev);
+failed_free:
+	free_netdev(dev);
+failed:
+	return NULL;
+}
+
+static struct xfrm_if *xfrmi_locate(struct net *net, struct xfrm_if_parms *p,
+				   int create)
+{
+	struct xfrm_if __rcu **xip;
+	struct xfrm_if *xi;
+	struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+	for (xip = &xfrmn->xfrmi[0];
+	     (xi = rtnl_dereference(*xip)) != NULL;
+	     xip = &xi->next) {
+		if (xi->p.if_id == p->if_id) {
+			if (create)
+				return NULL;
+
+			return xi;
+		}
+	}
+	if (!create)
+		return NULL;
+	return xfrmi_create(net, p);
+}
+
+static void xfrmi_dev_uninit(struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct xfrmi_net *xfrmn = net_generic(xi->net, xfrmi_net_id);
+
+	xfrmi_unlink(xfrmn, xi);
+	dev_put(xi->phydev);
+	dev_put(dev);
+}
+
+static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet)
+{
+	skb->tstamp = 0;
+	skb->pkt_type = PACKET_HOST;
+	skb->skb_iif = 0;
+	skb->ignore_df = 0;
+	skb_dst_drop(skb);
+	nf_reset(skb);
+	nf_reset_trace(skb);
+
+	if (!xnet)
+		return;
+
+	ipvs_reset(skb);
+	secpath_reset(skb);
+	skb_orphan(skb);
+	skb->mark = 0;
+}
+
+static int xfrmi_rcv_cb(struct sk_buff *skb, int err)
+{
+	struct pcpu_sw_netstats *tstats;
+	struct xfrm_mode *inner_mode;
+	struct net_device *dev;
+	struct xfrm_state *x;
+	struct xfrm_if *xi;
+	bool xnet;
+
+	if (err && !skb->sp)
+		return 0;
+
+	x = xfrm_input_state(skb);
+
+	xi = xfrmi_lookup(xs_net(x), x);
+	if (!xi)
+		return 1;
+
+	dev = xi->dev;
+	skb->dev = dev;
+
+	if (err) {
+		dev->stats.rx_errors++;
+		dev->stats.rx_dropped++;
+
+		return 0;
+	}
+
+	xnet = !net_eq(xi->net, dev_net(skb->dev));
+
+	if (xnet) {
+		inner_mode = x->inner_mode;
+
+		if (x->sel.family == AF_UNSPEC) {
+			inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol);
+			if (inner_mode == NULL) {
+				XFRM_INC_STATS(dev_net(skb->dev),
+					       LINUX_MIB_XFRMINSTATEMODEERROR);
+				return -EINVAL;
+			}
+		}
+
+		if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb,
+				       inner_mode->afinfo->family))
+			return -EPERM;
+	}
+
+	xfrmi_scrub_packet(skb, xnet);
+
+	tstats = this_cpu_ptr(dev->tstats);
+
+	u64_stats_update_begin(&tstats->syncp);
+	tstats->rx_packets++;
+	tstats->rx_bytes += skb->len;
+	u64_stats_update_end(&tstats->syncp);
+
+	return 0;
+}
+
+static int
+xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct net_device_stats *stats = &xi->dev->stats;
+	struct dst_entry *dst = skb_dst(skb);
+	struct net_device *tdev;
+	struct xfrm_state *x;
+	int err = -1;
+	int mtu;
+
+	if (!dst)
+		goto tx_err_link_failure;
+
+	fl->flowi_xfrm.if_id = xi->p.if_id;
+
+	dst_hold(dst);
+	dst = xfrm_lookup(xi->net, dst, fl, NULL, 0);
+	if (IS_ERR(dst)) {
+		err = PTR_ERR(dst);
+		dst = NULL;
+		goto tx_err_link_failure;
+	}
+
+	x = dst->xfrm;
+	if (!x)
+		goto tx_err_link_failure;
+
+	if (x->if_id != xi->p.if_id)
+		goto tx_err_link_failure;
+
+	tdev = dst->dev;
+
+	if (tdev == dev) {
+		stats->collisions++;
+		net_warn_ratelimited("%s: Local routing loop detected!\n",
+				     xi->p.name);
+		goto tx_err_dst_release;
+	}
+
+	mtu = dst_mtu(dst);
+	if (!skb->ignore_df && skb->len > mtu) {
+		skb_dst_update_pmtu(skb, mtu);
+
+		if (skb->protocol == htons(ETH_P_IPV6)) {
+			if (mtu < IPV6_MIN_MTU)
+				mtu = IPV6_MIN_MTU;
+
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		} else {
+			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+				  htonl(mtu));
+		}
+
+		dst_release(dst);
+		return -EMSGSIZE;
+	}
+
+	xfrmi_scrub_packet(skb, !net_eq(xi->net, dev_net(dev)));
+	skb_dst_set(skb, dst);
+	skb->dev = tdev;
+
+	err = dst_output(xi->net, skb->sk, skb);
+	if (net_xmit_eval(err) == 0) {
+		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
+
+		u64_stats_update_begin(&tstats->syncp);
+		tstats->tx_bytes += skb->len;
+		tstats->tx_packets++;
+		u64_stats_update_end(&tstats->syncp);
+	} else {
+		stats->tx_errors++;
+		stats->tx_aborted_errors++;
+	}
+
+	return 0;
+tx_err_link_failure:
+	stats->tx_carrier_errors++;
+	dst_link_failure(skb);
+tx_err_dst_release:
+	dst_release(dst);
+	return err;
+}
+
+static netdev_tx_t xfrmi_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct net_device_stats *stats = &xi->dev->stats;
+	struct flowi fl;
+	int ret;
+
+	memset(&fl, 0, sizeof(fl));
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IPV6):
+		xfrm_decode_session(skb, &fl, AF_INET6);
+		memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
+		break;
+	case htons(ETH_P_IP):
+		xfrm_decode_session(skb, &fl, AF_INET);
+		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
+		break;
+	default:
+		goto tx_err;
+	}
+
+	fl.flowi_oif = xi->phydev->ifindex;
+
+	ret = xfrmi_xmit2(skb, dev, &fl);
+	if (ret < 0)
+		goto tx_err;
+
+	return NETDEV_TX_OK;
+
+tx_err:
+	stats->tx_errors++;
+	stats->tx_dropped++;
+	kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static int xfrmi4_err(struct sk_buff *skb, u32 info)
+{
+	const struct iphdr *iph = (const struct iphdr *)skb->data;
+	struct net *net = dev_net(skb->dev);
+	int protocol = iph->protocol;
+	struct ip_comp_hdr *ipch;
+	struct ip_esp_hdr *esph;
+	struct ip_auth_hdr *ah ;
+	struct xfrm_state *x;
+	struct xfrm_if *xi;
+	__be32 spi;
+
+	switch (protocol) {
+	case IPPROTO_ESP:
+		esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
+		spi = esph->spi;
+		break;
+	case IPPROTO_AH:
+		ah = (struct ip_auth_hdr *)(skb->data+(iph->ihl<<2));
+		spi = ah->spi;
+		break;
+	case IPPROTO_COMP:
+		ipch = (struct ip_comp_hdr *)(skb->data+(iph->ihl<<2));
+		spi = htonl(ntohs(ipch->cpi));
+		break;
+	default:
+		return 0;
+	}
+
+	switch (icmp_hdr(skb)->type) {
+	case ICMP_DEST_UNREACH:
+		if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+			return 0;
+	case ICMP_REDIRECT:
+		break;
+	default:
+		return 0;
+	}
+
+	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
+			      spi, protocol, AF_INET);
+	if (!x)
+		return 0;
+
+	xi = xfrmi_lookup(net, x);
+	if (!xi) {
+		xfrm_state_put(x);
+		return -1;
+	}
+
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+		ipv4_update_pmtu(skb, net, info, 0, 0, protocol, 0);
+	else
+		ipv4_redirect(skb, net, 0, 0, protocol, 0);
+	xfrm_state_put(x);
+
+	return 0;
+}
+
+static int xfrmi6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+		    u8 type, u8 code, int offset, __be32 info)
+{
+	const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data;
+	struct net *net = dev_net(skb->dev);
+	int protocol = iph->nexthdr;
+	struct ip_comp_hdr *ipch;
+	struct ip_esp_hdr *esph;
+	struct ip_auth_hdr *ah;
+	struct xfrm_state *x;
+	struct xfrm_if *xi;
+	__be32 spi;
+
+	switch (protocol) {
+	case IPPROTO_ESP:
+		esph = (struct ip_esp_hdr *)(skb->data + offset);
+		spi = esph->spi;
+		break;
+	case IPPROTO_AH:
+		ah = (struct ip_auth_hdr *)(skb->data + offset);
+		spi = ah->spi;
+		break;
+	case IPPROTO_COMP:
+		ipch = (struct ip_comp_hdr *)(skb->data + offset);
+		spi = htonl(ntohs(ipch->cpi));
+		break;
+	default:
+		return 0;
+	}
+
+	if (type != ICMPV6_PKT_TOOBIG &&
+	    type != NDISC_REDIRECT)
+		return 0;
+
+	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
+			      spi, protocol, AF_INET6);
+	if (!x)
+		return 0;
+
+	xi = xfrmi_lookup(net, x);
+	if (!xi) {
+		xfrm_state_put(x);
+		return -1;
+	}
+
+	if (type == NDISC_REDIRECT)
+		ip6_redirect(skb, net, skb->dev->ifindex, 0,
+			     sock_net_uid(net, NULL));
+	else
+		ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
+	xfrm_state_put(x);
+
+	return 0;
+}
+
+static int xfrmi_change(struct xfrm_if *xi, const struct xfrm_if_parms *p)
+{
+	if (xi->p.link != p->link)
+		return -EINVAL;
+
+	xi->p.if_id = p->if_id;
+
+	return 0;
+}
+
+static int xfrmi_update(struct xfrm_if *xi, struct xfrm_if_parms *p)
+{
+	struct net *net = dev_net(xi->dev);
+	struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+	int err;
+
+	xfrmi_unlink(xfrmn, xi);
+	synchronize_net();
+	err = xfrmi_change(xi, p);
+	xfrmi_link(xfrmn, xi);
+	netdev_state_change(xi->dev);
+	return err;
+}
+
+static void xfrmi_get_stats64(struct net_device *dev,
+			       struct rtnl_link_stats64 *s)
+{
+	int cpu;
+
+	if (!dev->tstats)
+		return;
+
+	for_each_possible_cpu(cpu) {
+		struct pcpu_sw_netstats *stats;
+		struct pcpu_sw_netstats tmp;
+		int start;
+
+		stats = per_cpu_ptr(dev->tstats, cpu);
+		do {
+			start = u64_stats_fetch_begin_irq(&stats->syncp);
+			tmp.rx_packets = stats->rx_packets;
+			tmp.rx_bytes   = stats->rx_bytes;
+			tmp.tx_packets = stats->tx_packets;
+			tmp.tx_bytes   = stats->tx_bytes;
+		} while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+		s->rx_packets += tmp.rx_packets;
+		s->rx_bytes   += tmp.rx_bytes;
+		s->tx_packets += tmp.tx_packets;
+		s->tx_bytes   += tmp.tx_bytes;
+	}
+
+	s->rx_dropped = dev->stats.rx_dropped;
+	s->tx_dropped = dev->stats.tx_dropped;
+}
+
+static int xfrmi_get_iflink(const struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+
+	return xi->phydev->ifindex;
+}
+
+
+static const struct net_device_ops xfrmi_netdev_ops = {
+	.ndo_init	= xfrmi_dev_init,
+	.ndo_uninit	= xfrmi_dev_uninit,
+	.ndo_start_xmit = xfrmi_xmit,
+	.ndo_get_stats64 = xfrmi_get_stats64,
+	.ndo_get_iflink = xfrmi_get_iflink,
+};
+
+static void xfrmi_dev_setup(struct net_device *dev)
+{
+	dev->netdev_ops 	= &xfrmi_netdev_ops;
+	dev->type		= ARPHRD_NONE;
+	dev->hard_header_len 	= ETH_HLEN;
+	dev->min_header_len	= ETH_HLEN;
+	dev->mtu		= ETH_DATA_LEN;
+	dev->min_mtu		= ETH_MIN_MTU;
+	dev->max_mtu		= ETH_DATA_LEN;
+	dev->addr_len		= ETH_ALEN;
+	dev->flags 		= IFF_NOARP;
+	dev->needs_free_netdev	= true;
+	dev->priv_destructor	= xfrmi_dev_free;
+	netif_keep_dst(dev);
+}
+
+static int xfrmi_dev_init(struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct net_device *phydev = xi->phydev;
+	int err;
+
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats)
+		return -ENOMEM;
+
+	err = gro_cells_init(&xi->gro_cells, dev);
+	if (err) {
+		free_percpu(dev->tstats);
+		return err;
+	}
+
+	dev->features |= NETIF_F_LLTX;
+
+	dev->needed_headroom = phydev->needed_headroom;
+	dev->needed_tailroom = phydev->needed_tailroom;
+
+	if (is_zero_ether_addr(dev->dev_addr))
+		eth_hw_addr_inherit(dev, phydev);
+	if (is_zero_ether_addr(dev->broadcast))
+		memcpy(dev->broadcast, phydev->broadcast, dev->addr_len);
+
+	return 0;
+}
+
+static int xfrmi_validate(struct nlattr *tb[], struct nlattr *data[],
+			 struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+static void xfrmi_netlink_parms(struct nlattr *data[],
+			       struct xfrm_if_parms *parms)
+{
+	memset(parms, 0, sizeof(*parms));
+
+	if (!data)
+		return;
+
+	if (data[IFLA_XFRM_LINK])
+		parms->link = nla_get_u32(data[IFLA_XFRM_LINK]);
+
+	if (data[IFLA_XFRM_IF_ID])
+		parms->if_id = nla_get_u32(data[IFLA_XFRM_IF_ID]);
+}
+
+static int xfrmi_newlink(struct net *src_net, struct net_device *dev,
+			struct nlattr *tb[], struct nlattr *data[],
+			struct netlink_ext_ack *extack)
+{
+	struct net *net = dev_net(dev);
+	struct xfrm_if_parms *p;
+	struct xfrm_if *xi;
+
+	xi = netdev_priv(dev);
+	p = &xi->p;
+
+	xfrmi_netlink_parms(data, p);
+
+	if (!tb[IFLA_IFNAME])
+		return -EINVAL;
+
+	nla_strlcpy(p->name, tb[IFLA_IFNAME], IFNAMSIZ);
+
+	if (!xfrmi_locate(net, p, 1))
+		return -EEXIST;
+
+	return 0;
+}
+
+static void xfrmi_dellink(struct net_device *dev, struct list_head *head)
+{
+	unregister_netdevice_queue(dev, head);
+}
+
+static int xfrmi_changelink(struct net_device *dev, struct nlattr *tb[],
+			   struct nlattr *data[],
+			   struct netlink_ext_ack *extack)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct net *net = dev_net(dev);
+
+	xfrmi_netlink_parms(data, &xi->p);
+
+	xi = xfrmi_locate(net, &xi->p, 0);
+
+	if (xi) {
+		if (xi->dev != dev)
+			return -EEXIST;
+	} else
+		xi = netdev_priv(dev);
+
+	return xfrmi_update(xi, &xi->p);
+}
+
+static size_t xfrmi_get_size(const struct net_device *dev)
+{
+	return
+		/* IFLA_XFRM_LINK */
+		nla_total_size(4) +
+		/* IFLA_XFRM_IF_ID */
+		nla_total_size(4) +
+		0;
+}
+
+static int xfrmi_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+	struct xfrm_if_parms *parm = &xi->p;
+
+	if (nla_put_u32(skb, IFLA_XFRM_LINK, parm->link) ||
+	    nla_put_u32(skb, IFLA_XFRM_IF_ID, parm->if_id))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+struct net *xfrmi_get_link_net(const struct net_device *dev)
+{
+	struct xfrm_if *xi = netdev_priv(dev);
+
+	return dev_net(xi->phydev);
+}
+
+static const struct nla_policy xfrmi_policy[IFLA_XFRM_MAX + 1] = {
+	[IFLA_XFRM_LINK]	= { .type = NLA_U32 },
+	[IFLA_XFRM_IF_ID]	= { .type = NLA_U32 },
+};
+
+static struct rtnl_link_ops xfrmi_link_ops __read_mostly = {
+	.kind		= "xfrm",
+	.maxtype	= IFLA_XFRM_MAX,
+	.policy		= xfrmi_policy,
+	.priv_size	= sizeof(struct xfrm_if),
+	.setup		= xfrmi_dev_setup,
+	.validate	= xfrmi_validate,
+	.newlink	= xfrmi_newlink,
+	.dellink	= xfrmi_dellink,
+	.changelink	= xfrmi_changelink,
+	.get_size	= xfrmi_get_size,
+	.fill_info	= xfrmi_fill_info,
+	.get_link_net	= xfrmi_get_link_net,
+};
+
+static void __net_exit xfrmi_destroy_interfaces(struct xfrmi_net *xfrmn)
+{
+	struct xfrm_if *xi;
+	LIST_HEAD(list);
+
+	xi = rtnl_dereference(xfrmn->xfrmi[0]);
+	if (!xi)
+		return;
+
+	unregister_netdevice_queue(xi->dev, &list);
+	unregister_netdevice_many(&list);
+}
+
+static int __net_init xfrmi_init_net(struct net *net)
+{
+	return 0;
+}
+
+static void __net_exit xfrmi_exit_net(struct net *net)
+{
+	struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+	rtnl_lock();
+	xfrmi_destroy_interfaces(xfrmn);
+	rtnl_unlock();
+}
+
+static struct pernet_operations xfrmi_net_ops = {
+	.init = xfrmi_init_net,
+	.exit = xfrmi_exit_net,
+	.id   = &xfrmi_net_id,
+	.size = sizeof(struct xfrmi_net),
+};
+
+static struct xfrm6_protocol xfrmi_esp6_protocol __read_mostly = {
+	.handler	=	xfrm6_rcv,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi6_err,
+	.priority	=	10,
+};
+
+static struct xfrm6_protocol xfrmi_ah6_protocol __read_mostly = {
+	.handler	=	xfrm6_rcv,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi6_err,
+	.priority	=	10,
+};
+
+static struct xfrm6_protocol xfrmi_ipcomp6_protocol __read_mostly = {
+	.handler	=	xfrm6_rcv,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi6_err,
+	.priority	=	10,
+};
+
+static struct xfrm4_protocol xfrmi_esp4_protocol __read_mostly = {
+	.handler	=	xfrm4_rcv,
+	.input_handler	=	xfrm_input,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi4_err,
+	.priority	=	10,
+};
+
+static struct xfrm4_protocol xfrmi_ah4_protocol __read_mostly = {
+	.handler	=	xfrm4_rcv,
+	.input_handler	=	xfrm_input,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi4_err,
+	.priority	=	10,
+};
+
+static struct xfrm4_protocol xfrmi_ipcomp4_protocol __read_mostly = {
+	.handler	=	xfrm4_rcv,
+	.input_handler	=	xfrm_input,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi4_err,
+	.priority	=	10,
+};
+
+static int __init xfrmi4_init(void)
+{
+	int err;
+
+	err = xfrm4_protocol_register(&xfrmi_esp4_protocol, IPPROTO_ESP);
+	if (err < 0)
+		goto xfrm_proto_esp_failed;
+	err = xfrm4_protocol_register(&xfrmi_ah4_protocol, IPPROTO_AH);
+	if (err < 0)
+		goto xfrm_proto_ah_failed;
+	err = xfrm4_protocol_register(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
+	if (err < 0)
+		goto xfrm_proto_comp_failed;
+
+	return 0;
+
+xfrm_proto_comp_failed:
+	xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
+xfrm_proto_ah_failed:
+	xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
+xfrm_proto_esp_failed:
+	return err;
+}
+
+static void xfrmi4_fini(void)
+{
+	xfrm4_protocol_deregister(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
+	xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
+	xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
+}
+
+static int __init xfrmi6_init(void)
+{
+	int err;
+
+	err = xfrm6_protocol_register(&xfrmi_esp6_protocol, IPPROTO_ESP);
+	if (err < 0)
+		goto xfrm_proto_esp_failed;
+	err = xfrm6_protocol_register(&xfrmi_ah6_protocol, IPPROTO_AH);
+	if (err < 0)
+		goto xfrm_proto_ah_failed;
+	err = xfrm6_protocol_register(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
+	if (err < 0)
+		goto xfrm_proto_comp_failed;
+
+	return 0;
+
+xfrm_proto_comp_failed:
+	xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
+xfrm_proto_ah_failed:
+	xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
+xfrm_proto_esp_failed:
+	return err;
+}
+
+static void xfrmi6_fini(void)
+{
+	xfrm6_protocol_deregister(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
+	xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
+	xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
+}
+
+static const struct xfrm_if_cb xfrm_if_cb = {
+	.decode_session =	xfrmi_decode_session,
+};
+
+static int __init xfrmi_init(void)
+{
+	const char *msg;
+	int err;
+
+	pr_info("IPsec XFRM device driver\n");
+
+	msg = "tunnel device";
+	err = register_pernet_device(&xfrmi_net_ops);
+	if (err < 0)
+		goto pernet_dev_failed;
+
+	msg = "xfrm4 protocols";
+	err = xfrmi4_init();
+	if (err < 0)
+		goto xfrmi4_failed;
+
+	msg = "xfrm6 protocols";
+	err = xfrmi6_init();
+	if (err < 0)
+		goto xfrmi6_failed;
+
+
+	msg = "netlink interface";
+	err = rtnl_link_register(&xfrmi_link_ops);
+	if (err < 0)
+		goto rtnl_link_failed;
+
+	xfrm_if_register_cb(&xfrm_if_cb);
+
+	return err;
+
+rtnl_link_failed:
+	xfrmi6_fini();
+xfrmi6_failed:
+	xfrmi4_fini();
+xfrmi4_failed:
+	unregister_pernet_device(&xfrmi_net_ops);
+pernet_dev_failed:
+	pr_err("xfrmi init: failed to register %s\n", msg);
+	return err;
+}
+
+static void __exit xfrmi_fini(void)
+{
+	xfrm_if_unregister_cb();
+	rtnl_link_unregister(&xfrmi_link_ops);
+	xfrmi4_fini();
+	xfrmi6_fini();
+	unregister_pernet_device(&xfrmi_net_ops);
+}
+
+module_init(xfrmi_init);
+module_exit(xfrmi_fini);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_RTNL_LINK("xfrm");
+MODULE_ALIAS_NETDEV("xfrm0");
+MODULE_AUTHOR("Steffen Klassert");
+MODULE_DESCRIPTION("XFRM virtual interface");
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 08b745aade43..47f776840df1 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -47,6 +47,9 @@ struct xfrm_flo {
 
 static DEFINE_PER_CPU(struct xfrm_dst *, xfrm_last_dst);
 static struct work_struct *xfrm_pcpu_work __read_mostly;
+static DEFINE_SPINLOCK(xfrm_if_cb_lock);
+static struct xfrm_if_cb const __rcu *xfrm_if_cb __read_mostly;
+
 static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock);
 static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
@@ -119,6 +122,12 @@ static const struct xfrm_policy_afinfo *xfrm_policy_get_afinfo(unsigned short fa
 	return afinfo;
 }
 
+/* Called with rcu_read_lock(). */
+static const struct xfrm_if_cb *xfrm_if_get_cb(void)
+{
+	return rcu_dereference(xfrm_if_cb);
+}
+
 struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif,
 				    const xfrm_address_t *saddr,
 				    const xfrm_address_t *daddr,
@@ -2083,6 +2092,11 @@ xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
 
 	if (IS_ERR(xdst)) {
 		err = PTR_ERR(xdst);
+		if (err == -EREMOTE) {
+			xfrm_pols_put(pols, num_pols);
+			return NULL;
+		}
+
 		if (err != -EAGAIN)
 			goto error;
 		goto make_dummy_bundle;
@@ -2176,6 +2190,9 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
 			if (IS_ERR(xdst)) {
 				xfrm_pols_put(pols, num_pols);
 				err = PTR_ERR(xdst);
+				if (err == -EREMOTE)
+					goto nopol;
+
 				goto dropdst;
 			} else if (xdst == NULL) {
 				num_xfrms = 0;
@@ -2368,12 +2385,20 @@ int __xfrm_decode_session(struct sk_buff *skb, struct flowi *fl,
 			  unsigned int family, int reverse)
 {
 	const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family);
+	const struct xfrm_if_cb *ifcb = xfrm_if_get_cb();
+	struct xfrm_if *xi;
 	int err;
 
 	if (unlikely(afinfo == NULL))
 		return -EAFNOSUPPORT;
 
 	afinfo->decode_session(skb, fl, reverse);
+	if (ifcb) {
+		xi = ifcb->decode_session(skb);
+		if (xi)
+			fl->flowi_xfrm.if_id = xi->p.if_id;
+	}
+
 	err = security_xfrm_decode_session(skb, &fl->flowi_secid);
 	rcu_read_unlock();
 	return err;
@@ -2828,6 +2853,21 @@ void xfrm_policy_unregister_afinfo(const struct xfrm_policy_afinfo *afinfo)
 }
 EXPORT_SYMBOL(xfrm_policy_unregister_afinfo);
 
+void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb)
+{
+	spin_lock(&xfrm_if_cb_lock);
+	rcu_assign_pointer(xfrm_if_cb, ifcb);
+	spin_unlock(&xfrm_if_cb_lock);
+}
+EXPORT_SYMBOL(xfrm_if_register_cb);
+
+void xfrm_if_unregister_cb(void)
+{
+	RCU_INIT_POINTER(xfrm_if_cb, NULL);
+	synchronize_rcu();
+}
+EXPORT_SYMBOL(xfrm_if_unregister_cb);
+
 #ifdef CONFIG_XFRM_STATISTICS
 static int __net_init xfrm_statistics_init(struct net *net)
 {
@@ -3008,6 +3048,9 @@ void __init xfrm_init(void)
 	xfrm_dev_init();
 	seqcount_init(&xfrm_policy_hash_generation);
 	xfrm_input_init();
+
+	RCU_INIT_POINTER(xfrm_if_cb, NULL);
+	synchronize_rcu();
 }
 
 #ifdef CONFIG_AUDITSYSCALL
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Michal Hocko @ 2018-06-04  6:27 UTC (permalink / raw)
  To: Qing Huang
  Cc: Eric Dumazet, David Miller, tariqt, haakon.bugge, yanjun.zhu,
	netdev, linux-rdma, linux-kernel, gi-oh.kim,
	santosh.shilimkar@oracle.com, Vlastimil Babka
In-Reply-To: <f0bfa913-1837-6c5d-6e77-2a188fd094e1@oracle.com>

On Fri 01-06-18 15:05:26, Qing Huang wrote:
> 
> 
> On 6/1/2018 12:31 AM, Michal Hocko wrote:
> > On Thu 31-05-18 19:04:46, Qing Huang wrote:
> > > 
> > > On 5/31/2018 2:10 AM, Michal Hocko wrote:
> > > > On Thu 31-05-18 10:55:32, Michal Hocko wrote:
> > > > > On Thu 31-05-18 04:35:31, Eric Dumazet wrote:
> > > > [...]
> > > > > > I merely copied/pasted from alloc_skb_with_frags() :/
> > > > > I will have a look at it. Thanks!
> > > > OK, so this is an example of an incremental development ;).
> > > > 
> > > > __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
> > > > high order allocations") to prevent from OOM killer. Yet this was
> > > > not enough because fb05e7a89f50 ("net: don't wait for order-3 page
> > > > allocation") didn't want an excessive reclaim for non-costly orders
> > > > so it made it completely NOWAIT while it preserved __GFP_NORETRY in
> > > > place which is now redundant. Should I send a patch?
> > > > 
> > > Just curious, how about GFP_ATOMIC flag? Would it work in a similar fashion?
> > > We experimented
> > > with it a bit in the past but it seemed to cause other issue in our tests.
> > > :-)
> > GFP_ATOMIC is a non-sleeping (aka no reclaim) context with an access to
> > memory reserves. So the risk is that you deplete those reserves and
> > cause issues to other subsystems which need them as well.
> > 
> > > By the way, we didn't encounter any OOM killer events. It seemed that the
> > > mlx4_alloc_icm() triggered slowpath.
> > > We still had about 2GB free memory while it was highly fragmented.
> > The compaction was able to make a reasonable forward progress for you.
> > But considering mlx4_alloc_icm is called with GFP_KERNEL resp. GFP_HIGHUSER
> > then the OOM killer is clearly possible as long as the order is lower
> > than 4.
> 
> The allocation was 256KB so the order was much higher than 4. The compaction
> seemed to be the root
> cause for our problem. It took too long to finish its work while putting
> mlx4_alloc_icm to sleep in a heavily
> fragmented memory situation . Will NORETRY flag avoid the compaction ops and
> fail the 256KB allocation
> immediately so mlx4_alloc_icm can enter adjustable lower order allocation
> code path quickly?

Costly orders should only perform a light compaction attempt unless
__GFP_RETRY_MAY_FAIL is used IIRC. CCing Vlastimil. So __GFP_NORETRY
shouldn't make any difference.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* [PATCH] net/dns_resolver: dns_query Modify parameter checking to avoid dead code
From: nixiaoming @ 2018-06-04  6:40 UTC (permalink / raw)
  To: davem, dhowells, manuel.schoelling, wang840925
  Cc: nixiaoming, linux-kernel, netdev

After commit 1a4240f4764a ("DNS: Separate out CIFS DNS Resolver code")
a dead code exists in function dns_query

code show as below:
	if (!name || namelen == 0)
		return -EINVAL;
	/*Now the value of "namelen" cannot be equal to 0*/
	....
	if (!namelen) /*The condition "!namelen"" cannot be true*/
		namelen = strnlen(name, 256); /*deadcode*/

Modify parameter checking to avoid dead code

Signed-off-by: nixiaoming <nixiaoming@huawei.com>
---
 net/dns_resolver/dns_query.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/dns_resolver/dns_query.c b/net/dns_resolver/dns_query.c
index 49da670..f2acee2 100644
--- a/net/dns_resolver/dns_query.c
+++ b/net/dns_resolver/dns_query.c
@@ -81,7 +81,9 @@ int dns_query(const char *type, const char *name, size_t namelen,
 	kenter("%s,%*.*s,%zu,%s",
 	       type, (int)namelen, (int)namelen, name, namelen, options);
 
-	if (!name || namelen == 0)
+	if (!name || namelen < 3 || namelen > 255)
+		return -EINVAL;
+	if (namelen > strnlen(name, 256)) /*maybe only need part of name*/
 		return -EINVAL;
 
 	/* construct the query key description as "[<type>:]<name>" */
@@ -94,10 +96,6 @@ int dns_query(const char *type, const char *name, size_t namelen,
 		desclen += typelen + 1;
 	}
 
-	if (!namelen)
-		namelen = strnlen(name, 256);
-	if (namelen < 3 || namelen > 255)
-		return -EINVAL;
 	desclen += namelen + 1;
 
 	desc = kmalloc(desclen, GFP_KERNEL);
-- 
2.10.1

^ permalink raw reply related

* [PATCH 00/10] Support DPAA PTP clock and timestamping
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu

This patchset is to support DPAA FMAN PTP clock and HW timestamping.
- The patch #1 to patch #5 are to support DPAA FMAN 1588 timer in
  ptp_qoriq driver.
- The patch #6 to patch #10 are to add HW timestamping support in
  DPAA ethernet driver.

Yangbo Lu (10):
  fsl/fman: share the event interrupt
  ptp: support DPAA FMan 1588 timer in ptp_qoriq
  dt-binding: ptp_qoriq: add DPAA FMan support
  powerpc/mpc85xx: move ptp timer out of fman in dts
  arm64: dts: fsl: move ptp timer out of fman
  fsl/fman: add set_tstamp interface
  fsl/fman_port: support getting timestamp field
  fsl/fman: define frame description command UPD
  dpaa_eth: add support for hardware timestamping
  dpaa_eth: add the get_ts_info interface for ethtool

 Documentation/devicetree/bindings/net/fsl-fman.txt |   25 +----
 .../devicetree/bindings/ptp/ptp-qoriq.txt          |   15 ++-
 arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi   |   14 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi        |   14 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi        |   14 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi       |   14 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi       |   14 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi      |   14 ++-
 drivers/net/ethernet/freescale/dpaa/Kconfig        |   12 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     |  119 +++++++++++++++++++-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |    5 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |   44 +++++++
 drivers/net/ethernet/freescale/fman/fman.c         |    3 +-
 drivers/net/ethernet/freescale/fman/fman.h         |    1 +
 drivers/net/ethernet/freescale/fman/fman_dtsec.c   |   27 +++++
 drivers/net/ethernet/freescale/fman/fman_dtsec.h   |    1 +
 drivers/net/ethernet/freescale/fman/fman_memac.c   |    5 +
 drivers/net/ethernet/freescale/fman/fman_memac.h   |    1 +
 drivers/net/ethernet/freescale/fman/fman_port.c    |   12 ++
 drivers/net/ethernet/freescale/fman/fman_port.h    |    3 +
 drivers/net/ethernet/freescale/fman/fman_tgec.c    |   21 ++++
 drivers/net/ethernet/freescale/fman/fman_tgec.h    |    1 +
 drivers/net/ethernet/freescale/fman/mac.c          |    3 +
 drivers/net/ethernet/freescale/fman/mac.h          |    1 +
 drivers/ptp/Kconfig                                |    2 +-
 drivers/ptp/ptp_qoriq.c                            |  104 +++++++++++------
 include/linux/fsl/ptp_qoriq.h                      |   38 +++++-
 27 files changed, 414 insertions(+), 113 deletions(-)

^ permalink raw reply

* [PATCH 01/10] fsl/fman: share the event interrupt
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-kernel, linux-arm-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to share fman event interrupt because
the 1588 timer driver will also use this interrupt.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index 9530405..c415ac6 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -2801,7 +2801,8 @@ static irqreturn_t fman_irq(int irq, void *handle)
 	of_node_put(muram_node);
 	of_node_put(fm_node);
 
-	err = devm_request_irq(&of_dev->dev, irq, fman_irq, 0, "fman", fman);
+	err = devm_request_irq(&of_dev->dev, irq, fman_irq, IRQF_SHARED,
+			       "fman", fman);
 	if (err < 0) {
 		dev_err(&of_dev->dev, "%s: irq %d allocation failed (error = %d)\n",
 			__func__, irq, err);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 02/10] ptp: support DPAA FMan 1588 timer in ptp_qoriq
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to support DPAA (Data Path Acceleration Architecture)
1588 timer by adding "fsl,fman-ptp-timer" compatible, sharing
interrupt with FMan, adding FSL_DPAA_ETH dependency, and fixing
up register offset.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/ptp/Kconfig           |    2 +-
 drivers/ptp/ptp_qoriq.c       |  104 ++++++++++++++++++++++++++---------------
 include/linux/fsl/ptp_qoriq.h |   38 ++++++++++++---
 3 files changed, 98 insertions(+), 46 deletions(-)

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 474c988..d137c48 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -43,7 +43,7 @@ config PTP_1588_CLOCK_DTE
 
 config PTP_1588_CLOCK_QORIQ
 	tristate "Freescale QorIQ 1588 timer as PTP clock"
-	depends on GIANFAR
+	depends on GIANFAR || FSL_DPAA_ETH
 	depends on PTP_1588_CLOCK
 	default y
 	help
diff --git a/drivers/ptp/ptp_qoriq.c b/drivers/ptp/ptp_qoriq.c
index 1468a16..c4e3545 100644
--- a/drivers/ptp/ptp_qoriq.c
+++ b/drivers/ptp/ptp_qoriq.c
@@ -39,11 +39,12 @@
 /* Caller must hold qoriq_ptp->lock. */
 static u64 tmr_cnt_read(struct qoriq_ptp *qoriq_ptp)
 {
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 	u64 ns;
 	u32 lo, hi;
 
-	lo = qoriq_read(&qoriq_ptp->regs->tmr_cnt_l);
-	hi = qoriq_read(&qoriq_ptp->regs->tmr_cnt_h);
+	lo = qoriq_read(&regs->ctrl_regs->tmr_cnt_l);
+	hi = qoriq_read(&regs->ctrl_regs->tmr_cnt_h);
 	ns = ((u64) hi) << 32;
 	ns |= lo;
 	return ns;
@@ -52,16 +53,18 @@ static u64 tmr_cnt_read(struct qoriq_ptp *qoriq_ptp)
 /* Caller must hold qoriq_ptp->lock. */
 static void tmr_cnt_write(struct qoriq_ptp *qoriq_ptp, u64 ns)
 {
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 	u32 hi = ns >> 32;
 	u32 lo = ns & 0xffffffff;
 
-	qoriq_write(&qoriq_ptp->regs->tmr_cnt_l, lo);
-	qoriq_write(&qoriq_ptp->regs->tmr_cnt_h, hi);
+	qoriq_write(&regs->ctrl_regs->tmr_cnt_l, lo);
+	qoriq_write(&regs->ctrl_regs->tmr_cnt_h, hi);
 }
 
 /* Caller must hold qoriq_ptp->lock. */
 static void set_alarm(struct qoriq_ptp *qoriq_ptp)
 {
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 	u64 ns;
 	u32 lo, hi;
 
@@ -70,16 +73,18 @@ static void set_alarm(struct qoriq_ptp *qoriq_ptp)
 	ns -= qoriq_ptp->tclk_period;
 	hi = ns >> 32;
 	lo = ns & 0xffffffff;
-	qoriq_write(&qoriq_ptp->regs->tmr_alarm1_l, lo);
-	qoriq_write(&qoriq_ptp->regs->tmr_alarm1_h, hi);
+	qoriq_write(&regs->alarm_regs->tmr_alarm1_l, lo);
+	qoriq_write(&regs->alarm_regs->tmr_alarm1_h, hi);
 }
 
 /* Caller must hold qoriq_ptp->lock. */
 static void set_fipers(struct qoriq_ptp *qoriq_ptp)
 {
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
+
 	set_alarm(qoriq_ptp);
-	qoriq_write(&qoriq_ptp->regs->tmr_fiper1, qoriq_ptp->tmr_fiper1);
-	qoriq_write(&qoriq_ptp->regs->tmr_fiper2, qoriq_ptp->tmr_fiper2);
+	qoriq_write(&regs->fiper_regs->tmr_fiper1, qoriq_ptp->tmr_fiper1);
+	qoriq_write(&regs->fiper_regs->tmr_fiper2, qoriq_ptp->tmr_fiper2);
 }
 
 /*
@@ -89,16 +94,17 @@ static void set_fipers(struct qoriq_ptp *qoriq_ptp)
 static irqreturn_t isr(int irq, void *priv)
 {
 	struct qoriq_ptp *qoriq_ptp = priv;
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 	struct ptp_clock_event event;
 	u64 ns;
 	u32 ack = 0, lo, hi, mask, val;
 
-	val = qoriq_read(&qoriq_ptp->regs->tmr_tevent);
+	val = qoriq_read(&regs->ctrl_regs->tmr_tevent);
 
 	if (val & ETS1) {
 		ack |= ETS1;
-		hi = qoriq_read(&qoriq_ptp->regs->tmr_etts1_h);
-		lo = qoriq_read(&qoriq_ptp->regs->tmr_etts1_l);
+		hi = qoriq_read(&regs->etts_regs->tmr_etts1_h);
+		lo = qoriq_read(&regs->etts_regs->tmr_etts1_l);
 		event.type = PTP_CLOCK_EXTTS;
 		event.index = 0;
 		event.timestamp = ((u64) hi) << 32;
@@ -108,8 +114,8 @@ static irqreturn_t isr(int irq, void *priv)
 
 	if (val & ETS2) {
 		ack |= ETS2;
-		hi = qoriq_read(&qoriq_ptp->regs->tmr_etts2_h);
-		lo = qoriq_read(&qoriq_ptp->regs->tmr_etts2_l);
+		hi = qoriq_read(&regs->etts_regs->tmr_etts2_h);
+		lo = qoriq_read(&regs->etts_regs->tmr_etts2_l);
 		event.type = PTP_CLOCK_EXTTS;
 		event.index = 1;
 		event.timestamp = ((u64) hi) << 32;
@@ -130,16 +136,16 @@ static irqreturn_t isr(int irq, void *priv)
 			hi = ns >> 32;
 			lo = ns & 0xffffffff;
 			spin_lock(&qoriq_ptp->lock);
-			qoriq_write(&qoriq_ptp->regs->tmr_alarm2_l, lo);
-			qoriq_write(&qoriq_ptp->regs->tmr_alarm2_h, hi);
+			qoriq_write(&regs->alarm_regs->tmr_alarm2_l, lo);
+			qoriq_write(&regs->alarm_regs->tmr_alarm2_h, hi);
 			spin_unlock(&qoriq_ptp->lock);
 			qoriq_ptp->alarm_value = ns;
 		} else {
-			qoriq_write(&qoriq_ptp->regs->tmr_tevent, ALM2);
+			qoriq_write(&regs->ctrl_regs->tmr_tevent, ALM2);
 			spin_lock(&qoriq_ptp->lock);
-			mask = qoriq_read(&qoriq_ptp->regs->tmr_temask);
+			mask = qoriq_read(&regs->ctrl_regs->tmr_temask);
 			mask &= ~ALM2EN;
-			qoriq_write(&qoriq_ptp->regs->tmr_temask, mask);
+			qoriq_write(&regs->ctrl_regs->tmr_temask, mask);
 			spin_unlock(&qoriq_ptp->lock);
 			qoriq_ptp->alarm_value = 0;
 			qoriq_ptp->alarm_interval = 0;
@@ -153,7 +159,7 @@ static irqreturn_t isr(int irq, void *priv)
 	}
 
 	if (ack) {
-		qoriq_write(&qoriq_ptp->regs->tmr_tevent, ack);
+		qoriq_write(&regs->ctrl_regs->tmr_tevent, ack);
 		return IRQ_HANDLED;
 	} else
 		return IRQ_NONE;
@@ -169,6 +175,7 @@ static int ptp_qoriq_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 	u32 tmr_add;
 	int neg_adj = 0;
 	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 
 	if (scaled_ppm < 0) {
 		neg_adj = 1;
@@ -186,7 +193,7 @@ static int ptp_qoriq_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 
 	tmr_add = neg_adj ? tmr_add - diff : tmr_add + diff;
 
-	qoriq_write(&qoriq_ptp->regs->tmr_add, tmr_add);
+	qoriq_write(&regs->ctrl_regs->tmr_add, tmr_add);
 
 	return 0;
 }
@@ -250,6 +257,7 @@ static int ptp_qoriq_enable(struct ptp_clock_info *ptp,
 			      struct ptp_clock_request *rq, int on)
 {
 	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 	unsigned long flags;
 	u32 bit, mask;
 
@@ -266,23 +274,23 @@ static int ptp_qoriq_enable(struct ptp_clock_info *ptp,
 			return -EINVAL;
 		}
 		spin_lock_irqsave(&qoriq_ptp->lock, flags);
-		mask = qoriq_read(&qoriq_ptp->regs->tmr_temask);
+		mask = qoriq_read(&regs->ctrl_regs->tmr_temask);
 		if (on)
 			mask |= bit;
 		else
 			mask &= ~bit;
-		qoriq_write(&qoriq_ptp->regs->tmr_temask, mask);
+		qoriq_write(&regs->ctrl_regs->tmr_temask, mask);
 		spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 		return 0;
 
 	case PTP_CLK_REQ_PPS:
 		spin_lock_irqsave(&qoriq_ptp->lock, flags);
-		mask = qoriq_read(&qoriq_ptp->regs->tmr_temask);
+		mask = qoriq_read(&regs->ctrl_regs->tmr_temask);
 		if (on)
 			mask |= PP1EN;
 		else
 			mask &= ~PP1EN;
-		qoriq_write(&qoriq_ptp->regs->tmr_temask, mask);
+		qoriq_write(&regs->ctrl_regs->tmr_temask, mask);
 		spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 		return 0;
 
@@ -313,10 +321,12 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 {
 	struct device_node *node = dev->dev.of_node;
 	struct qoriq_ptp *qoriq_ptp;
+	struct qoriq_ptp_registers *regs;
 	struct timespec64 now;
 	int err = -ENOMEM;
 	u32 tmr_ctrl;
 	unsigned long flags;
+	void __iomem *base;
 
 	qoriq_ptp = kzalloc(sizeof(*qoriq_ptp), GFP_KERNEL);
 	if (!qoriq_ptp)
@@ -351,7 +361,7 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 		pr_err("irq not in device tree\n");
 		goto no_node;
 	}
-	if (request_irq(qoriq_ptp->irq, isr, 0, DRIVER, qoriq_ptp)) {
+	if (request_irq(qoriq_ptp->irq, isr, IRQF_SHARED, DRIVER, qoriq_ptp)) {
 		pr_err("request_irq failed\n");
 		goto no_node;
 	}
@@ -368,12 +378,27 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 
 	spin_lock_init(&qoriq_ptp->lock);
 
-	qoriq_ptp->regs = ioremap(qoriq_ptp->rsrc->start,
-				resource_size(qoriq_ptp->rsrc));
-	if (!qoriq_ptp->regs) {
+	base = ioremap(qoriq_ptp->rsrc->start,
+		       resource_size(qoriq_ptp->rsrc));
+	if (!base) {
 		pr_err("ioremap ptp registers failed\n");
 		goto no_ioremap;
 	}
+
+	qoriq_ptp->base = base;
+
+	if (of_device_is_compatible(node, "fsl,fman-ptp-timer")) {
+		qoriq_ptp->regs.ctrl_regs = base + FMAN_CTRL_REGS_OFFSET;
+		qoriq_ptp->regs.alarm_regs = base + FMAN_ALARM_REGS_OFFSET;
+		qoriq_ptp->regs.fiper_regs = base + FMAN_FIPER_REGS_OFFSET;
+		qoriq_ptp->regs.etts_regs = base + FMAN_ETTS_REGS_OFFSET;
+	} else {
+		qoriq_ptp->regs.ctrl_regs = base + CTRL_REGS_OFFSET;
+		qoriq_ptp->regs.alarm_regs = base + ALARM_REGS_OFFSET;
+		qoriq_ptp->regs.fiper_regs = base + FIPER_REGS_OFFSET;
+		qoriq_ptp->regs.etts_regs = base + ETTS_REGS_OFFSET;
+	}
+
 	getnstimeofday64(&now);
 	ptp_qoriq_settime(&qoriq_ptp->caps, &now);
 
@@ -383,13 +408,14 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 
 	spin_lock_irqsave(&qoriq_ptp->lock, flags);
 
-	qoriq_write(&qoriq_ptp->regs->tmr_ctrl,   tmr_ctrl);
-	qoriq_write(&qoriq_ptp->regs->tmr_add,    qoriq_ptp->tmr_add);
-	qoriq_write(&qoriq_ptp->regs->tmr_prsc,   qoriq_ptp->tmr_prsc);
-	qoriq_write(&qoriq_ptp->regs->tmr_fiper1, qoriq_ptp->tmr_fiper1);
-	qoriq_write(&qoriq_ptp->regs->tmr_fiper2, qoriq_ptp->tmr_fiper2);
+	regs = &qoriq_ptp->regs;
+	qoriq_write(&regs->ctrl_regs->tmr_ctrl,   tmr_ctrl);
+	qoriq_write(&regs->ctrl_regs->tmr_add,    qoriq_ptp->tmr_add);
+	qoriq_write(&regs->ctrl_regs->tmr_prsc,   qoriq_ptp->tmr_prsc);
+	qoriq_write(&regs->fiper_regs->tmr_fiper1, qoriq_ptp->tmr_fiper1);
+	qoriq_write(&regs->fiper_regs->tmr_fiper2, qoriq_ptp->tmr_fiper2);
 	set_alarm(qoriq_ptp);
-	qoriq_write(&qoriq_ptp->regs->tmr_ctrl,   tmr_ctrl|FIPERST|RTPE|TE|FRD);
+	qoriq_write(&regs->ctrl_regs->tmr_ctrl,   tmr_ctrl|FIPERST|RTPE|TE|FRD);
 
 	spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 
@@ -405,7 +431,7 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 	return 0;
 
 no_clock:
-	iounmap(qoriq_ptp->regs);
+	iounmap(qoriq_ptp->base);
 no_ioremap:
 	release_resource(qoriq_ptp->rsrc);
 no_resource:
@@ -419,12 +445,13 @@ static int qoriq_ptp_probe(struct platform_device *dev)
 static int qoriq_ptp_remove(struct platform_device *dev)
 {
 	struct qoriq_ptp *qoriq_ptp = platform_get_drvdata(dev);
+	struct qoriq_ptp_registers *regs = &qoriq_ptp->regs;
 
-	qoriq_write(&qoriq_ptp->regs->tmr_temask, 0);
-	qoriq_write(&qoriq_ptp->regs->tmr_ctrl,   0);
+	qoriq_write(&regs->ctrl_regs->tmr_temask, 0);
+	qoriq_write(&regs->ctrl_regs->tmr_ctrl,   0);
 
 	ptp_clock_unregister(qoriq_ptp->clock);
-	iounmap(qoriq_ptp->regs);
+	iounmap(qoriq_ptp->base);
 	release_resource(qoriq_ptp->rsrc);
 	free_irq(qoriq_ptp->irq, qoriq_ptp);
 	kfree(qoriq_ptp);
@@ -434,6 +461,7 @@ static int qoriq_ptp_remove(struct platform_device *dev)
 
 static const struct of_device_id match_table[] = {
 	{ .compatible = "fsl,etsec-ptp" },
+	{ .compatible = "fsl,fman-ptp-timer" },
 	{},
 };
 MODULE_DEVICE_TABLE(of, match_table);
diff --git a/include/linux/fsl/ptp_qoriq.h b/include/linux/fsl/ptp_qoriq.h
index b462d9e..dc3dac4 100644
--- a/include/linux/fsl/ptp_qoriq.h
+++ b/include/linux/fsl/ptp_qoriq.h
@@ -11,9 +11,8 @@
 
 /*
  * qoriq ptp registers
- * Generated by regen.tcl on Thu May 13 01:38:57 PM CEST 2010
  */
-struct qoriq_ptp_registers {
+struct ctrl_regs {
 	u32 tmr_ctrl;     /* Timer control register */
 	u32 tmr_tevent;   /* Timestamp event register */
 	u32 tmr_temask;   /* Timer event mask register */
@@ -28,22 +27,47 @@ struct qoriq_ptp_registers {
 	u8  res1[4];
 	u32 tmroff_h;     /* Timer offset high */
 	u32 tmroff_l;     /* Timer offset low */
-	u8  res2[8];
+};
+
+struct alarm_regs {
 	u32 tmr_alarm1_h; /* Timer alarm 1 high register */
 	u32 tmr_alarm1_l; /* Timer alarm 1 high register */
 	u32 tmr_alarm2_h; /* Timer alarm 2 high register */
 	u32 tmr_alarm2_l; /* Timer alarm 2 high register */
-	u8  res3[48];
+};
+
+struct fiper_regs {
 	u32 tmr_fiper1;   /* Timer fixed period interval */
 	u32 tmr_fiper2;   /* Timer fixed period interval */
 	u32 tmr_fiper3;   /* Timer fixed period interval */
-	u8  res4[20];
+};
+
+struct etts_regs {
 	u32 tmr_etts1_h;  /* Timestamp of general purpose external trigger */
 	u32 tmr_etts1_l;  /* Timestamp of general purpose external trigger */
 	u32 tmr_etts2_h;  /* Timestamp of general purpose external trigger */
 	u32 tmr_etts2_l;  /* Timestamp of general purpose external trigger */
 };
 
+struct qoriq_ptp_registers {
+	struct ctrl_regs __iomem *ctrl_regs;
+	struct alarm_regs __iomem *alarm_regs;
+	struct fiper_regs __iomem *fiper_regs;
+	struct etts_regs __iomem *etts_regs;
+};
+
+/* Offset definitions for the four register groups */
+#define CTRL_REGS_OFFSET	0x0
+#define ALARM_REGS_OFFSET	0x40
+#define FIPER_REGS_OFFSET	0x80
+#define ETTS_REGS_OFFSET	0xa0
+
+#define FMAN_CTRL_REGS_OFFSET	0x80
+#define FMAN_ALARM_REGS_OFFSET	0xb8
+#define FMAN_FIPER_REGS_OFFSET	0xd0
+#define FMAN_ETTS_REGS_OFFSET	0xe0
+
+
 /* Bit definitions for the TMR_CTRL register */
 #define ALM1P                 (1<<31) /* Alarm1 output polarity */
 #define ALM2P                 (1<<30) /* Alarm2 output polarity */
@@ -105,10 +129,10 @@ struct qoriq_ptp_registers {
 #define DRIVER		"ptp_qoriq"
 #define DEFAULT_CKSEL	1
 #define N_EXT_TS	2
-#define REG_SIZE	sizeof(struct qoriq_ptp_registers)
 
 struct qoriq_ptp {
-	struct qoriq_ptp_registers __iomem *regs;
+	void __iomem *base;
+	struct qoriq_ptp_registers regs;
 	spinlock_t lock; /* protects regs */
 	struct ptp_clock *clock;
 	struct ptp_clock_info caps;
-- 
1.7.1

^ permalink raw reply related

* [PATCH 03/10] dt-binding: ptp_qoriq: add DPAA FMan support
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to add bindings description for DPAA
FMan 1588 timer, and also remove its description in
fsl-fman dt-bindings document.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 Documentation/devicetree/bindings/net/fsl-fman.txt |   25 +-------------------
 .../devicetree/bindings/ptp/ptp-qoriq.txt          |   15 +++++++++--
 2 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/fsl-fman.txt b/Documentation/devicetree/bindings/net/fsl-fman.txt
index df873d1..74603dd 100644
--- a/Documentation/devicetree/bindings/net/fsl-fman.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fman.txt
@@ -356,30 +356,7 @@ ethernet@e0000 {
 ============================================================================
 FMan IEEE 1588 Node
 
-DESCRIPTION
-
-The FMan interface to support IEEE 1588
-
-
-PROPERTIES
-
-- compatible
-		Usage: required
-		Value type: <stringlist>
-		Definition: A standard property.
-		Must include "fsl,fman-ptp-timer".
-
-- reg
-		Usage: required
-		Value type: <prop-encoded-array>
-		Definition: A standard property.
-
-EXAMPLE
-
-ptp-timer@fe000 {
-	compatible = "fsl,fman-ptp-timer";
-	reg = <0xfe000 0x1000>;
-};
+Refer to Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
 
 =============================================================================
 FMan MDIO Node
diff --git a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
index 0f569d8..c5d0e79 100644
--- a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
+++ b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
@@ -2,7 +2,8 @@
 
 General Properties:
 
-  - compatible   Should be "fsl,etsec-ptp"
+  - compatible   Should be "fsl,etsec-ptp" for eTSEC
+                 Should be "fsl,fman-ptp-timer" for DPAA FMan
   - reg          Offset and length of the register set for the device
   - interrupts   There should be at least two interrupts. Some devices
                  have as many as four PTP related interrupts.
@@ -43,14 +44,22 @@ Clock Properties:
   value, which will be directly written in those bits, that is why,
   according to reference manual, the next clock sources can be used:
 
+  For eTSEC,
   <0> - external high precision timer reference clock (TSEC_TMR_CLK
         input is used for this purpose);
   <1> - eTSEC system clock;
   <2> - eTSEC1 transmit clock;
   <3> - RTC clock input.
 
-  When this attribute is not used, eTSEC system clock will serve as
-  IEEE 1588 timer reference clock.
+  For DPAA FMan,
+  <0> - external high precision timer reference clock (TMR_1588_CLK)
+  <1> - MAC system clock (1/2 FMan clock)
+  <2> - reserved
+  <3> - RTC clock oscillator
+
+  When this attribute is not used, the IEEE 1588 timer reference clock
+  will use the eTSEC system clock (for Gianfar) or the MAC system
+  clock (for DPAA).
 
 Example:
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 04/10] powerpc/mpc85xx: move ptp timer out of fman in dts
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to move ptp timer node out of fman.
Because ptp timer will be probed by ptp_qoriq driver,
it should be an independent device in case of conflict
memory mapping.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi   |   14 ++++++++------
 arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi   |   14 ++++++++------
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi  |   14 ++++++++------
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi  |   14 ++++++++------
 arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi |   14 ++++++++------
 5 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
index abd01d4..6b124f7 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-0.dtsi
@@ -37,12 +37,13 @@ fman0: fman@400000 {
 	#size-cells = <1>;
 	cell-index = <0>;
 	compatible = "fsl,fman";
-	ranges = <0 0x400000 0x100000>;
-	reg = <0x400000 0x100000>;
+	ranges = <0 0x400000 0xfe000>;
+	reg = <0x400000 0xfe000>;
 	interrupts = <96 2 0 0>, <16 2 1 1>;
 	clocks = <&clockgen 3 0>;
 	clock-names = "fmanclk";
 	fsl,qman-channel-range = <0x40 0xc>;
+	ptimer-handle = <&ptp_timer0>;
 
 	muram@0 {
 		compatible = "fsl,fman-muram";
@@ -93,9 +94,10 @@ fman0: fman@400000 {
 		reg = <0x87000 0x1000>;
 		status = "disabled";
 	};
+};
 
-	ptp_timer0: ptp-timer@fe000 {
-		compatible = "fsl,fman-ptp-timer";
-		reg = <0xfe000 0x1000>;
-	};
+ptp_timer0: ptp-timer@4fe000 {
+	compatible = "fsl,fman-ptp-timer";
+	reg = <0x4fe000 0x1000>;
+	interrupts = <96 2 0 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
index debea75..b80aaf5 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman-1.dtsi
@@ -37,12 +37,13 @@ fman1: fman@500000 {
 	#size-cells = <1>;
 	cell-index = <1>;
 	compatible = "fsl,fman";
-	ranges = <0 0x500000 0x100000>;
-	reg = <0x500000 0x100000>;
+	ranges = <0 0x500000 0xfe000>;
+	reg = <0x500000 0xfe000>;
 	interrupts = <97 2 0 0>, <16 2 1 0>;
 	clocks = <&clockgen 3 1>;
 	clock-names = "fmanclk";
 	fsl,qman-channel-range = <0x60 0xc>;
+	ptimer-handle = <&ptp_timer1>;
 
 	muram@0 {
 		compatible = "fsl,fman-muram";
@@ -93,9 +94,10 @@ fman1: fman@500000 {
 		reg = <0x87000 0x1000>;
 		status = "disabled";
 	};
+};
 
-	ptp_timer1: ptp-timer@fe000 {
-		compatible = "fsl,fman-ptp-timer";
-		reg = <0xfe000 0x1000>;
-	};
+ptp_timer1: ptp-timer@5fe000 {
+	compatible = "fsl,fman-ptp-timer";
+	reg = <0x5fe000 0x1000>;
+	interrupts = <97 2 0 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
index 3a20e0d..d3720fd 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0.dtsi
@@ -37,12 +37,13 @@ fman0: fman@400000 {
 	#size-cells = <1>;
 	cell-index = <0>;
 	compatible = "fsl,fman";
-	ranges = <0 0x400000 0x100000>;
-	reg = <0x400000 0x100000>;
+	ranges = <0 0x400000 0xfe000>;
+	reg = <0x400000 0xfe000>;
 	interrupts = <96 2 0 0>, <16 2 1 1>;
 	clocks = <&clockgen 3 0>;
 	clock-names = "fmanclk";
 	fsl,qman-channel-range = <0x800 0x10>;
+	ptimer-handle = <&ptp_timer0>;
 
 	muram@0 {
 		compatible = "fsl,fman-muram";
@@ -98,9 +99,10 @@ fman0: fman@400000 {
 		compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
 		reg = <0xfd000 0x1000>;
 	};
+};
 
-	ptp_timer0: ptp-timer@fe000 {
-		compatible = "fsl,fman-ptp-timer";
-		reg = <0xfe000 0x1000>;
-	};
+ptp_timer0: ptp-timer@4fe000 {
+	compatible = "fsl,fman-ptp-timer";
+	reg = <0x4fe000 0x1000>;
+	interrupts = <96 2 0 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
index 82750ac..ae34c20 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-1.dtsi
@@ -37,12 +37,13 @@ fman1: fman@500000 {
 	#size-cells = <1>;
 	cell-index = <1>;
 	compatible = "fsl,fman";
-	ranges = <0 0x500000 0x100000>;
-	reg = <0x500000 0x100000>;
+	ranges = <0 0x500000 0xfe000>;
+	reg = <0x500000 0xfe000>;
 	interrupts = <97 2 0 0>, <16 2 1 0>;
 	clocks = <&clockgen 3 1>;
 	clock-names = "fmanclk";
 	fsl,qman-channel-range = <0x820 0x10>;
+	ptimer-handle = <&ptp_timer1>;
 
 	muram@0 {
 		compatible = "fsl,fman-muram";
@@ -98,9 +99,10 @@ fman1: fman@500000 {
 		compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
 		reg = <0xfd000 0x1000>;
 	};
+};
 
-	ptp_timer1: ptp-timer@fe000 {
-		compatible = "fsl,fman-ptp-timer";
-		reg = <0xfe000 0x1000>;
-	};
+ptp_timer1: ptp-timer@5fe000 {
+	compatible = "fsl,fman-ptp-timer";
+	reg = <0x5fe000 0x1000>;
+	interrupts = <97 2 0 0>;
 };
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
index 7f60b60..02f2755 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3l-0.dtsi
@@ -37,12 +37,13 @@ fman0: fman@400000 {
 	#size-cells = <1>;
 	cell-index = <0>;
 	compatible = "fsl,fman";
-	ranges = <0 0x400000 0x100000>;
-	reg = <0x400000 0x100000>;
+	ranges = <0 0x400000 0xfe000>;
+	reg = <0x400000 0xfe000>;
 	interrupts = <96 2 0 0>, <16 2 1 1>;
 	clocks = <&clockgen 3 0>;
 	clock-names = "fmanclk";
 	fsl,qman-channel-range = <0x800 0x10>;
+	ptimer-handle = <&ptp_timer0>;
 
 	muram@0 {
 		compatible = "fsl,fman-muram";
@@ -86,9 +87,10 @@ fman0: fman@400000 {
 		compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
 		reg = <0xfd000 0x1000>;
 	};
+};
 
-	ptp_timer0: ptp-timer@fe000 {
-		compatible = "fsl,fman-ptp-timer";
-		reg = <0xfe000 0x1000>;
-	};
+ptp_timer0: ptp-timer@4fe000 {
+	compatible = "fsl,fman-ptp-timer";
+	reg = <0x4fe000 0x1000>;
+	interrupts = <96 2 0 0>;
 };
-- 
1.7.1

^ permalink raw reply related

* [PATCH 05/10] arm64: dts: fsl: move ptp timer out of fman
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to move ptp timer node out of fman.
Because ptp timer will be probed by ptp_qoriq driver,
it should be an independent device in case of conflict
memory mapping.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
index 4dd0676..e745c0b 100644
--- a/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
+++ b/arch/arm64/boot/dts/freescale/qoriq-fman3-0.dtsi
@@ -11,13 +11,14 @@ fman0: fman@1a00000 {
 	#size-cells = <1>;
 	cell-index = <0>;
 	compatible = "fsl,fman";
-	ranges = <0x0 0x0 0x1a00000 0x100000>;
-	reg = <0x0 0x1a00000 0x0 0x100000>;
+	ranges = <0x0 0x0 0x1a00000 0xfe000>;
+	reg = <0x0 0x1a00000 0x0 0xfe000>;
 	interrupts = <GIC_SPI 44 IRQ_TYPE_LEVEL_HIGH>,
 		     <GIC_SPI 45 IRQ_TYPE_LEVEL_HIGH>;
 	clocks = <&clockgen 3 0>;
 	clock-names = "fmanclk";
 	fsl,qman-channel-range = <0x800 0x10>;
+	ptimer-handle = <&ptp_timer0>;
 
 	muram@0 {
 		compatible = "fsl,fman-muram";
@@ -73,9 +74,10 @@ fman0: fman@1a00000 {
 		compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
 		reg = <0xfd000 0x1000>;
 	};
+};
 
-	ptp_timer0: ptp-timer@fe000 {
-		compatible = "fsl,fman-ptp-timer";
-		reg = <0xfe000 0x1000>;
-	};
+ptp_timer0: ptp-timer@1afe000 {
+	compatible = "fsl,fman-ptp-timer";
+	reg = <0x1afe000 0x1000>;
+	interrupts = <GIC_SPI 44 IRQ_TYPE_LEVEL_HIGH>;
 };
-- 
1.7.1

^ permalink raw reply related

* [PATCH 06/10] fsl/fman: add set_tstamp interface
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to add set_tstamp interface for memac,
dtsec, and 10GEC controllers to configure HW timestamping.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman_dtsec.c |   27 ++++++++++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_dtsec.h |    1 +
 drivers/net/ethernet/freescale/fman/fman_memac.c |    5 ++++
 drivers/net/ethernet/freescale/fman/fman_memac.h |    1 +
 drivers/net/ethernet/freescale/fman/fman_tgec.c  |   21 +++++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_tgec.h  |    1 +
 drivers/net/ethernet/freescale/fman/mac.c        |    3 ++
 drivers/net/ethernet/freescale/fman/mac.h        |    1 +
 8 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman_dtsec.c b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
index 57b1e2b..1ca543a 100644
--- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
@@ -123,11 +123,13 @@
 #define DTSEC_ECNTRL_R100M		0x00000008
 #define DTSEC_ECNTRL_QSGMIIM		0x00000001
 
+#define TCTRL_TTSE			0x00000040
 #define TCTRL_GTS			0x00000020
 
 #define RCTRL_PAL_MASK			0x001f0000
 #define RCTRL_PAL_SHIFT			16
 #define RCTRL_GHTX			0x00000400
+#define RCTRL_RTSE			0x00000040
 #define RCTRL_GRS			0x00000020
 #define RCTRL_MPROM			0x00000008
 #define RCTRL_RSF			0x00000004
@@ -1136,6 +1138,31 @@ int dtsec_set_allmulti(struct fman_mac *dtsec, bool enable)
 	return 0;
 }
 
+int dtsec_set_tstamp(struct fman_mac *dtsec, bool enable)
+{
+	struct dtsec_regs __iomem *regs = dtsec->regs;
+	u32 rctrl, tctrl;
+
+	if (!is_init_done(dtsec->dtsec_drv_param))
+		return -EINVAL;
+
+	rctrl = ioread32be(&regs->rctrl);
+	tctrl = ioread32be(&regs->tctrl);
+
+	if (enable) {
+		rctrl |= RCTRL_RTSE;
+		tctrl |= TCTRL_TTSE;
+	} else {
+		rctrl &= ~RCTRL_RTSE;
+		tctrl &= ~TCTRL_TTSE;
+	}
+
+	iowrite32be(rctrl, &regs->rctrl);
+	iowrite32be(tctrl, &regs->tctrl);
+
+	return 0;
+}
+
 int dtsec_del_hash_mac_address(struct fman_mac *dtsec, enet_addr_t *eth_addr)
 {
 	struct dtsec_regs __iomem *regs = dtsec->regs;
diff --git a/drivers/net/ethernet/freescale/fman/fman_dtsec.h b/drivers/net/ethernet/freescale/fman/fman_dtsec.h
index 1a689ad..5149d96 100644
--- a/drivers/net/ethernet/freescale/fman/fman_dtsec.h
+++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.h
@@ -56,5 +56,6 @@ int dtsec_set_exception(struct fman_mac *dtsec,
 int dtsec_del_hash_mac_address(struct fman_mac *dtsec, enet_addr_t *eth_addr);
 int dtsec_get_version(struct fman_mac *dtsec, u32 *mac_version);
 int dtsec_set_allmulti(struct fman_mac *dtsec, bool enable);
+int dtsec_set_tstamp(struct fman_mac *dtsec, bool enable);
 
 #endif /* __DTSEC_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c b/drivers/net/ethernet/freescale/fman/fman_memac.c
index 446a97b..bc6eb30 100644
--- a/drivers/net/ethernet/freescale/fman/fman_memac.c
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
@@ -964,6 +964,11 @@ int memac_set_allmulti(struct fman_mac *memac, bool enable)
 	return 0;
 }
 
+int memac_set_tstamp(struct fman_mac *memac, bool enable)
+{
+	return 0; /* Always enabled. */
+}
+
 int memac_del_hash_mac_address(struct fman_mac *memac, enet_addr_t *eth_addr)
 {
 	struct memac_regs __iomem *regs = memac->regs;
diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.h b/drivers/net/ethernet/freescale/fman/fman_memac.h
index b5a5033..b2c671e 100644
--- a/drivers/net/ethernet/freescale/fman/fman_memac.h
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.h
@@ -58,5 +58,6 @@ int memac_set_exception(struct fman_mac *memac,
 int memac_add_hash_mac_address(struct fman_mac *memac, enet_addr_t *eth_addr);
 int memac_del_hash_mac_address(struct fman_mac *memac, enet_addr_t *eth_addr);
 int memac_set_allmulti(struct fman_mac *memac, bool enable);
+int memac_set_tstamp(struct fman_mac *memac, bool enable);
 
 #endif /* __MEMAC_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_tgec.c b/drivers/net/ethernet/freescale/fman/fman_tgec.c
index 284735d..4070593 100644
--- a/drivers/net/ethernet/freescale/fman/fman_tgec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_tgec.c
@@ -44,6 +44,7 @@
 #define TGEC_TX_IPG_LENGTH_MASK	0x000003ff
 
 /* Command and Configuration Register (COMMAND_CONFIG) */
+#define CMD_CFG_EN_TIMESTAMP		0x00100000
 #define CMD_CFG_NO_LEN_CHK		0x00020000
 #define CMD_CFG_PAUSE_IGNORE		0x00000100
 #define CMF_CFG_CRC_FWD			0x00000040
@@ -588,6 +589,26 @@ int tgec_set_allmulti(struct fman_mac *tgec, bool enable)
 	return 0;
 }
 
+int tgec_set_tstamp(struct fman_mac *tgec, bool enable)
+{
+	struct tgec_regs __iomem *regs = tgec->regs;
+	u32 tmp;
+
+	if (!is_init_done(tgec->cfg))
+		return -EINVAL;
+
+	tmp = ioread32be(&regs->command_config);
+
+	if (enable)
+		tmp |= CMD_CFG_EN_TIMESTAMP;
+	else
+		tmp &= ~CMD_CFG_EN_TIMESTAMP;
+
+	iowrite32be(tmp, &regs->command_config);
+
+	return 0;
+}
+
 int tgec_del_hash_mac_address(struct fman_mac *tgec, enet_addr_t *eth_addr)
 {
 	struct tgec_regs __iomem *regs = tgec->regs;
diff --git a/drivers/net/ethernet/freescale/fman/fman_tgec.h b/drivers/net/ethernet/freescale/fman/fman_tgec.h
index cbbd3b4..3bfd106 100644
--- a/drivers/net/ethernet/freescale/fman/fman_tgec.h
+++ b/drivers/net/ethernet/freescale/fman/fman_tgec.h
@@ -52,5 +52,6 @@ int tgec_set_exception(struct fman_mac *tgec,
 int tgec_del_hash_mac_address(struct fman_mac *tgec, enet_addr_t *eth_addr);
 int tgec_get_version(struct fman_mac *tgec, u32 *mac_version);
 int tgec_set_allmulti(struct fman_mac *tgec, bool enable);
+int tgec_set_tstamp(struct fman_mac *tgec, bool enable);
 
 #endif /* __TGEC_H */
diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c
index 7b5b95f..a847b9c 100644
--- a/drivers/net/ethernet/freescale/fman/mac.c
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -471,6 +471,7 @@ static void setup_dtsec(struct mac_device *mac_dev)
 	mac_dev->set_rx_pause		= dtsec_accept_rx_pause_frames;
 	mac_dev->set_exception		= dtsec_set_exception;
 	mac_dev->set_allmulti		= dtsec_set_allmulti;
+	mac_dev->set_tstamp		= dtsec_set_tstamp;
 	mac_dev->set_multi		= set_multi;
 	mac_dev->start			= start;
 	mac_dev->stop			= stop;
@@ -490,6 +491,7 @@ static void setup_tgec(struct mac_device *mac_dev)
 	mac_dev->set_rx_pause		= tgec_accept_rx_pause_frames;
 	mac_dev->set_exception		= tgec_set_exception;
 	mac_dev->set_allmulti		= tgec_set_allmulti;
+	mac_dev->set_tstamp		= tgec_set_tstamp;
 	mac_dev->set_multi		= set_multi;
 	mac_dev->start			= start;
 	mac_dev->stop			= stop;
@@ -509,6 +511,7 @@ static void setup_memac(struct mac_device *mac_dev)
 	mac_dev->set_rx_pause		= memac_accept_rx_pause_frames;
 	mac_dev->set_exception		= memac_set_exception;
 	mac_dev->set_allmulti		= memac_set_allmulti;
+	mac_dev->set_tstamp		= memac_set_tstamp;
 	mac_dev->set_multi		= set_multi;
 	mac_dev->start			= start;
 	mac_dev->stop			= stop;
diff --git a/drivers/net/ethernet/freescale/fman/mac.h b/drivers/net/ethernet/freescale/fman/mac.h
index b520cec..824a81a 100644
--- a/drivers/net/ethernet/freescale/fman/mac.h
+++ b/drivers/net/ethernet/freescale/fman/mac.h
@@ -68,6 +68,7 @@ struct mac_device {
 	int (*set_promisc)(struct fman_mac *mac_dev, bool enable);
 	int (*change_addr)(struct fman_mac *mac_dev, enet_addr_t *enet_addr);
 	int (*set_allmulti)(struct fman_mac *mac_dev, bool enable);
+	int (*set_tstamp)(struct fman_mac *mac_dev, bool enable);
 	int (*set_multi)(struct net_device *net_dev,
 			 struct mac_device *mac_dev);
 	int (*set_rx_pause)(struct fman_mac *mac_dev, bool en);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 07/10] fsl/fman_port: support getting timestamp field
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

This patch is to add fman_port_get_tstamp_field() interface
to get timestamp field data.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman_port.c |   12 ++++++++++++
 drivers/net/ethernet/freescale/fman/fman_port.h |    3 +++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index ce6e24c..86f0094 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1731,6 +1731,18 @@ int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset)
 }
 EXPORT_SYMBOL(fman_port_get_hash_result_offset);
 
+int fman_port_get_tstamp_field(struct fman_port *port, const void *data,
+			       u64 *tstamp)
+{
+	if (port->buffer_offsets.time_stamp_offset == ILLEGAL_BASE)
+		return -EINVAL;
+
+	*tstamp = *(u64 *)(data + port->buffer_offsets.time_stamp_offset);
+
+	return 0;
+}
+EXPORT_SYMBOL(fman_port_get_tstamp_field);
+
 static int fman_port_probe(struct platform_device *of_dev)
 {
 	struct fman_port *port;
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index e86ca6a..d10e48d 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -153,6 +153,9 @@ int fman_port_cfg_buf_prefix_content(struct fman_port *port,
 
 int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset);
 
+int fman_port_get_tstamp_field(struct fman_port *port, const void *data,
+			       u64 *tstamp);
+
 struct fman_port *fman_port_bind(struct device *dev);
 
 #endif /* __FMAN_PORT_H */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 08/10] fsl/fman: define frame description command UPD
From: Yangbo Lu @ 2018-06-04  7:08 UTC (permalink / raw)
  To: netdev, madalin.bucur, Richard Cochran, Rob Herring, Shawn Guo,
	David S . Miller
  Cc: devicetree, linuxppc-dev, linux-arm-kernel, linux-kernel,
	Yangbo Lu
In-Reply-To: <20180604070837.19265-1-yangbo.lu@nxp.com>

Defined frame description command FM_FD_CMD_UPD for
prepended data updating.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h
index bfa02e0..935c317 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -41,6 +41,7 @@
 /* Frame queue Context Override */
 #define FM_FD_CMD_FCO                   0x80000000
 #define FM_FD_CMD_RPD                   0x40000000  /* Read Prepended Data */
+#define FM_FD_CMD_UPD			0x20000000  /* Update Prepended Data */
 #define FM_FD_CMD_DTC                   0x10000000  /* Do L4 Checksum */
 
 /* TX-Port: Unsupported Format */
-- 
1.7.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox