Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] mac80211: only remove AP VLAN frames from TXQ
From: Toke Høiland-Jørgensen @ 2017-10-06 10:30 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Michał Kazior, Johannes Berg
In-Reply-To: <20171006095333.26335-2-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>

Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> writes:

> From: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> When removing an AP VLAN interface, mac80211 currently purges
> the entire TXQ for the AP interface. Fix this by using the FQ
> API introduced in the previous patch to filter frames.
>
> Signed-off-by: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Acked-by: Toke Høiland-Jørgensen <toke-LJ9M9ZcSy1A@public.gmane.org>

^ permalink raw reply

* Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Jesper Dangaard Brouer @ 2017-10-06 10:50 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Alexei Starovoitov, Andy Gospodarek, brouer,
	Tobias Klauser
In-Reply-To: <59D5FDFF.5040002@iogearbox.net>

On Thu, 05 Oct 2017 11:40:15 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/04/2017 02:03 PM, Jesper Dangaard Brouer wrote:
> [...]
> > +#define CPU_MAP_BULK_SIZE 8  /* 8 == one cacheline on 64-bit archs */
> > +struct xdp_bulk_queue {
> > +	void *q[CPU_MAP_BULK_SIZE];
> > +	unsigned int count;
> > +};
> > +
> > +/* Struct for every remote "destination" CPU in map */
> > +struct bpf_cpu_map_entry {
> > +	u32 cpu;    /* kthread CPU and map index */
> > +	int map_id; /* Back reference to map */  
> 
> map_id is not used here if I read it correctly? We should
> then remove it.

It is actually used in a later patch. Notice, there is no unused
members in the final patch.  I did consider adding back in the later
patch, but it was annoying to during the devel and split-up patch
phase, as it creates conflicts when I move between the different
patches, that need to modify this struct.  Thus, I choose to keep the
end-struct in this cpumap-base-patch.  If you insist, I can go though
the patch-stack and carefully introduce changes to the struct in steps?


> > +	u32 qsize;  /* Redundant queue size for map lookup */
> > +
> > +	/* XDP can run multiple RX-ring queues, need __percpu enqueue store */
> > +	struct xdp_bulk_queue __percpu *bulkq;
> > +
> > +	/* Queue with potential multi-producers, and single-consumer kthread */
> > +	struct ptr_ring *queue;
> > +	struct task_struct *kthread;
> > +	struct work_struct kthread_stop_wq;
> > +
> > +	atomic_t refcnt; /* Control when this struct can be free'ed */
> > +	struct rcu_head rcu;
> > +};
> > +
> > +struct bpf_cpu_map {
> > +	struct bpf_map map;
> > +	/* Below members specific for map type */
> > +	struct bpf_cpu_map_entry **cpu_map;
> > +	unsigned long __percpu *flush_needed;
> > +};
> > +
> > +static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,
> > +			     struct xdp_bulk_queue *bq);  
> 
> Could we avoid forward declaration?

This forward declaration helps me avoid mixing the enqueue code with
the bpf-map code.  I like that all the enqueue code is located after
the struct bpf_map_ops cpu_map_ops deceleration.  If you insist, I can
reorder the code?

 
> > +static u64 cpu_map_bitmap_size(const union bpf_attr *attr)
> > +{
> > +	return BITS_TO_LONGS(attr->max_entries) * sizeof(unsigned long);
> > +}
> > +
> > +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> > +{
> > +	struct bpf_cpu_map *cmap;
> > +	u64 cost;
> > +	int err;
> > +
> > +	/* check sanity of attributes */
> > +	if (attr->max_entries == 0 || attr->key_size != 4 ||
> > +	    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	cmap = kzalloc(sizeof(*cmap), GFP_USER);
> > +	if (!cmap)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	/* mandatory map attributes */
> > +	cmap->map.map_type = attr->map_type;
> > +	cmap->map.key_size = attr->key_size;
> > +	cmap->map.value_size = attr->value_size;
> > +	cmap->map.max_entries = attr->max_entries;
> > +	cmap->map.map_flags = attr->map_flags;
> > +	cmap->map.numa_node = bpf_map_attr_numa_node(attr);
> > +
> > +	/* Pre-limit array size based on NR_CPUS, not final CPU check */
> > +	if (cmap->map.max_entries > NR_CPUS)
> > +		return ERR_PTR(-E2BIG);
> > +
> > +	/* make sure page count doesn't overflow */
> > +	cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
> > +	cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
> > +	if (cost >= U32_MAX - PAGE_SIZE)
> > +		goto free_cmap;
> > +	cmap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> > +
> > +	/* if map size is larger than memlock limit, reject it early */
> > +	err = bpf_map_precharge_memlock(cmap->map.pages);
> > +	if (err)
> > +		goto free_cmap;  
> 
> Given this is almost the same as devmap and touches user land reporting,
> we should probably go and do the same as in 582db7e0c4c2 ("bpf: devmap:
> pass on return value of bpf_map_precharge_memlock").

I guess we have to do this... even-though I absolutely HATE that this
will return -EPERM which user land will see as "Operation not permitted".

Even-though I know this, I got confused and spend several hours hunting
the wrong kind of error:
 https://github.com/netoptimizer/prototype-kernel/commit/cf4694792bf9807f48dc174a149ab0805133184a

Even the iovisor/BCC tool have a work-around for this ambiguous ABI
that we provide, and explicitly call their work-around a "hack":

https://github.com/iovisor/bcc/blob/2e20494f63a/src/cc/libbpf.c#L366-L373

 if (ret < 0 && errno == EPERM) {
    // When EPERM is returned, two reasons are possible:
    //  1. user has no permissions for bpf()
    //  2. user has insufficent rlimit for locked memory
    // Unfortunately, there is no api to inspect the current usage of locked
    // mem for the user, so an accurate calculation of how much memory to lock
    // for this new program is difficult to calculate. As a hack, bump the limit
    // to unlimited. If program load fails again, return the error.

 
> > +	/* A per cpu bitfield with a bit per possible CPU in map  */
> > +	cmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),
> > +					    __alignof__(unsigned long));
> > +	if (!cmap->flush_needed)
> > +		goto free_cmap;
> > +
> > +	/* Alloc array for possible remote "destination" CPUs */
> > +	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
> > +					   sizeof(struct bpf_cpu_map_entry *),
> > +					   cmap->map.numa_node);
> > +	if (!cmap->cpu_map)
> > +		goto free_cmap;
> > +
> > +	return &cmap->map;
> > +free_cmap:
> > +	free_percpu(cmap->flush_needed);
> > +	kfree(cmap);
> > +	return ERR_PTR(-ENOMEM);
> > +}
> > +
> > +void __cpu_map_queue_destructor(void *ptr)
> > +{
> > +	/* For now, just catch this as an error */
> > +	if (!ptr)
> > +		return;
> > +	pr_err("ERROR: %s() cpu_map queue was not empty\n", __func__);  
> 
> Can you elaborate on this "for now" condition? Is this a race
> when kthread doesn't consume queue on thread exit, or should it
> be impossible to trigger (should it then be replaced with a
> 'if (WARN_ON_ONCE(ptr)) page_frag_free(ptr)' and a more elaborate
> comment)?

The "for now" is an old comment while developing and testing this.
In this final state of the patchset it _should_ not be possible to
trigger this situation.  I like your idea of replacing it with a
WARN_ON_ONCE.  (as it might be good to keep in some form, as it would
catch is someone changing the code which breaks the RCU+WQ+kthread
tear-down procedure).


> > +	page_frag_free(ptr);
> > +}
> > +
> > +static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> > +{
> > +	if (atomic_dec_and_test(&rcpu->refcnt)) {
> > +		/* The queue should be empty at this point */
> > +		ptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor);
> > +		kfree(rcpu->queue);
> > +		kfree(rcpu);
> > +	}
> > +}
> > +
> > +static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> > +{
> > +	atomic_inc(&rcpu->refcnt);
> > +}
> > +
> > +/* called from workqueue, to workaround syscall using preempt_disable */
> > +static void cpu_map_kthread_stop(struct work_struct *work)
> > +{
> > +	struct bpf_cpu_map_entry *rcpu;
> > +
> > +	rcpu = container_of(work, struct bpf_cpu_map_entry, kthread_stop_wq);
> > +	synchronize_rcu(); /* wait for flush in __cpu_map_entry_free() */
> > +	kthread_stop(rcpu->kthread); /* calls put_cpu_map_entry */
> > +}
> > +
> > +static int cpu_map_kthread_run(void *data)
> > +{
> > +	struct bpf_cpu_map_entry *rcpu = data;
> > +
> > +	set_current_state(TASK_INTERRUPTIBLE);
> > +	while (!kthread_should_stop()) {
> > +		struct xdp_pkt *xdp_pkt;
> > +
> > +		schedule();
> > +		/* Do work */
> > +		while ((xdp_pkt = ptr_ring_consume(rcpu->queue))) {
> > +			/* For now just "refcnt-free" */
> > +			page_frag_free(xdp_pkt);
> > +		}
> > +		__set_current_state(TASK_INTERRUPTIBLE);
> > +	}
> > +	put_cpu_map_entry(rcpu);
> > +
> > +	__set_current_state(TASK_RUNNING);
> > +	return 0;
> > +}
> > +
> > +struct bpf_cpu_map_entry *__cpu_map_entry_alloc(u32 qsize, u32 cpu, int map_id)
> > +{
> > +	gfp_t gfp = GFP_ATOMIC|__GFP_NOWARN;
> > +	struct bpf_cpu_map_entry *rcpu;
> > +	int numa, err;
> > +
> > +	/* Have map->numa_node, but choose node of redirect target CPU */
> > +	numa = cpu_to_node(cpu);
> > +
> > +	rcpu = kzalloc_node(sizeof(*rcpu), gfp, numa);
> > +	if (!rcpu)
> > +		return NULL;
> > +
> > +	/* Alloc percpu bulkq */
> > +	rcpu->bulkq = __alloc_percpu_gfp(sizeof(*rcpu->bulkq),
> > +					 sizeof(void *), gfp);
> > +	if (!rcpu->bulkq)
> > +		goto fail;
> > +
> > +	/* Alloc queue */
> > +	rcpu->queue = kzalloc_node(sizeof(*rcpu->queue), gfp, numa);
> > +	if (!rcpu->queue)
> > +		goto fail;
> > +
> > +	err = ptr_ring_init(rcpu->queue, qsize, gfp);
> > +	if (err)
> > +		goto fail;
> > +	rcpu->qsize = qsize;
> > +
> > +	/* Setup kthread */
> > +	rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa,
> > +					       "cpumap/%d/map:%d", cpu, map_id);
> > +	if (IS_ERR(rcpu->kthread))
> > +		goto fail;  
> 
> What about ptr_ring_cleanup() when we fail here?

Thanks for catching this ... added.

> > +
> > +	/* Make sure kthread runs on a single CPU */
> > +	kthread_bind(rcpu->kthread, cpu);
> > +	wake_up_process(rcpu->kthread);
> > +
> > +	get_cpu_map_entry(rcpu); /* 1-refcnt for being in cmap->cpu_map[] */
> > +	get_cpu_map_entry(rcpu); /* 1-refcnt for kthread */
> > +
> > +	return rcpu;
> > +
> > +fail:   /* Hint: free API detect NULL values */
> > +	free_percpu(rcpu->bulkq);
> > +	kfree(rcpu->queue);
> > +	kfree(rcpu);
> > +	return NULL;
> > +}
> > +
> > +void __cpu_map_entry_free(struct rcu_head *rcu)
> > +{
> > +	struct bpf_cpu_map_entry *rcpu;
> > +	int cpu;
> > +
> > +	/* This cpu_map_entry have been disconnected from map and one
> > +	 * RCU graze-period have elapsed.  Thus, XDP cannot queue any
> > +	 * new packets and cannot change/set flush_needed that can
> > +	 * find this entry.
> > +	 */
> > +	rcpu = container_of(rcu, struct bpf_cpu_map_entry, rcu);
> > +
> > +	/* Flush remaining packets in percpu bulkq */
> > +	for_each_online_cpu(cpu) {
> > +		struct xdp_bulk_queue *bq = per_cpu_ptr(rcpu->bulkq, cpu);
> > +
> > +		/* No concurrent bq_enqueue can run at this point */
> > +		bq_flush_to_queue(rcpu, bq);
> > +	}
> > +	free_percpu(rcpu->bulkq);
> > +	/* Cannot kthread_stop() here, last put free rcpu resources */
> > +	put_cpu_map_entry(rcpu);
> > +}
> > +
> > +/* After xchg pointer to bpf_cpu_map_entry, use the call_rcu() to
> > + * ensure any driver rcu critical sections have completed, but this
> > + * does not guarantee a flush has happened yet. Because driver side
> > + * rcu_read_lock/unlock only protects the running XDP program.  The
> > + * atomic xchg and NULL-ptr check in __cpu_map_flush() makes sure a
> > + * pending flush op doesn't fail.
> > + *
> > + * The bpf_cpu_map_entry is still used by the kthread, and there can
> > + * still be pending packets (in queue and percpu bulkq).  A refcnt
> > + * makes sure to last user (kthread_stop vs. call_rcu) free memory
> > + * resources.
> > + *
> > + * The rcu callback __cpu_map_entry_free flush remaining packets in
> > + * percpu bulkq to queue.  Due to caller map_delete_elem() disable
> > + * preemption, cannot call kthread_stop() to make sure queue is empty.
> > + * Instead a work_queue is started for stopping kthread,
> > + * cpu_map_kthread_stop, which waits for an RCU graze period before
> > + * stopping kthread, emptying the queue.
> > + */
> > +void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
> > +			     u32 key_cpu, struct bpf_cpu_map_entry *rcpu)
> > +{
> > +	struct bpf_cpu_map_entry *old_rcpu;
> > +
> > +	old_rcpu = xchg(&cmap->cpu_map[key_cpu], rcpu);
> > +	if (old_rcpu) {
> > +		call_rcu(&old_rcpu->rcu, __cpu_map_entry_free);
> > +		INIT_WORK(&old_rcpu->kthread_stop_wq, cpu_map_kthread_stop);
> > +		schedule_work(&old_rcpu->kthread_stop_wq);
> > +	}
> > +}
> > +
> > +int cpu_map_delete_elem(struct bpf_map *map, void *key)
> > +{
> > +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> > +	u32 key_cpu = *(u32 *)key;
> > +
> > +	if (key_cpu >= map->max_entries)
> > +		return -EINVAL;
> > +
> > +	/* notice caller map_delete_elem() use preempt_disable() */
> > +	__cpu_map_entry_replace(cmap, key_cpu, NULL);
> > +	return 0;
> > +}
> > +
> > +int cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
> > +				u64 map_flags)
> > +{
> > +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> > +	struct bpf_cpu_map_entry *rcpu;
> > +
> > +	/* Array index key correspond to CPU number */
> > +	u32 key_cpu = *(u32 *)key;
> > +	/* Value is the queue size */
> > +	u32 qsize = *(u32 *)value;
> > +
> > +	/* Make sure CPU is a valid possible cpu */
> > +	if (!cpu_possible(key_cpu))
> > +		return -ENODEV;
> > +
> > +	if (unlikely(map_flags > BPF_EXIST))
> > +		return -EINVAL;
> > +	if (unlikely(key_cpu >= cmap->map.max_entries))
> > +		return -E2BIG;
> > +	if (unlikely(map_flags == BPF_NOEXIST))
> > +		return -EEXIST;
> > +	if (unlikely(qsize > 16384)) /* sanity limit on qsize */
> > +		return -EOVERFLOW;
> > +
> > +	if (qsize == 0) {
> > +		rcpu = NULL; /* Same as deleting */
> > +	} else {
> > +		/* Updating qsize cause re-allocation of bpf_cpu_map_entry */
> > +		rcpu = __cpu_map_entry_alloc(qsize, key_cpu, map->id);
> > +		if (!rcpu)
> > +			return -ENOMEM;
> > +	}
> > +	rcu_read_lock();
> > +	__cpu_map_entry_replace(cmap, key_cpu, rcpu);
> > +	rcu_read_unlock();
> > +	return 0;  
> 
> You need to update verifier such that this function cannot be called
> out of an BPF program,

In the example BPF program, I do a lookup into the map, but only to
verify that an entry exist (I don't look at the value).  I would like
to support such usage.


> otherwise it would be possible under full RCU
> read context, which is explicitly avoided here and also it would otherwise
> be allowed for other maps of different type as well, which needs to
> be avoided.

Sorry, I don't follow this.

 
> > +}
> > +
> > +void cpu_map_free(struct bpf_map *map)
> > +{
> > +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> > +	int cpu;
> > +	u32 i;
> > +
> > +	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
> > +	 * so the bpf programs (can be more than one that used this map) were
> > +	 * disconnected from events. Wait for outstanding critical sections in
> > +	 * these programs to complete. The rcu critical section only guarantees
> > +	 * no further "XDP/bpf-side" reads against bpf_cpu_map->cpu_map.
> > +	 * It does __not__ ensure pending flush operations (if any) are
> > +	 * complete.
> > +	 */
> > +	synchronize_rcu();
> > +
> > +	/* To ensure all pending flush operations have completed wait for flush
> > +	 * bitmap to indicate all flush_needed bits to be zero on _all_ cpus.
> > +	 * Because the above synchronize_rcu() ensures the map is disconnected
> > +	 * from the program we can assume no new bits will be set.
> > +	 */
> > +	for_each_online_cpu(cpu) {
> > +		unsigned long *bitmap = per_cpu_ptr(cmap->flush_needed, cpu);
> > +
> > +		while (!bitmap_empty(bitmap, cmap->map.max_entries))
> > +			cond_resched();
> > +	}
> > +
> > +	/* For cpu_map the remote CPUs can still be using the entries
> > +	 * (struct bpf_cpu_map_entry).
> > +	 */
> > +	for (i = 0; i < cmap->map.max_entries; i++) {
> > +		struct bpf_cpu_map_entry *rcpu;
> > +
> > +		rcpu = READ_ONCE(cmap->cpu_map[i]);
> > +		if (!rcpu)
> > +			continue;
> > +
> > +		/* bq flush and cleanup happens after RCU graze-period */
> > +		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
> > +	}
> > +	free_percpu(cmap->flush_needed);
> > +	bpf_map_area_free(cmap->cpu_map);
> > +	kfree(cmap);
> > +}
> > +
> > +struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
> > +{
> > +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> > +	struct bpf_cpu_map_entry *rcpu;
> > +
> > +	if (key >= map->max_entries)
> > +		return NULL;
> > +
> > +	rcpu = READ_ONCE(cmap->cpu_map[key]);
> > +	return rcpu;
> > +}
> > +
> > +static void *cpu_map_lookup_elem(struct bpf_map *map, void *key)
> > +{
> > +	struct bpf_cpu_map_entry *rcpu =
> > +		__cpu_map_lookup_elem(map, *(u32 *)key);
> > +
> > +	return rcpu ? &rcpu->qsize : NULL;  
> 
> The qsize doesn't seem used anywhere else besides here, but you
> probably should update verifier such that this cannot be called
> out of the BPF program, which could then mangle qsize value.

It is true that the BPF prog can modify this qsize value, but it's not
the authoritative value, so it doesn't really matter.

As I said above, I do want to do a lookup from a BPF program.  To allow
to BPF program to know, if an entry is valid, else it will blindly
send to a cpu destination.  Maybe bpf_prog's just have to use a
map-on-the-side to coordinate this(?), but then a sysadm modifying the
real cpumap will be invisible to the program.


Maybe we should just disable BPF-progs from reading this in the first
iteration?  It would allow for more advanced usage schemes later..

One crazy idea is to have the cpu_map_lookup_elem() return if any
packets are in-flight on this cpu-queue. (Making it easier to avoid OoO
packets, when switching target CPU, but it can also be implemented by
the BPF-programmer herself via maps, although via some extra atomic
cost).


> > +}
> > +
> > +static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> > +{
> > +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> > +	u32 index = key ? *(u32 *)key : U32_MAX;
> > +	u32 *next = next_key;
> > +
> > +	if (index >= cmap->map.max_entries) {
> > +		*next = 0;
> > +		return 0;
> > +	}
> > +
> > +	if (index == cmap->map.max_entries - 1)
> > +		return -ENOENT;
> > +	*next = index + 1;
> > +	return 0;
> > +}


I would have liked to have implemented next_key so it only returned the
next valid cpu_entry, and used it as a simple round-robin scheduler.
But AFAIK the next_key function is not allowed from bpf_prog's, right?


> > +
> > +const struct bpf_map_ops cpu_map_ops = {
> > +	.map_alloc		= cpu_map_alloc,
> > +	.map_free		= cpu_map_free,
> > +	.map_delete_elem	= cpu_map_delete_elem,
> > +	.map_update_elem	= cpu_map_update_elem,
> > +	.map_lookup_elem	= cpu_map_lookup_elem,
> > +	.map_get_next_key	= cpu_map_get_next_key,
> > +};
> > +
> > +static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,
> > +			     struct xdp_bulk_queue *bq)
> > +{
> > +	struct ptr_ring *q;
> > +	int i;
> > +
> > +	if (unlikely(!bq->count))
> > +		return 0;
> > +
> > +	q = rcpu->queue;
> > +	spin_lock(&q->producer_lock);
> > +
> > +	for (i = 0; i < bq->count; i++) {
> > +		void *xdp_pkt = bq->q[i];
> > +		int err;
> > +
> > +		err = __ptr_ring_produce(q, xdp_pkt);
> > +		if (err) {
> > +			/* Free xdp_pkt */
> > +			page_frag_free(xdp_pkt);
> > +		}
> > +	}
> > +	bq->count = 0;
> > +	spin_unlock(&q->producer_lock);
> > +
> > +	return 0;
> > +}
> > +
[...]


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] bnx2x: Use pci_ari_enabled() instead of local copy
From: Bjorn Helgaas @ 2017-10-06 11:00 UTC (permalink / raw)
  To: Ariel Elior; +Cc: netdev, everest-linux-l2, linux-kernel

From: Bjorn Helgaas <bhelgaas@google.com>

Use pci_ari_enabled() from the PCI core instead of the identical local copy
bnx2x_ari_enabled().  No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index 9ca994d0bab6..3591077a5f6b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -1074,11 +1074,6 @@ static void bnx2x_vf_set_bars(struct bnx2x *bp, struct bnx2x_virtf *vf)
 	}
 }
 
-static int bnx2x_ari_enabled(struct pci_dev *dev)
-{
-	return dev->bus->self && dev->bus->self->ari_enabled;
-}
-
 static int
 bnx2x_get_vf_igu_cam_info(struct bnx2x *bp)
 {
@@ -1212,7 +1207,7 @@ int bnx2x_iov_init_one(struct bnx2x *bp, int int_mode_param,
 
 	err = -EIO;
 	/* verify ari is enabled */
-	if (!bnx2x_ari_enabled(bp->pdev)) {
+	if (!pci_ari_enabled(bp->pdev->bus)) {
 		BNX2X_ERR("ARI not supported (check pci bridge ARI forwarding), SRIOV can not be enabled\n");
 		return 0;
 	}

^ permalink raw reply related

* Re: [net-next,v2] ip_gre: check packet length and mtu correctly in erspan tx
From: Xin Long @ 2017-10-06 11:03 UTC (permalink / raw)
  To: William Tu; +Cc: network dev, David Laight
In-Reply-To: <1507230432-56495-1-git-send-email-u9012063@gmail.com>

On Fri, Oct 6, 2017 at 3:07 AM, William Tu <u9012063@gmail.com> wrote:
> Similarly to early patch for erspan_xmit(), the ARPHDR_ETHER device
> is the length of the whole ether packet.  So skb->len should subtract
> the dev->hard_header_len.
>
> Fixes: 1a66a836da63 ("gre: add collect_md mode to ERSPAN tunnel")
> Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
> Signed-off-by: William Tu <u9012063@gmail.com>
> Cc: Xin Long <lucien.xin@gmail.com>
> Cc: David Laight <David.Laight@aculab.com>
> ---
> v1->v2:
> use addition to avoid overflow
> fix pskb_trim size
> ---
>  net/ipv4/ip_gre.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index b279c325c7f6..fb95f68d6e53 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -579,8 +579,8 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
>         if (gre_handle_offloads(skb, false))
>                 goto err_free_rt;
>
> -       if (skb->len > dev->mtu) {
> -               pskb_trim(skb, dev->mtu);
> +       if (skb->len > dev->mtu + dev->hard_header_len) {
> +               pskb_trim(skb, dev->mtu + dev->hard_header_len);
>                 truncate = true;
>         }
>
> @@ -731,8 +731,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb,
>         if (skb_cow_head(skb, dev->needed_headroom))
>                 goto free_skb;
>
> -       if (skb->len - dev->hard_header_len > dev->mtu) {
> -               pskb_trim(skb, dev->mtu);
> +       if (skb->len > dev->mtu + dev->hard_header_len) {
> +               pskb_trim(skb, dev->mtu + dev->hard_header_len);
>                 truncate = true;
>         }
>
> --
> 2.7.4
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH net v2 3/9] net/mac89x0: Fix and modernize log messages
From: Finn Thain @ 2017-10-06 11:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20171005.210842.1348457661309929796.davem@davemloft.net>

On Thu, 5 Oct 2017, David Miller wrote:

> From: Finn Thain <fthain@telegraphics.com.au>
> Date: Thu, 5 Oct 2017 21:11:05 -0400 (EDT)
> 
> > Fix misplaced newlines in conditional log messages.
> 
> Please don't do this, the way the author formatted the strings was 
> intentional, they intended to print out:
> 
> 	NAME: cs89%c0%s rev %c found at %#8lx IRQ %d ADDR %pM
> 
> But now you are splitting it into multiple lines.

Right.

> Also, you're printing the IRQ information after register_netdev() which 
> is bad.  As soon as register_netdev() is called, the driver's
> ->open() routine can be invoked, and during which time some
> log messages could be emitted during that operation.
> 
> And that would cut the probe messages up.
> 

Yes and no. The thing is, "IRQ %d" isn't really a "probe message" and 
doesn't need to be logged at all: the IRQ is entirely fixed. Actually the 
same is true for the macmace driver. There is value in this information 
but it can be found in /proc/interrupts so I'd happily drop the "IRQ %d" 
portion from these log messages.

> I know how you got to this state, you saw a reference to dev->name 
> before it had a real value.  You just removed the "eth%d" string 
> entirely.  And since you removed the dev->name reference, you had no 
> reason to move log messages after register_netdev() at all.
> 

Not quite. I used the "MAC %pM, IRQ %d" style for consistency with other 
NIC drivers. Though consistency in itself may be insufficient 
justification. More importantly, I wanted the MAC address logged together 
with the actual interface name. That's how I arrived at this code.

> Anyways, you can also see the intention of the author here becuase they 
> have _explicit_ leading newlines in the error path messages that come 
> after the inital probe printk.
> 

Of course. I do understand the existing code. And the code actually 
reflects the intentions of the author of the ISA driver. Having the IRQ 
logged could be really valuable to the typical ISA card user but this 
platform is not ISA.

> The real way to fix the early dev->name reference is to replace it with 
> a dev_info() call and have it use the struct device name rather than the 
> netdev device one.
> 

This driver only runs on machines with one expansion slot (called a "Comm 
Slot"). So I figured that either pr_info() or printk(KERN_INFO ...) would 
do just fine here (always has done). I did consider dev_info() but I don't 
see the benefit. I'm probably missing something, so would you elaborate 
please?

BTW I've also used pr_info() elsewhere in this series in platform drivers. 
It's not yet clear to me whether the mac89x0 driver should ultimately bind 
to a platform device or a nubus device: comm slot cards are a bit of 
each.

> Again, I think you really shouldn't be making these small weird changes 
> to these old drivers.
> 

These are weird changes befitting a weird platform.

I understand your reluctance to touch legacy drivers, but my intention is 
not change for the sake of change. There is bitrot here. Sometimes that's 
due to the rest of the kernel having changed and sometimes it's due to 
actual damage of the kind you seem to fear. I'm trying to address both.

-- 

^ permalink raw reply

* Re: [net-next V4 PATCH 2/5] bpf: XDP_REDIRECT enable use of cpumap
From: Jesper Dangaard Brouer @ 2017-10-06 11:17 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Alexei Starovoitov, Andy Gospodarek, brouer
In-Reply-To: <59D60505.2040004@iogearbox.net>


On Thu, 05 Oct 2017 12:10:13 +0200 Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/04/2017 02:03 PM, Jesper Dangaard Brouer wrote:
> > This patch connects cpumap to the xdp_do_redirect_map infrastructure.
> >
> > Still no SKB allocation are done yet.  The XDP frames are transferred
> > to the other CPU, but they are simply refcnt decremented on the remote
> > CPU.  This served as a good benchmark for measuring the overhead of
> > remote refcnt decrement.  If driver page recycle cache is not
> > efficient then this, exposes a bottleneck in the page allocator.
> >
> > A shout-out to MST's ptr_ring, which is the secret behind is being so
> > efficient to transfer memory pointers between CPUs, without constantly
> > bouncing cache-lines between CPUs.
> >
> > V3: Handle !CONFIG_BPF_SYSCALL pointed out by kbuild test robot.
> >
> > V4: Make Generic-XDP aware of cpumap type, but don't allow redirect yet,
> >   as implementation require a separate upstream discussion.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>  
> [...]
> > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> > index ae8e29352261..4926a9971f90 100644
> > --- a/kernel/bpf/cpumap.c
> > +++ b/kernel/bpf/cpumap.c
> > @@ -493,7 +493,8 @@ static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_pkt *xdp_pkt)
> >   	return 0;
> >   }
> >
> > -int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp)
> > +int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
> > +		    struct net_device *dev_rx)
> >   {
> >   	struct xdp_pkt *xdp_pkt;
> >   	int headroom;
> > @@ -505,7 +506,7 @@ int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp)
> >   	xdp_pkt = xdp->data_hard_start;
> >   	xdp_pkt->data = xdp->data;
> >   	xdp_pkt->len  = xdp->data_end - xdp->data;
> > -	xdp_pkt->headroom = headroom;
> > +	xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);  
> 
> (Just a note, bit confusing that first two patches add and extend
>   this, and only in the third you add the xdp->data_meta handling,
>   makes it harder to review at least.)

Sorry.  This is a left-overs from rebasing and measuring the cost of
transferring only the pointer to the page, and remote put_page().
And your xdp->data_meta, happen basically while my patches was in-flight.

I'll move this one-line back to patch 2, to spreading over too many
patches.

> [...]
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 9b6e7e84aafd..dbf2ae071108 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2521,10 +2521,36 @@ static int __bpf_tx_xdp(struct net_device *dev,
> >   	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdp);
> >   	if (err)
> >   		return err;
> > -	if (map)
> > +	dev->netdev_ops->ndo_xdp_flush(dev);
> > +	return 0;
> > +}
> > +
> > +static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
> > +			    struct bpf_map *map,
> > +			    struct xdp_buff *xdp,
> > +			    u32 index)
> > +{
> > +	int err;
> > +
> > +	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
> > +		struct net_device *dev = fwd;
> > +
> > +		if (!dev->netdev_ops->ndo_xdp_xmit)
> > +			return -EOPNOTSUPP;
> > +
> > +		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdp);
> > +		if (err)
> > +			return err;
> >   		__dev_map_insert_ctx(map, index);
> > -	else
> > -		dev->netdev_ops->ndo_xdp_flush(dev);
> > +
> > +	} else if (map->map_type == BPF_MAP_TYPE_CPUMAP) {
> > +		struct bpf_cpu_map_entry *rcpu = fwd;
> > +
> > +		err = cpu_map_enqueue(rcpu, xdp, dev_rx);
> > +		if (err)
> > +			return err;
> > +		__cpu_map_insert_ctx(map, index);
> > +	}
> >   	return 0;
> >   }
> >
> > @@ -2534,11 +2560,33 @@ void xdp_do_flush_map(void)
> >   	struct bpf_map *map = ri->map_to_flush;
> >
> >   	ri->map_to_flush = NULL;
> > -	if (map)
> > -		__dev_map_flush(map);
> > +	if (map) {
> > +		switch (map->map_type) {
> > +		case BPF_MAP_TYPE_DEVMAP:
> > +			__dev_map_flush(map);
> > +			break;
> > +		case BPF_MAP_TYPE_CPUMAP:
> > +			__cpu_map_flush(map);
> > +			break;
> > +		default:
> > +			break;
> > +		}
> > +	}
> >   }
> >   EXPORT_SYMBOL_GPL(xdp_do_flush_map);
> >
> > +static void *__xdp_map_lookup_elem(struct bpf_map *map, u32 index)
> > +{
> > +	switch (map->map_type) {
> > +	case BPF_MAP_TYPE_DEVMAP:
> > +		return __dev_map_lookup_elem(map, index);
> > +	case BPF_MAP_TYPE_CPUMAP:
> > +		return __cpu_map_lookup_elem(map, index);
> > +	default:
> > +		return NULL;
> > +	}  
> 
> Should we just have a callback and instead of the above use
> map->ptr_lookup_elem() (or however we name it) ... lot of it
> is pretty much the same logic as with devmap.

We could extend struct bpf_map *map with such a callback, I was just
afraid that this would be too invasive.

Performance wise, I don't thinks will hurt too much.
http://www.cipht.net/2017/10/03/are-jump-tables-always-fastest.html


> > +}
> > +
> >   static inline bool xdp_map_invalid(const struct bpf_prog *xdp_prog,
> >   				   unsigned long aux)
> >   {
> > @@ -2551,8 +2599,8 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
> >   	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> >   	unsigned long map_owner = ri->map_owner;
> >   	struct bpf_map *map = ri->map;
> > -	struct net_device *fwd = NULL;
> >   	u32 index = ri->ifindex;
> > +	void *fwd = NULL;
> >   	int err;
> >
> >   	ri->ifindex = 0;  
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 2/3 v2] net: phy: DP83822 initial driver submission
From: Dan Murphy @ 2017-10-06 12:00 UTC (permalink / raw)
  To: Andrew Lunn, Woojung.Huh; +Cc: f.fainelli, netdev, afd
In-Reply-To: <20171004235307.GD16612@lunn.ch>

Andrew

On 10/04/2017 06:53 PM, Andrew Lunn wrote:
> On Wed, Oct 04, 2017 at 10:44:36PM +0000, Woojung.Huh@microchip.com wrote:
>>> +static int dp83822_suspend(struct phy_device *phydev)
>>> +{
>>> +	int value;
>>> +
>>> +	mutex_lock(&phydev->lock);
>>> +	value = phy_read_mmd(phydev, DP83822_DEVADDR,
>>> MII_DP83822_WOL_CFG);
>>> +	mutex_unlock(&phydev->lock);
> 
>> Would we need mutex to access phy_read_mmd()?
>> phy_read_mmd() has mdio_lock for indirect access.
> 
> Hi Woojung
> 
> The mdio lock is not sufficient. It protects against two mdio
> accesses. But here we need to protect against two phy operations.
> There is a danger something else tries to access the phy during
> suspend.
> 
>>> +	if (!(value & DP83822_WOL_EN))
>>> +		genphy_suspend(phydev);
> 
> Releasing the lock before calling genphy_suspend() is not so nice.
> Maybe add a version which assumes the lock has already been taken?
> 

The marvell driver does not take a lock and calls genphy_suspend/resume
so I am wondering if this driver needs to take a lock.

The at803x needs to take the lock because it does not call into the genphy
functions.

I will submit a new version with the lock removed.

Dan

>       Andrew
> 


-- 
------------------
Dan Murphy

^ permalink raw reply

* Re: [net-next V4 PATCH 2/5] bpf: XDP_REDIRECT enable use of cpumap
From: Jesper Dangaard Brouer @ 2017-10-06 12:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Alexei Starovoitov, Andy Gospodarek, brouer
In-Reply-To: <20171006131748.75185f65@redhat.com>

On Fri, 6 Oct 2017 13:17:48 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> > > -int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp)
> > > +int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
> > > +		    struct net_device *dev_rx)
> > >   {
> > >   	struct xdp_pkt *xdp_pkt;
> > >   	int headroom;
> > > @@ -505,7 +506,7 @@ int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp)
> > >   	xdp_pkt = xdp->data_hard_start;
> > >   	xdp_pkt->data = xdp->data;
> > >   	xdp_pkt->len  = xdp->data_end - xdp->data;
> > > -	xdp_pkt->headroom = headroom;
> > > +	xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);    
> > 
> > (Just a note, bit confusing that first two patches add and extend
> >   this, and only in the third you add the xdp->data_meta handling,
> >   makes it harder to review at least.)  
> 
> Sorry.  This is a left-overs from rebasing and measuring the cost of
> transferring only the pointer to the page, and remote put_page().
> And your xdp->data_meta, happen basically while my patches was in-flight.
> 
> I'll move this one-line back to patch 2, to spreading over too many
> patches.

I instead choose to move the creation of cpu_map_enqueue() into this
patch, but in a more simple version stating explicit that this is only
seen as a void pointer enqueue.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] ipv6: gso: fix payload length when gso_size is zero
From: Alexey Kodanev @ 2017-10-06 12:16 UTC (permalink / raw)
  To: Duyck, Alexander H, netdev@vger.kernel.org
  Cc: davem@davemloft.net, steffen.klassert@secunet.com
In-Reply-To: <1507229878.2098.56.camel@intel.com>

On 10/05/2017 09:58 PM, Duyck, Alexander H wrote:
> On Thu, 2017-10-05 at 20:06 +0300, Alexey Kodanev wrote:
>> When gso_size reset to zero for the tail segment in skb_segment(), later
>> in ipv6_gso_segment(), we will get incorrect payload_len for that segment.
>> inet_gso_segment() already has a check for gso_size before calculating
>> payload so fixing only IPv6 part.
>>
>> The issue was found with LTP vxlan & gre tests over ixgbe NIC.
>>
>> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
>> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
>> ---
>>  net/ipv6/ip6_offload.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
>> index cdb3728..4a87f94 100644
>> --- a/net/ipv6/ip6_offload.c
>> +++ b/net/ipv6/ip6_offload.c
>> @@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
>>  
>>  	for (skb = segs; skb; skb = skb->next) {
>>  		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
>> -		if (gso_partial)
>> +		if (gso_partial && skb_is_gso(skb))
>>  			payload_len = skb_shinfo(skb)->gso_size +
>>  				      SKB_GSO_CB(skb)->data_offset +
>>  				      skb->head - (unsigned char *)(ipv6h + 1);
> So looking over this change it looks good to me. I'm just wondering if
> you have looked at the code in __skb_udp_tunnel_segment or
> gre_gso_segment? It seems like if you needed this change here you
> should need to make similar changes to those functions as well. I'm
> wondering if we just aren't seeing issues due to the segments already
> being MSS sized before being handed to us for segmentation.

Right, it can happen in __skb_udp_tunnel_segment as well. I wasn't able
to reproduce with gre, it looks like it doesn't go to that part of code,
skipping it, possibly, on gso_type & SKB_GSO_GRE_CSUM check, though the
NIC has tx-gre-csum-segmentation enabled...

I'll send new version after testing completed.

Thanks,
Alexey

^ permalink raw reply

* Re: [net-next V4 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation
From: Jesper Dangaard Brouer @ 2017-10-06 12:11 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Alexei Starovoitov, Andy Gospodarek, brouer
In-Reply-To: <59D607F3.6090306@iogearbox.net>

On Thu, 05 Oct 2017 12:22:43 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/04/2017 02:03 PM, Jesper Dangaard Brouer wrote:
> [...]
> >   static int cpu_map_kthread_run(void *data)
> >   {
> >   	struct bpf_cpu_map_entry *rcpu = data;
> >
> >   	set_current_state(TASK_INTERRUPTIBLE);
> >   	while (!kthread_should_stop()) {
> > +		unsigned int processed = 0, drops = 0;
> >   		struct xdp_pkt *xdp_pkt;
> >
> > -		schedule();
> > -		/* Do work */
> > -		while ((xdp_pkt = ptr_ring_consume(rcpu->queue))) {
> > -			/* For now just "refcnt-free" */
> > -			page_frag_free(xdp_pkt);
> > +		/* Release CPU reschedule checks */
> > +		if (__ptr_ring_empty(rcpu->queue)) {
> > +			schedule();
> > +		} else {
> > +			cond_resched();
> > +		}
> > +
> > +		/* Process packets in rcpu->queue */
> > +		local_bh_disable();
> > +		/*
> > +		 * The bpf_cpu_map_entry is single consumer, with this
> > +		 * kthread CPU pinned. Lockless access to ptr_ring
> > +		 * consume side valid as no-resize allowed of queue.
> > +		 */
> > +		while ((xdp_pkt = __ptr_ring_consume(rcpu->queue))) {
> > +			struct sk_buff *skb;
> > +			int ret;
> > +
> > +			skb = cpu_map_build_skb(rcpu, xdp_pkt);
> > +			if (!skb) {
> > +				page_frag_free(xdp_pkt);
> > +				continue;
> > +			}
> > +
> > +			/* Inject into network stack */
> > +			ret = netif_receive_skb_core(skb);  
> 
> Don't we need to hold RCU read lock for above netif_receive_skb_core()?

Yes, I guess we do, but I'll place it in netif_receive_skb_core() before
invoking __netif_receive_skb_core(), like netif_receive_skb() does
around __netif_receive_skb().

It looks like the RCU section protects:
 	rx_handler = rcu_dereference(skb->dev->rx_handler);

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 3/3] ARM: dts: gr-peach: Add ETHER pin group
From: jacopo mondi @ 2017-10-06 12:24 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Geert Uytterhoeven, Chris Brandt, f.fainelli, netdev
In-Reply-To: <20171005164826.GL13247@lunn.ch>

Hi Andrew,
   thanks for the suggestion

On Thu, Oct 05, 2017 at 06:48:26PM +0200, Andrew Lunn wrote:
> On Thu, Oct 05, 2017 at 05:42:39PM +0200, jacopo mondi wrote:
> > Hi Andrew,

[snip]

> > > Hi Jocopo
> > >
> > > So what is this reset resetting?
> > >
> > > The MAC?
> > > The PHY?
> >
> > The reset line goes from our SoC to LAN8710A PHY chip external reset pin.
>
> So yes, this is a PHY property, and should be in the PHY node.
>
> Documentation/devicetree/bindings/net/mdio.txt does not apply here
> anyway. That is for an MDIO binding. This node is an ethernet MAC.
>
> So your binding whats to look something like
>
>         ether: ethernet@e8203000 {
>                 compatible = "renesas,ether-r7s72100";
>                 reg = <0xe8203000 0x800>,
>                       <0xe8204800 0x200>;
>                 interrupts = <GIC_SPI 327 IRQ_TYPE_LEVEL_HIGH>;
>                 clocks = <&mstp7_clks R7S72100_CLK_ETHER>;
>                 power-domains = <&cpg_clocks>;
>                 phy-mode = "mii";
> 		          phy-handle = <&phy0>;
>                 #address-cells = <1>;
>                 #size-cells = <0>;
>
>                 mdio: bus-bus {
>                       #address-cells = <1>;
>                       #size-cells = <0>;
>
>                       phy0: ethernet-phy@1 {
>                             reg = <1>;

Why reg = <1> ?
Shouldn't this be 0, or even better with no reg property at all?

        mdio: bus-bus {
              phy-0 {
                  reset-gpios = <&port4 2 GPIO_ACTIVE_LOW>;
                  reset-delay-us = <5>;
              };
        };

Thanks
   j

>                             reset-gpios = <&port4 2 GPIO_ACTIVE_LOW>;
>                             reset-delay-us = <5>;
>                       };
>                 };
>         };
>
> 	Andrew

^ permalink raw reply

* [PATCH 0/4] RCU: introduce noref debug
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev

The networking subsystem is currently using some kind of long-lived
RCU-protected, references to avoid the overhead of full book-keeping.

Such references - skb_dst() noref - are stored inside the skbs and can be
moved across relevant slices of the network stack, with the users
being in charge of properly clearing the relevant skb - or properly refcount
the related dst references - before the skb escapes the RCU section.

We currently don't have any deterministic debug infrastructure to check
the dst noref usages - and the introduction of others noref artifact is
currently under discussion.

This series tries to tackle the above introducing an RCU debug infrastructure
aimed at spotting incorrect noref pointer usage, in patch one. The
infrastructure is small and must be explicitly enabled via a newly introduced
build option.

Patch two uses such infrastructure to track dst noref usage in the networking
stack.

Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
on basic scenarios.

Paolo Abeni (4):
  rcu: introduce noref debug
  net: use RCU noref infrastructure to track dst noref
  ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref()
  tcp: avoid noref dst leak on input path

 include/linux/rcupdate.h | 11 ++++++
 include/linux/skbuff.h   |  1 +
 include/net/dst.h        |  5 +++
 kernel/rcu/Kconfig.debug | 15 ++++++++
 kernel/rcu/Makefile      |  1 +
 kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/route.c         |  7 +---
 net/ipv4/tcp_input.c     |  5 ++-
 8 files changed, 127 insertions(+), 7 deletions(-)
 create mode 100644 kernel/rcu/noref_debug.c

-- 
2.13.6

^ permalink raw reply

* [PATCH 1/4] rcu: introduce noref debug
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <cover.1507294365.git.pabeni@redhat.com>

We currently lack a debugging infrastructure to ensure that
long-lived noref dst are properly handled - namely dropped
or converted to a refcounted version before escaping the current
RCU section.

This changeset implements such infra, tracking the noref pointer
on a per CPU store and checking that all noref tracked at any
given RCU nesting level are cleared before leaving such RCU
section.

Each noref entity is identified using the entity pointer and an
additional, optional, key/opaque pointer. This is needed to cope
with usage case scenario where the same noref entity is stored
in different places (e.g. noref dst on skb_clone()).

To keep the patch/implementation simple RCU_NOREF_DEBUG depends
on PREEMPT_RCU=n in Kconfig.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/rcupdate.h | 11 ++++++
 kernel/rcu/Kconfig.debug | 15 ++++++++
 kernel/rcu/Makefile      |  1 +
 kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+)
 create mode 100644 kernel/rcu/noref_debug.c

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de50d8a4cf41..20c1ce08e3eb 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -60,6 +60,15 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
 void rcu_barrier_tasks(void);
 
+#ifdef CONFIG_RCU_NOREF_DEBUG
+void rcu_track_noref(void *key, void *noref, bool track);
+void __rcu_check_noref(void);
+
+#else
+static inline void rcu_track_noref(void *key, void *noref, bool add) { }
+static inline void __rcu_check_noref(void) { }
+#endif
+
 #ifdef CONFIG_PREEMPT_RCU
 
 void __rcu_read_lock(void);
@@ -85,6 +94,7 @@ static inline void __rcu_read_lock(void)
 
 static inline void __rcu_read_unlock(void)
 {
+	__rcu_check_noref();
 	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
 		preempt_enable();
 }
@@ -723,6 +733,7 @@ static inline void rcu_read_unlock_bh(void)
 			 "rcu_read_unlock_bh() used illegally while idle");
 	rcu_lock_release(&rcu_bh_lock_map);
 	__release(RCU_BH);
+	__rcu_check_noref();
 	local_bh_enable();
 }
 
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 0ec7d1d33a14..6c7f52a3e809 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -68,6 +68,21 @@ config RCU_TRACE
 	  Say Y here if you want to enable RCU tracing
 	  Say N if you are unsure.
 
+config RCU_NOREF_DEBUG
+	bool "Debugging for RCU-protected noref entries"
+	depends on PREEMPT_RCU=n
+	select PREEMPT_COUNT
+	default n
+	help
+	  In case of long lasting rcu_read_lock sections this debug
+	  feature enables one to ensure that no rcu managed dereferenced
+	  data leaves the locked section. One use case is the tracking
+	  of dst_entries in struct sk_buff ->dst, which is used to pass
+	  the dst_entry around in the whole stack while under RCU lock.
+
+	  Say Y here if you want to enable noref RCU debugging
+	  Say N if you are unsure.
+
 config RCU_EQS_DEBUG
 	bool "Provide debugging asserts for adding NO_HZ support to an arch"
 	depends on DEBUG_KERNEL
diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
index 13c0fc852767..c67d7c65c582 100644
--- a/kernel/rcu/Makefile
+++ b/kernel/rcu/Makefile
@@ -11,3 +11,4 @@ obj-$(CONFIG_TREE_RCU) += tree.o
 obj-$(CONFIG_PREEMPT_RCU) += tree.o
 obj-$(CONFIG_TINY_RCU) += tiny.o
 obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
+obj-$(CONFIG_RCU_NOREF_DEBUG) += noref_debug.o
\ No newline at end of file
diff --git a/kernel/rcu/noref_debug.c b/kernel/rcu/noref_debug.c
new file mode 100644
index 000000000000..ae2e104b11d6
--- /dev/null
+++ b/kernel/rcu/noref_debug.c
@@ -0,0 +1,89 @@
+#include <linux/bug.h>
+#include <linux/percpu.h>
+#include <linux/skbuff.h>
+#include <linux/bitops.h>
+
+#define NOREF_MAX 7
+struct noref_entry {
+	void *noref;
+	void *key;
+	int nesting;
+};
+
+struct noref_cache {
+	struct noref_entry store[NOREF_MAX];
+};
+
+static DEFINE_PER_CPU(struct noref_cache, per_cpu_noref_cache);
+
+static int noref_cache_lookup(struct noref_cache *cache, void *noref, void *key)
+{
+	int i;
+
+	for (i = 0; i < NOREF_MAX; ++i)
+		if (cache->store[i].noref == noref &&
+		    cache->store[i].key == key)
+			return i;
+
+	return -1;
+}
+
+void rcu_track_noref(void *key, void *noref, bool track)
+{
+	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
+	int i;
+
+	if (track) {
+		/* find the first empty slot */
+		i = noref_cache_lookup(cache, NULL, 0);
+		if (i < 0) {
+			WARN_ONCE(1, "can't find empty an slot to track a noref"
+				  " noref tracking will be inaccurate");
+			return;
+		}
+
+		cache->store[i].noref = noref;
+		cache->store[i].key = key;
+		cache->store[i].nesting = preempt_count();
+		return;
+	}
+
+	i = noref_cache_lookup(cache, noref, key);
+	if (i == -1) {
+		WARN_ONCE(1, "to-be-untracked noref entity %p not found in "
+			  "cache\n", noref);
+		return;
+	}
+
+	cache->store[i].noref = NULL;
+	cache->store[i].key = NULL;
+	cache->store[i].nesting = 0;
+}
+EXPORT_SYMBOL_GPL(rcu_track_noref);
+
+void __rcu_check_noref(void)
+{
+	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
+	char *cur, buf[strlen("0xffffffffffffffff") * NOREF_MAX + 1];
+	int nesting = preempt_count();
+	int i, ret, cnt = 0;
+
+	cur = buf;
+	for (i = 0; i < NOREF_MAX; ++i) {
+		if (!cache->store[i].noref ||
+		    cache->store[i].nesting != nesting)
+			continue;
+
+		cnt++;
+		ret = sprintf(cur, " %p", cache->store[i].noref);
+		if (ret >= 0)
+			cur += ret;
+		cache->store[i].noref = NULL;
+		cache->store[i].key = NULL;
+		cache->store[i].nesting = 0;
+	}
+
+	WARN_ONCE(cnt, "%d noref entities escaped an RCU section, "
+		  "nesting %d, leaked noref list %s", cnt, nesting, buf);
+}
+EXPORT_SYMBOL_GPL(__rcu_check_noref);
-- 
2.13.6

^ permalink raw reply related

* [PATCH 2/4] net: use RCU noref infrastructure to track dst noref
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <cover.1507294365.git.pabeni@redhat.com>

We can now ensure noref dsts do no escape the relevant RCU
section. This does not introduce any functional changes, it adds
the relevant debug code enabled by CONFIG_RCU_NOREF_DEBUG.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/skbuff.h | 1 +
 include/net/dst.h      | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef00061..5df77c8a0319 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -905,6 +905,7 @@ static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
 static inline void skb_dst_set_noref(struct sk_buff *skb, struct dst_entry *dst)
 {
 	WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+	rcu_track_noref(skb, dst, true);
 	skb->_skb_refdst = (unsigned long)dst | SKB_DST_NOREF;
 }
 
diff --git a/include/net/dst.h b/include/net/dst.h
index 06a6765da074..6bc6fefe178e 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -294,6 +294,8 @@ static inline void refdst_drop(unsigned long refdst)
 static inline void skb_dst_drop(struct sk_buff *skb)
 {
 	if (skb->_skb_refdst) {
+		if (skb->_skb_refdst & SKB_DST_NOREF)
+			rcu_track_noref(skb, skb_dst(skb), false);
 		refdst_drop(skb->_skb_refdst);
 		skb->_skb_refdst = 0UL;
 	}
@@ -304,6 +306,8 @@ static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst)
 	nskb->_skb_refdst = refdst;
 	if (!(nskb->_skb_refdst & SKB_DST_NOREF))
 		dst_clone(skb_dst(nskb));
+	else
+		rcu_track_noref(nskb, skb_dst(nskb), true);
 }
 
 static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb)
@@ -335,6 +339,7 @@ static inline void skb_dst_force(struct sk_buff *skb)
 		struct dst_entry *dst = skb_dst(skb);
 
 		WARN_ON(!rcu_read_lock_held());
+		rcu_track_noref(skb, dst, false);
 		if (!dst_hold_safe(dst))
 			dst = NULL;
 
-- 
2.13.6

^ permalink raw reply related

* [PATCH 3/4] ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref()
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <cover.1507294365.git.pabeni@redhat.com>

Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat on the
first ingress IPv4 packet:

       1 noref entities escaped an RCU section, nesting 259, leaked noref list  ffff8edcefb1dc00
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 0 at kernel/rcu/noref_debug.c:87 __rcu_check_noref+0xf8/0x100
       Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd mei_me ipmi_ssif sg mei mxm_wmi iTCO_wdt iTCO_vendor_support dcdbas lpc_ich ipmi_si pcspkr ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio ahci ptp crc32c_intel i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1.noref_3+ #1609
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
       task: ffffffffae019500 task.stack: ffffffffae000000
       RIP: 0010:__rcu_check_noref+0x7b/0xd0
       RSP: 0018:ffff900afbe03b30 EFLAGS: 00010246
       RAX: 0000000000000034 RBX: ffff900afbfd2500 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
       RBP: ffff900afbe03b48 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 00000000c388883e R12: 0000000000000103
       R13: 000000001d24100a R14: ffff900af13e0000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff900afbe00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00005589ce876398 CR3: 0000001ff52f9005 CR4: 00000000003606f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <IRQ>
        ip_route_input_noref+0xa6/0x150
        ? ip_route_input_noref+0x5/0x150
        ip_rcv_finish+0x78/0x5e0
        ip_rcv+0x2a7/0x540
        ? packet_rcv+0x52/0x450
        __netif_receive_skb_core+0x3b9/0xe10
        ? netif_receive_skb_internal+0x40/0x390
        __netif_receive_skb+0x18/0x60
        netif_receive_skb_internal+0x8d/0x390
        ? netif_receive_skb_internal+0x40/0x390
        napi_gro_receive+0x15c/0x1f0
        igb_clean_rx_irq+0x36d/0x7f0 [igb]
        igb_poll+0x303/0x780 [igb]
        ? save_stack_trace+0x1b/0x20
        ? __lock_acquire+0xcf2/0x11c0
        ? net_rx_action+0xb4/0x520
        net_rx_action+0x27d/0x520
        __do_softirq+0xd1/0x4f5
        irq_exit+0xfb/0x110
        do_IRQ+0x67/0x120
        common_interrupt+0xa7/0xa7
        </IRQ>
       RIP: 0010:cpuidle_enter_state+0xd0/0x360
       RSP: 0018:ffffffffae003df8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff79
       RAX: ffffffffae019500 RBX: ffffd7d67c604400 RCX: 0000000000000000
       RDX: ffffffffae019500 RSI: 0000000000000001 RDI: ffffffffae019500
       RBP: ffffffffae003e30 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000004
       R13: 0000000000000000 R14: ffffd7d67c604400 R15: 00000009c8eafd50
        ? cpuidle_enter_state+0xc9/0x360
        cpuidle_enter+0x17/0x20
        call_cpuidle+0x23/0x40
        do_idle+0x183/0x200
        cpu_startup_entry+0x73/0x80
        rest_init+0xc3/0xd0
        start_kernel+0x4f7/0x518
        ? set_init_arg+0x5a/0x5a
        x86_64_start_reservations+0x24/0x26
        x86_64_start_kernel+0x6f/0x72
        secondary_startup_64+0xa5/0xa5
       Code: f6 75 07 5b 41 5c 41 5d 5d c3 80 3d eb e4 ff 00 00 75 1a 44 89 e2 48 c7 c7 88 54 e7 ad 31 c0 c6 05 d6 e4 ff 00 01 e8 28 af fe ff <0f> ff 41 bd 07 00 00 00 48 8b 33 48 85 f6 74 06 44 39 63 10 74

The rcu protection in ip_route_input_noref() is unneeded and
misleading: the caller still needs to acquire and retain the rcu
lock until the skb - carrying a noref dst on successful return -
is either dropped or the relevant dst is forced to a ref-counted
version.

This change just drops the unneeded lock.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/route.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2d5ea4..5a6ca1f16d3f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2069,14 +2069,9 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			 u8 tos, struct net_device *dev)
 {
 	struct fib_result res;
-	int err;
 
 	tos &= IPTOS_RT_MASK;
-	rcu_read_lock();
-	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
-	rcu_read_unlock();
-
-	return err;
+	return ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
 }
 EXPORT_SYMBOL(ip_route_input_noref);
 
-- 
2.13.6

^ permalink raw reply related

* [PATCH 4/4] tcp: avoid noref dst leak on input path
From: Paolo Abeni @ 2017-10-06 12:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <cover.1507294365.git.pabeni@redhat.com>

Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
processing tcp packets:

           to-be-untracked noref entity ffff942cb71ea300 not found in cache
           ------------[ cut here ]------------
           WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 rcu_track_noref+0xa4/0xf0
           Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
           CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 4.14.0-rc1.noref_route+ #1610
           Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
           task: ffff940e48300000 task.stack: ffffaec406a20000
           RIP: 0010:rcu_track_noref+0xa4/0xf0
           RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246
           RAX: 0000000000000040 RBX: 0000000000000000 RCX: 0000000000000002
           RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292
           RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 0000000000000000
           R10: 0000000000000001 R11: 0000000000000003 R12: 0000000000000000
           R13: ffff942cb5110000 R14: 000000000000fe88 R15: ffff942cb1a20200
           FS:  0000000000000000(0000) GS:ffff942cbee00000(0000) knlGS:0000000000000000
           CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
           CR2: 00007febc072d140 CR3: 0000001feebd6002 CR4: 00000000003606e0
           DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
           DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
           Call Trace:
            tcp_data_queue+0x82a/0xce0
            tcp_rcv_established+0x283/0x570
            tcp_v4_do_rcv+0x102/0x1e0
            tcp_v4_rcv+0xa9e/0xc10
            ip_local_deliver_finish+0x128/0x380
            ? ip_local_deliver_finish+0x41/0x380
            ip_local_deliver+0x74/0x230
            ip_rcv_finish+0x105/0x5e0
            ip_rcv+0x2a7/0x540
            __netif_receive_skb_core+0x3b9/0xe10
            ? netif_receive_skb_internal+0x40/0x390
            __netif_receive_skb+0x18/0x60
            netif_receive_skb_internal+0x8d/0x390
            ? netif_receive_skb_internal+0x40/0x390
            napi_gro_complete+0x127/0x1d0
            ? napi_gro_complete+0x2a/0x1d0
            napi_gro_flush+0x5f/0x80
            napi_complete_done+0x96/0x100
            ixgbe_poll+0x5f8/0x7c0 [ixgbe]
            net_rx_action+0x27d/0x520
            __do_softirq+0xd1/0x4f5
            run_ksoftirqd+0x25/0x70
            smpboot_thread_fn+0x11a/0x1f0
            kthread+0x155/0x190
            ? sort_range+0x30/0x30
            ? kthread_create_on_node+0x70/0x70
            ret_from_fork+0x2a/0x40
           Code: e9 83 c2 01 48 83 c0 18 83 fa 07 75 ef 80 3d b0 e5 ff 00 00 75 d2 48 c7 c7 50 54 07 9d 31 c0 c6 05 9e e5 ff 00 01 e8 ef af fe ff <0f> ff 5d c3 80 3d 8f e5 ff 00 00 75 b0 48 c7 c7 00 54 07 9d 31

we must clear the skb dst before enqueuing the skb somewhere,
but currently the dst entry for packets delivered
via tcp_rcv_established() -> tcp_queue_rcv() is not cleared.

Fix it by adding the explicit drop in tcp_queue_rcv() and moving
the current skb_dst_drop() just before the other enqueuing
operation, do avoid unneeded double skb_dst_drop() for some
path.

The leak itself is not harmful, because the tcp recvmsg() code
should not access such info.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/tcp_input.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656beeee..bf4e17edfe7a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4422,6 +4422,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
+	/* drop the -possibly noref - dst before delivery the skb to ofo tree */
+	skb_dst_drop(skb);
+
 	/* Stash tstamp to avoid being stomped on by rbnode */
 	if (TCP_SKB_CB(skb)->has_rxtstamp)
 		TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
@@ -4560,6 +4563,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
 				  skb, fragstolen)) ? 1 : 0;
 	tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
 	if (!eaten) {
+		skb_dst_drop(skb);
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
 		skb_set_owner_r(skb, sk);
 	}
@@ -4626,7 +4630,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		__kfree_skb(skb);
 		return;
 	}
-	skb_dst_drop(skb);
 	__skb_pull(skb, tcp_hdr(skb)->doff * 4);
 
 	tcp_ecn_accept_cwr(tp, skb);
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH net] netfilter: x_tables: avoid stack-out-of-bounds read in xt_copy_counters_from_user
From: Pablo Neira Ayuso @ 2017-10-06 13:05 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric Dumazet, Jozsef Kadlecsik, netfilter-devel, netdev,
	Willem de Bruijn
In-Reply-To: <20171005095644.GB27522@breakpoint.cc>

On Thu, Oct 05, 2017 at 11:56:44AM +0200, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > syzkaller reports an out of bound read in strlcpy(), triggered
> > by xt_copy_counters_from_user()
> > 
> > Fix this by using memcpy(), then forcing a zero byte at the last position
> > of the destination, as Florian did for the non COMPAT code.
> > 
> > Fixes: d7591f0c41ce ("netfilter: x_tables: introduce and use xt_copy_counters_from_user")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Willem de Bruijn <willemb@google.com>
> > ---
> >  net/netfilter/x_tables.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index c83a3b5e1c6c2a91b713b6681a794bd79ab3fa08..d8571f4142080a3c121fc90f0b52d81ee9df6712 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -892,7 +892,7 @@ void *xt_copy_counters_from_user(const void __user *user, unsigned int len,
> >  		if (copy_from_user(&compat_tmp, user, sizeof(compat_tmp)) != 0)
> >  			return ERR_PTR(-EFAULT);
> >  
> > -		strlcpy(info->name, compat_tmp.name, sizeof(info->name));
> > +		memcpy(info->name, compat_tmp.name, sizeof(info->name) - 1);
> 
> Argh, right, compat_tmp.name might not be 0 terminated :-/
> 
> Acked-by: Florian Westphal <fw@strlen.de>

Applied to nf, thanks.

^ permalink raw reply

* Re: [PATCH] mwifiex: Use put_unaligned_le32
From: Kalle Valo @ 2017-10-06 13:31 UTC (permalink / raw)
  To: Himanshu Jha
  Cc: Brian Norris, amitkarwar, nishants, gbhat, huxm, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20171005190730.GA8043@himanshu-Vostro-3559>

Himanshu Jha <himanshujha199640@gmail.com> writes:

> On Thu, Oct 05, 2017 at 11:02:50AM -0700, Brian Norris wrote:
>> On Thu, Oct 05, 2017 at 08:52:33PM +0530, Himanshu Jha wrote:
>> > There are various instances where a function used in file say for eg
>> > int func_align (void* a)
>> > is used and it is defined in align.h
>> > But many files don't *directly* include align.h and rather include
>> > any other header which includes align.h
>> 
>> I believe the general rule is that you should included headers for all
>> symbols you use, and not rely on implicit includes.
>> 
>> The modification to the general rule is that not all headers are
>> intended to be included directly, and in such cases there's likely a
>> parent header that is the more appropriate target.
>> 
>> In this case, the key is CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. It
>> seems that asm-generic/unaligned.h is set up to include different
>> headers, based on the expected architecture behavior.
>>
> Yes, asm-generic/unaligned.h looks more appopriate and is most generic
> implementation of unaligned accesses and  arc specific.
>
> Let's see what Kalle Valo recommends! And then I will send v2 of the
> patch.

Not sure what you are asking from me. But if you are asking should it
be:

#include <asm/unaligned.h>

or:

#include <asm-generic/unaligned.h>

I think it should be the former.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH 0/4] RCU: introduce noref debug
From: Paul E. McKenney @ 2017-10-06 13:34 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <cover.1507294365.git.pabeni@redhat.com>

On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> The networking subsystem is currently using some kind of long-lived
> RCU-protected, references to avoid the overhead of full book-keeping.
> 
> Such references - skb_dst() noref - are stored inside the skbs and can be
> moved across relevant slices of the network stack, with the users
> being in charge of properly clearing the relevant skb - or properly refcount
> the related dst references - before the skb escapes the RCU section.
> 
> We currently don't have any deterministic debug infrastructure to check
> the dst noref usages - and the introduction of others noref artifact is
> currently under discussion.
> 
> This series tries to tackle the above introducing an RCU debug infrastructure
> aimed at spotting incorrect noref pointer usage, in patch one. The
> infrastructure is small and must be explicitly enabled via a newly introduced
> build option.
> 
> Patch two uses such infrastructure to track dst noref usage in the networking
> stack.
> 
> Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> on basic scenarios.

This patchset does not look like it handles rcu_read_lock() nesting.
For example, given code like this:

	void foo(void)
	{
		rcu_read_lock();
		rcu_track_noref(&key2, &noref2, true);
		do_something();
		rcu_track_noref(&key2, &noref2, false);
		rcu_read_unlock();
	}

	void bar(void)
	{
		rcu_read_lock();
		rcu_track_noref(&key1, &noref1, true);
		do_something_more();
		foo();
		do_something_else();
		rcu_track_noref(&key1, &noref1, false);
		rcu_read_unlock();
	}

	void grill(void)
	{
		foo();
	}

It looks like foo()'s rcu_read_unlock() will complain about key1.
You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
that will break the call from grill().

Or am I missing something subtle here?  Given patch 3/4, I suspect not...

							Thanx, Paul

> Paolo Abeni (4):
>   rcu: introduce noref debug
>   net: use RCU noref infrastructure to track dst noref
>   ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref()
>   tcp: avoid noref dst leak on input path
> 
>  include/linux/rcupdate.h | 11 ++++++
>  include/linux/skbuff.h   |  1 +
>  include/net/dst.h        |  5 +++
>  kernel/rcu/Kconfig.debug | 15 ++++++++
>  kernel/rcu/Makefile      |  1 +
>  kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/route.c         |  7 +---
>  net/ipv4/tcp_input.c     |  5 ++-
>  8 files changed, 127 insertions(+), 7 deletions(-)
>  create mode 100644 kernel/rcu/noref_debug.c
> 
> -- 
> 2.13.6
> 

^ permalink raw reply

* [PATCH 1/4] batman-adv: Start new development cycle
From: Simon Wunderlich @ 2017-10-06 13:54 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
In-Reply-To: <20171006135437.26736-1-sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX@public.gmane.org>

Signed-off-by: Simon Wunderlich <sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX@public.gmane.org>
---
 net/batman-adv/main.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index 05cc7637c064..edb2f239d04d 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -24,7 +24,7 @@
 #define BATADV_DRIVER_DEVICE "batman-adv"
 
 #ifndef BATADV_SOURCE_VERSION
-#define BATADV_SOURCE_VERSION "2017.3"
+#define BATADV_SOURCE_VERSION "2017.4"
 #endif
 
 /* B.A.T.M.A.N. parameters */
-- 
2.11.0

^ permalink raw reply related

* [PATCH 4/4] batman-adv: Add argument names for function ptr definitions
From: Simon Wunderlich @ 2017-10-06 13:54 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
In-Reply-To: <20171006135437.26736-1-sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX@public.gmane.org>

From: Sven Eckelmann <sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org>

checkpatch started to report unnamed arguments in function pointer
definitions. Add the corresponding names to these definitions to avoid this
warning.

Signed-off-by: Sven Eckelmann <sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org>
Signed-off-by: Simon Wunderlich <sw-2YrNx6rUIHYiY0qSoAWiAoQuADTiUCJX@public.gmane.org>
---
 net/batman-adv/main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 033819aefc39..4daed7ad46f2 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -73,8 +73,8 @@
  * list traversals just rcu-locked
  */
 struct list_head batadv_hardif_list;
-static int (*batadv_rx_handler[256])(struct sk_buff *,
-				     struct batadv_hard_iface *);
+static int (*batadv_rx_handler[256])(struct sk_buff *skb,
+				     struct batadv_hard_iface *recv_if);
 
 unsigned char batadv_broadcast_addr[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
@@ -540,8 +540,8 @@ batadv_recv_handler_register(u8 packet_type,
 			     int (*recv_handler)(struct sk_buff *,
 						 struct batadv_hard_iface *))
 {
-	int (*curr)(struct sk_buff *,
-		    struct batadv_hard_iface *);
+	int (*curr)(struct sk_buff *skb,
+		    struct batadv_hard_iface *recv_if);
 	curr = batadv_rx_handler[packet_type];
 
 	if (curr != batadv_recv_unhandled_packet &&
-- 
2.11.0

^ permalink raw reply related

* [PATCH 3/4] batman-adv: Fix "line over 80 characters" checkpatch warning
From: Simon Wunderlich @ 2017-10-06 13:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Sven Eckelmann, Simon Wunderlich
In-Reply-To: <20171006135437.26736-1-sw@simonwunderlich.de>

From: Sven Eckelmann <sven@narfation.org>

Fixes: 242c1a28eb61 ("net: Remove useless function skb_header_release")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
---
 net/batman-adv/soft-interface.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 7c8288245f80..3af4b0b29b23 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -69,8 +69,8 @@ int batadv_skb_head_push(struct sk_buff *skb, unsigned int len)
 	int result;
 
 	/* TODO: We must check if we can release all references to non-payload
-	 * data using __skb_header_release in our skbs to allow skb_cow_header to
-	 * work optimally. This means that those skbs are not allowed to read
+	 * data using __skb_header_release in our skbs to allow skb_cow_header
+	 * to work optimally. This means that those skbs are not allowed to read
 	 * or write any data which is before the current position of skb->data
 	 * after that call and thus allow other skbs with the same data buffer
 	 * to write freely in that area.
-- 
2.11.0

^ permalink raw reply related

* [PATCH 0/4] pull request for net-next: batman-adv 2017-10-06
From: Simon Wunderlich @ 2017-10-06 13:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Simon Wunderlich

Hi David,

here is a small cleanup pull request of batman-adv to go into net-next.

Please pull or let me know of any problem!

Thank you,
      Simon

The following changes since commit 242c1a28eb61cb34974e8aa05235d84355940a8a:

  net: Remove useless function skb_header_release (2017-09-22 20:43:13 -0700)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batadv-next-for-davem-20171006

for you to fetch changes up to 706cc9f51d9a22528af18d4b3ffbd17b30a1d3b0:

  batman-adv: Add argument names for function ptr definitions (2017-09-30 09:31:34 +0200)

----------------------------------------------------------------
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Cleanup patches to make checkpatch happy, by Sven Eckelmann (3 patches)

----------------------------------------------------------------
Simon Wunderlich (1):
      batman-adv: Start new development cycle

Sven Eckelmann (3):
      batman-adv: Remove unnecessary parentheses
      batman-adv: Fix "line over 80 characters" checkpatch warning
      batman-adv: Add argument names for function ptr definitions

 net/batman-adv/bat_iv_ogm.c            | 24 ++++++++++++------------
 net/batman-adv/bat_v.c                 |  2 +-
 net/batman-adv/bat_v_elp.c             |  6 +++---
 net/batman-adv/bat_v_ogm.c             | 12 ++++++------
 net/batman-adv/distributed-arp-table.c |  4 ++--
 net/batman-adv/gateway_client.c        |  8 ++++----
 net/batman-adv/gateway_common.c        | 18 +++++++++---------
 net/batman-adv/hard-interface.c        | 12 ++++++------
 net/batman-adv/icmp_socket.c           |  4 ++--
 net/batman-adv/main.c                  | 12 ++++++------
 net/batman-adv/main.h                  |  2 +-
 net/batman-adv/multicast.c             |  2 +-
 net/batman-adv/originator.c            | 26 +++++++++++++-------------
 net/batman-adv/routing.c               |  6 +++---
 net/batman-adv/send.c                  |  6 +++---
 net/batman-adv/soft-interface.c        |  6 +++---
 net/batman-adv/sysfs.c                 |  4 ++--
 net/batman-adv/tp_meter.c              |  2 +-
 18 files changed, 78 insertions(+), 78 deletions(-)

^ permalink raw reply

* [PATCH 2/4] batman-adv: Remove unnecessary parentheses
From: Simon Wunderlich @ 2017-10-06 13:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Sven Eckelmann, Simon Wunderlich
In-Reply-To: <20171006135437.26736-1-sw@simonwunderlich.de>

From: Sven Eckelmann <sven@narfation.org>

checkpatch introduced with commit 63b7c73ec86b ("checkpatch: add --strict
check for ifs with unnecessary parentheses") an additional test which
identifies some unnecessary parentheses.

Remove these unnecessary parentheses to avoid the warnings and to unify the
coding style slightly more.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
---
 net/batman-adv/bat_iv_ogm.c            | 24 ++++++++++++------------
 net/batman-adv/bat_v.c                 |  2 +-
 net/batman-adv/bat_v_elp.c             |  6 +++---
 net/batman-adv/bat_v_ogm.c             | 12 ++++++------
 net/batman-adv/distributed-arp-table.c |  4 ++--
 net/batman-adv/gateway_client.c        |  8 ++++----
 net/batman-adv/gateway_common.c        | 18 +++++++++---------
 net/batman-adv/hard-interface.c        | 12 ++++++------
 net/batman-adv/icmp_socket.c           |  4 ++--
 net/batman-adv/main.c                  |  4 ++--
 net/batman-adv/multicast.c             |  2 +-
 net/batman-adv/originator.c            | 26 +++++++++++++-------------
 net/batman-adv/routing.c               |  6 +++---
 net/batman-adv/send.c                  |  6 +++---
 net/batman-adv/soft-interface.c        |  2 +-
 net/batman-adv/sysfs.c                 |  4 ++--
 net/batman-adv/tp_meter.c              |  2 +-
 17 files changed, 71 insertions(+), 71 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 83ba5483455a..1b659ab652fb 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -916,8 +916,8 @@ static void batadv_iv_ogm_schedule(struct batadv_hard_iface *hard_iface)
 	u16 tvlv_len = 0;
 	unsigned long send_time;
 
-	if ((hard_iface->if_status == BATADV_IF_NOT_IN_USE) ||
-	    (hard_iface->if_status == BATADV_IF_TO_BE_REMOVED))
+	if (hard_iface->if_status == BATADV_IF_NOT_IN_USE ||
+	    hard_iface->if_status == BATADV_IF_TO_BE_REMOVED)
 		return;
 
 	/* the interface gets activated here to avoid race conditions between
@@ -1264,7 +1264,7 @@ static bool batadv_iv_ogm_calc_tq(struct batadv_orig_node *orig_node,
 	 * drops as they can't send and receive at the same time.
 	 */
 	tq_iface_penalty = BATADV_TQ_MAX_VALUE;
-	if (if_outgoing && (if_incoming == if_outgoing) &&
+	if (if_outgoing && if_incoming == if_outgoing &&
 	    batadv_is_wifi_hardif(if_outgoing))
 		tq_iface_penalty = batadv_hop_penalty(BATADV_TQ_MAX_VALUE,
 						      bat_priv);
@@ -1369,7 +1369,7 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr,
 				ret = BATADV_NEIGH_DUP;
 		} else {
 			set_mark = 0;
-			if (is_dup && (ret != BATADV_NEIGH_DUP))
+			if (is_dup && ret != BATADV_NEIGH_DUP)
 				ret = BATADV_ORIG_DUP;
 		}
 
@@ -1515,7 +1515,7 @@ batadv_iv_ogm_process_per_outif(const struct sk_buff *skb, int ogm_offset,
 	/* drop packet if sender is not a direct neighbor and if we
 	 * don't route towards it
 	 */
-	if (!is_single_hop_neigh && (!orig_neigh_router)) {
+	if (!is_single_hop_neigh && !orig_neigh_router) {
 		batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
 			   "Drop packet: OGM via unknown neighbor!\n");
 		goto out_neigh;
@@ -1535,7 +1535,7 @@ batadv_iv_ogm_process_per_outif(const struct sk_buff *skb, int ogm_offset,
 	sameseq = orig_ifinfo->last_real_seqno == ntohl(ogm_packet->seqno);
 	similar_ttl = (orig_ifinfo->last_ttl - 3) <= ogm_packet->ttl;
 
-	if (is_bidirect && ((dup_status == BATADV_NO_DUP) ||
+	if (is_bidirect && (dup_status == BATADV_NO_DUP ||
 			    (sameseq && similar_ttl))) {
 		batadv_iv_ogm_orig_update(bat_priv, orig_node,
 					  orig_ifinfo, ethhdr,
@@ -1553,8 +1553,8 @@ batadv_iv_ogm_process_per_outif(const struct sk_buff *skb, int ogm_offset,
 		/* OGMs from secondary interfaces should only scheduled once
 		 * per interface where it has been received, not multiple times
 		 */
-		if ((ogm_packet->ttl <= 2) &&
-		    (if_incoming != if_outgoing)) {
+		if (ogm_packet->ttl <= 2 &&
+		    if_incoming != if_outgoing) {
 			batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
 				   "Drop packet: OGM from secondary interface and wrong outgoing interface\n");
 			goto out_neigh;
@@ -1590,7 +1590,7 @@ batadv_iv_ogm_process_per_outif(const struct sk_buff *skb, int ogm_offset,
 			      if_incoming, if_outgoing);
 
 out_neigh:
-	if ((orig_neigh_node) && (!is_single_hop_neigh))
+	if (orig_neigh_node && !is_single_hop_neigh)
 		batadv_orig_node_put(orig_neigh_node);
 out:
 	if (router_ifinfo)
@@ -2523,9 +2523,9 @@ batadv_iv_gw_get_best_gw_node(struct batadv_priv *bat_priv)
 			tmp_gw_factor *= 100 * 100;
 			tmp_gw_factor >>= 18;
 
-			if ((tmp_gw_factor > max_gw_factor) ||
-			    ((tmp_gw_factor == max_gw_factor) &&
-			     (tq_avg > max_tq))) {
+			if (tmp_gw_factor > max_gw_factor ||
+			    (tmp_gw_factor == max_gw_factor &&
+			     tq_avg > max_tq)) {
 				if (curr_gw)
 					batadv_gw_node_put(curr_gw);
 				curr_gw = gw_node;
diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c
index 4e2724c5b33d..93ef1c06227e 100644
--- a/net/batman-adv/bat_v.c
+++ b/net/batman-adv/bat_v.c
@@ -767,7 +767,7 @@ batadv_v_gw_get_best_gw_node(struct batadv_priv *bat_priv)
 		if (batadv_v_gw_throughput_get(gw_node, &bw) < 0)
 			goto next;
 
-		if (curr_gw && (bw <= max_bw))
+		if (curr_gw && bw <= max_bw)
 			goto next;
 
 		if (curr_gw)
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index bd1064d98e16..1de992c58b35 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -134,7 +134,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
 			hard_iface->bat_v.flags &= ~BATADV_FULL_DUPLEX;
 
 		throughput = link_settings.base.speed;
-		if (throughput && (throughput != SPEED_UNKNOWN))
+		if (throughput && throughput != SPEED_UNKNOWN)
 			return throughput * 10;
 	}
 
@@ -263,8 +263,8 @@ static void batadv_v_elp_periodic_work(struct work_struct *work)
 		goto out;
 
 	/* we are in the process of shutting this interface down */
-	if ((hard_iface->if_status == BATADV_IF_NOT_IN_USE) ||
-	    (hard_iface->if_status == BATADV_IF_TO_BE_REMOVED))
+	if (hard_iface->if_status == BATADV_IF_NOT_IN_USE ||
+	    hard_iface->if_status == BATADV_IF_TO_BE_REMOVED)
 		goto out;
 
 	/* the interface was enabled but may not be ready yet */
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c
index 8be61734fc43..c251445a42a0 100644
--- a/net/batman-adv/bat_v_ogm.c
+++ b/net/batman-adv/bat_v_ogm.c
@@ -304,8 +304,8 @@ static u32 batadv_v_forward_penalty(struct batadv_priv *bat_priv,
 	 * due to the store & forward characteristics of WIFI.
 	 * Very low throughput values are the exception.
 	 */
-	if ((throughput > 10) &&
-	    (if_incoming == if_outgoing) &&
+	if (throughput > 10 &&
+	    if_incoming == if_outgoing &&
 	    !(if_incoming->bat_v.flags & BATADV_FULL_DUPLEX))
 		return throughput / 2;
 
@@ -455,7 +455,7 @@ static int batadv_v_ogm_metric_update(struct batadv_priv *bat_priv,
 	/* drop packets with old seqnos, however accept the first packet after
 	 * a host has been rebooted.
 	 */
-	if ((seq_diff < 0) && !protection_started)
+	if (seq_diff < 0 && !protection_started)
 		goto out;
 
 	neigh_node->last_seen = jiffies;
@@ -568,8 +568,8 @@ static bool batadv_v_ogm_route_update(struct batadv_priv *bat_priv,
 		router_throughput = router_ifinfo->bat_v.throughput;
 		neigh_throughput = neigh_ifinfo->bat_v.throughput;
 
-		if ((neigh_seq_diff < BATADV_OGM_MAX_ORIGDIFF) &&
-		    (router_throughput >= neigh_throughput))
+		if (neigh_seq_diff < BATADV_OGM_MAX_ORIGDIFF &&
+		    router_throughput >= neigh_throughput)
 			goto out;
 	}
 
@@ -621,7 +621,7 @@ batadv_v_ogm_process_per_outif(struct batadv_priv *bat_priv,
 		return;
 
 	/* only unknown & newer OGMs contain TVLVs we are interested in */
-	if ((seqno_age > 0) && (if_outgoing == BATADV_IF_DEFAULT))
+	if (seqno_age > 0 && if_outgoing == BATADV_IF_DEFAULT)
 		batadv_tvlv_containers_process(bat_priv, true, orig_node,
 					       NULL, NULL,
 					       (unsigned char *)(ogm2 + 1),
diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index b6cfa78e9381..760c0de72582 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -492,8 +492,8 @@ static bool batadv_is_orig_node_eligible(struct batadv_dat_candidate *res,
 	/* this is an hash collision with the temporary selected node. Choose
 	 * the one with the lowest address
 	 */
-	if ((tmp_max == max) && max_orig_node &&
-	    (batadv_compare_eth(candidate->orig, max_orig_node->orig) > 0))
+	if (tmp_max == max && max_orig_node &&
+	    batadv_compare_eth(candidate->orig, max_orig_node->orig) > 0)
 		goto out;
 
 	ret = true;
diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
index de9955d5224d..10d521f0b17f 100644
--- a/net/batman-adv/gateway_client.c
+++ b/net/batman-adv/gateway_client.c
@@ -248,12 +248,12 @@ void batadv_gw_election(struct batadv_priv *bat_priv)
 		}
 	}
 
-	if ((curr_gw) && (!next_gw)) {
+	if (curr_gw && !next_gw) {
 		batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
 			   "Removing selected gateway - no gateway in range\n");
 		batadv_throw_uevent(bat_priv, BATADV_UEV_GW, BATADV_UEV_DEL,
 				    NULL);
-	} else if ((!curr_gw) && (next_gw)) {
+	} else if (!curr_gw && next_gw) {
 		batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
 			   "Adding route to gateway %pM (bandwidth: %u.%u/%u.%u MBit, tq: %i)\n",
 			   next_gw->orig_node->orig,
@@ -411,8 +411,8 @@ void batadv_gw_node_update(struct batadv_priv *bat_priv,
 		goto out;
 	}
 
-	if ((gw_node->bandwidth_down == ntohl(gateway->bandwidth_down)) &&
-	    (gw_node->bandwidth_up == ntohl(gateway->bandwidth_up)))
+	if (gw_node->bandwidth_down == ntohl(gateway->bandwidth_down) &&
+	    gw_node->bandwidth_up == ntohl(gateway->bandwidth_up))
 		goto out;
 
 	batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
diff --git a/net/batman-adv/gateway_common.c b/net/batman-adv/gateway_common.c
index 33940c5c74a8..2c26039c23fc 100644
--- a/net/batman-adv/gateway_common.c
+++ b/net/batman-adv/gateway_common.c
@@ -56,8 +56,8 @@ bool batadv_parse_throughput(struct net_device *net_dev, char *buff,
 		if (strncasecmp(tmp_ptr, "mbit", 4) == 0)
 			bw_unit_type = BATADV_BW_UNIT_MBIT;
 
-		if ((strncasecmp(tmp_ptr, "kbit", 4) == 0) ||
-		    (bw_unit_type == BATADV_BW_UNIT_MBIT))
+		if (strncasecmp(tmp_ptr, "kbit", 4) == 0 ||
+		    bw_unit_type == BATADV_BW_UNIT_MBIT)
 			*tmp_ptr = '\0';
 	}
 
@@ -190,7 +190,7 @@ ssize_t batadv_gw_bandwidth_set(struct net_device *net_dev, char *buff,
 	if (!up_new)
 		up_new = 1;
 
-	if ((down_curr == down_new) && (up_curr == up_new))
+	if (down_curr == down_new && up_curr == up_new)
 		return count;
 
 	batadv_gw_reselect(bat_priv);
@@ -224,16 +224,16 @@ static void batadv_gw_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
 	/* only fetch the tvlv value if the handler wasn't called via the
 	 * CIFNOTFND flag and if there is data to fetch
 	 */
-	if ((flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) ||
-	    (tvlv_value_len < sizeof(gateway))) {
+	if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND ||
+	    tvlv_value_len < sizeof(gateway)) {
 		gateway.bandwidth_down = 0;
 		gateway.bandwidth_up = 0;
 	} else {
 		gateway_ptr = tvlv_value;
 		gateway.bandwidth_down = gateway_ptr->bandwidth_down;
 		gateway.bandwidth_up = gateway_ptr->bandwidth_up;
-		if ((gateway.bandwidth_down == 0) ||
-		    (gateway.bandwidth_up == 0)) {
+		if (gateway.bandwidth_down == 0 ||
+		    gateway.bandwidth_up == 0) {
 			gateway.bandwidth_down = 0;
 			gateway.bandwidth_up = 0;
 		}
@@ -242,8 +242,8 @@ static void batadv_gw_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
 	batadv_gw_node_update(bat_priv, orig, &gateway);
 
 	/* restart gateway selection */
-	if ((gateway.bandwidth_down != 0) &&
-	    (atomic_read(&bat_priv->gw.mode) == BATADV_GW_MODE_CLIENT))
+	if (gateway.bandwidth_down != 0 &&
+	    atomic_read(&bat_priv->gw.mode) == BATADV_GW_MODE_CLIENT)
 		batadv_gw_check_election(bat_priv, orig);
 }
 
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index e348f76ea8c1..d4aa99c060f9 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -504,8 +504,8 @@ static void batadv_check_known_mac_addr(const struct net_device *net_dev)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(hard_iface, &batadv_hardif_list, list) {
-		if ((hard_iface->if_status != BATADV_IF_ACTIVE) &&
-		    (hard_iface->if_status != BATADV_IF_TO_BE_ACTIVATED))
+		if (hard_iface->if_status != BATADV_IF_ACTIVE &&
+		    hard_iface->if_status != BATADV_IF_TO_BE_ACTIVATED)
 			continue;
 
 		if (hard_iface->net_dev == net_dev)
@@ -568,8 +568,8 @@ int batadv_hardif_min_mtu(struct net_device *soft_iface)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(hard_iface, &batadv_hardif_list, list) {
-		if ((hard_iface->if_status != BATADV_IF_ACTIVE) &&
-		    (hard_iface->if_status != BATADV_IF_TO_BE_ACTIVATED))
+		if (hard_iface->if_status != BATADV_IF_ACTIVE &&
+		    hard_iface->if_status != BATADV_IF_TO_BE_ACTIVATED)
 			continue;
 
 		if (hard_iface->soft_iface != soft_iface)
@@ -654,8 +654,8 @@ batadv_hardif_activate_interface(struct batadv_hard_iface *hard_iface)
 static void
 batadv_hardif_deactivate_interface(struct batadv_hard_iface *hard_iface)
 {
-	if ((hard_iface->if_status != BATADV_IF_ACTIVE) &&
-	    (hard_iface->if_status != BATADV_IF_TO_BE_ACTIVATED))
+	if (hard_iface->if_status != BATADV_IF_ACTIVE &&
+	    hard_iface->if_status != BATADV_IF_TO_BE_ACTIVATED)
 		return;
 
 	hard_iface->if_status = BATADV_IF_INACTIVE;
diff --git a/net/batman-adv/icmp_socket.c b/net/batman-adv/icmp_socket.c
index 8ead292886d1..bded31121d12 100644
--- a/net/batman-adv/icmp_socket.c
+++ b/net/batman-adv/icmp_socket.c
@@ -132,10 +132,10 @@ static ssize_t batadv_socket_read(struct file *file, char __user *buf,
 	size_t packet_len;
 	int error;
 
-	if ((file->f_flags & O_NONBLOCK) && (socket_client->queue_len == 0))
+	if ((file->f_flags & O_NONBLOCK) && socket_client->queue_len == 0)
 		return -EAGAIN;
 
-	if ((!buf) || (count < sizeof(struct batadv_icmp_packet)))
+	if (!buf || count < sizeof(struct batadv_icmp_packet))
 		return -EINVAL;
 
 	if (!access_ok(VERIFY_WRITE, buf, count))
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index fb381fb26a66..033819aefc39 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -544,8 +544,8 @@ batadv_recv_handler_register(u8 packet_type,
 		    struct batadv_hard_iface *);
 	curr = batadv_rx_handler[packet_type];
 
-	if ((curr != batadv_recv_unhandled_packet) &&
-	    (curr != batadv_recv_unhandled_unicast_packet))
+	if (curr != batadv_recv_unhandled_packet &&
+	    curr != batadv_recv_unhandled_unicast_packet)
 		return -EBUSY;
 
 	batadv_rx_handler[packet_type] = recv_handler;
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
index d327670641ac..e553a8770a89 100644
--- a/net/batman-adv/multicast.c
+++ b/net/batman-adv/multicast.c
@@ -1126,7 +1126,7 @@ static void batadv_mcast_tvlv_ogm_handler(struct batadv_priv *bat_priv,
 	bool orig_initialized;
 
 	if (orig_mcast_enabled && tvlv_value &&
-	    (tvlv_value_len >= sizeof(mcast_flags)))
+	    tvlv_value_len >= sizeof(mcast_flags))
 		mcast_flags = *(u8 *)tvlv_value;
 
 	spin_lock_bh(&orig->mcast_handler_lock);
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 8e2a4b205257..2967b86c13da 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -1062,9 +1062,9 @@ batadv_purge_neigh_ifinfo(struct batadv_priv *bat_priv,
 			continue;
 
 		/* don't purge if the interface is not (going) down */
-		if ((if_outgoing->if_status != BATADV_IF_INACTIVE) &&
-		    (if_outgoing->if_status != BATADV_IF_NOT_IN_USE) &&
-		    (if_outgoing->if_status != BATADV_IF_TO_BE_REMOVED))
+		if (if_outgoing->if_status != BATADV_IF_INACTIVE &&
+		    if_outgoing->if_status != BATADV_IF_NOT_IN_USE &&
+		    if_outgoing->if_status != BATADV_IF_TO_BE_REMOVED)
 			continue;
 
 		batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
@@ -1106,9 +1106,9 @@ batadv_purge_orig_ifinfo(struct batadv_priv *bat_priv,
 			continue;
 
 		/* don't purge if the interface is not (going) down */
-		if ((if_outgoing->if_status != BATADV_IF_INACTIVE) &&
-		    (if_outgoing->if_status != BATADV_IF_NOT_IN_USE) &&
-		    (if_outgoing->if_status != BATADV_IF_TO_BE_REMOVED))
+		if (if_outgoing->if_status != BATADV_IF_INACTIVE &&
+		    if_outgoing->if_status != BATADV_IF_NOT_IN_USE &&
+		    if_outgoing->if_status != BATADV_IF_TO_BE_REMOVED)
 			continue;
 
 		batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
@@ -1155,13 +1155,13 @@ batadv_purge_orig_neighbors(struct batadv_priv *bat_priv,
 		last_seen = neigh_node->last_seen;
 		if_incoming = neigh_node->if_incoming;
 
-		if ((batadv_has_timed_out(last_seen, BATADV_PURGE_TIMEOUT)) ||
-		    (if_incoming->if_status == BATADV_IF_INACTIVE) ||
-		    (if_incoming->if_status == BATADV_IF_NOT_IN_USE) ||
-		    (if_incoming->if_status == BATADV_IF_TO_BE_REMOVED)) {
-			if ((if_incoming->if_status == BATADV_IF_INACTIVE) ||
-			    (if_incoming->if_status == BATADV_IF_NOT_IN_USE) ||
-			    (if_incoming->if_status == BATADV_IF_TO_BE_REMOVED))
+		if (batadv_has_timed_out(last_seen, BATADV_PURGE_TIMEOUT) ||
+		    if_incoming->if_status == BATADV_IF_INACTIVE ||
+		    if_incoming->if_status == BATADV_IF_NOT_IN_USE ||
+		    if_incoming->if_status == BATADV_IF_TO_BE_REMOVED) {
+			if (if_incoming->if_status == BATADV_IF_INACTIVE ||
+			    if_incoming->if_status == BATADV_IF_NOT_IN_USE ||
+			    if_incoming->if_status == BATADV_IF_TO_BE_REMOVED)
 				batadv_dbg(BATADV_DBG_BATMAN, bat_priv,
 					   "neighbor purge: originator %pM, neighbor: %pM, iface: %s\n",
 					   orig_node->orig, neigh_node->addr,
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index f10e3ff26f9d..40d9bf3e5bfe 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -93,14 +93,14 @@ static void _batadv_update_route(struct batadv_priv *bat_priv,
 	batadv_orig_ifinfo_put(orig_ifinfo);
 
 	/* route deleted */
-	if ((curr_router) && (!neigh_node)) {
+	if (curr_router && !neigh_node) {
 		batadv_dbg(BATADV_DBG_ROUTES, bat_priv,
 			   "Deleting route towards: %pM\n", orig_node->orig);
 		batadv_tt_global_del_orig(bat_priv, orig_node, -1,
 					  "Deleted route towards originator");
 
 	/* route added */
-	} else if ((!curr_router) && (neigh_node)) {
+	} else if (!curr_router && neigh_node) {
 		batadv_dbg(BATADV_DBG_ROUTES, bat_priv,
 			   "Adding route towards: %pM (via %pM)\n",
 			   orig_node->orig, neigh_node->addr);
@@ -381,7 +381,7 @@ int batadv_recv_icmp_packet(struct sk_buff *skb,
 	/* add record route information if not full */
 	if ((icmph->msg_type == BATADV_ECHO_REPLY ||
 	     icmph->msg_type == BATADV_ECHO_REQUEST) &&
-	    (skb->len >= sizeof(struct batadv_icmp_packet_rr))) {
+	    skb->len >= sizeof(struct batadv_icmp_packet_rr)) {
 		if (skb_linearize(skb) < 0)
 			goto free_skb;
 
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
index 054a65e6eb68..7895323fd2a7 100644
--- a/net/batman-adv/send.c
+++ b/net/batman-adv/send.c
@@ -142,7 +142,7 @@ int batadv_send_unicast_skb(struct sk_buff *skb,
 #ifdef CONFIG_BATMAN_ADV_BATMAN_V
 	hardif_neigh = batadv_hardif_neigh_get(neigh->if_incoming, neigh->addr);
 
-	if ((hardif_neigh) && (ret != NET_XMIT_DROP))
+	if (hardif_neigh && ret != NET_XMIT_DROP)
 		hardif_neigh->bat_v.last_unicast_tx = jiffies;
 
 	if (hardif_neigh)
@@ -615,8 +615,8 @@ batadv_forw_packet_list_steal(struct hlist_head *forw_list,
 		 * we delete only packets belonging to the given interface
 		 */
 		if (hard_iface &&
-		    (forw_packet->if_incoming != hard_iface) &&
-		    (forw_packet->if_outgoing != hard_iface))
+		    forw_packet->if_incoming != hard_iface &&
+		    forw_packet->if_outgoing != hard_iface)
 			continue;
 
 		hlist_del(&forw_packet->list);
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index c2c986746d0b..7c8288245f80 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -160,7 +160,7 @@ static int batadv_interface_set_mac_addr(struct net_device *dev, void *p)
 static int batadv_interface_change_mtu(struct net_device *dev, int new_mtu)
 {
 	/* check ranges */
-	if ((new_mtu < 68) || (new_mtu > batadv_hardif_min_mtu(dev)))
+	if (new_mtu < 68 || new_mtu > batadv_hardif_min_mtu(dev))
 		return -EINVAL;
 
 	dev->mtu = new_mtu;
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
index 0ae8b30e4eaa..aa187fd42475 100644
--- a/net/batman-adv/sysfs.c
+++ b/net/batman-adv/sysfs.c
@@ -925,8 +925,8 @@ static int batadv_store_mesh_iface_finish(struct net_device *net_dev,
 	if (hard_iface->if_status == status_tmp)
 		goto out;
 
-	if ((hard_iface->soft_iface) &&
-	    (strncmp(hard_iface->soft_iface->name, ifname, IFNAMSIZ) == 0))
+	if (hard_iface->soft_iface &&
+	    strncmp(hard_iface->soft_iface->name, ifname, IFNAMSIZ) == 0)
 		goto out;
 
 	if (status_tmp == BATADV_IF_NOT_IN_USE) {
diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c
index bfe8effe9238..4b90033f35a8 100644
--- a/net/batman-adv/tp_meter.c
+++ b/net/batman-adv/tp_meter.c
@@ -1206,7 +1206,7 @@ static int batadv_tp_send_ack(struct batadv_priv *bat_priv, const u8 *dst,
 
 	/* send the ack */
 	r = batadv_send_skb_to_orig(skb, orig_node, NULL);
-	if (unlikely(r < 0) || (r == NET_XMIT_DROP)) {
+	if (unlikely(r < 0) || r == NET_XMIT_DROP) {
 		ret = BATADV_TP_REASON_DST_UNREACHABLE;
 		goto out;
 	}
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 1/4] rcu: introduce noref debug
From: Steven Rostedt @ 2017-10-06 14:13 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Paul E. McKenney, Josh Triplett, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <bc39fa36c0765e6d2f7dab82e4c6460989669957.1507294365.git.pabeni@redhat.com>

On Fri,  6 Oct 2017 14:57:46 +0200
Paolo Abeni <pabeni@redhat.com> wrote:

> We currently lack a debugging infrastructure to ensure that
> long-lived noref dst are properly handled - namely dropped
> or converted to a refcounted version before escaping the current
> RCU section.
> 
> This changeset implements such infra, tracking the noref pointer
> on a per CPU store and checking that all noref tracked at any
> given RCU nesting level are cleared before leaving such RCU
> section.
> 
> Each noref entity is identified using the entity pointer and an
> additional, optional, key/opaque pointer. This is needed to cope
> with usage case scenario where the same noref entity is stored
> in different places (e.g. noref dst on skb_clone()).
> 
> To keep the patch/implementation simple RCU_NOREF_DEBUG depends
> on PREEMPT_RCU=n in Kconfig.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/rcupdate.h | 11 ++++++
>  kernel/rcu/Kconfig.debug | 15 ++++++++
>  kernel/rcu/Makefile      |  1 +
>  kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 116 insertions(+)
>  create mode 100644 kernel/rcu/noref_debug.c
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index de50d8a4cf41..20c1ce08e3eb 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -60,6 +60,15 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
>  void synchronize_sched(void);
>  void rcu_barrier_tasks(void);
>  
> +#ifdef CONFIG_RCU_NOREF_DEBUG
> +void rcu_track_noref(void *key, void *noref, bool track);
> +void __rcu_check_noref(void);
> +
> +#else
> +static inline void rcu_track_noref(void *key, void *noref, bool add) { }
> +static inline void __rcu_check_noref(void) { }
> +#endif
> +
>  #ifdef CONFIG_PREEMPT_RCU
>  
>  void __rcu_read_lock(void);
> @@ -85,6 +94,7 @@ static inline void __rcu_read_lock(void)
>  
>  static inline void __rcu_read_unlock(void)
>  {
> +	__rcu_check_noref();
>  	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
>  		preempt_enable();
>  }
> @@ -723,6 +733,7 @@ static inline void rcu_read_unlock_bh(void)
>  			 "rcu_read_unlock_bh() used illegally while idle");
>  	rcu_lock_release(&rcu_bh_lock_map);
>  	__release(RCU_BH);
> +	__rcu_check_noref();
>  	local_bh_enable();
>  }
>  
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 0ec7d1d33a14..6c7f52a3e809 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -68,6 +68,21 @@ config RCU_TRACE
>  	  Say Y here if you want to enable RCU tracing
>  	  Say N if you are unsure.
>  
> +config RCU_NOREF_DEBUG
> +	bool "Debugging for RCU-protected noref entries"
> +	depends on PREEMPT_RCU=n
> +	select PREEMPT_COUNT
> +	default n
> +	help
> +	  In case of long lasting rcu_read_lock sections this debug
> +	  feature enables one to ensure that no rcu managed dereferenced
> +	  data leaves the locked section. One use case is the tracking
> +	  of dst_entries in struct sk_buff ->dst, which is used to pass
> +	  the dst_entry around in the whole stack while under RCU lock.
> +
> +	  Say Y here if you want to enable noref RCU debugging
> +	  Say N if you are unsure.
> +
>  config RCU_EQS_DEBUG
>  	bool "Provide debugging asserts for adding NO_HZ support to an arch"
>  	depends on DEBUG_KERNEL
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 13c0fc852767..c67d7c65c582 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -11,3 +11,4 @@ obj-$(CONFIG_TREE_RCU) += tree.o
>  obj-$(CONFIG_PREEMPT_RCU) += tree.o
>  obj-$(CONFIG_TINY_RCU) += tiny.o
>  obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
> +obj-$(CONFIG_RCU_NOREF_DEBUG) += noref_debug.o
> \ No newline at end of file
> diff --git a/kernel/rcu/noref_debug.c b/kernel/rcu/noref_debug.c
> new file mode 100644
> index 000000000000..ae2e104b11d6
> --- /dev/null
> +++ b/kernel/rcu/noref_debug.c
> @@ -0,0 +1,89 @@
> +#include <linux/bug.h>
> +#include <linux/percpu.h>
> +#include <linux/skbuff.h>
> +#include <linux/bitops.h>
> +
> +#define NOREF_MAX 7
> +struct noref_entry {
> +	void *noref;
> +	void *key;
> +	int nesting;
> +};
> +
> +struct noref_cache {
> +	struct noref_entry store[NOREF_MAX];
> +};
> +
> +static DEFINE_PER_CPU(struct noref_cache, per_cpu_noref_cache);
> +
> +static int noref_cache_lookup(struct noref_cache *cache, void *noref, void *key)
> +{
> +	int i;
> +
> +	for (i = 0; i < NOREF_MAX; ++i)
> +		if (cache->store[i].noref == noref &&
> +		    cache->store[i].key == key)
> +			return i;
> +
> +	return -1;
> +}
> +

Please add a comment above this function on how to use it.

-- Steve

> +void rcu_track_noref(void *key, void *noref, bool track)
> +{
> +	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
> +	int i;
> +
> +	if (track) {
> +		/* find the first empty slot */
> +		i = noref_cache_lookup(cache, NULL, 0);
> +		if (i < 0) {
> +			WARN_ONCE(1, "can't find empty an slot to track a noref"
> +				  " noref tracking will be inaccurate");
> +			return;
> +		}
> +
> +		cache->store[i].noref = noref;
> +		cache->store[i].key = key;
> +		cache->store[i].nesting = preempt_count();
> +		return;
> +	}
> +
> +	i = noref_cache_lookup(cache, noref, key);
> +	if (i == -1) {
> +		WARN_ONCE(1, "to-be-untracked noref entity %p not found in "
> +			  "cache\n", noref);
> +		return;
> +	}
> +
> +	cache->store[i].noref = NULL;
> +	cache->store[i].key = NULL;
> +	cache->store[i].nesting = 0;
> +}
> +EXPORT_SYMBOL_GPL(rcu_track_noref);
> +
> +void __rcu_check_noref(void)
> +{
> +	struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache);
> +	char *cur, buf[strlen("0xffffffffffffffff") * NOREF_MAX + 1];
> +	int nesting = preempt_count();
> +	int i, ret, cnt = 0;
> +
> +	cur = buf;
> +	for (i = 0; i < NOREF_MAX; ++i) {
> +		if (!cache->store[i].noref ||
> +		    cache->store[i].nesting != nesting)
> +			continue;
> +
> +		cnt++;
> +		ret = sprintf(cur, " %p", cache->store[i].noref);
> +		if (ret >= 0)
> +			cur += ret;
> +		cache->store[i].noref = NULL;
> +		cache->store[i].key = NULL;
> +		cache->store[i].nesting = 0;
> +	}
> +
> +	WARN_ONCE(cnt, "%d noref entities escaped an RCU section, "
> +		  "nesting %d, leaked noref list %s", cnt, nesting, buf);
> +}
> +EXPORT_SYMBOL_GPL(__rcu_check_noref);

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox