* [patch net] net: sched: fix error path in tcf_proto_create() when modules are not configured
From: Jiri Pirko @ 2018-05-11 15:45 UTC (permalink / raw)
To: netdev; +Cc: davem, jhs, xiyou.wangcong, mlxsw
From: Jiri Pirko <jiri@mellanox.com>
In case modules are not configured, error out when tp->ops is null
and prevent later null pointer dereference.
Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
net/sched/cls_api.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b66754f52a9f..963e4bf0aab8 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -152,8 +152,8 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
NL_SET_ERR_MSG(extack, "TC classifier not found");
err = -ENOENT;
}
- goto errout;
#endif
+ goto errout;
}
tp->classify = tp->ops->classify;
tp->protocol = protocol;
--
2.14.3
^ permalink raw reply related
* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-11 15:43 UTC (permalink / raw)
To: Randy Dunlap, mst, stephen, davem, netdev, virtualization,
virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
jasowang, loseweigh, jiri, aaron.f.brown
In-Reply-To: <460f3d8f-b2ec-2118-e296-03f4f9655c5a@infradead.org>
On 5/7/2018 3:39 PM, Randy Dunlap wrote:
> Hi,
>
> On 05/07/2018 03:10 PM, Sridhar Samudrala wrote:
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>> MAINTAINERS | 7 +
>> include/linux/netdevice.h | 16 +
>> include/net/net_failover.h | 52 +++
>> net/Kconfig | 10 +
>> net/core/Makefile | 1 +
>> net/core/net_failover.c | 1044 ++++++++++++++++++++++++++++++++++++++++++++
>> 6 files changed, 1130 insertions(+)
>> create mode 100644 include/net/net_failover.h
>> create mode 100644 net/core/net_failover.c
>
>> diff --git a/net/Kconfig b/net/Kconfig
>> index b62089fb1332..0540856676de 100644
>> --- a/net/Kconfig
>> +++ b/net/Kconfig
>> @@ -429,6 +429,16 @@ config MAY_USE_DEVLINK
>> config PAGE_POOL
>> bool
>>
>> +config NET_FAILOVER
>> + tristate "Failover interface"
>> + default m
> Need some justification for default m (as opposed to n).
default n should be fine. It will get selected automatically when virtio_net or
netvsc are enabled. will fix in the next revision.
>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-11 15:40 UTC (permalink / raw)
To: Stephen Hemminger
Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
aaron.f.brown
In-Reply-To: <20180507164632.4f6c2eef@xeon-e3>
On 5/7/2018 4:46 PM, Stephen Hemminger wrote:
> On Mon, 7 May 2018 15:10:44 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> This provides a generic interface for paravirtual drivers to listen
>> for netdev register/unregister/link change events from pci ethernet
>> devices with the same MAC and takeover their datapath. The notifier and
>> event handling code is based on the existing netvsc implementation.
>>
>> It exposes 2 sets of interfaces to the paravirtual drivers.
>> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>> the failover module provides interfaces to create/destroy additional
>> master netdev and all the slave events are managed internally.
>> net_failover_create()
>> net_failover_destroy()
>> A failover netdev is created that acts a master device and controls 2
>> slave devices. The original virtio_net netdev is registered as 'standby'
>> netdev and a passthru/vf device with the same MAC gets registered as
>> 'primary' netdev. Both 'standby' and 'failover' netdevs are associated
>> with the same 'pci' device. The user accesses the network interface via
>> 'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>> default for transmits when it is available with link up and running.
>> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>> is created. The paravirtual driver registers each instance of netvsc
>> as a 'failover' netdev along with a set of ops to manage the slave
>> events. There is no 'standby' netdev in this model. A passthru/vf device
>> with the same MAC gets registered as 'primary' netdev.
>> net_failover_register()
>> net_failover_unregister()
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> You are conflating the net_failover device (3 device model) with
> the generic network failover infrastructure into one file. There should be two
> seperate files net/core/failover.c and drivers/net/failover.c which splits
> the work into two parts (and acts a check for the api).
OK. I started splitting net_failover.c into 2 files.
net/core/failover.c (CONFIG_FAILOVER)
- implements the generic failover infrastructure that exports failover_register(),
failover_unregister() and failover_slave_unregister() as the API that will be
used by netvsc and the net_failover drivers(3 netdev model)
drivers/net/net_failover.c (CONFIG_NET_FAILOVER)
- implements the net_failover netdev as the upper dev for the 3-netdev model and
exports net_failover_create() and net_failover_destroy() as the API that is
used by virtio_net.
HYPERV_NET and NET_FAILOVER selects FAILOVER
VIRTIO_NET selects NET_FAILOVER
Does this look good? Any better suggestion for the prefix to be used for generic
network failover api rather than 'failover'?
^ permalink raw reply
* Re: [RFC] net: Add new LoRaWAN subsystem
From: Marcel Holtmann @ 2018-05-11 15:39 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: David S. Miller, Alexander Aring, Stefan Schmidt, linux-wpan - ML,
netdev, linux-kernel
In-Reply-To: <CAC=mGzijVyqEG=DXH4v9WkD0kXR2WOJC4KBPzoT7g5wqVrPGXA@mail.gmail.com>
Hi Jian-Hong,
> A Low-Power Wide-Area Network (LPWAN) is a type of wireless
> telecommunication wide area network designed to allow long range
> communications at a low bit rate among things (connected objects), such
> as sensors operated on a battery. It can be used widely in IoT area.
> LoRaWAN, which is one kind of implementation of LPWAN, is a medium
> access control (MAC) layer protocol for managing communication between
> LPWAN gateways and end-node devices, maintained by the LoRa Alliance.
> LoRaWAN™ Specification could be downloaded at:
> https://lora-alliance.org/lorawan-for-developers
>
> However, LoRaWAN is not implemented in Linux kernel right now, so I am
> trying to develop it. Here is my repository:
> https://github.com/starnight/LoRa/tree/lorawan-ndo/LoRaWAN
>
> Because it is a kind of network, the ideal usage in an user space
> program should be like "socket(PF_LORAWAN, SOCK_DGRAM, 0)" and with
> other socket APIs. Therefore, the definitions like AF_LORAWAN,
> PF_LORAWAN ..., must be listed in the header files of glibc.
> For the driver in kernel space, the definitions also must be listed in
> the corresponding Linux socket header files.
> Especially, both are for the testing programs.
>
> Back to the mentioned "LoRaWAN is not implemented in Linux kernel now".
> Could or should we add the definitions into corresponding kernel header
> files now, if LoRaWAN will be accepted as a subsystem in Linux?
when you submit your LoRaWAN subsystem to netdev for review, include a patch that adds these new address family definitions. Just pick the next one available. There will be no pre-allocation of numbers until your work has been accepted upstream. Meaning, that the number might change if other address families get merged before yours. So you have to keep updating. glibc will eventually follow the number assigned by the kernel.
Regards
Marcel
^ permalink raw reply
* Re: [RFC bpf-next 04/10] bpf: cfg: detect loop use domination information
From: Jiong Wang @ 2018-05-11 15:11 UTC (permalink / raw)
To: John Fastabend, alexei.starovoitov, daniel; +Cc: netdev, oss-drivers
In-Reply-To: <06c6f498-2e85-e762-09b6-766612c0bc46@gmail.com>
On 10/05/2018 19:17, John Fastabend wrote:
> On 05/07/2018 03:22 AM, Jiong Wang wrote:
>> If one bb is dominating its predecessor, then there is loop.
>>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>> kernel/bpf/cfg.c | 22 ++++++++++++++++++++++
>> kernel/bpf/cfg.h | 1 +
>> kernel/bpf/verifier.c | 8 ++++++++
>> 3 files changed, 31 insertions(+)
>>
>> diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
>> index b50937a..90692e4 100644
>> --- a/kernel/bpf/cfg.c
>> +++ b/kernel/bpf/cfg.c
>> @@ -568,6 +568,28 @@ int subprog_build_dom_info(struct bpf_subprog_info *subprog)
>> return ret;
>> }
>>
>> +bool subprog_has_loop(struct bpf_subprog_info *subprog)
>> +{
>> + int lane_len = BITS_TO_LONGS(subprog->bb_num - 2);
>> + struct list_head *bb_list = &subprog->bbs;
>> + struct bb_node *bb, *entry_bb;
>> + struct edge_node *e;
>> +
>> + entry_bb = entry_bb(bb_list);
>> + bb = bb_next(entry_bb);
>> + list_for_each_entry_from(bb, &exit_bb(bb_list)->l, l)
>> + list_for_each_entry(e, &bb->e_prevs, l) {
>> + struct bb_node *latch = e->src;
>> +
>> + if (latch != entry_bb &&
>> + test_bit(bb->idx,
>> + subprog->dtree + latch->idx * lane_len))
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
> Because we are using this to guard against loops we need to detect
> all loops not just reducible loops. And because (assuming my understanding
> is correct) Tarjan's algorithm will only detect all loops when the
> graph is reducible we need additional tests.
Hi John,
Yes, the current DOM based loop detection can't detect irreducible loop.
And I feel both the first and second approaches you listed below are good,
might worth implementing both and measure which one is with less overhead.
I could give a try on approach 2. The alg given by Eric Stoltz might be a
good choice (https://compilers.iecc.com/comparch/article/94-01-053). We
have dom info now, so could do another DFS, and see if the head of one
back-edge doesn't dom the tail that there is multiple entries to the loop.
I guess the first approach is with less overhead as there is no existing
DFS pass to reuse after dom build that we need a new pass for approach 2.
Regards,
Jiong
>
> There are a couple options to fix this with varying levels of complexity.
> Because I'm using this to build loop info structures to find induction
> variables and show termination. After the loop structures are built we
> could search for any back-edges not in valid loops. This would be similar
> to the existing back-edge detection code but with an extra check to
> allow edges that have been validated. I would need to check that this
> doesn't have any escapes before actually proposing it though.
>
> The other method would be to properly test for reducibility using one of
> the algorithms for this. I think the most intuitive is to remove back-edges
> and test the graph is acyclic. This would be run before the dom tree is
> built. This is IMO what we should do, it seems the most "correct" way to
> do this.
>
> The most complex would be to handle irreducible programs using some of the
> more complex methods. I really don't think this is necessary but in theory
> at least we could use something like Havlak-Tarjan algorithm and allow
> some programs with irreducible loops. This is likely overkill especially
> in a first iteration.
>
> Here is a sample that fails without this series, using original back-edge
> detection, but is allowed with this patch,
>
> SEC("classifier_tc_mark")
> int _tc_mark(struct __sk_buff *ctx)
> {
> void *data = (void *)(unsigned long)ctx->data;
> void *data_end = (void *)(unsigned long)ctx->data_end;
> void *data_meta = (void *)(unsigned long)ctx->data_meta;
> struct meta_info *meta = data_meta;
> volatile int mark = ctx->mark;
>
> mark += 1;
>
> if (meta + 1 > data) {
> B:
> mark += 2;
>
> if (mark < ctx->mark * 3)
> goto C;
> } else if (meta < data) {
> C:
> mark += 1;
> if (mark < 1000)
> goto B;
> }
>
> return TC_ACT_OK;
> }
>
> A more concise example could be made but I just hacked on one of the
> sample programs. This generates the CFG as follows (I have a patch
> on top of your stack to print the CFG and DOM tables)
>
> CFG: 65535[-1,-1] -> 0[0,9] 0[0,9] -> 3[20,20] 0[0,9] -> 1[10,18] 1[10,18] -> 4[21,28] 1[10,18] -> 2[19,19] 2[19,19] -> 5[29,30] 3[20,20] -> 5[29,30] 3[20,20] -> 4[21,28] 4[21,28] -> 1[10,18] 4[21,28] -> 5[29,30] 5[29,30] -> 65534[31,65534]
> DOM:
> 1 0 0 0 0 0
> 1 1 0 0 0 0
> 1 1 1 0 0 0
> 1 0 0 1 0 0
> 1 0 0 0 1 0
> 1 0 0 0 0 1
>
>
> Here we have the loop 1[10,18]->4[21,28] and the back-edge 4[21,28]->1[10,18].
> The notation is #idx[head_insn,tail_insn]. The above can then be imported
> into dot notation and graphed if needed.
>
> Jiong, please verify this analysis is correct.
>
> Thanks,
> John
>
^ permalink raw reply
* Re: Re: Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Willem de Bruijn @ 2018-05-11 14:56 UTC (permalink / raw)
To: Gao Feng
Cc: davem@davemloft.net, daniel@iogearbox.net,
jakub.kicinski@netronome.com, David Ahern, netdev@vger.kernel.org
In-Reply-To: <ac2b5b7.97a1.1634fa8f60d.Coremail.gfree.wind@vip.163.com>
On Fri, May 11, 2018 at 10:44 AM, Gao Feng <gfree.wind@vip.163.com> wrote:
> At 2018-05-11 21:23:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>>On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.wind@vip.163.com> wrote:
>>> At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>>>>On Thu, May 10, 2018 at 4:28 AM, <gfree.wind@vip.163.com> wrote:
>>>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>>>
>>>>> The skb flow limit is implemented for each CPU independently. In the
>>>>> current codes, the function skb_flow_limit gets the softnet_data by
>>>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>>>> the stats of current CPU, while the skb is going to append the queue of
>>>>> another CPU. It isn't the expected behavior.
>>>>>
>>>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>>
>>>>The local cpu softnet_data is used on purpose. The operations in
>>>>skb_flow_limit() on sd fields could race if not executed on the local cpu.
>>>
>>> I think the race doesn't exist because of the rps_lock.
>>> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.
>>
>>Indeed, I overlooked that. There still is the matter of cache contention.
>
> The cache contention is really important in this case?
> I don't think so, because the enqueue_to_backlog have touched and modified the softnet_stat
> of target cpu.
>
>>
>>>>Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
>>>>These would always hit the same RPS cpu, so that cpu being backlogged
>>>
>>> They may hit the different target CPU when enable RFS. Because the app could be scheduled
>>> to another CPU, then RFS tries to deliver the skb to latest core which has hot cache.
>>
>>This even more suggest using the initial (or IRQ) cpu to track state, instead
>>of the destination (RPS/RFS) cpu.
>
> I couldn't understand why it is better to track state on initial cpu, not the target cpu.
> The latter one could get more accurate result.
For a single DoS flow with normal cpu pinned IRQs, the results will be equally
good when tracked on the initial IRQ cpu..
>
>>
>>>>may be an indication that such a flow is active. But the flow will also always
>>>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table
>>>
>>> The RSS couldn't make sure the irq is handled by same cpu. It would be balanced between
>>> the cpus.
>>
>>IRQs are usually pinned to cores. Unless using something like irqbalance,
>>but that operates at too coarse a timescale to do anything useful at Mpps
>>packet rates.
>
> There are some motherboard which couldn't make sure the irq is pinned.
> The flow_limit wouldn't work as well as expected.
.. this seems to be the crux of the argument. I am not aware of any network
interrupts that do not adhere to the cpu pinning configuration in
/proc/irq/$IRQ/smp_affinity(_list)
What kind of hardware ignores this setting and sprays interrupts? I agree
that in that case flow_limit as is may be ineffective (if migration happens
at rates comparable to packet rates). But this should not happen?
>
>>
>>>>on the initial CPU is also fine. There may be false positives on other CPUs
>>>>with the same RPS destination, but that is unlikely with a highly concurrent
>>>>traffic server mix ("mice").
>>>
>>> If my comment is right, the flow couldn't always arrive one the same initial cpu, although
>>> it may be sent to one same target cpu.
>>>
>>>>
>>>>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>>>>for the cpus on which traffic initially lands, not the RPS destination cpus.
>>>>See also Documentation/networking/scaling.txt
>>>>
>>>>That said, I had to reread the code, as it does seem sensible that the
>>>>same softnet_data is intended to be used both when testing qlen and
>>>>flow_limit.
>>>
>>> In most cases, user configures the same RPS map with flow_limit like 0xff.
>>> Because user couldn't predict which core the evil flow would arrive on.
>>>
>>> Take an example, there are 2 cores, cpu0 and cpu1.
>>> One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the target cpu is cpu1.
>>> Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue length
>>> of cpu0. Certainly it could pass the check of skb_flow_limit because there is no any evil flow on cpu0.
>>
>>No, enqueue_to_backlog passes qlen to skb_flow_limit, so that does
>>check the queue length of the RPS cpu.
>
> Sorry, I overlooked the qlen is the length of the rps cpu.
> Then it's ok unless the stats may be not accurate when irq isn't pinned.
>
> But I still doubt that is it really important to track state on initial cpu, not target cpu?
> Because the enqueue_to_backlog have touched the softnet_data of target cpu.
I think the merit of both IRQ and RPS cpu can be argued for attaching the
flow_limit state.
Either way, the current behavior is not a bug, so I don't think that this is a
candidate for net.
The cost of moving from IRQ to RPS cpu will be the cacheline contention
on a system with multiple IRQ cpus that all try to update the sd->flow_data
of the same RPS cpus. Which is particularly likely with RFS. I suspect that
this cost is non-trivial and not worth the benefit of handling hardware with
unpinned IRQs.
^ permalink raw reply
* [PATCH v2 net 1/1] net sched actions: fix invalid pointer dereferencing if skbedit flags missing
From: Roman Mashak @ 2018-05-11 14:55 UTC (permalink / raw)
To: davem
Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, alexander.duyck,
Roman Mashak
When application fails to pass flags in netlink TLV for a new skbedit action,
the kernel results in the following oops:
[ 8.307732] BUG: unable to handle kernel paging request at 0000000000021130
[ 8.309167] PGD 80000000193d1067 P4D 80000000193d1067 PUD 180e0067 PMD 0
[ 8.310595] Oops: 0000 [#1] SMP PTI
[ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw
[ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ #357
[ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140
[ 8.316203] RSP: 0018:ffffa0718038f840 EFLAGS: 00010246
[ 8.317123] RAX: 0000000000000001 RBX: 0000000000021100 RCX: 0000000000000000
[ 8.319831] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000021100
[ 8.321181] RBP: 0000000000000000 R08: 000000000004adf8 R09: 0000000000000122
[ 8.322645] R10: 0000000000000000 R11: ffffffff9e5b01ed R12: 0000000000000000
[ 8.324157] R13: ffffffff9e0d3cc0 R14: 0000000000000000 R15: 0000000000000000
[ 8.325590] FS: 00007f591292e700(0000) GS:ffff8fcf5bc40000(0000) knlGS:0000000000000000
[ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.327987] CR2: 0000000000021130 CR3: 00000000180e6004 CR4: 00000000001606a0
[ 8.329289] Call Trace:
[ 8.329735] tcf_skbedit_init+0xa7/0xb0
[ 8.330423] tcf_action_init_1+0x362/0x410
[ 8.331139] ? try_to_wake_up+0x44/0x430
[ 8.331817] tcf_action_init+0x103/0x190
[ 8.332511] tc_ctl_action+0x11a/0x220
[ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0
[ 8.333902] ? _cond_resched+0x16/0x40
[ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0
[ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0
[ 8.336178] netlink_rcv_skb+0xdb/0x110
[ 8.336855] netlink_unicast+0x167/0x220
[ 8.337550] netlink_sendmsg+0x2a7/0x390
[ 8.338258] sock_sendmsg+0x30/0x40
[ 8.338865] ___sys_sendmsg+0x2c5/0x2e0
[ 8.339531] ? pagecache_get_page+0x27/0x210
[ 8.340271] ? filemap_fault+0xa2/0x630
[ 8.340943] ? page_add_file_rmap+0x108/0x200
[ 8.341732] ? alloc_set_pte+0x2aa/0x530
[ 8.342573] ? finish_fault+0x4e/0x70
[ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0
[ 8.344337] ? __sys_sendmsg+0x53/0x80
[ 8.345040] __sys_sendmsg+0x53/0x80
[ 8.345678] do_syscall_64+0x4f/0x100
[ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.347206] RIP: 0033:0x7f591191da67
[ 8.347831] RSP: 002b:00007fff745abd48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 8.349179] RAX: ffffffffffffffda RBX: 00007fff745abe70 RCX: 00007f591191da67
[ 8.350431] RDX: 0000000000000000 RSI: 00007fff745abdc0 RDI: 0000000000000003
[ 8.351659] RBP: 000000005af35251 R08: 0000000000000001 R09: 0000000000000000
[ 8.352922] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
[ 8.354183] R13: 00007fff745afed0 R14: 0000000000000001 R15: 00000000006767c0
[ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00
00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 <8b> 53 30
74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c
[ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP: ffffa0718038f840
[ 8.359770] CR2: 0000000000021130
[ 8.360438] ---[ end trace 60c66be45dfc14f0 ]---
The caller calls action's ->init() and passes pointer to "struct tc_action *a",
which later may be initialized to point at the existing action, otherwise
"struct tc_action *a" is still invalid, and therefore dereferencing it is an
error as happens in tcf_idr_release, where refcnt is decremented.
So in case of missing flags tcf_idr_release must be called only for
existing actions.
v2:
- prepare patch for net tree
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
net/sched/act_skbedit.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index ddf69fc01bdf..6138d1d71900 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -121,7 +121,8 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
return 0;
if (!flags) {
- tcf_idr_release(*a, bind);
+ if (exists)
+ tcf_idr_release(*a, bind);
return -EINVAL;
}
--
2.7.4
^ permalink raw reply related
* Re:Re: Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Gao Feng @ 2018-05-11 14:44 UTC (permalink / raw)
To: Willem de Bruijn
Cc: davem@davemloft.net, daniel@iogearbox.net,
jakub.kicinski@netronome.com, David Ahern, netdev@vger.kernel.org
In-Reply-To: <CAF=yD-JuAivOyrimVSxB_TtaiyiT498Z6fpe+XpF_oNqCMdUPA@mail.gmail.com>
At 2018-05-11 21:23:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.wind@vip.163.com> wrote:
>> At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>>>On Thu, May 10, 2018 at 4:28 AM, <gfree.wind@vip.163.com> wrote:
>>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>>
>>>> The skb flow limit is implemented for each CPU independently. In the
>>>> current codes, the function skb_flow_limit gets the softnet_data by
>>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>>> the stats of current CPU, while the skb is going to append the queue of
>>>> another CPU. It isn't the expected behavior.
>>>>
>>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>
>>>The local cpu softnet_data is used on purpose. The operations in
>>>skb_flow_limit() on sd fields could race if not executed on the local cpu.
>>
>> I think the race doesn't exist because of the rps_lock.
>> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.
>
>Indeed, I overlooked that. There still is the matter of cache contention.
The cache contention is really important in this case?
I don't think so, because the enqueue_to_backlog have touched and modified the softnet_stat
of target cpu.
>
>>>Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
>>>These would always hit the same RPS cpu, so that cpu being backlogged
>>
>> They may hit the different target CPU when enable RFS. Because the app could be scheduled
>> to another CPU, then RFS tries to deliver the skb to latest core which has hot cache.
>
>This even more suggest using the initial (or IRQ) cpu to track state, instead
>of the destination (RPS/RFS) cpu.
I couldn't understand why it is better to track state on initial cpu, not the target cpu.
The latter one could get more accurate result.
>
>>>may be an indication that such a flow is active. But the flow will also always
>>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table
>>
>> The RSS couldn't make sure the irq is handled by same cpu. It would be balanced between
>> the cpus.
>
>IRQs are usually pinned to cores. Unless using something like irqbalance,
>but that operates at too coarse a timescale to do anything useful at Mpps
>packet rates.
There are some motherboard which couldn't make sure the irq is pinned.
The flow_limit wouldn't work as well as expected.
>
>>>on the initial CPU is also fine. There may be false positives on other CPUs
>>>with the same RPS destination, but that is unlikely with a highly concurrent
>>>traffic server mix ("mice").
>>
>> If my comment is right, the flow couldn't always arrive one the same initial cpu, although
>> it may be sent to one same target cpu.
>>
>>>
>>>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>>>for the cpus on which traffic initially lands, not the RPS destination cpus.
>>>See also Documentation/networking/scaling.txt
>>>
>>>That said, I had to reread the code, as it does seem sensible that the
>>>same softnet_data is intended to be used both when testing qlen and
>>>flow_limit.
>>
>> In most cases, user configures the same RPS map with flow_limit like 0xff.
>> Because user couldn't predict which core the evil flow would arrive on.
>>
>> Take an example, there are 2 cores, cpu0 and cpu1.
>> One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the target cpu is cpu1.
>> Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue length
>> of cpu0. Certainly it could pass the check of skb_flow_limit because there is no any evil flow on cpu0.
>
>No, enqueue_to_backlog passes qlen to skb_flow_limit, so that does
>check the queue length of the RPS cpu.
Sorry, I overlooked the qlen is the length of the rps cpu.
Then it's ok unless the stats may be not accurate when irq isn't pinned.
But I still doubt that is it really important to track state on initial cpu, not target cpu?
Because the enqueue_to_backlog have touched the softnet_data of target cpu.
Best Regards
Feng
>
>> Then the cpu0 inserts the skb into the queue of cpu1.
>> As a result, the skb_flow_limit doesn't work as expected.
>>
>> BTW, I have already sent the v2 patch which only adds the "Fixes: tag".
>
>The change also makes the code inconsistent with
>Documentation/networking/scaling.txt
>
>"In such environments, enable the feature on all CPUs that handle
>network rx interrupts (as set in /proc/irq/N/smp_affinity)."
^ permalink raw reply
* Re: WARNING in compat_copy_entries (2)
From: Dmitry Vyukov @ 2018-05-11 14:13 UTC (permalink / raw)
To: syzbot
Cc: bridge, coreteam, David Miller, Florian Westphal,
Jozsef Kadlecsik, LKML, netdev, netfilter-devel,
Pablo Neira Ayuso, stephen hemminger, syzkaller-bugs
In-Reply-To: <001a113dea0ef7d2620566c15bac@google.com>
On Tue, Mar 6, 2018 at 5:59 PM, syzbot
<syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> ce380619fab99036f5e745c7a865b21c59f005f6 (Tue Mar 6 04:31:14 2018 +0000)
> Merge tag 'please-pull-ia64_misc' of
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
>
> So far this crash happened 11 times on upstream.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> user-space arch: i386
This was bisected to, well:
commit 9dea5dc921b5f4045a18c63eb92e84dc274d17eb
Author: Andy Lutomirski <luto@kernel.org>
Date: Tue Jul 14 15:24:24 2015 -0700
x86/entry/syscalls: Wire up 32-bit direct socket calls
https://gist.githubusercontent.com/dvyukov/acbdace00cbb2fd31fb599a009d471e6/raw/bfe6bd8f2423df7b3493c7df2e466b37a8968688/gistfile1.txt
Which is kinda makes sense, but shows another limitation of automatic bisection.
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> audit: type=1400 audit(1520347101.934:8): avc: denied { map } for
> pid=4183 comm="syz-execprog" path="/root/syzkaller-shm886349459" dev="sda1"
> ino=16482 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:file_t:s0 tclass=file permissive=1
> IPVS: ftp: loaded support on port[0] = 21
> WARNING: CPU: 1 PID: 4192 at net/bridge/netfilter/ebtables.c:2063
> ebt_size_mwt net/bridge/netfilter/ebtables.c:2063 [inline]
> WARNING: CPU: 1 PID: 4192 at net/bridge/netfilter/ebtables.c:2063
> size_entry_mwt net/bridge/netfilter/ebtables.c:2140 [inline]
> WARNING: CPU: 1 PID: 4192 at net/bridge/netfilter/ebtables.c:2063
> compat_copy_entries+0xd92/0x1150 net/bridge/netfilter/ebtables.c:2179
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 4192 Comm: syz-executor0 Not tainted 4.16.0-rc4+ #253
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x194/0x24d lib/dump_stack.c:53
> panic+0x1e4/0x41c kernel/panic.c:183
> __warn+0x1dc/0x200 kernel/panic.c:547
> report_bug+0x211/0x2d0 lib/bug.c:184
> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> fixup_bug arch/x86/kernel/traps.c:247 [inline]
> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:ebt_size_mwt net/bridge/netfilter/ebtables.c:2063 [inline]
> RIP: 0010:size_entry_mwt net/bridge/netfilter/ebtables.c:2140 [inline]
> RIP: 0010:compat_copy_entries+0xd92/0x1150
> net/bridge/netfilter/ebtables.c:2179
> RSP: 0018:ffff8801bb5077e8 EFLAGS: 00010293
> RAX: ffff8801bc82a580 RBX: 0000000000000000 RCX: ffffffff851ad5c2
> RDX: 0000000000000000 RSI: ffffc90001853180 RDI: 0000000000000000
> RBP: ffff8801bb507968 R08: 0000000000000004 R09: 0000000000000000
> R10: ffffffff88613380 R11: 0000000000000001 R12: 0000000000000007
> R13: dffffc0000000000 R14: ffff8801bb5079c8 R15: 000000000000000c
> compat_do_replace+0x398/0x7d0 net/bridge/netfilter/ebtables.c:2268
> compat_do_ebt_set_ctl+0x22a/0x2d0 net/bridge/netfilter/ebtables.c:2350
> compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline]
> compat_nf_setsockopt+0x88/0x130 net/netfilter/nf_sockopt.c:156
> compat_ip_setsockopt+0x8b/0xd0 net/ipv4/ip_sockglue.c:1285
> inet_csk_compat_setsockopt+0x95/0x120 net/ipv4/inet_connection_sock.c:1041
> compat_dccp_setsockopt+0x40/0x70 net/dccp/proto.c:589
> compat_sock_common_setsockopt+0xb2/0x140 net/core/sock.c:2986
> C_SYSC_setsockopt net/compat.c:403 [inline]
> compat_SyS_setsockopt+0x17c/0x410 net/compat.c:386
> do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
> do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
> entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
> RIP: 0023:0xf7f35c99
> RSP: 002b:00000000ff9c02cc EFLAGS: 00000282 ORIG_RAX: 000000000000016e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000000
> RDX: 0000000000000080 RSI: 0000000020000240 RDI: 0000000000000240
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a113dea0ef7d2620566c15bac%40google.com.
> For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply
* Re: [rds-devel] KASAN: null-ptr-deref Read in rds_ib_get_mr
From: Yanjun Zhu @ 2018-05-11 14:07 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: DaeRyong Jeong, santosh.shilimkar, davem, rds-devel, kt0755,
linux-rdma, netdev, linux-kernel, byoungyoung
In-Reply-To: <20180511104630.GD14952@oracle.com>
On 2018/5/11 18:46, Sowmini Varadhan wrote:
> On (05/11/18 15:48), Yanjun Zhu wrote:
>> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
>> index e678699..2228b50 100644
>> --- a/net/rds/ib_rdma.c
>> +++ b/net/rds/ib_rdma.c
>> @@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void)
>> void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
>> struct rds_sock *rs, u32 *key_ret)
>> {
>> - struct rds_ib_device *rds_ibdev;
>> + struct rds_ib_device *rds_ibdev = NULL;
>> struct rds_ib_mr *ibmr = NULL;
>> - struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;
>> + struct rds_ib_connection *ic = NULL;
>> int ret;
>>
>> + if (rs->rs_bound_addr == 0) {
>> + ret = -EPERM;
>> + goto out;
>> + }
>> +
>> + ic = rs->rs_conn->c_transport_data;
>> rds_ibdev = rds_ib_get_device(rs->rs_bound_addr);
>> if (!rds_ibdev) {
>> ret = -ENODEV;
>>
>> I made this raw patch. If you can reproduce this bug, please make tests
>> with it.
> I dont think this solves the problem, I think it
> just changes the timing under which it can still happen.
>
> what if the rds_remove_bound() in rds_bind() happens after the check
> for if (rs->rs_bound_addr == 0) added above by the patch
>
> I believe you need some type of synchronization (either
> through mutex, or some atomic flag in the rs or similar) to make
> sure rds_bind() and rds_ib_get_mr() are mutually exclusive.
Sure. I agree with you. Maybe mutex is a good choice.
Zhu Yanjun
>
> --Sowmini
>
>
>
^ permalink raw reply
* Re: [RFC PATCH 1/3] arcnet: com20020: Add memory map of com20020
From: Rob Herring @ 2018-05-11 13:35 UTC (permalink / raw)
To: Andrea Greco
Cc: Michael Grzeschik, Andrea Greco, Mark Rutland, netdev, devicetree,
linux-kernel@vger.kernel.org
In-Reply-To: <CAPoXtQK2s-G3sPHOQQcEQe+O9LGBWBv3Jrsy_NTVNo_nAtd1bg@mail.gmail.com>
On Fri, May 11, 2018 at 5:50 AM, Andrea Greco
<andrea.greco.gapmilano@gmail.com> wrote:
> On 05/08/2018 06:16 PM, Rob Herring wrote:
>> On Sat, May 05, 2018 at 11:34:45PM +0200, Andrea Greco wrote:
>>> From: Andrea Greco <a.greco@4sigma.it>
>>>
>>> Add support for com20022I/com20020, memory mapped chip version.
>>> Support bus: Intel 80xx and Motorola 68xx.
>>> Bus size: Only 8 bit bus size is supported.
>>> Added related device tree bindings
>>>
>>> Signed-off-by: Andrea Greco <a.greco@4sigma.it>
>>> ---
>>> .../devicetree/bindings/net/smsc-com20020.txt | 23 +++
>>
>> Please split bindings to separate patch.
>
> Ok
>>
>>> drivers/net/arcnet/Kconfig | 12 +-
>>> drivers/net/arcnet/Makefile | 1 +
>>> drivers/net/arcnet/arcdevice.h | 27 ++-
>>> drivers/net/arcnet/com20020-membus.c | 191 +++++++++++++++++++++
>>> drivers/net/arcnet/com20020.c | 9 +-
>>> 6 files changed, 253 insertions(+), 10 deletions(-)
>>> create mode 100644 Documentation/devicetree/bindings/net/smsc-com20020.txt
>>> create mode 100644 drivers/net/arcnet/com20020-membus.c
>>>
>>> diff --git a/Documentation/devicetree/bindings/net/smsc-com20020.txt b/Documentation/devicetree/bindings/net/smsc-com20020.txt
>>> new file mode 100644
>>> index 000000000000..39c5b19c55af
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/net/smsc-com20020.txt
>>> @@ -0,0 +1,23 @@
>>> +SMSC com20020, com20022I
>>
>> What does this device do?
>>
>
> Changed in:
> SMSC com20020 Arcnet network controller
>
>>> +
>>> +timeout: Arcnet timeout, checkout datashet
>>> +clockp: Clock Prescaler, checkout datashet
>>
>> s/datashet/datasheet/
>>
>>> +clockm: Clock multiplier, checkout datasheet
>>
>> Would these 3 properties be common for arcnet devices? If not, then they
>> should have a vendor prefix.
>>
>
> Timeout is arcnet propelty:
> Other is smsc params, then become:
> - timeout: Arcnet timeout
Needs unit suffix as defined in property-units.txt.
> - smsc-clockp: Clock Prescaler
> - smsc-clockm: Clock multiplier
> - smsc-backplane: Controller use backplane mode inside of transceiver
Vendor properties are <vendor>,<prop-name>.
>
> I forget backplane propelty, but is required
>
>>> +
>>> +phy-reset-gpios: Chip reset ppin
>>
>> Use 'reset-gpios' as that is standard.
>>
>>> +phy-irq-gpios: Chip irq pin
>>
>> Use 'interrupts'. Interrupt capable gpio controllers are also interrupt
>> controllers.
>>
>
> Ok, change to standard
>
>>> +
>>> +com20020_A@0 {
>>
>> Node names should be generic based on the class of device. I don't think
>> we have one defined, but how about 'arcnet'.
>>
>> Unit addresses must have a corresponding reg property. How is this
>> device accessed?
>>
>
> Then: arcnet@28000000
>
>>> + compatible = "smsc,com20020";
>>
>> Not documented.
>>
> I miss something? Where add this doc?
> Is not this file?
Yes, this file up above with all the other properties. The example is
just an example, not a binding definition.
Rob
^ permalink raw reply
* [PATCH] cfg80211: fix spelling mistake: "uknown" -> "unknown"
From: Colin King @ 2018-05-11 13:25 UTC (permalink / raw)
To: Johannes Berg, David S . Miller, linux-wireless; +Cc: kernel-janitors, netdev
From: Colin Ian King <colin.king@canonical.com>
Trivial fix to spelling mistake in pr_debug message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
net/wireless/reg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 72eb7c6b6e7f..e55099b1785d 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -3400,7 +3400,7 @@ bool reg_supported_dfs_region(enum nl80211_dfs_regions dfs_region)
case NL80211_DFS_JP:
return true;
default:
- pr_debug("Ignoring uknown DFS master region: %d\n", dfs_region);
+ pr_debug("Ignoring unknown DFS master region: %d\n", dfs_region);
return false;
}
}
--
2.17.0
^ permalink raw reply related
* Re: Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Willem de Bruijn @ 2018-05-11 13:23 UTC (permalink / raw)
To: Gao Feng
Cc: davem@davemloft.net, daniel@iogearbox.net,
jakub.kicinski@netronome.com, David Ahern, netdev@vger.kernel.org
In-Reply-To: <7353417.5279.1634ddbc882.Coremail.gfree.wind@vip.163.com>
On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.wind@vip.163.com> wrote:
> At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>>On Thu, May 10, 2018 at 4:28 AM, <gfree.wind@vip.163.com> wrote:
>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>
>>> The skb flow limit is implemented for each CPU independently. In the
>>> current codes, the function skb_flow_limit gets the softnet_data by
>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>> the stats of current CPU, while the skb is going to append the queue of
>>> another CPU. It isn't the expected behavior.
>>>
>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>
>>The local cpu softnet_data is used on purpose. The operations in
>>skb_flow_limit() on sd fields could race if not executed on the local cpu.
>
> I think the race doesn't exist because of the rps_lock.
> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.
Indeed, I overlooked that. There still is the matter of cache contention.
>>Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
>>These would always hit the same RPS cpu, so that cpu being backlogged
>
> They may hit the different target CPU when enable RFS. Because the app could be scheduled
> to another CPU, then RFS tries to deliver the skb to latest core which has hot cache.
This even more suggest using the initial (or IRQ) cpu to track state, instead
of the destination (RPS/RFS) cpu.
>>may be an indication that such a flow is active. But the flow will also always
>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table
>
> The RSS couldn't make sure the irq is handled by same cpu. It would be balanced between
> the cpus.
IRQs are usually pinned to cores. Unless using something like irqbalance,
but that operates at too coarse a timescale to do anything useful at Mpps
packet rates.
>>on the initial CPU is also fine. There may be false positives on other CPUs
>>with the same RPS destination, but that is unlikely with a highly concurrent
>>traffic server mix ("mice").
>
> If my comment is right, the flow couldn't always arrive one the same initial cpu, although
> it may be sent to one same target cpu.
>
>>
>>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>>for the cpus on which traffic initially lands, not the RPS destination cpus.
>>See also Documentation/networking/scaling.txt
>>
>>That said, I had to reread the code, as it does seem sensible that the
>>same softnet_data is intended to be used both when testing qlen and
>>flow_limit.
>
> In most cases, user configures the same RPS map with flow_limit like 0xff.
> Because user couldn't predict which core the evil flow would arrive on.
>
> Take an example, there are 2 cores, cpu0 and cpu1.
> One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the target cpu is cpu1.
> Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue length
> of cpu0. Certainly it could pass the check of skb_flow_limit because there is no any evil flow on cpu0.
No, enqueue_to_backlog passes qlen to skb_flow_limit, so that does
check the queue length of the RPS cpu.
> Then the cpu0 inserts the skb into the queue of cpu1.
> As a result, the skb_flow_limit doesn't work as expected.
>
> BTW, I have already sent the v2 patch which only adds the "Fixes: tag".
The change also makes the code inconsistent with
Documentation/networking/scaling.txt
"In such environments, enable the feature on all CPUs that handle
network rx interrupts (as set in /proc/irq/N/smp_affinity)."
^ permalink raw reply
* Re: [PATCH 14/32] net/tcp: convert to ->poll_mask
From: Eric Dumazet @ 2018-05-11 13:13 UTC (permalink / raw)
To: Christoph Hellwig, viro
Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
linux-kernel
In-Reply-To: <20180511110803.10910-15-hch@lst.de>
On 05/11/2018 04:07 AM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> include/net/tcp.h | 4 ++--
> net/ipv4/af_inet.c | 3 ++-
> net/ipv4/tcp.c | 31 ++++++++++++++-----------------
> net/ipv6/af_inet6.c | 3 ++-
> 4 files changed, 20 insertions(+), 21 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 9c9b3768b350..d4d72ea9128d 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -388,8 +388,8 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst);
> void tcp_close(struct sock *sk, long timeout);
> void tcp_init_sock(struct sock *sk);
> void tcp_init_transfer(struct sock *sk, int bpf_op);
> -__poll_t tcp_poll(struct file *file, struct socket *sock,
> - struct poll_table_struct *wait);
> +struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events);
> +__poll_t tcp_poll_mask(struct socket *sock, __poll_t events);
> int tcp_getsockopt(struct sock *sk, int level, int optname,
> char __user *optval, int __user *optlen);
> int tcp_setsockopt(struct sock *sk, int level, int optname,
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index eaed0367e669..220b51347526 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -986,7 +986,8 @@ const struct proto_ops inet_stream_ops = {
> .socketpair = sock_no_socketpair,
> .accept = inet_accept,
> .getname = inet_getname,
> - .poll = tcp_poll,
> + .get_poll_head = tcp_get_poll_head,
> + .poll_mask = tcp_poll_mask,
> .ioctl = inet_ioctl,
> .listen = inet_listen,
> .shutdown = inet_shutdown,
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9ce1c726185e..6ec0e7a13581 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -493,33 +493,30 @@ static inline bool tcp_stream_is_readable(const struct tcp_sock *tp,
> sk->sk_prot->stream_memory_read(sk) : false);
> }
>
> +struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events)
> +{
> + sock_poll_busy_loop(sock, events);
> + sock_rps_record_flow(sock->sk);
Why are you adding sock_rps_record_flow() ?
> + return sk_sleep(sock->sk);
> +}
> +EXPORT_SYMBOL(tcp_get_poll_head);
> +
> /*
> - * Wait for a TCP event.
> - *
> - * Note that we don't need to lock the socket, as the upper poll layers
> - * take care of normal races (between the test and the event) and we don't
> - * go look at any of the socket buffers directly.
> + * Socket is not locked. We are protected from async events by poll logic and
> + * correct handling of state changes made by other threads is impossible in
> + * any case.
> */
> -__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
> +__poll_t tcp_poll_mask(struct socket *sock, __poll_t events)
> {
> - __poll_t mask;
> struct sock *sk = sock->sk;
> const struct tcp_sock *tp = tcp_sk(sk);
> + __poll_t mask = 0;
> int state;
>
> - sock_poll_wait(file, sk_sleep(sk), wait);
> -
> state = inet_sk_state_load(sk);
> if (state == TCP_LISTEN)
> return inet_csk_listen_poll(sk);
>
> - /* Socket is not locked. We are protected from async events
> - * by poll logic and correct handling of state changes
> - * made by other threads is impossible in any case.
> - */
> -
> - mask = 0;
> -
> /*
> * EPOLLHUP is certainly not done right. But poll() doesn't
> * have a notion of HUP in just one direction, and for a
> @@ -600,7 +597,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
>
> return mask;
> }
> -EXPORT_SYMBOL(tcp_poll);
> +EXPORT_SYMBOL(tcp_poll_mask);
>
> int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
> {
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 8da0b513f188..a43d967eeca5 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -571,7 +571,8 @@ const struct proto_ops inet6_stream_ops = {
> .socketpair = sock_no_socketpair, /* a do nothing */
> .accept = inet_accept, /* ok */
> .getname = inet6_getname,
> - .poll = tcp_poll, /* ok */
> + .get_poll_head = tcp_get_poll_head,
> + .poll_mask = tcp_poll_mask, /* ok */
> .ioctl = inet6_ioctl, /* must change */
> .listen = inet_listen, /* ok */
> .shutdown = inet_shutdown, /* ok */
>
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply
* [PATCH net-next] cxgb4: Add new T5 device id
From: Ganesh Goudar @ 2018-05-11 13:07 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar
Add 0x50ad device id for new T5 card.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index 90b5274..adacc63 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -187,6 +187,7 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
CH_PCI_ID_TABLE_FENTRY(0x50aa), /* Custom T580-CR */
CH_PCI_ID_TABLE_FENTRY(0x50ab), /* Custom T520-CR */
CH_PCI_ID_TABLE_FENTRY(0x50ac), /* Custom T540-BT */
+ CH_PCI_ID_TABLE_FENTRY(0x50ad), /* Custom T520-CR */
/* T6 adapters:
*/
--
2.1.0
^ permalink raw reply related
* [PATCH net-next 3/3] cxgb4: avoid schedule while atomic
From: Ganesh Goudar @ 2018-05-11 13:06 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, arjun, Ganesh Goudar
do not sleep while adding or deleting udp tunnel.
Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index aa266e2..27ad69a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3081,7 +3081,7 @@ static void cxgb_del_udp_tunnel(struct net_device *netdev,
match_all_mac, match_all_mac,
adapter->rawf_start +
pi->port_id,
- 1, pi->port_id, true);
+ 1, pi->port_id, false);
if (ret < 0) {
netdev_info(netdev, "Failed to free mac filter entry, for port %d\n",
i);
@@ -3169,7 +3169,7 @@ static void cxgb_add_udp_tunnel(struct net_device *netdev,
match_all_mac,
adapter->rawf_start +
pi->port_id,
- 1, pi->port_id, true);
+ 1, pi->port_id, false);
if (ret < 0) {
netdev_info(netdev, "Failed to allocate a mac filter entry, not adding port %d\n",
be16_to_cpu(ti->port));
--
2.1.0
^ permalink raw reply related
* [PATCH net-next 2/3] cxgb4: enable inner header checksum calculation
From: Ganesh Goudar @ 2018-05-11 13:05 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, arjun, Ganesh Goudar
set cntrl bits to indicate whether inner header checksum
needs to be calculated whenever the packet is an encapsulated
packet and enable supported encap features.
Fixes: d0a1299c6bf7 ("cxgb4: add support for vxlan segmentation offload")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 9 ++-
drivers/net/ethernet/chelsio/cxgb4/sge.c | 91 +++++++++++++++++++------
drivers/net/ethernet/chelsio/cxgb4/t4_msg.h | 5 ++
3 files changed, 83 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index a8aa233..aa266e2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5636,8 +5636,15 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_TC;
- if (CHELSIO_CHIP_VERSION(chip) > CHELSIO_T5)
+ if (CHELSIO_CHIP_VERSION(chip) > CHELSIO_T5) {
+ netdev->hw_enc_features |= NETIF_F_IP_CSUM |
+ NETIF_F_IPV6_CSUM |
+ NETIF_F_RXCSUM |
+ NETIF_F_GSO_UDP_TUNNEL |
+ NETIF_F_TSO | NETIF_F_TSO6;
+
netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+ }
if (highdma)
netdev->hw_features |= NETIF_F_HIGHDMA;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1a28df1..0f87e97 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -1072,12 +1072,27 @@ static void *inline_tx_skb_header(const struct sk_buff *skb,
static u64 hwcsum(enum chip_type chip, const struct sk_buff *skb)
{
int csum_type;
- const struct iphdr *iph = ip_hdr(skb);
+ bool inner_hdr_csum = false;
+ u16 proto, ver;
- if (iph->version == 4) {
- if (iph->protocol == IPPROTO_TCP)
+ if (skb->encapsulation &&
+ (CHELSIO_CHIP_VERSION(chip) > CHELSIO_T5))
+ inner_hdr_csum = true;
+
+ if (inner_hdr_csum) {
+ ver = inner_ip_hdr(skb)->version;
+ proto = (ver == 4) ? inner_ip_hdr(skb)->protocol :
+ inner_ipv6_hdr(skb)->nexthdr;
+ } else {
+ ver = ip_hdr(skb)->version;
+ proto = (ver == 4) ? ip_hdr(skb)->protocol :
+ ipv6_hdr(skb)->nexthdr;
+ }
+
+ if (ver == 4) {
+ if (proto == IPPROTO_TCP)
csum_type = TX_CSUM_TCPIP;
- else if (iph->protocol == IPPROTO_UDP)
+ else if (proto == IPPROTO_UDP)
csum_type = TX_CSUM_UDPIP;
else {
nocsum: /*
@@ -1090,19 +1105,29 @@ static u64 hwcsum(enum chip_type chip, const struct sk_buff *skb)
/*
* this doesn't work with extension headers
*/
- const struct ipv6hdr *ip6h = (const struct ipv6hdr *)iph;
-
- if (ip6h->nexthdr == IPPROTO_TCP)
+ if (proto == IPPROTO_TCP)
csum_type = TX_CSUM_TCPIP6;
- else if (ip6h->nexthdr == IPPROTO_UDP)
+ else if (proto == IPPROTO_UDP)
csum_type = TX_CSUM_UDPIP6;
else
goto nocsum;
}
if (likely(csum_type >= TX_CSUM_TCPIP)) {
- u64 hdr_len = TXPKT_IPHDR_LEN_V(skb_network_header_len(skb));
- int eth_hdr_len = skb_network_offset(skb) - ETH_HLEN;
+ int eth_hdr_len, l4_len;
+ u64 hdr_len;
+
+ if (inner_hdr_csum) {
+ /* This allows checksum offload for all encapsulated
+ * packets like GRE etc..
+ */
+ l4_len = skb_inner_network_header_len(skb);
+ eth_hdr_len = skb_inner_network_offset(skb) - ETH_HLEN;
+ } else {
+ l4_len = skb_network_header_len(skb);
+ eth_hdr_len = skb_network_offset(skb) - ETH_HLEN;
+ }
+ hdr_len = TXPKT_IPHDR_LEN_V(l4_len);
if (CHELSIO_CHIP_VERSION(chip) <= CHELSIO_T5)
hdr_len |= TXPKT_ETHHDR_LEN_V(eth_hdr_len);
@@ -1273,7 +1298,7 @@ static inline void t6_fill_tnl_lso(struct sk_buff *skb,
netdev_tx_t t4_eth_xmit(struct sk_buff *skb, struct net_device *dev)
{
u32 wr_mid, ctrl0, op;
- u64 cntrl, *end;
+ u64 cntrl, *end, *sgl;
int qidx, credits;
unsigned int flits, ndesc;
struct adapter *adap;
@@ -1443,6 +1468,19 @@ out_free: dev_kfree_skb_any(skb);
TX_CSUM_TCPIP6 : TX_CSUM_TCPIP) |
TXPKT_IPHDR_LEN_V(l3hdr_len);
}
+ sgl = (u64 *)(cpl + 1); /* sgl start here */
+ if (unlikely((u8 *)sgl >= (u8 *)q->q.stat)) {
+ /* If current position is already at the end of the
+ * txq, reset the current to point to start of the queue
+ * and update the end ptr as well.
+ */
+ if (sgl == (u64 *)q->q.stat) {
+ int left = (u8 *)end - (u8 *)q->q.stat;
+
+ end = (void *)q->q.desc + left;
+ sgl = (void *)q->q.desc;
+ }
+ }
q->tso++;
q->tx_cso += ssi->gso_segs;
} else {
@@ -1454,6 +1492,7 @@ out_free: dev_kfree_skb_any(skb);
wr->op_immdlen = htonl(FW_WR_OP_V(op) |
FW_WR_IMMDLEN_V(len));
cpl = (void *)(wr + 1);
+ sgl = (u64 *)(cpl + 1);
if (skb->ip_summed == CHECKSUM_PARTIAL) {
cntrl = hwcsum(adap->params.chip, skb) |
TXPKT_IPCSUM_DIS_F;
@@ -1487,13 +1526,12 @@ out_free: dev_kfree_skb_any(skb);
cpl->ctrl1 = cpu_to_be64(cntrl);
if (immediate) {
- cxgb4_inline_tx_skb(skb, &q->q, cpl + 1);
+ cxgb4_inline_tx_skb(skb, &q->q, sgl);
dev_consume_skb_any(skb);
} else {
int last_desc;
- cxgb4_write_sgl(skb, &q->q, (struct ulptx_sgl *)(cpl + 1),
- end, 0, addr);
+ cxgb4_write_sgl(skb, &q->q, (void *)sgl, end, 0, addr);
skb_orphan(skb);
last_desc = q->q.pidx + ndesc - 1;
@@ -2259,7 +2297,7 @@ static void cxgb4_sgetim_to_hwtstamp(struct adapter *adap,
}
static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
- const struct cpl_rx_pkt *pkt)
+ const struct cpl_rx_pkt *pkt, unsigned long tnl_hdr_len)
{
struct adapter *adapter = rxq->rspq.adap;
struct sge *s = &adapter->sge;
@@ -2275,6 +2313,8 @@ static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
}
copy_frags(skb, gl, s->pktshift);
+ if (tnl_hdr_len)
+ skb->csum_level = 1;
skb->len = gl->tot_len - s->pktshift;
skb->data_len = skb->len;
skb->truesize += skb->data_len;
@@ -2406,7 +2446,7 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
struct sge *s = &q->adap->sge;
int cpl_trace_pkt = is_t4(q->adap->params.chip) ?
CPL_TRACE_PKT : CPL_TRACE_PKT_T5;
- u16 err_vec;
+ u16 err_vec, tnl_hdr_len = 0;
struct port_info *pi;
int ret = 0;
@@ -2415,16 +2455,19 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
pkt = (const struct cpl_rx_pkt *)rsp;
/* Compressed error vector is enabled for T6 only */
- if (q->adap->params.tp.rx_pkt_encap)
+ if (q->adap->params.tp.rx_pkt_encap) {
err_vec = T6_COMPR_RXERR_VEC_G(be16_to_cpu(pkt->err_vec));
- else
+ tnl_hdr_len = T6_RX_TNLHDR_LEN_G(ntohs(pkt->err_vec));
+ } else {
err_vec = be16_to_cpu(pkt->err_vec);
+ }
csum_ok = pkt->csum_calc && !err_vec &&
(q->netdev->features & NETIF_F_RXCSUM);
- if ((pkt->l2info & htonl(RXF_TCP_F)) &&
+ if (((pkt->l2info & htonl(RXF_TCP_F)) ||
+ tnl_hdr_len) &&
(q->netdev->features & NETIF_F_GRO) && csum_ok && !pkt->ip_frag) {
- do_gro(rxq, si, pkt);
+ do_gro(rxq, si, pkt, tnl_hdr_len);
return 0;
}
@@ -2471,7 +2514,13 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
} else if (pkt->l2info & htonl(RXF_IP_F)) {
__sum16 c = (__force __sum16)pkt->csum;
skb->csum = csum_unfold(c);
- skb->ip_summed = CHECKSUM_COMPLETE;
+
+ if (tnl_hdr_len) {
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+ skb->csum_level = 1;
+ } else {
+ skb->ip_summed = CHECKSUM_COMPLETE;
+ }
rxq->stats.rx_cso++;
}
} else {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index fe2029e9..09e38f0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -1233,6 +1233,11 @@ struct cpl_rx_pkt {
#define T6_COMPR_RXERR_SUM_V(x) ((x) << T6_COMPR_RXERR_SUM_S)
#define T6_COMPR_RXERR_SUM_F T6_COMPR_RXERR_SUM_V(1U)
+#define T6_RX_TNLHDR_LEN_S 8
+#define T6_RX_TNLHDR_LEN_M 0xFF
+#define T6_RX_TNLHDR_LEN_V(x) ((x) << T6_RX_TNLHDR_LEN_S)
+#define T6_RX_TNLHDR_LEN_G(x) (((x) >> T6_RX_TNLHDR_LEN_S) & T6_RX_TNLHDR_LEN_M)
+
struct cpl_trace_pkt {
u8 opcode;
u8 intf;
--
2.1.0
^ permalink raw reply related
* [PATCH net-next 1/3] cxgb4: Fix {vxlan/geneve}_port initialization
From: Ganesh Goudar @ 2018-05-11 13:04 UTC (permalink / raw)
To: netdev, davem
Cc: nirranjan, indranil, venkatesh, Arjun Vynipadath, Ganesh Goudar
From: Arjun Vynipadath <arjun@chelsio.com>
adapter->rawf_cnt was not initialized, thereby
ndo_udp_tunnel_{add/del} was returning immediately
without initializing {vxlan/geneve}_port.
Also initializes mps_encap_entry refcnt.
Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks")
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 21 +++++++++++++++++++++
drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 2 ++
2 files changed, 23 insertions(+)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 5e33780..a8aa233 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4276,6 +4276,20 @@ static int adap_init0(struct adapter *adap)
adap->tids.nftids = val[4] - val[3] + 1;
adap->sge.ingr_start = val[5];
+ if (CHELSIO_CHIP_VERSION(adap->params.chip) > CHELSIO_T5) {
+ /* Read the raw mps entries. In T6, the last 2 tcam entries
+ * are reserved for raw mac addresses (rawf = 2, one per port).
+ */
+ params[0] = FW_PARAM_PFVF(RAWF_START);
+ params[1] = FW_PARAM_PFVF(RAWF_END);
+ ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 2,
+ params, val);
+ if (ret == 0) {
+ adap->rawf_start = val[0];
+ adap->rawf_cnt = val[1] - val[0] + 1;
+ }
+ }
+
/* qids (ingress/egress) returned from firmware can be anywhere
* in the range from EQ(IQFLINT)_START to EQ(IQFLINT)_END.
* Hence driver needs to allocate memory for this range to
@@ -5181,6 +5195,7 @@ static void free_some_resources(struct adapter *adapter)
{
unsigned int i;
+ kvfree(adapter->mps_encap);
kvfree(adapter->smt);
kvfree(adapter->l2t);
kvfree(adapter->srq);
@@ -5687,6 +5702,12 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
adapter->params.offload = 0;
}
+ adapter->mps_encap = kvzalloc(sizeof(struct mps_encap_entry) *
+ adapter->params.arch.mps_tcam_size,
+ GFP_KERNEL);
+ if (!adapter->mps_encap)
+ dev_warn(&pdev->dev, "could not allocate MPS Encap entries, continuing\n");
+
#if IS_ENABLED(CONFIG_IPV6)
if ((CHELSIO_CHIP_VERSION(adapter->params.chip) <= CHELSIO_T5) &&
(!(t4_read_reg(adapter, LE_DB_CONFIG_A) & ASLIPCOMPEN_F))) {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index e3d4751..0e007ee 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -1305,6 +1305,8 @@ enum fw_params_param_pfvf {
FW_PARAMS_PARAM_PFVF_HPFILTER_END = 0x33,
FW_PARAMS_PARAM_PFVF_TLS_START = 0x34,
FW_PARAMS_PARAM_PFVF_TLS_END = 0x35,
+ FW_PARAMS_PARAM_PFVF_RAWF_START = 0x36,
+ FW_PARAMS_PARAM_PFVF_RAWF_END = 0x37,
FW_PARAMS_PARAM_PFVF_NCRYPTO_LOOKASIDE = 0x39,
FW_PARAMS_PARAM_PFVF_PORT_CAPS32 = 0x3A,
};
--
2.1.0
^ permalink raw reply related
* [PATCH net-next] erspan: auto detect truncated ipv6 packets.
From: William Tu @ 2018-05-11 12:49 UTC (permalink / raw)
To: netdev
Currently the truncated bit is set only when 1) the mirrored packet
is larger than mtu and 2) the ipv4 packet tot_len is larger than
the actual skb->len. This patch adds another case for detecting
whether ipv6 packet is truncated or not, by checking the ipv6 header
payload_len and the skb->len.
Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
net/ipv4/ip_gre.c | 6 ++++++
net/ipv6/ip6_gre.c | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index dfe5b22f6ed4..2409e648454d 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -579,6 +579,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
int version;
__be16 df;
int nhoff;
+ int thoff;
tun_info = skb_tunnel_info(skb);
if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
@@ -611,6 +612,11 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
(ntohs(ip_hdr(skb)->tot_len) > skb->len - nhoff))
truncate = true;
+ thoff = skb_transport_header(skb) - skb_mac_header(skb);
+ if (skb->protocol == htons(ETH_P_IPV6) &&
+ (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff))
+ truncate = true;
+
if (version == 1) {
erspan_build_header(skb, ntohl(tunnel_id_to_key32(key->tun_id)),
ntohl(md->u.index), truncate, true);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index b511818b268c..bede77f24784 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -897,6 +897,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
int err = -EINVAL;
__u32 mtu;
int nhoff;
+ int thoff;
if (!ip6_tnl_xmit_ctl(t, &t->parms.laddr, &t->parms.raddr))
goto tx_err;
@@ -914,6 +915,11 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
(ntohs(ip_hdr(skb)->tot_len) > skb->len - nhoff))
truncate = true;
+ thoff = skb_transport_header(skb) - skb_mac_header(skb);
+ if (skb->protocol == htons(ETH_P_IPV6) &&
+ (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff))
+ truncate = true;
+
if (skb_cow_head(skb, dev->needed_headroom))
goto tx_err;
--
2.7.4
^ permalink raw reply related
* Re: [PATCH][next] net: aquantia: fix unsigned numvecs comparison with less than zero
From: Igor Russkikh @ 2018-05-11 12:40 UTC (permalink / raw)
To: David Miller, colin.king
Cc: pavel.belous, weiyongjun1, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180510.175358.1296757435086922873.davem@davemloft.net>
>> Fixes: a09bd81b5413 ("net: aquantia: Limit number of vectors to actually allocated irqs")
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
>
> This doesn't apply to net-next.
>
Colin, believe thats because you should target to net, not net-next.
BR, Igor
^ permalink raw reply
* [PATCH iproute2] ip: do not drop capabilities if net_admin=i is set
From: Luca Boccassi @ 2018-05-11 12:39 UTC (permalink / raw)
To: netdev; +Cc: dsahern, luto, stephen, Luca Boccassi
Users have reported a regression due to ip now dropping capabilities
unconditionally.
zerotier-one VPN and VirtualBox use ambient capabilities in their
binary and then fork out to ip to set routes and links, and this
does not work anymore.
As a workaround, do not drop caps if CAP_NET_ADMIN (the most common
capability used by ip) is set with the INHERITABLE flag.
Users that want ip vrf exec to work do not need to set INHERITABLE,
which will then only set when the calling program had privileges to
give itself the ambient capability.
Fixes: ba2fc55b99f8 ("Drop capabilities if not running ip exec vrf with libcap")
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
Reported on Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898015
The reporter tested this patch and verified it solves the issue.
lib/utils.c | 15 ++++++++++++---
man/man8/ip-vrf.8 | 4 ++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/lib/utils.c b/lib/utils.c
index 8a0bff0b..7b2c6dd1 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -1612,14 +1612,23 @@ void drop_cap(void)
/* don't harmstring root/sudo */
if (getuid() != 0 && geteuid() != 0) {
cap_t capabilities;
+ cap_value_t net_admin = CAP_NET_ADMIN;
+ cap_flag_t inheritable = CAP_INHERITABLE;
+ cap_flag_value_t is_set;
capabilities = cap_get_proc();
if (!capabilities)
exit(EXIT_FAILURE);
- if (cap_clear(capabilities) != 0)
- exit(EXIT_FAILURE);
- if (cap_set_proc(capabilities) != 0)
+ if (cap_get_flag(capabilities, net_admin, inheritable,
+ &is_set) != 0)
exit(EXIT_FAILURE);
+ /* apps with ambient caps can fork and call ip */
+ if (is_set == CAP_CLEAR) {
+ if (cap_clear(capabilities) != 0)
+ exit(EXIT_FAILURE);
+ if (cap_set_proc(capabilities) != 0)
+ exit(EXIT_FAILURE);
+ }
cap_free(capabilities);
}
#endif
diff --git a/man/man8/ip-vrf.8 b/man/man8/ip-vrf.8
index 1a42cebe..c1c9b958 100644
--- a/man/man8/ip-vrf.8
+++ b/man/man8/ip-vrf.8
@@ -70,6 +70,10 @@ This command also requires to be ran as root or with the CAP_SYS_ADMIN,
CAP_NET_ADMIN and CAP_DAC_OVERRIDE capabilities. If built with libcap and if
capabilities are added to the ip binary program via setcap, the program will
drop them as the first thing when invoked, unless the command is vrf exec.
+.br
+NOTE: capabilities will NOT be dropped if CAP_NET_ADMIN is set to INHERITABLE
+to avoid breaking programs with ambient capabilities that call ip.
+Do not set the INHERITABLE flag on the ip binary itself.
.TP
.B ip vrf identify [PID] - Report VRF association for process
--
2.14.2
^ permalink raw reply related
* iproute2 - modifying routes in place
From: Ryan Whelan @ 2018-05-11 11:42 UTC (permalink / raw)
To: netdev
`ip route` has 2 subcommands that don't seem to work as expected and i'm
not sure if its a bug, or if i'm misunderstanding the semantics.
I am unable to modify a route 'in place', which from what i'm reading
online, I should be able to do with `ip route change` and/or `ip route
replace`.
After a route is created with either `ip route add` or `ip route replace`,
I am unable to use `change`, regardless of how I attempt to identify the
route i'm trying to alter.
# ip -6 route show
...
fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0 src
fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 1000 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s8 proto kernel metric 256 pref medium
fe80::/64 dev internal0 proto kernel metric 256 pref medium
If i try to change the metric of the route that already exist via `ip route
change`, i get "No such file or directory.
# ip -6 route change fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0
src fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 100
RTNETLINK answers: No such file or directory
If i use `replace`, the command does not error, but creates another route
instead of replacing the current route.
# ip -6 route replace fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0
src fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 100
# ip -6 route show
...
fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0 src
fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 100 pref medium
fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0 src
fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 1000 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s8 proto kernel metric 256 pref medium
fe80::/64 dev internal0 proto kernel metric 256 pref medium
What am I missing or doing wrong? Forgive me if i'm being dense, but i
could not find an answer online to explain this behavior.
If this is the wrong forum for this question, I apologize; please point me
in the right direction?
Linux: 4.16.6
iproute2: 4.15.0
Thank you!
^ permalink raw reply
* Re: net: hang in unregister_netdevice: waiting for lo to become free
From: Dan Streetman @ 2018-05-11 11:40 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Tommi Rantala, Neil Horman, Xin Long, David Ahern,
Daniel Borkmann, Cong Wang, David Miller, Eric Dumazet,
Willem de Bruijn, Jakub Kicinski, Rasmus Villemoes, netdev, LKML,
Alexey Kuznetsov, Hideaki YOSHIFUJI, syzkaller, Dan Streetman,
Eric W. Biederman, Alexey Kodanev
In-Reply-To: <CACT4Y+bexDms2oLbQFdvLjsd6whj2w2ioRN+PWeg2ZHjPK=jaQ@mail.gmail.com>
On Fri, May 11, 2018 at 5:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, May 10, 2018 at 12:23 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>>>>>> <tommi.t.rantala@nokia.com> wrote:
>>>>>>>>> On 20.02.2018 18:26, Neil Horman wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala
>>>>>>>>>>> <tommi.t.rantala@nokia.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 19.02.2018 20:59, Dmitry Vyukov wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is this meant to be fixed already? I am still seeing this on the
>>>>>>>>>>>>> latest upstream tree.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> These two commits are in v4.16-rc1:
>>>>>>>>>>>>
>>>>>>>>>>>> commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8
>>>>>>>>>>>> Author: Tommi Rantala <tommi.t.rantala@nokia.com>
>>>>>>>>>>>> Date: Mon Feb 5 21:48:14 2018 +0200
>>>>>>>>>>>>
>>>>>>>>>>>> sctp: fix dst refcnt leak in sctp_v4_get_dst
>>>>>>>>>>>> ...
>>>>>>>>>>>> Fixes: 410f03831 ("sctp: add routing output fallback")
>>>>>>>>>>>> Fixes: 0ca50d12f ("sctp: fix src address selection if using
>>>>>>>>>>>> secondary
>>>>>>>>>>>> addresses")
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2
>>>>>>>>>>>> Author: Alexey Kodanev <alexey.kodanev@oracle.com>
>>>>>>>>>>>> Date: Mon Feb 5 15:10:35 2018 +0300
>>>>>>>>>>>>
>>>>>>>>>>>> sctp: fix dst refcnt leak in sctp_v6_get_dst()
>>>>>>>>>>>> ...
>>>>>>>>>>>> Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using
>>>>>>>>>>>> secondary
>>>>>>>>>>>> addresses for ipv6")
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I guess we missed something if it's still reproducible.
>>>>>>>>>>>>
>>>>>>>>>>>> I can check it later this week, unless someone else beat me to it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Tommi,
>>>>>>>>>>>
>>>>>>>>>>> Hmmm, I can't claim that it's exactly the same bug. Perhaps it's
>>>>>>>>>>> another one then. But I am still seeing these:
>>>>>>>>>>>
>>>>>>>>>>> [ 58.799130] unregister_netdevice: waiting for lo to become free.
>>>>>>>>>>> Usage count = 4
>>>>>>>>>>> [ 60.847138] unregister_netdevice: waiting for lo to become free.
>>>>>>>>>>> Usage count = 4
>>>>>>>>>>> [ 62.895093] unregister_netdevice: waiting for lo to become free.
>>>>>>>>>>> Usage count = 4
>>>>>>>>>>> [ 64.943103] unregister_netdevice: waiting for lo to become free.
>>>>>>>>>>> Usage count = 4
>>>>>>>>>>>
>>>>>>>>>>> on upstream tree pulled ~12 hours ago.
>>>>>>>>>>>
>>>>>>>>>> Can you write a systemtap script to probe dev_hold, and dev_put, printing
>>>>>>>>>> out a
>>>>>>>>>> backtrace if the device name matches "lo". That should tell us
>>>>>>>>>> definitively if
>>>>>>>>>> the problem is in the same location or not
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Dmitry, I tested with the reproducer and the kernel .config file that you
>>>>>>>>> sent in the first email in this thread:
>>>>>>>>>
>>>>>>>>> With 4.16-rc2 unable to reproduce.
>>>>>>>>>
>>>>>>>>> With 4.15-rc9 bug reproducible, and I get "unregister_netdevice: waiting for
>>>>>>>>> lo to become free. Usage count = 3"
>>>>>>>>>
>>>>>>>>> With 4.15-rc9 and Alexey's "sctp: fix dst refcnt leak in sctp_v6_get_dst()"
>>>>>>>>> cherry-picked on top, unable to reproduce.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is syzkaller doing something else now to trigger the bug...?
>>>>>>>>> Can you still trigger the bug with the same reproducer?
>>>>>>>>
>>>>>>>> Hi Neil, Tommi,
>>>>>>>>
>>>>>>>> Reviving this old thread about "unregister_netdevice: waiting for lo
>>>>>>>> to become free. Usage count = 3" hangs.
>>>>>>>> I still did not have time to deep dive into what happens there (too
>>>>>>>> many bugs coming from syzbot). But this still actively happens and I
>>>>>>>> suspect accounts to a significant portion of various hang reports,
>>>>>>>> which are quite unpleasant.
>>>>>>>>
>>>>>>>> One idea that could make it all simpler:
>>>>>>>>
>>>>>>>> Is this wait loop in netdev_wait_allrefs() supposed to wait for any
>>>>>>>> prolonged periods of time under any non-buggy conditions? E.g. more
>>>>>>>> than 1-2 minutes?
>>>>>>>> If it only supposed to wait briefly for things that already supposed
>>>>>>>> to be shutting down, and we add a WARNING there after some timeout,
>>>>>>>> then syzbot will report all info how/when it happens, hopefully
>>>>>>>> extracting reproducers, and all the nice things.
>>>>>>>> But this WARNING should not have any false positives under any
>>>>>>>> realistic conditions (e.g. waiting for arrival of remote packets with
>>>>>>>> large timeouts).
>>>>>>>>
>>>>>>>> Looking at some task hung reports, it seems that this code holds some
>>>>>>>> mutexes, takes workqueue thread and prevents any progress with
>>>>>>>> destruction of other devices (and net namespace creation/destruction),
>>>>>>>> so I guess it should not wait for any indefinite periods of time?
>>>>>>>
>>>>>>> I'm working on this currently:
>>>>>>> https://bugs.launchpad.net/ubuntu/zesty/+source/linux/+bug/1711407
>>>>>>>
>>>>>>> I added a summary of what I've found to be the cause (or at least, one
>>>>>>> possible cause) of this:
>>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/comments/72
>>>>>>>
>>>>>>> I'm working on a patch to work around the main side-effect of this,
>>>>>>> which is hanging while holding the global net mutex. Hangs will still
>>>>>>> happen (e.g. if a dst leaks) but should not affect anything else,
>>>>>>> other than a leak of the dst and its net namespace.
>>>>>>>
>>>>>>> Fixing the dst leaks is important too, of course, but a dst leak (or
>>>>>>> other cause) shouldn't break the entire system.
>>>>>>
>>>>>> Leaking some memory is definitely better than hanging the system.
>>>>>>
>>>>>> So I've made syzkaller to recognize "unregister_netdevice: waiting for
>>>>>> (.*) to become free" as a kernel bug:
>>>>>> https://github.com/google/syzkaller/commit/7a67784ca8bdc3b26cce2f0ec9a40d2dd9ec9396
>>>>>> Unfortunately it does not make it catch these bugs because creating a
>>>>>> net namespace per test is too damn slow, so namespaces are reused for
>>>>>> lots of tests and when/if it's eventually destroyed it's already too
>>>>>> late to find root cause.
>>>>>>
>>>>>> But I've run a one-off experiment with prompt net namespace
>>>>>> destruction and syzkaller was able to easily extract a C reproducer:
>>>>>> https://gist.githubusercontent.com/dvyukov/d571e8fff24e127ca48a8c4790d42bfa/raw/52050e93ba9afbb5126b9d7bb39b7e71a82af016/gistfile1.txt
>>>>>>
>>>>>> On upstream 16e205cf42da1f497b10a4a24f563e6c0d574eec with this config:
>>>>>> https://gist.githubusercontent.com/dvyukov/9663c57443adb21f2795b92ef0829d62/raw/bbea0652e23746096dd56855a28f6c681aebcdee/gistfile1.txt
>>>>>>
>>>>>> this gives me:
>>>>>>
>>>>>> [ 83.183198] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 9
>>>>>> [ 85.231202] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 9
>>>>>> ...
>>>>>> [ 523.511205] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 9
>>>>>> ...
>>>>>>
>>>>>> This is generated from this syzkaller program:
>>>>>>
>>>>>> r0 = socket$inet6(0xa, 0x1, 0x84)
>>>>>> setsockopt$inet6_IPV6_XFRM_POLICY(r0, 0x29, 0x23,
>>>>>> &(0x7f0000000380)={{{@in6=@remote={0xfe, 0x80, [], 0xbb},
>>>>>> @in=@dev={0xac, 0x14, 0x14}, 0x0, 0x0, 0x0, 0x0, 0xa}, {}, {}, 0x0,
>>>>>> 0x0, 0x1}, {{@in=@local={0xac, 0x14, 0x14, 0xaa}, 0x0, 0x32}, 0x0,
>>>>>> @in=@local={0xac, 0x14, 0x14, 0xaa}, 0x3504}}, 0xe8)
>>>>>> bind$inet6(r0, &(0x7f0000000000)={0xa, 0x4e20}, 0x1c)
>>>>>> connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @dev={0xac, 0x14,
>>>>>> 0x14, 0xd}}, 0x10)
>>>>>> syz_emit_ethernet(0x3e, &(0x7f00000001c0)={@local={[0xaa, 0xaa, 0xaa,
>>>>>> 0xaa, 0xaa], 0xaa}, @dev={[0xaa, 0xaa, 0xaa, 0xaa, 0xaa]}, [],
>>>>>> {@ipv6={0x86dd, {0x0, 0x6, "50a09c", 0x8, 0xffffff11, 0x0,
>>>>>> @remote={0xfe, 0x80, [], 0xbb}, @local={0xfe, 0x80, [], 0xaa}, {[],
>>>>>> @udp={0x0, 0x4e20, 0x8}}}}}}, &(0x7f0000000040))
>>>>>>
>>>>>> So this seems to be related to IPv6 and/or xfrm and is potentially
>>>>>> caused by external packets (that syz_emit_ethernet call).
>>>>>
>>>>>
>>>>>
>>>>> Here is another repro which seems to be a different bug (note that it
>>>>> requires fault injection):
>>>>>
>>>>> https://gist.githubusercontent.com/dvyukov/1c56623016cc4c24a69d433c5114ad5b/raw/530478f571b195193101b912aa646948528baa8e/gistfile1.txt
>>>>>
>>>>> Dan, do you mind taking a look at them? Fixing these should eliminate
>>>>> root causes of these hangs/leaks.
>>>>
>>>> Yep I will look at them, thanks for the reproducers.
>>>
>>> Hi Dan,
>>>
>>> Any updates on this? syzbot is hitting this all the time.
>>
>> Sorry, the recent changes from net_mutex -> net_rwsem/pernet_ops_rwsem
>> have complicated what I had done to workaround this, but I'm still
>> working on it. Apologies for the delay.
>
> Are you looking at the mitigation? Or the bugs that trigger it? Or both?
Both - workaround first, as most important (and relatively easiest) is
allowing the system to continue to create and destroy net namespaces
once this happens (instead of hanging all further netns init/cleanup
until a system reboot).
Then, I do want to try to add some debug to make it easier to debug a
leaked dst (or other cause).
^ permalink raw reply
* [PATCH bpf] tools: bpf: fix NULL return handling in bpf__prepare_load
From: YueHaibing @ 2018-05-11 11:21 UTC (permalink / raw)
To: alexander.shishkin, mingo, peterz; +Cc: netdev, namhyung, YueHaibing
bpf_object__open()/bpf_object__open_buffer can return error pointer or NULL,
check the return values with IS_ERR_OR_NULL() in bpf__prepare_load and
bpf__prepare_load_buffer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
tools/perf/util/bpf-loader.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index af7ad81..cee6587 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -66,7 +66,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, const char *name)
}
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
- if (IS_ERR(obj)) {
+ if (IS_ERR_OR_NULL(obj)) {
pr_debug("bpf: failed to load buffer\n");
return ERR_PTR(-EINVAL);
}
@@ -102,14 +102,14 @@ struct bpf_object *bpf__prepare_load(const char *filename, bool source)
pr_debug("bpf: successfull builtin compilation\n");
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
- if (!IS_ERR(obj) && llvm_param.dump_obj)
+ if (!IS_ERR_OR_NULL(obj) && llvm_param.dump_obj)
llvm__dump_obj(filename, obj_buf, obj_buf_sz);
free(obj_buf);
} else
obj = bpf_object__open(filename);
- if (IS_ERR(obj)) {
+ if (IS_ERR_OR_NULL(obj)) {
pr_debug("bpf: failed to load %s\n", filename);
return obj;
}
--
2.7.0
^ permalink raw reply related
* [PATCH 16/32] net: convert datagram_poll users tp ->poll_mask
From: Christoph Hellwig @ 2018-05-11 11:07 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
linux-kernel
In-Reply-To: <20180511110803.10910-1-hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/isdn/mISDN/socket.c | 2 +-
drivers/net/ppp/pppoe.c | 2 +-
drivers/staging/ipx/af_ipx.c | 2 +-
include/linux/skbuff.h | 3 +--
include/net/udp.h | 2 +-
net/appletalk/ddp.c | 2 +-
net/ax25/af_ax25.c | 2 +-
net/bluetooth/hci_sock.c | 2 +-
net/can/bcm.c | 2 +-
net/can/raw.c | 2 +-
net/core/datagram.c | 13 ++++---------
net/decnet/af_decnet.c | 6 +++---
net/ieee802154/socket.c | 4 ++--
net/ipv4/af_inet.c | 6 +++---
net/ipv4/udp.c | 10 +++++-----
net/ipv6/af_inet6.c | 2 +-
net/ipv6/raw.c | 4 ++--
net/kcm/kcmsock.c | 10 +++++-----
net/key/af_key.c | 2 +-
net/l2tp/l2tp_ip.c | 2 +-
net/l2tp/l2tp_ip6.c | 2 +-
net/l2tp/l2tp_ppp.c | 2 +-
net/llc/af_llc.c | 2 +-
net/netlink/af_netlink.c | 2 +-
net/netrom/af_netrom.c | 2 +-
net/nfc/rawsock.c | 4 ++--
net/packet/af_packet.c | 9 ++++-----
net/phonet/socket.c | 2 +-
net/qrtr/qrtr.c | 2 +-
net/rose/af_rose.c | 2 +-
net/x25/af_x25.c | 2 +-
31 files changed, 52 insertions(+), 59 deletions(-)
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 18c0a1281914..98f90aadd141 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -588,7 +588,7 @@ static const struct proto_ops data_sock_ops = {
.getname = data_sock_getname,
.sendmsg = mISDN_sock_sendmsg,
.recvmsg = mISDN_sock_recvmsg,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
.setsockopt = data_sock_setsockopt,
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 7df07337d69c..40d0c80fa6ef 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1122,7 +1122,7 @@ static const struct proto_ops pppoe_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = pppoe_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c
index 5703dd176787..208b5c161631 100644
--- a/drivers/staging/ipx/af_ipx.c
+++ b/drivers/staging/ipx/af_ipx.c
@@ -1965,7 +1965,7 @@ static const struct proto_ops ipx_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = ipx_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = ipx_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ipx_compat_ioctl,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9065477ed255..89198379b39d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3250,8 +3250,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
int *peeked, int *off, int *err);
struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
int *err);
-__poll_t datagram_poll(struct file *file, struct socket *sock,
- struct poll_table_struct *wait);
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events);
int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
struct iov_iter *to, int size);
static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
diff --git a/include/net/udp.h b/include/net/udp.h
index 0676b272f6ac..61389a29334b 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -276,7 +276,7 @@ int udp_init_sock(struct sock *sk);
int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
int __udp_disconnect(struct sock *sk, int flags);
int udp_disconnect(struct sock *sk, int flags);
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events);
struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
netdev_features_t features,
bool is_ipv6);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 9b6bc5abe946..55fdba05d7d9 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1869,7 +1869,7 @@ static const struct proto_ops atalk_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = atalk_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = atalk_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = atalk_compat_ioctl,
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 2b41366fcad2..b7cd97325c66 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1954,7 +1954,7 @@ static const struct proto_ops ax25_proto_ops = {
.socketpair = sock_no_socketpair,
.accept = ax25_accept,
.getname = ax25_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = ax25_ioctl,
.listen = ax25_listen,
.shutdown = ax25_shutdown,
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 1506e1632394..d6c099861538 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1975,7 +1975,7 @@ static const struct proto_ops hci_sock_ops = {
.sendmsg = hci_sock_sendmsg,
.recvmsg = hci_sock_recvmsg,
.ioctl = hci_sock_ioctl,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
.setsockopt = hci_sock_setsockopt,
diff --git a/net/can/bcm.c b/net/can/bcm.c
index ac5e5e34fee3..30c51e0ce294 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1669,7 +1669,7 @@ static const struct proto_ops bcm_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = can_ioctl, /* use can_ioctl() from af_can.c */
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/can/raw.c b/net/can/raw.c
index 1051eee82581..fd7e2f49ea6a 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -843,7 +843,7 @@ static const struct proto_ops raw_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = raw_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = can_ioctl, /* use can_ioctl() from af_can.c */
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9938952c5c78..f19bf3dc2bd6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -819,9 +819,8 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
/**
* datagram_poll - generic datagram poll
- * @file: file struct
* @sock: socket
- * @wait: poll table
+ * @events to wait for
*
* Datagram poll: Again totally generic. This also handles
* sequenced packet sockets providing the socket receive queue
@@ -831,14 +830,10 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
* and you use a different write policy from sock_writeable()
* then please supply your own write_space callback.
*/
-__poll_t datagram_poll(struct file *file, struct socket *sock,
- poll_table *wait)
+__poll_t datagram_poll_mask(struct socket *sock, __poll_t events)
{
struct sock *sk = sock->sk;
- __poll_t mask;
-
- sock_poll_wait(file, sk_sleep(sk), wait);
- mask = 0;
+ __poll_t mask = 0;
/* exceptional events? */
if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -871,4 +866,4 @@ __poll_t datagram_poll(struct file *file, struct socket *sock,
return mask;
}
-EXPORT_SYMBOL(datagram_poll);
+EXPORT_SYMBOL(datagram_poll_mask);
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 32751602767f..2af6470d73ce 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1207,11 +1207,11 @@ static int dn_getname(struct socket *sock, struct sockaddr *uaddr,int peer)
}
-static __poll_t dn_poll(struct file *file, struct socket *sock, poll_table *wait)
+static __poll_t dn_poll_mask(struct socket *sock, __poll_t events)
{
struct sock *sk = sock->sk;
struct dn_scp *scp = DN_SK(sk);
- __poll_t mask = datagram_poll(file, sock, wait);
+ __poll_t mask = datagram_poll_mask(sock, events);
if (!skb_queue_empty(&scp->other_receive_queue))
mask |= EPOLLRDBAND;
@@ -2344,7 +2344,7 @@ static const struct proto_ops dn_proto_ops = {
.socketpair = sock_no_socketpair,
.accept = dn_accept,
.getname = dn_getname,
- .poll = dn_poll,
+ .poll_mask = dn_poll_mask,
.ioctl = dn_ioctl,
.listen = dn_listen,
.shutdown = dn_shutdown,
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index a60658c85a9a..a0768d2759b8 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -423,7 +423,7 @@ static const struct proto_ops ieee802154_raw_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = ieee802154_sock_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
@@ -969,7 +969,7 @@ static const struct proto_ops ieee802154_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = ieee802154_sock_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 220b51347526..030a0fcffdbf 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1019,7 +1019,7 @@ const struct proto_ops inet_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = inet_getname,
- .poll = udp_poll,
+ .poll_mask = udp_poll_mask,
.ioctl = inet_ioctl,
.listen = sock_no_listen,
.shutdown = inet_shutdown,
@@ -1040,7 +1040,7 @@ EXPORT_SYMBOL(inet_dgram_ops);
/*
* For SOCK_RAW sockets; should be the same as inet_dgram_ops but without
- * udp_poll
+ * udp_poll_mask
*/
static const struct proto_ops inet_sockraw_ops = {
.family = PF_INET,
@@ -1051,7 +1051,7 @@ static const struct proto_ops inet_sockraw_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = inet_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = inet_ioctl,
.listen = sock_no_listen,
.shutdown = inet_shutdown,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 24b5c59b1c53..34a2cd7290dc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2498,7 +2498,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
* udp_poll - wait for a UDP event.
* @file - file struct
* @sock - socket
- * @wait - poll table
+ * @events - events to wait for
*
* This is same as datagram poll, except for the special case of
* blocking sockets. If application is using a blocking fd
@@ -2507,23 +2507,23 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
* but then block when reading it. Add special case code
* to work around these arguably broken applications.
*/
-__poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t udp_poll_mask(struct socket *sock, __poll_t events)
{
- __poll_t mask = datagram_poll(file, sock, wait);
+ __poll_t mask = datagram_poll_mask(sock, events);
struct sock *sk = sock->sk;
if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
mask |= EPOLLIN | EPOLLRDNORM;
/* Check for false positives due to checksum errors */
- if ((mask & EPOLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
+ if ((mask & EPOLLRDNORM) && !(sock->file->f_flags & O_NONBLOCK) &&
!(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
mask &= ~(EPOLLIN | EPOLLRDNORM);
return mask;
}
-EXPORT_SYMBOL(udp_poll);
+EXPORT_SYMBOL(udp_poll_mask);
int udp_abort(struct sock *sk, int err)
{
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a43d967eeca5..a64c4f070aae 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -602,7 +602,7 @@ const struct proto_ops inet6_dgram_ops = {
.socketpair = sock_no_socketpair, /* a do nothing */
.accept = sock_no_accept, /* a do nothing */
.getname = inet6_getname,
- .poll = udp_poll, /* ok */
+ .poll_mask = udp_poll_mask, /* ok */
.ioctl = inet6_ioctl, /* must change */
.listen = sock_no_listen, /* ok */
.shutdown = inet_shutdown, /* ok */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 5eb9b08947ed..4a73ea1ddd51 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1345,7 +1345,7 @@ void raw6_proc_exit(void)
}
#endif /* CONFIG_PROC_FS */
-/* Same as inet6_dgram_ops, sans udp_poll. */
+/* Same as inet6_dgram_ops, sans udp_poll_mask. */
const struct proto_ops inet6_sockraw_ops = {
.family = PF_INET6,
.owner = THIS_MODULE,
@@ -1355,7 +1355,7 @@ const struct proto_ops inet6_sockraw_ops = {
.socketpair = sock_no_socketpair, /* a do nothing */
.accept = sock_no_accept, /* a do nothing */
.getname = inet6_getname,
- .poll = datagram_poll, /* ok */
+ .poll_mask = datagram_poll_mask, /* ok */
.ioctl = inet6_ioctl, /* must change */
.listen = sock_no_listen, /* ok */
.shutdown = inet_shutdown, /* ok */
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index dc76bc346829..d67734c99027 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1336,9 +1336,9 @@ static void init_kcm_sock(struct kcm_sock *kcm, struct kcm_mux *mux)
struct list_head *head;
int index = 0;
- /* For SOCK_SEQPACKET sock type, datagram_poll checks the sk_state, so
- * we set sk_state, otherwise epoll_wait always returns right away with
- * EPOLLHUP
+ /* For SOCK_SEQPACKET sock type, datagram_poll_mask checks the sk_state,
+ * so we set sk_state, otherwise epoll_wait always returns right away
+ * with EPOLLHUP
*/
kcm->sk.sk_state = TCP_ESTABLISHED;
@@ -1903,7 +1903,7 @@ static const struct proto_ops kcm_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = kcm_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
@@ -1924,7 +1924,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = kcm_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 7e2e7188e7f4..7654607e728b 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3726,7 +3726,7 @@ static const struct proto_ops pfkey_ops = {
/* Now the operations that really occur. */
.release = pfkey_release,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.sendmsg = pfkey_sendmsg,
.recvmsg = pfkey_recvmsg,
};
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index a9c05b2bc1b0..181073bf6925 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -613,7 +613,7 @@ static const struct proto_ops l2tp_ip_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = l2tp_ip_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = inet_ioctl,
.listen = sock_no_listen,
.shutdown = inet_shutdown,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 957369192ca1..336e4c00abbc 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -754,7 +754,7 @@ static const struct proto_ops l2tp_ip6_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = l2tp_ip6_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = inet6_ioctl,
.listen = sock_no_listen,
.shutdown = inet_shutdown,
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 1fd9e145076a..ef1f46aa6414 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1806,7 +1806,7 @@ static const struct proto_ops pppol2tp_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = pppol2tp_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
.setsockopt = pppol2tp_setsockopt,
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index cb80ebb38311..c75ec214415d 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -1189,7 +1189,7 @@ static const struct proto_ops llc_ui_ops = {
.socketpair = sock_no_socketpair,
.accept = llc_ui_accept,
.getname = llc_ui_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = llc_ui_ioctl,
.listen = llc_ui_listen,
.shutdown = llc_ui_shutdown,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 55342c4d5cec..22b30278903b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2673,7 +2673,7 @@ static const struct proto_ops netlink_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = netlink_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = netlink_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 4221d98a314b..96b45549f4ac 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1367,7 +1367,7 @@ static const struct proto_ops nr_proto_ops = {
.socketpair = sock_no_socketpair,
.accept = nr_accept,
.getname = nr_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = nr_ioctl,
.listen = nr_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index e2188deb08dc..60c322531c49 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -284,7 +284,7 @@ static const struct proto_ops rawsock_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
@@ -304,7 +304,7 @@ static const struct proto_ops rawsock_raw_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 01f3515cada0..f1d6a351a111 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4108,12 +4108,11 @@ static int packet_ioctl(struct socket *sock, unsigned int cmd,
return 0;
}
-static __poll_t packet_poll(struct file *file, struct socket *sock,
- poll_table *wait)
+static __poll_t packet_poll_mask(struct socket *sock, __poll_t events)
{
struct sock *sk = sock->sk;
struct packet_sock *po = pkt_sk(sk);
- __poll_t mask = datagram_poll(file, sock, wait);
+ __poll_t mask = datagram_poll_mask(sock, events);
spin_lock_bh(&sk->sk_receive_queue.lock);
if (po->rx_ring.pg_vec) {
@@ -4455,7 +4454,7 @@ static const struct proto_ops packet_ops_spkt = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = packet_getname_spkt,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = packet_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
@@ -4476,7 +4475,7 @@ static const struct proto_ops packet_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = packet_getname,
- .poll = packet_poll,
+ .poll_mask = packet_poll_mask,
.ioctl = packet_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index f9b40e6a18a5..9ecf02def928 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -448,7 +448,7 @@ const struct proto_ops phonet_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = pn_socket_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = pn_socket_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 2aa07b547b16..1b5025ea5b04 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -1023,7 +1023,7 @@ static const struct proto_ops qrtr_proto_ops = {
.recvmsg = qrtr_recvmsg,
.getname = qrtr_getname,
.ioctl = qrtr_ioctl,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.shutdown = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
.getsockopt = sock_no_getsockopt,
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 9ff5e0a76593..ecd8bc5f5f7e 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1482,7 +1482,7 @@ static const struct proto_ops rose_proto_ops = {
.socketpair = sock_no_socketpair,
.accept = rose_accept,
.getname = rose_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = rose_ioctl,
.listen = rose_listen,
.shutdown = sock_no_shutdown,
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index d49aa79b7997..f93365ae0fdd 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1750,7 +1750,7 @@ static const struct proto_ops x25_proto_ops = {
.socketpair = sock_no_socketpair,
.accept = x25_accept,
.getname = x25_getname,
- .poll = datagram_poll,
+ .poll_mask = datagram_poll_mask,
.ioctl = x25_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = compat_x25_ioctl,
--
2.17.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox