* Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
From: Ben Pfaff @ 2016-12-06 0:59 UTC (permalink / raw)
To: Michał Mirosław
Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <20161205225247.e3dd6dcw3ryjjlp2-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>
On Mon, Dec 05, 2016 at 11:52:47PM +0100, Michał Mirosław wrote:
> On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> > On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > > > > intact through linux networking stack.
> > > > This appears to change the established Open vSwitch userspace API. You
> > > > can see that simply from the way that it changes the documentation for
> > > > the userspace API. If I'm right about that, then this change will break
> > > > all userspace programs that use the Open vSwitch kernel module,
> > > > including Open vSwitch itself.
> > >
> > > If I understood the code correctly, it does change expected meaning for
> > > the (unlikely?) case of header truncated just before the VLAN TCI - it will
> > > be impossible to differentiate this case from the VLAN TCI == 0.
> > >
> > > I guess this is a problem with OVS API, because it doesn't directly show
> > > the "missing" state of elements, but relies on an "invalid" value.
> >
> > That particular corner case should not be a huge problem in any case.
> >
> > The real problem is that this appears to break the common case use of
> > VLANs in Open vSwitch. After this patch, parse_vlan() in
> > net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> > accelerated version of them or the version in the skb data) into
> > sw_flow_key members. OK, that's fine on it's own. However, I don't see
> > any corresponding change to the code in flow_netlink.c to compensate for
> > the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> > was always required to be set to 1 in flow matches inside Netlink
> > messages sent from userspace, and the kernel always set it to 1 in
> > corresponding messages sent to userspace.
> >
> > In other words, if I'm reading this change correctly:
> >
> > * With a kernel before this change, userspace always had to set
> > VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
> > reject the flow match.
> >
> > * With a kernel after this change, userspace must not set
> > VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
> > match but nothing will ever match because packets do not actually
> > have the CFI bit set.
> >
> > Take a look at this code that the patch deletes from
> > validate_vlan_from_nlattrs(), for example, and see how it insisted that
> > VLAN_TAG_PRESENT was set:
> >
> > if (!(tci & htons(VLAN_TAG_PRESENT))) {
> > if (tci) {
> > OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
> > (inner) ? "C-VLAN" : "VLAN");
> > return -EINVAL;
> > } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> > /* Corner case for truncated VLAN header. */
> > OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> > (inner) ? "C-VLAN" : "VLAN");
> > return -EINVAL;
> > }
> > }
> >
> > Please let me know if I'm overlooking something.
>
> Hmm. So the easiest change without disrupting current userspace, would be
> to flip the CFI bit on the way to/from OVS userspace. Does this seem
> correct?
That sounds correct. (The bit should not be flipped in the mask.)
Thanks,
Ben.
_______________________________________________
dev mailing list
dev@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
^ permalink raw reply
* Re: [patch net v3] net: fec: fix compile with CONFIG_M5272
From: David Miller @ 2016-12-06 0:56 UTC (permalink / raw)
To: nikita.yoush
Cc: fugang.duan, troy.kisky, andrew, eric, tremyfr, johannes, netdev,
cphealy, fabio.estevam, linux-kernel
In-Reply-To: <1480959661-29834-1-git-send-email-nikita.yoush@cogentembedded.com>
From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date: Mon, 5 Dec 2016 20:41:01 +0300
> Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
> introduced unconditional statistics-related actions.
I do not see this commit in any of my trees:
[davem@localhost net-next]$ git describe 4dfb80d18d05
fatal: Not a valid object name 4dfb80d18d05
[davem@localhost net-next]$ cd ../net
[davem@localhost net]$ git describe 4dfb80d18d05
fatal: Not a valid object name 4dfb80d18d05
[davem@localhost net]$
^ permalink raw reply
* Re: [PATCH net-next 4/7] liquidio CN23XX: VF scatter gather lists
From: David Miller @ 2016-12-06 0:48 UTC (permalink / raw)
To: rvatsavayi
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
felix.manlunas
In-Reply-To: <1480929318-8511-5-git-send-email-rvatsavayi@caviumnetworks.com>
From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Mon, 5 Dec 2016 01:15:15 -0800
> + kfree((void *)lio->glist);
> + kfree((void *)lio->glist_lock);
> +}
...
> + if (!lio->glist) {
> + kfree((void *)lio->glist_lock);
> + return 1;
> + }
These void casts are unnecessary, please remove them.
^ permalink raw reply
* Re: [v3 PATCH] netlink: Do not schedule work from sk_destruct
From: David Miller @ 2016-12-06 0:44 UTC (permalink / raw)
To: herbert
Cc: xiyou.wangcong, andreyknvl, johannes.berg, fw, edumazet, me, tom,
decot, netdev, linux-kernel
In-Reply-To: <20161205072820.GB10204@gondor.apana.org.au>
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 5 Dec 2016 15:28:21 +0800
> It is wrong to schedule a work from sk_destruct using the socket
> as the memory reserve because the socket will be freed immediately
> after the return from sk_destruct.
>
> Instead we should do the deferral prior to sk_free.
>
> This patch does just that.
>
> Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Applied, thanks Herbert.
^ permalink raw reply
* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: Andy Lutomirski @ 2016-12-06 0:36 UTC (permalink / raw)
To: John Stultz
Cc: Alexei Starovoitov, Andy Lutomirski, Mickaël Salaün,
Daniel Mack, David S. Miller, kafai-b10kYP2dOMg, Florian Westphal,
Harald Hoyer, Network Development, Sargun Dhillon,
Pablo Neira Ayuso, lkml, Tejun Heo, Li Zefan, Jonathan Corbet,
open list:CONTROL GROUP (CGROUP), Android Kernel Team,
Rom Lemarchand, Colin Cross, Dmitry Shmidt
In-Reply-To: <CALAqxLW-mq4Lnudtn5KMBPdzBTg5bhTXV9QmUC2vfabVru+fUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Mon, Dec 5, 2016 at 4:28 PM, John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>>> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>>>
>>>>> I hate to say it, but I think I may see a problem. Current
>>>>> developments are afoot to make cgroups do more than resource control.
>>>>> For example, there's Landlock and there's Daniel's ingress/egress
>>>>> filter thing. Current cgroup controllers can mostly just DoS their
>>>>> controlled processes. These new controllers (or controller-like
>>>>> things) can exfiltrate data and change semantics.
>>>>>
>>>>> Does anyone have a security model in mind for these controllers and
>>>>> the cgroups that they're attached to? I'm reasonably confident that
>>>>> CAP_SYS_RESOURCE is not the answer...
>>>>
>>>> and specifically the answer is... ?
>>>> Also would be great if you start with specifying the question first
>>>> and the problem you're trying to solve.
>>>>
>>>
>>> I don't have a good answer right now. Here are some constraints, though:
>>>
>>> 1. An insufficiently privileged process should not be able to move a
>>> victim into a dangerous cgroup.
>>>
>>> 2. An insufficiently privileged process should not be able to move
>>> itself into a dangerous cgroup and then use execve to gain privilege
>>> such that the execve'd program can be compromised.
>>>
>>> 3. An insufficiently privileged process should not be able to make an
>>> existing cgroup dangerous in a way that could compromise a victim in
>>> that cgroup.
>>>
>>> 4. An insufficiently privileged process should not be able to make a
>>> cgroup dangerous in a way that bypasses protections that would
>>> otherwise protect execve() as used by itself or some other process in
>>> that cgroup.
>>>
>>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>>> addition to the cgroup being controlled.
>>
>> Sorry for taking awhile to get back to you here. I'm a little
>> befuddled as to what next steps I should consider (and honestly, I'm
>> not totally sure I really grok your concern here, particularly what
>> you mean with "dangrous cgroups").
>>
>> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
>> separate "sufficiently" from "insufficiently privileged") better?
>>
>> Or something closer to the original method Android used of each cgroup
>> having an allow_attach() check which could determine what is
>> sufficiently privledged for the respective level of danger the cgroup
>> might poise?
>>
>> Or just stepping back, what method would you imagine to be reasonable
>> to allow a specified task to migrate other tasks between cgroups
>> without it having to be root/suid?
>
> Any suggested feedback here?
I really don't know. The cgroupfs interface is a bit unfortunate in
that it doesn't really express the constraints. To safely migrate a
task, ISTM you ought to have some form of privilege over the task
*and* some form of privilege over the cgroup. cgroupfs only handles
the latter.
CAP_CGROUP_MIGRATE ought to be okay. Or maybe cgroupfs needs to gain
a concept of "dangerous" cgroups and further restrict them and
CAP_SYS_RESOURCE should be fine for non-dangerous cgroups? I think I
favor the latter, but it might be nice to hear from Tejun first.
--Andy
^ permalink raw reply
* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: John Stultz @ 2016-12-06 0:28 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Alexei Starovoitov, Andy Lutomirski, Mickaël Salaün,
Daniel Mack, David S. Miller, kafai, Florian Westphal,
Harald Hoyer, Network Development, Sargun Dhillon,
Pablo Neira Ayuso, lkml, Tejun Heo, Li Zefan, Jonathan Corbet,
open list:CONTROL GROUP (CGROUP), Android Kernel Team,
Rom Lemarchand, Colin Cross, Dmitry Shmidt
In-Reply-To: <CALAqxLXk+A9=A0oXGEduphXOQdbMg-wuGxmLkEk9cPHyn44X5Q@mail.gmail.com>
On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>>
>>>> I hate to say it, but I think I may see a problem. Current
>>>> developments are afoot to make cgroups do more than resource control.
>>>> For example, there's Landlock and there's Daniel's ingress/egress
>>>> filter thing. Current cgroup controllers can mostly just DoS their
>>>> controlled processes. These new controllers (or controller-like
>>>> things) can exfiltrate data and change semantics.
>>>>
>>>> Does anyone have a security model in mind for these controllers and
>>>> the cgroups that they're attached to? I'm reasonably confident that
>>>> CAP_SYS_RESOURCE is not the answer...
>>>
>>> and specifically the answer is... ?
>>> Also would be great if you start with specifying the question first
>>> and the problem you're trying to solve.
>>>
>>
>> I don't have a good answer right now. Here are some constraints, though:
>>
>> 1. An insufficiently privileged process should not be able to move a
>> victim into a dangerous cgroup.
>>
>> 2. An insufficiently privileged process should not be able to move
>> itself into a dangerous cgroup and then use execve to gain privilege
>> such that the execve'd program can be compromised.
>>
>> 3. An insufficiently privileged process should not be able to make an
>> existing cgroup dangerous in a way that could compromise a victim in
>> that cgroup.
>>
>> 4. An insufficiently privileged process should not be able to make a
>> cgroup dangerous in a way that bypasses protections that would
>> otherwise protect execve() as used by itself or some other process in
>> that cgroup.
>>
>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>> addition to the cgroup being controlled.
>
> Sorry for taking awhile to get back to you here. I'm a little
> befuddled as to what next steps I should consider (and honestly, I'm
> not totally sure I really grok your concern here, particularly what
> you mean with "dangrous cgroups").
>
> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> separate "sufficiently" from "insufficiently privileged") better?
>
> Or something closer to the original method Android used of each cgroup
> having an allow_attach() check which could determine what is
> sufficiently privledged for the respective level of danger the cgroup
> might poise?
>
> Or just stepping back, what method would you imagine to be reasonable
> to allow a specified task to migrate other tasks between cgroups
> without it having to be root/suid?
Any suggested feedback here?
thanks
-john
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Florian Westphal @ 2016-12-06 0:20 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Willem de Bruijn, Pablo Neira Ayuso, netfilter-devel,
Network Development, Daniel Borkmann, Florian Westphal,
Eric Dumazet
In-Reply-To: <CAF=yD-J-vcQQ_-48xG8qhMuYvLvucCr7SG99rsfJA5-okvhHzA@mail.gmail.com>
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> While we're discussing the patch, another question, about revisions: I
> tested both modified and original iptables binaries on both standard
> and modified kernels. It all works as expected, except for the case
> where both binaries are used on a single kernel. For instance:
>
> iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
> 8000'`" -j LOG
> ./iptables.new -L
>
> Here the new binary will interpret the object as xt_bpf_match_v1, but
> iptables has inserted xt_bpf_match. The same problem happens the other
> way around. A new binary can be made robust to detect old structs, but
> not the other way around. Specific to bpf, the existing xt_bpf code
> has an unfortunate bug that it always prints at least one line of
> code, even if ->bpf_program_num_elems == 0.
>
> I notice that other extensions also do not necessarily only extend
> struct vN in vN+1. Is the above a known issue?
Yes, I guess noone ever bothered to fix this.
The kernel blob should contain the match/target revision number,
so userspace can in fact see that 'this is bpf v42', but iirc
the netfilter userspace just loads the highest userspace revision
supported by the kernel (which is then different for the 2 iptables
binaries).
But we *could* display message like 'kernel uses revision 2 but I can
only find 0 and 1' or fall back to the lower supported revision without
guess-the-struct-by-size games.
^ permalink raw reply
* Oops with CONFIG_VMAP_STCK and bond device + virtio-net
From: Laura Abbott @ 2016-12-05 23:53 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang
Cc: zbyszek, virtualization, netdev, Linux Kernel Mailing List
Hi,
Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1401612
In qemu with two virtio-net interfaces:
$ ip l
...
5: ens14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:e9:64:41 brd ff:ff:ff:ff:ff:ff
6: ens15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:e9:64:42 brd ff:ff:ff:ff:ff:ff
$ sudo ip link add bond1 type bond
$ sudo ip link set ens14 master bond1
Segmentation fault
------------[ cut here ]------------
kernel BUG at ./include/linux/scatterlist.h:140!
invalid opcode: 0000 [#1] SMP
Modules linked in: bonding ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip
ata_generic crc32c_intel qxl drm_kms_helper virtio_pci serio_raw ttm drm pata_acpi
CPU: 5 PID: 1983 Comm: ip Not tainted 4.9.0-0.rc6.git2.1.fc26.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
task: ffff9d50a3583240 task.stack: ffffb06e41040000
RIP: 0010:[<ffffffffbc4896fc>] [<ffffffffbc4896fc>] sg_init_one+0x8c/0xa0
RSP: 0018:ffffb06e41043698 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffb06e41043774 RCX: 0000000000000028
RDX: 0000131ec1043774 RSI: 0000000000000013 RDI: ffffb06ec1043774
RBP: ffffb06e410436b0 R08: 00000000001ddbe0 R09: ffffb06e410436c8
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000006
R13: ffffb06e410436c8 R14: ffff9d50b2dc1800 R15: ffff9d50b3db9600
FS: 00007f15347e5700(0000) GS:ffff9d50bb000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc09bc4000 CR3: 0000000135797000 CR4: 00000000000406e0
Stack:
ffff9d50b229d000 0000000000000000 ffffb06e41043772 ffffb06e41043720
ffffffffc0051123 ffff9d50a3583240 0000000087654321 0000000000000002
0000000000000000 0000000000000000 0000000000000000 000000007b8f5301
Call Trace:
[<ffffffffc0051123>] virtnet_set_mac_address+0xb3/0x140 [virtio_net]
[<ffffffffbc7ae305>] dev_set_mac_address+0x55/0xc0
[<ffffffffc03f319e>] bond_enslave+0x34e/0x1180 [bonding]
[<ffffffffbc7ca22f>] do_setlink+0x6cf/0xd10
[<ffffffffbc20dd6a>] ? get_page_from_freelist+0x6ba/0xca0
[<ffffffffbc037de9>] ? sched_clock+0x9/0x10
[<ffffffffbc068475>] ? kvm_sched_clock_read+0x25/0x40
[<ffffffffbc111ed6>] ? __lock_acquire+0x346/0x1290
[<ffffffffbc4aa436>] ? nla_parse+0xa6/0x120
[<ffffffffbc7ce9e8>] rtnl_newlink+0x5c8/0x870
[<ffffffffbc3ecb32>] ? avc_has_perm_noaudit+0x32/0x210
[<ffffffffbc0bbfca>] ? ns_capable_common+0x7a/0x90
[<ffffffffbc0bbff3>] ? ns_capable+0x13/0x20
[<ffffffffbc7ced76>] rtnetlink_rcv_msg+0xe6/0x210
[<ffffffffbc7c951b>] ? rtnetlink_rcv+0x1b/0x40
[<ffffffffbc7c951b>] ? rtnetlink_rcv+0x1b/0x40
[<ffffffffbc7cec90>] ? rtnl_newlink+0x870/0x870
[<ffffffffbc7f7394>] netlink_rcv_skb+0xa4/0xc0
[<ffffffffbc7c952a>] rtnetlink_rcv+0x2a/0x40
[<ffffffffbc7f6d07>] netlink_unicast+0x1f7/0x2f0
[<ffffffffbc7f6c7f>] ? netlink_unicast+0x16f/0x2f0
[<ffffffffbc7f7102>] netlink_sendmsg+0x302/0x3c0
[<ffffffffbc790c28>] sock_sendmsg+0x38/0x50
[<ffffffffbc791773>] ___sys_sendmsg+0x2e3/0x2f0
[<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
[<ffffffffbc068475>] ? kvm_sched_clock_read+0x25/0x40
[<ffffffffbc037de9>] ? sched_clock+0x9/0x10
[<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
[<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
[<ffffffffbc111775>] ? trace_hardirqs_on_caller+0xf5/0x1b0
[<ffffffffbc7924b4>] __sys_sendmsg+0x54/0x90
[<ffffffffbc792502>] SyS_sendmsg+0x12/0x20
[<ffffffffbc003eec>] do_syscall_64+0x6c/0x1f0
[<ffffffffbc917589>] entry_SYSCALL64_slow_path+0x25/0x25
Code: ca 75 2c 49 8b 55 08 f6 c2 01 75 25 83 e2 03 81 e3 ff 0f 00 00 45 89 65 14 48
RIP [<ffffffffbc4896fc>] sg_init_one+0x8c/0xa0
RSP <ffffb06e41043698>
---[ end trace 9076d2284efbf735 ]---
This looks like an issue with CONFIG_VMAP_STACK since bond_enslave uses
struct sockaddr from the stack and virtnet_set_mac_address calls
sg_init_one which triggers BUG_ON(!virt_addr_valid(buf));
I know there have been a lot of CONFIG_VMAP_STACK fixes around but I
didn't find this one reported yet.
Thanks,
Laura
^ permalink raw reply
* Re: wl1251 & mac address & calibration data
From: Tony Lindgren @ 2016-12-05 23:51 UTC (permalink / raw)
To: Pali Rohár
Cc: Aaro Koskinen, Sebastian Reichel, Pavel Machek, Michal Kazior,
Kalle Valo, Ivaylo Dimitrov, linux-wireless, Network Development,
linux-kernel
In-Reply-To: <201611261820.47995@pali>
* Pali Rohár <pali.rohar@gmail.com> [161126 09:21]:
> On Thursday 24 November 2016 19:46:01 Aaro Koskinen wrote:
> > Hi,
> >
> > On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote:
> > > Proprietary, signed and closed bootloader NOLO does not support DT.
> > > So for booting you need to append DTS file to kernel image.
> > >
> > > U-Boot is optional and can be used as intermediate bootloader
> > > between NOLO and kernel. But still it has problems with reading
> > > from nand, so cannot read NVS data nor MAC address.
> >
> > You could use kexec to pass the fixed DT.
> >
> > A.
>
> IIRC it was broken for N900/omap3, no idea if somebody fixed it.
FYI, at least in next-20161205 kexec works on omap3 for me.
Regards,
Tony
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:51 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Pablo Neira Ayuso, netfilter-devel, Network Development,
Daniel Borkmann, Florian Westphal, Eric Dumazet
In-Reply-To: <CA+FuTSfHryyqBhRyfQCChG2VPwpj1KYBk2Z7DZ-dcYL-FbbSeA@mail.gmail.com>
On Mon, Dec 5, 2016 at 6:29 PM, Willem de Bruijn <willemb@google.com> wrote:
> On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
>> [...]
>>> Eric also suggests a private variable to avoid being subject to
>>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>>> length than current PATH_MAX.
>>
>> Good.
>>
>>> FWIW, there is a workaround for users with deeply nested paths: the
>>> path passed does not have to be absolute. It is literally what is
>>> passed on the command line to iptables right now, including relative
>>> addresses.
>>
>> If iptables userspace always expects to have the bpf file repository
>> in some given location (suggesting to have a directory that we specify
>> at ./configure time, similar to what we do with connlabel.conf), then
>> I think we can rely on relative paths. Would this be flexible enough
>> for your usecase?
>
> As long as it accepts relative paths, I think it will always work.
> Worst case, a user has to cd. No need for hardcoding the bpf mount
> point at compile time.
>
> I have the matching iptables patch for pinned objects, btw. Not for
> elf objects, which requires linking to libelf and parsing the object,
> which is more work (and perhaps best punted on by expanding libbpf in
> bcc to include this functionality. it already exists under samples/bpf
> and iproute2).
While we're discussing the patch, another question, about revisions: I
tested both modified and original iptables binaries on both standard
and modified kernels. It all works as expected, except for the case
where both binaries are used on a single kernel. For instance:
iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
8000'`" -j LOG
./iptables.new -L
Here the new binary will interpret the object as xt_bpf_match_v1, but
iptables has inserted xt_bpf_match. The same problem happens the other
way around. A new binary can be made robust to detect old structs, but
not the other way around. Specific to bpf, the existing xt_bpf code
has an unfortunate bug that it always prints at least one line of
code, even if ->bpf_program_num_elems == 0.
I notice that other extensions also do not necessarily only extend
struct vN in vN+1. Is the above a known issue?
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:29 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Willem de Bruijn, netfilter-devel, Network Development,
Daniel Borkmann, Florian Westphal, Eric Dumazet
In-Reply-To: <20161205232214.GA15825@salvia>
On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
> [...]
>> Eric also suggests a private variable to avoid being subject to
>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>> length than current PATH_MAX.
>
> Good.
>
>> FWIW, there is a workaround for users with deeply nested paths: the
>> path passed does not have to be absolute. It is literally what is
>> passed on the command line to iptables right now, including relative
>> addresses.
>
> If iptables userspace always expects to have the bpf file repository
> in some given location (suggesting to have a directory that we specify
> at ./configure time, similar to what we do with connlabel.conf), then
> I think we can rely on relative paths. Would this be flexible enough
> for your usecase?
As long as it accepts relative paths, I think it will always work.
Worst case, a user has to cd. No need for hardcoding the bpf mount
point at compile time.
I have the matching iptables patch for pinned objects, btw. Not for
elf objects, which requires linking to libelf and parsing the object,
which is more work (and perhaps best punted on by expanding libbpf in
bcc to include this functionality. it already exists under samples/bpf
and iproute2).
^ permalink raw reply
* [PATCH next] Revert "dctcp: update cwnd on congestion event"
From: Florian Westphal @ 2016-12-05 23:23 UTC (permalink / raw)
To: netdev; +Cc: Florian Westphal, Neal Cardwell
Neal Cardwell says:
If I am reading the code correctly, then I would have two concerns:
1) Has that been tested? That seems like an extremely dramatic
decrease in cwnd. For example, if the cwnd is 80, and there are 40
ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
calculations seem to suggest that after just 11 ACKs the cwnd would be
down to a minimal value of 2 [..]
2) That seems to contradict another passage in the draft [..] where it
sazs:
Just as specified in [RFC3168], DCTCP does not react to congestion
indications more than once for every window of data.
Neal is right. Fortunately we don't have to complicate this by testing
vs. current rtt estimate, we can just revert the patch.
Normal stack already handles this for us: receiving ACKs with ECE
set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
adjusted and prr will take care of cwnd adjustment.
Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event")
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv4/tcp_dctcp.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index bde22ebb92a8..5f5e5936760e 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -188,8 +188,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
static void dctcp_update_alpha(struct sock *sk, u32 flags)
{
+ const struct tcp_sock *tp = tcp_sk(sk);
struct dctcp *ca = inet_csk_ca(sk);
- struct tcp_sock *tp = tcp_sk(sk);
u32 acked_bytes = tp->snd_una - ca->prior_snd_una;
/* If ack did not advance snd_una, count dupack as MSS size.
@@ -229,13 +229,6 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags)
WRITE_ONCE(ca->dctcp_alpha, alpha);
dctcp_reset(tp, ca);
}
-
- if (flags & CA_ACK_ECE) {
- unsigned int cwnd = dctcp_ssthresh(sk);
-
- if (cwnd != tp->snd_cwnd)
- tp->snd_cwnd = cwnd;
- }
}
static void dctcp_state(struct sock *sk, u8 new_state)
--
2.7.3
^ permalink raw reply related
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 23:22 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netfilter-devel, Network Development, Daniel Borkmann,
Willem de Bruijn, Florian Westphal, Eric Dumazet
In-Reply-To: <CAF=yD-Jk4W+zU=tdixWe3LPGcFp-sANxXLLoQTRvrkN_3=a9uw@mail.gmail.com>
On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
[...]
> Eric also suggests a private variable to avoid being subject to
> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
> length than current PATH_MAX.
Good.
> FWIW, there is a workaround for users with deeply nested paths: the
> path passed does not have to be absolute. It is literally what is
> passed on the command line to iptables right now, including relative
> addresses.
If iptables userspace always expects to have the bpf file repository
in some given location (suggesting to have a directory that we specify
at ./configure time, similar to what we do with connlabel.conf), then
I think we can rely on relative paths. Would this be flexible enough
for your usecase?
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 23:11 UTC (permalink / raw)
To: Pavel Machek
Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
linux-kernel
In-Reply-To: <98fb3577-5f31-bf17-3e02-96c150854108@gmx.de>
>
> You mean stmmac_xmit()? Thats also softirq AFAICT, its the TX softirq....
>
> Regards,
> Lino
>
>
Hmm. netdevices.txt says:
ndo_start_xmit:
...
Context: Process with BHs disabled or BH (timer),
will be called with interrupts disabled by netconsole.
...
If this is correct it can indeed be process context, too. However BHs are already
disabled.
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:06 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netfilter-devel, Network Development, Daniel Borkmann,
Willem de Bruijn, Florian Westphal, Eric Dumazet
In-Reply-To: <20161205230055.GA15379@salvia>
On Mon, Dec 5, 2016 at 6:00 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
>> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
>> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
>> > > > From: Willem de Bruijn <willemb@google.com>
>> > > >
>> > > > Add support for attaching an eBPF object by file descriptor.
>> > > >
>> > > > The iptables binary can be called with a path to an elf object or a
>> > > > pinned bpf object. Also pass the mode and path to the kernel to be
>> > > > able to return it later for iptables dump and save.
>> > > >
>> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
>> > > > ---
>> > >
>> > > Assuming there is no simple way to get variable matchsize in iptables,
>> > > this looks good to me, thanks.
>> >
>> > It should be possible by setting kernel .matchsize to ~0 which
>> > suppresses strict size enforcement.
>> >
>> > Its currently only used by ebt_among, but this should work for any xtables
>> > module.
>>
>> This is likely going to trigger a large rewrite of the core userspace
>> iptables codebase, and likely going to pull part of the mess we have
>> in ebtables into iptables. So I'd prefer not to follow this path.
>
> So this variable path is there to annotate what userspace claims that
> is the file that contains the bpf blob that was loaded, actually this
> is irrelevant to the kernel, so this is just there to dump it back
> when iptables-save it is called. Just a side note, one could set
> anything there from userspace, point somewhere else actually...
>
> Well anyway, going back to the path problem to keep it simple: Why
> don't just trim this down to something smaller, are you really
> expecting to reach PATH_MAX in your usecase?
Not often. Module-specific limitations that differ from global
definitions are just a pain when they bite. This module also has an
arbitrary low limit on the length of the cBPF program passed, for
instance.
Eric also suggests a private variable to avoid being subject to
changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
length than current PATH_MAX.
FWIW, there is a workaround for users with deeply nested paths: the
path passed does not have to be absolute. It is literally what is
passed on the command line to iptables right now, including relative
addresses.
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 23:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: Florian Westphal, Willem de Bruijn, netfilter-devel, netdev,
daniel, Willem de Bruijn
In-Reply-To: <1480978749.18162.561.camel@edumazet-glaptop3.roam.corp.google.com>
On Mon, Dec 05, 2016 at 02:59:09PM -0800, Eric Dumazet wrote:
> On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:
>
> > Fair enough, I have no objections to the patch.
>
> An additional question is about PATH_MAX :
>
> Is it guaranteed to stay at 4096 forever ?
>
> To be safe, maybe we should use a constant of our own.
Right, this reminds me we have to fix something else.
So constant of our own plus something smaller, if possible, would be
good to go. Thanks.
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:01 UTC (permalink / raw)
To: Daniel Borkmann
Cc: netfilter-devel, pablo, Network Development, Willem de Bruijn
In-Reply-To: <5845F060.2050700@iogearbox.net>
On Mon, Dec 5, 2016 at 5:55 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Hi Willem,
>
> On 12/05/2016 09:28 PM, Willem de Bruijn wrote:
>>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Add support for attaching an eBPF object by file descriptor.
>>
>> The iptables binary can be called with a path to an elf object or a
>> pinned bpf object. Also pass the mode and path to the kernel to be
>> able to return it later for iptables dump and save.
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>
>
> just out of pure curiosity, use case is for android guys wrt
> accounting, or anything specific that cls_bpf on tc ingress +
> egress cannot do already?
That is the immediate motivation, yes.
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 23:00 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netfilter-devel, netdev, daniel, Willem de Bruijn,
Florian Westphal, Eric Dumazet
In-Reply-To: <20161205223415.GA14689@salvia>
On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Add support for attaching an eBPF object by file descriptor.
> > > >
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > >
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > >
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> >
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> >
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
>
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.
So this variable path is there to annotate what userspace claims that
is the file that contains the bpf blob that was loaded, actually this
is irrelevant to the kernel, so this is just there to dump it back
when iptables-save it is called. Just a side note, one could set
anything there from userspace, point somewhere else actually...
Well anyway, going back to the path problem to keep it simple: Why
don't just trim this down to something smaller, are you really
expecting to reach PATH_MAX in your usecase?
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Eric Dumazet @ 2016-12-05 22:59 UTC (permalink / raw)
To: Florian Westphal
Cc: Pablo Neira Ayuso, Willem de Bruijn, netfilter-devel, netdev,
daniel, Willem de Bruijn
In-Reply-To: <20161205224051.GB16819@breakpoint.cc>
On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:
> Fair enough, I have no objections to the patch.
An additional question is about PATH_MAX :
Is it guaranteed to stay at 4096 forever ?
To be safe, maybe we should use a constant of our own.
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Daniel Borkmann @ 2016-12-05 22:55 UTC (permalink / raw)
To: Willem de Bruijn, netfilter-devel; +Cc: pablo, netdev, Willem de Bruijn
In-Reply-To: <1480969684-74414-1-git-send-email-willemdebruijn.kernel@gmail.com>
Hi Willem,
On 12/05/2016 09:28 PM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Add support for attaching an eBPF object by file descriptor.
>
> The iptables binary can be called with a path to an elf object or a
> pinned bpf object. Also pass the mode and path to the kernel to be
> able to return it later for iptables dump and save.
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
just out of pure curiosity, use case is for android guys wrt
accounting, or anything specific that cls_bpf on tc ingress +
egress cannot do already?
Thanks,
Daniel
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 22:54 UTC (permalink / raw)
To: Pavel Machek
Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
linux-kernel
In-Reply-To: <20161205224057.GB19135@amd>
On 05.12.2016 23:40, Pavel Machek wrote:
> On Mon 2016-12-05 23:37:09, Lino Sanfilippo wrote:
>> Hi Pavel,
>>
>> On 05.12.2016 23:02, Pavel Machek wrote:
>> >
>> > we need spin_lock_bh at minimum, as we are locking user context
>> > against timer.
>> >
>> > Best regards,
>> > Pavel
>> >
>>
>> I was referring to stmmac_tx_clean() which AFAICS is only called from softirq context,
>> (one time in the timer handler and one time in napi poll handler) so a spin_lock() should
>> be sufficient. I cant see how this is called from userspace. If it were, a spin_lock_bh() had
>> to be used, of course.
>
> stmmac_tx_clean() shares lock with stmmac_tx() -- and that's process
> context as far as I can tell. So... spin_lock_bh() at
> minimum... right?
>
> Pavel
>
You mean stmmac_xmit()? Thats also softirq AFAICT, its the TX softirq....
Regards,
Lino
^ permalink raw reply
* Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
From: Michał Mirosław @ 2016-12-05 22:52 UTC (permalink / raw)
To: Ben Pfaff
Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <20161205185545.GB3129-LZ6Gd1LRuIk@public.gmane.org>
On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > > > intact through linux networking stack.
> > > This appears to change the established Open vSwitch userspace API. You
> > > can see that simply from the way that it changes the documentation for
> > > the userspace API. If I'm right about that, then this change will break
> > > all userspace programs that use the Open vSwitch kernel module,
> > > including Open vSwitch itself.
> >
> > If I understood the code correctly, it does change expected meaning for
> > the (unlikely?) case of header truncated just before the VLAN TCI - it will
> > be impossible to differentiate this case from the VLAN TCI == 0.
> >
> > I guess this is a problem with OVS API, because it doesn't directly show
> > the "missing" state of elements, but relies on an "invalid" value.
>
> That particular corner case should not be a huge problem in any case.
>
> The real problem is that this appears to break the common case use of
> VLANs in Open vSwitch. After this patch, parse_vlan() in
> net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> accelerated version of them or the version in the skb data) into
> sw_flow_key members. OK, that's fine on it's own. However, I don't see
> any corresponding change to the code in flow_netlink.c to compensate for
> the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> was always required to be set to 1 in flow matches inside Netlink
> messages sent from userspace, and the kernel always set it to 1 in
> corresponding messages sent to userspace.
>
> In other words, if I'm reading this change correctly:
>
> * With a kernel before this change, userspace always had to set
> VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
> reject the flow match.
>
> * With a kernel after this change, userspace must not set
> VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
> match but nothing will ever match because packets do not actually
> have the CFI bit set.
>
> Take a look at this code that the patch deletes from
> validate_vlan_from_nlattrs(), for example, and see how it insisted that
> VLAN_TAG_PRESENT was set:
>
> if (!(tci & htons(VLAN_TAG_PRESENT))) {
> if (tci) {
> OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
> (inner) ? "C-VLAN" : "VLAN");
> return -EINVAL;
> } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> /* Corner case for truncated VLAN header. */
> OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> (inner) ? "C-VLAN" : "VLAN");
> return -EINVAL;
> }
> }
>
> Please let me know if I'm overlooking something.
Hmm. So the easiest change without disrupting current userspace, would be
to flip the CFI bit on the way to/from OVS userspace. Does this seem
correct?
Best Regards,
Michał Mirosław
^ permalink raw reply
* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Florian Westphal @ 2016-12-05 22:40 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Florian Westphal, Eric Dumazet, Willem de Bruijn, netfilter-devel,
netdev, daniel, Willem de Bruijn
In-Reply-To: <20161205223415.GA14689@salvia>
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Add support for attaching an eBPF object by file descriptor.
> > > >
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > >
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > >
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> >
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> >
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
>
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.
Fair enough, I have no objections to the patch.
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-05 22:40 UTC (permalink / raw)
To: Lino Sanfilippo
Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
linux-kernel
In-Reply-To: <9a85da24-85cf-f94e-908f-a10eecac2369@gmx.de>
[-- Attachment #1: Type: text/plain, Size: 874 bytes --]
On Mon 2016-12-05 23:37:09, Lino Sanfilippo wrote:
> Hi Pavel,
>
> On 05.12.2016 23:02, Pavel Machek wrote:
> >
> > we need spin_lock_bh at minimum, as we are locking user context
> > against timer.
> >
> > Best regards,
> > Pavel
> >
>
> I was referring to stmmac_tx_clean() which AFAICS is only called from softirq context,
> (one time in the timer handler and one time in napi poll handler) so a spin_lock() should
> be sufficient. I cant see how this is called from userspace. If it were, a spin_lock_bh() had
> to be used, of course.
stmmac_tx_clean() shares lock with stmmac_tx() -- and that's process
context as far as I can tell. So... spin_lock_bh() at
minimum... right?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 22:37 UTC (permalink / raw)
To: Pavel Machek
Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
linux-kernel
In-Reply-To: <20161205220221.GA19135@amd>
Hi Pavel,
On 05.12.2016 23:02, Pavel Machek wrote:
>
> we need spin_lock_bh at minimum, as we are locking user context
> against timer.
>
> Best regards,
> Pavel
>
I was referring to stmmac_tx_clean() which AFAICS is only called from softirq context,
(one time in the timer handler and one time in napi poll handler) so a spin_lock() should
be sufficient. I cant see how this is called from userspace. If it were, a spin_lock_bh() had
to be used, of course.
Regards,
Lino
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox