* Re: [PATCH RFC 0/2] tun: lockless xmit
From: Eric Dumazet @ 2016-04-13 12:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <20160413132816-mutt-send-email-mst@redhat.com>
On Wed, 2016-04-13 at 14:08 +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 13, 2016 at 11:04:45AM +0200, Paolo Abeni wrote:
> > This patch series try to remove the need for any lock in the tun device
> > xmit path, significantly improving the forwarding performance when multiple
> > processes are accessing the tun device (i.e. in a nic->bridge->tun->vm scenario).
> >
> > The lockless xmit is obtained explicitly setting the NETIF_F_LLTX feature bit
> > and removing the default qdisc.
> >
> > Unlikely most virtual devices, the tun driver has featured a default qdisc
> > for a long period, but it already lost such feature in linux 4.3.
>
> Thanks - I think it's a good idea to reduce the
> lock contention there.
>
> But I think it's unfortunate that it requires
> bypassing the qdisc completely: this means
> that anyone trying to do traffic shaping will
> get back the contention.
>
> Can we solve the lock contention for qdisc?
> E.g. add a small lockless queue in front of it,
> whoever has the qdisc lock would be
> responsible for moving things from there to qdisc
> proper.
>
> Thoughts? Is there a chance this might work reasonably well?
Adding any new queue in front of qdisc is problematic :
- Adds a new buffer, with extra latencies.
- If you want to implement priorities properly for X COS, you need X
queues.
- Who is going to service this extra buffer and feed the qdisc ?
- If the innocent guy is RT thread, maybe the extra latency will hurt.
- Adding another set of atomic ops.
We have such a schem here at Google (called holdq), but it was a
nightmare to tune.
We never tried to upstream this beast, it is kind of ugly, and were
expecting something better. Problem is : If you use HTB on a bonding
device, you want still to properly use MQ on the slaves.
HTB queue. 20 netperf generating UDP packets
lpaa23:~# ./super_netperf 20 -H lpaa24 -t UDP_STREAM -l 3000 -- -m 100 &
[1] 181993
With the holdq feature turned on : about 1 Mpps
lpaa23:~# sar -n DEV 1 10|grep eth0|grep Average
Average: eth0 28.50 999071.60 3.07 138542.64 0.00
0.00 0.60
holdq turned off : about 620 Kpps
lpaa23:~# sar -n DEV 1 10|grep eth0|grep Average
Average: eth0 39.00 617765.40 4.73 85667.42 0.00
0.00 0.90
^ permalink raw reply
* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Eric Dumazet @ 2016-04-13 12:52 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, David S. Miller, Michael S. Tsirkin, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <425e8f69ffd8fe315ef9d3cc678519c7060fb2e0.1460393493.git.pabeni@redhat.com>
On Wed, 2016-04-13 at 11:04 +0200, Paolo Abeni wrote:
> The current tun_net_xmit() implementation don't need any external
> lock since it relay on rcu protection for the tun data structure
> and on socket queue lock for skb queuing.
>
> This patch set the NETIF_F_LLTX feature bit in the tun device, so
> that on xmit, in absence of qdisc, no serialization lock is acquired
> by the caller.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> drivers/net/tun.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index faf9297..42992dc 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1796,7 +1796,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
> NETIF_F_HW_VLAN_STAG_TX;
> - dev->features = dev->hw_features;
> + dev->features = dev->hw_features | NETIF_F_LLTX;
> dev->vlan_features = dev->features &
> ~(NETIF_F_HW_VLAN_CTAG_TX |
> NETIF_F_HW_VLAN_STAG_TX);
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH RFC 0/2] tun: lockless xmit
From: Michael S. Tsirkin @ 2016-04-13 12:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <1460551817.10638.7.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, Apr 13, 2016 at 05:50:17AM -0700, Eric Dumazet wrote:
> On Wed, 2016-04-13 at 14:08 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 13, 2016 at 11:04:45AM +0200, Paolo Abeni wrote:
> > > This patch series try to remove the need for any lock in the tun device
> > > xmit path, significantly improving the forwarding performance when multiple
> > > processes are accessing the tun device (i.e. in a nic->bridge->tun->vm scenario).
> > >
> > > The lockless xmit is obtained explicitly setting the NETIF_F_LLTX feature bit
> > > and removing the default qdisc.
> > >
> > > Unlikely most virtual devices, the tun driver has featured a default qdisc
> > > for a long period, but it already lost such feature in linux 4.3.
> >
> > Thanks - I think it's a good idea to reduce the
> > lock contention there.
> >
> > But I think it's unfortunate that it requires
> > bypassing the qdisc completely: this means
> > that anyone trying to do traffic shaping will
> > get back the contention.
> >
> > Can we solve the lock contention for qdisc?
> > E.g. add a small lockless queue in front of it,
> > whoever has the qdisc lock would be
> > responsible for moving things from there to qdisc
> > proper.
> >
> > Thoughts? Is there a chance this might work reasonably well?
>
> Adding any new queue in front of qdisc is problematic :
> - Adds a new buffer, with extra latencies.
Only where lock contention would previously occur, right?
> - If you want to implement priorities properly for X COS, you need X
> queues.
This definitely needs thought.
> - Who is going to service this extra buffer and feed the qdisc ?
The way I see it - whoever has the lock, at unlock time.
> - If the innocent guy is RT thread, maybe the extra latency will hurt.
Again - more than a lock?
> - Adding another set of atomic ops.
That's likely true. Use some per-cpu trick instead?
> We have such a schem here at Google (called holdq), but it was a
> nightmare to tune.
>
> We never tried to upstream this beast, it is kind of ugly, and were
> expecting something better. Problem is : If you use HTB on a bonding
> device, you want still to properly use MQ on the slaves.
>
> HTB queue. 20 netperf generating UDP packets
> lpaa23:~# ./super_netperf 20 -H lpaa24 -t UDP_STREAM -l 3000 -- -m 100 &
> [1] 181993
>
>
> With the holdq feature turned on : about 1 Mpps
>
> lpaa23:~# sar -n DEV 1 10|grep eth0|grep Average
> Average: eth0 28.50 999071.60 3.07 138542.64 0.00
> 0.00 0.60
>
> holdq turned off : about 620 Kpps
>
> lpaa23:~# sar -n DEV 1 10|grep eth0|grep Average
> Average: eth0 39.00 617765.40 4.73 85667.42 0.00
> 0.00 0.90
>
^ permalink raw reply
* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Michael S. Tsirkin @ 2016-04-13 12:57 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Paolo Abeni, netdev, David S. Miller, Eric W. Biederman,
Greg Kurz, Jason Wang, Christian Borntraeger, Herbert Xu
In-Reply-To: <570E15D9.4000308@stressinduktion.org>
On Wed, Apr 13, 2016 at 11:48:09AM +0200, Hannes Frederic Sowa wrote:
> On 13.04.2016 11:41, Michael S. Tsirkin wrote:
> >On Wed, Apr 13, 2016 at 11:04:46AM +0200, Paolo Abeni wrote:
> >>The current tun_net_xmit() implementation don't need any external
> >>lock since it relay on rcu protection for the tun data structure
> >>and on socket queue lock for skb queuing.
> >>
> >>This patch set the NETIF_F_LLTX feature bit in the tun device, so
> >>that on xmit, in absence of qdisc, no serialization lock is acquired
> >>by the caller.
> >>
> >>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>---
> >> drivers/net/tun.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>index faf9297..42992dc 100644
> >>--- a/drivers/net/tun.c
> >>+++ b/drivers/net/tun.c
> >>@@ -1796,7 +1796,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >> dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> >> TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
> >> NETIF_F_HW_VLAN_STAG_TX;
> >>- dev->features = dev->hw_features;
> >>+ dev->features = dev->hw_features | NETIF_F_LLTX;
> >
> >the documentation says:
> > NETIF_F_LLTX_BIT, /* LockLess TX - deprecated. Please */
> > /* do not use LLTX in new drivers */
>
> In networking/netdev-features.txt
>
> * LLTX driver (deprecated for hardware drivers)
>
> NETIF_F_LLTX should be set in drivers that implement their own locking in
> transmit path or don't need locking at all (e.g. software tunnels).
> In ndo_start_xmit, it is recommended to use a try_lock and return
> NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly
> protect against other callbacks (the rules you need to find out).
>
> Don't use it for new drivers.
>
> I think this is documentation is correct and it is only deprecated for
> hardware drivers.
>
> Bye,
> Hannes
Fine, but what's the AF_PACKET duplication that Herbert Xu
reported with NETIF_F_LLTX? Does anyone remember?
--
MST
^ permalink raw reply
* Re: [PATCH RFC 0/2] tun: lockless xmit
From: Eric Dumazet @ 2016-04-13 13:09 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <20160413155146-mutt-send-email-mst@redhat.com>
On Wed, 2016-04-13 at 15:56 +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 13, 2016 at 05:50:17AM -0700, Eric Dumazet wrote:
> > On Wed, 2016-04-13 at 14:08 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Apr 13, 2016 at 11:04:45AM +0200, Paolo Abeni wrote:
> > > > This patch series try to remove the need for any lock in the tun device
> > > > xmit path, significantly improving the forwarding performance when multiple
> > > > processes are accessing the tun device (i.e. in a nic->bridge->tun->vm scenario).
> > > >
> > > > The lockless xmit is obtained explicitly setting the NETIF_F_LLTX feature bit
> > > > and removing the default qdisc.
> > > >
> > > > Unlikely most virtual devices, the tun driver has featured a default qdisc
> > > > for a long period, but it already lost such feature in linux 4.3.
> > >
> > > Thanks - I think it's a good idea to reduce the
> > > lock contention there.
> > >
> > > But I think it's unfortunate that it requires
> > > bypassing the qdisc completely: this means
> > > that anyone trying to do traffic shaping will
> > > get back the contention.
> > >
> > > Can we solve the lock contention for qdisc?
> > > E.g. add a small lockless queue in front of it,
> > > whoever has the qdisc lock would be
> > > responsible for moving things from there to qdisc
> > > proper.
> > >
> > > Thoughts? Is there a chance this might work reasonably well?
> >
> > Adding any new queue in front of qdisc is problematic :
> > - Adds a new buffer, with extra latencies.
>
> Only where lock contention would previously occur, right?
>
> > - If you want to implement priorities properly for X COS, you need X
> > queues.
>
> This definitely needs thought.
>
> > - Who is going to service this extra buffer and feed the qdisc ?
>
> The way I see it - whoever has the lock, at unlock time.
>
> > - If the innocent guy is RT thread, maybe the extra latency will hurt.
>
> Again - more than a lock?
Way more. HTB is slow as hell.
Remember the qdisc dequeue is already a big problem in itself.
Adding another layer can practically double the latencies.
>
> > - Adding another set of atomic ops.
>
> That's likely true. Use some per-cpu trick instead?
We tried that, and we got miserable production incidents...
You really need to convince John Fastabend to work full time on the real
thing, not on another queue in front of the existing qdisc.
^ permalink raw reply
* Re: [PATCH RFC 0/2] tun: lockless xmit
From: Michael S. Tsirkin @ 2016-04-13 13:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <1460552966.10638.12.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, Apr 13, 2016 at 06:09:26AM -0700, Eric Dumazet wrote:
> You really need to convince John Fastabend to work full time on the real
> thing
Meaning making all qdiscs themselves lockless? With complex policies
like codel I can see how that might be challenging ...
--
MST
^ permalink raw reply
* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Eric Dumazet @ 2016-04-13 13:27 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Hannes Frederic Sowa, Paolo Abeni, netdev, David S. Miller,
Eric W. Biederman, Greg Kurz, Jason Wang, Christian Borntraeger,
Herbert Xu
In-Reply-To: <20160413155654-mutt-send-email-mst@redhat.com>
I. On Wed, 2016-04-13 at 15:57 +0300, Michael S. Tsirkin wrote:
> Fine, but what's the AF_PACKET duplication that Herbert Xu
> reported with NETIF_F_LLTX? Does anyone remember?
Really a lot of virtual drivers use NETIF_F_LLTX these days.
Duplication is more likely to happen with a qdisc, when a packet is
requeued if a driver returns NETDEV_TX_BUSY
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable
From: Peter Zijlstra @ 2016-04-13 13:28 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Ingo Molnar, Mike Galbraith, Jason Wang, davem, netdev,
linux-kernel, Ingo Molnar
In-Reply-To: <20160411182111-mutt-send-email-mst@redhat.com>
On Mon, Apr 11, 2016 at 07:31:57PM +0300, Michael S. Tsirkin wrote:
> +static bool expected_to_run_fair(struct cfs_rq *cfs_rq, s64 t)
> +{
> + struct sched_entity *left;
> + struct sched_entity *curr = cfs_rq->curr;
> +
> + if (!curr || !curr->on_rq)
> + return false;
> +
> + left = __pick_first_entity(cfs_rq);
> + if (!left)
> + return true;
> +
> + return (s64)(curr->vruntime + calc_delta_fair(t, curr) -
> + left->vruntime) < 0;
> +}
>
> The reason it seems easier is because that way we can reuse
> calc_delta_fair and don't have to do the reverse translation
> from vruntime to nsec.
>
> And I guess if we do this with interrupts disabled, and only poke
> at the current CPU's rq, we know first entity
> won't go away so we don't need locks?
Nope, not true. Current isn't actually in the tree, and any other task
is subject to being moved at any time.
Even if current was in the tree, there is no guarantee it is
->rb_leftmost; imagine a task being migrated in that has a smaller
vruntime.
So this really cannot work without locks :/
I've not thought about the actual problem you're trying to solve; but I
figured I'd let you know this before you continue down this path.
^ permalink raw reply
* Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
From: Stefan Hajnoczi @ 2016-04-13 13:38 UTC (permalink / raw)
To: Ian Campbell
Cc: marius vlad, kvm, Michael S. Tsirkin, ijc, virtualization,
Claudio Imbrenda, Matt Benjamin, netdev, Greg Kurz,
Christoffer Dall
In-Reply-To: <CAOc2ZU1Kaes-UVfFrKrE9QyLMWnAQ77+hkSAGJgD0gOuM8o_UQ@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 513 bytes --]
On Tue, Apr 12, 2016 at 05:37:54PM +0100, Ian Campbell wrote:
> Perhaps the guest end is turning shutdown(foo) directly into a vsock
> message without or-ing in the current state?
Yes, you are right:
lock_sock(sk);
sk->sk_shutdown |= mode;
sk->sk_state_change(sk);
release_sock(sk);
if (sk->sk_type == SOCK_STREAM) {
sock_reset_flag(sk, SOCK_DONE);
vsock_send_shutdown(sk, mode);
Although sk_shutdown is ORed correctly, vsock_send_shutdown() is called with
just the shutdown() argument.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: FWD: [PATCH v2] Marvell phy: add fiber status check for some components
From: Charles-Antoine Couret @ 2016-04-13 9:27 UTC (permalink / raw)
To: Florian Fainelli, Andrew Lunn; +Cc: netdev
In-Reply-To: <570BFF50.6060909@gmail.com>
Le 11/04/2016 21:47, Florian Fainelli a écrit :
> On 04/04/16 06:25, Andrew Lunn wrote:
>>> >From 564b767163d19355a3b5efaad195e93796570c71 Mon Sep 17 00:00:00 2001
>>> From: Charles-Antoine Couret <charles-antoine.couret@nexvision.fr>
>>> Date: Fri, 1 Apr 2016 16:16:35 +0200
>>> Subject: [PATCH] Marvell phy: add fiber status check for some components
>>>
>>> Marvell's phy could have two modes: fiber and copper. Currently, the driver
>>> checks only the copper mode registers to get the status link which could be
>>> wrong.
>>>
>>> This commit add a handler to check fiber then copper status link.
>>> If the fiber link is activated, the driver would use this information.
>>> Else, it would use the copper status.
>>
>> Hi Florian
>>
>> What do you think about this?
>>
>> This works for basic status information. But what about other ethtool
>> options? Setting the speed and duplex, turning pause on/off, etc.
>
> Agreed, it seems like a PHY configured for fiber will need to provide
> these informations differently, or does the standard BMSR register
> provide accurate information already?
The BSMR is similar between fiber and copper mode (but, it's the same values for duplex, speed...) in register 17.
To force speed, duplex, etc. it's the same way too (register 0).
The register's numbers are the same, only the page change.
>> Do we actually need to stay on page 1 if fibre is in use? How do we
>> initially change to page 1 when the fibre link is still down?
>
> I also do not feel very comfortable with reading the fiber status first,
> and then copper and then combine these two. At the very best, could we
> do something like:
>
> - identify if the PHY is configured for fiber in drv->probe or
> drv->config_init, retain that information
> - have two paths in drv->read_status which take care of reading one
> status or the other?
It should be a great idea.
Because, we could select the preferred link (copper or fiber). This option could select that during the init process.
I could save the state into marvell_priv structure, but how configure this choice ? Set copper by default and the user change that
by ethtool, driver loading option or the driver code ? I don't know what is the correct way to configure that properly.
> Are there other side effects for other register accesses (say,
> statistics, or auto-negotiation) that need to be fiber vs. copper aware?
Auto-negociation are separated. But the statistics are globally common, there are some specific registers about that too. But I think these registers are not relevant.
I will improve my patch.
Thanks.
Regards.
Charles-Antoine Couret
^ permalink raw reply
* Re: [PATCH RFC 0/2] tun: lockless xmit
From: Eric Dumazet @ 2016-04-13 13:43 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <20160413161558-mutt-send-email-mst@redhat.com>
On Wed, 2016-04-13 at 16:17 +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 13, 2016 at 06:09:26AM -0700, Eric Dumazet wrote:
> > You really need to convince John Fastabend to work full time on the real
> > thing
>
> Meaning making all qdiscs themselves lockless? With complex policies
> like codel I can see how that might be challenging ...
Codel is a fifo, plus some droping capabilities at dequeue time.
It totally can be made lockless.
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: exit busy loop when another process is runnable
From: Michael S. Tsirkin @ 2016-04-13 13:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Mike Galbraith, Jason Wang, davem, netdev,
linux-kernel, Ingo Molnar
In-Reply-To: <20160413132803.GN2906@worktop>
On Wed, Apr 13, 2016 at 03:28:03PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 11, 2016 at 07:31:57PM +0300, Michael S. Tsirkin wrote:
> > +static bool expected_to_run_fair(struct cfs_rq *cfs_rq, s64 t)
> > +{
> > + struct sched_entity *left;
> > + struct sched_entity *curr = cfs_rq->curr;
> > +
> > + if (!curr || !curr->on_rq)
> > + return false;
> > +
> > + left = __pick_first_entity(cfs_rq);
> > + if (!left)
> > + return true;
> > +
> > + return (s64)(curr->vruntime + calc_delta_fair(t, curr) -
> > + left->vruntime) < 0;
> > +}
> >
> > The reason it seems easier is because that way we can reuse
> > calc_delta_fair and don't have to do the reverse translation
> > from vruntime to nsec.
> >
> > And I guess if we do this with interrupts disabled, and only poke
> > at the current CPU's rq, we know first entity
> > won't go away so we don't need locks?
>
> Nope, not true. Current isn't actually in the tree, and any other task
> is subject to being moved at any time.
> Even if current was in the tree, there is no guarantee it is
> ->rb_leftmost; imagine a task being migrated in that has a smaller
> vruntime.
>
> So this really cannot work without locks :/
>
> I've not thought about the actual problem you're trying to solve; but I
> figured I'd let you know this before you continue down this path.
Hmm. This is in fact called in the context of a given task,
so maybe I should just change the API and use
&task->se
instead of
cfs_rq->current
:
static bool expected_to_run_fair(struct cfs_rq *cfs_rq, struct task_struct *task, s64 t)
{
struct sched_entity *left;
struct sched_entity *curr = &task->se;
if (!curr || !curr->on_rq)
return false;
left = __pick_first_entity(cfs_rq);
if (!left)
return true;
return (s64)(curr->vruntime + calc_delta_fair(t, curr) -
left->vruntime) < 0;
}
This way it is caller's respinsibility to make sure task
is not going away.
Left here is on tree so it's not going away, right?
--
MST
^ permalink raw reply
* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Michael S. Tsirkin @ 2016-04-13 13:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: Hannes Frederic Sowa, Paolo Abeni, netdev, David S. Miller,
Eric W. Biederman, Greg Kurz, Jason Wang, Christian Borntraeger,
Herbert Xu
In-Reply-To: <1460554028.10638.18.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, Apr 13, 2016 at 06:27:08AM -0700, Eric Dumazet wrote:
> I. On Wed, 2016-04-13 at 15:57 +0300, Michael S. Tsirkin wrote:
>
> > Fine, but what's the AF_PACKET duplication that Herbert Xu
> > reported with NETIF_F_LLTX? Does anyone remember?
>
> Really a lot of virtual drivers use NETIF_F_LLTX these days.
>
> Duplication is more likely to happen with a qdisc, when a packet is
> requeued if a driver returns NETDEV_TX_BUSY
OK, now I understand what the duplication is about.
What about NETDEV_TX_LOCKED? Looks like it might have
the same effect?
This might be worth documenting in include/linux/netdevice.h,
might it not?
--
MST
^ permalink raw reply
* Re: [RFC] Is it a bug for nfs on udp6 mode or kernel?
From: Benjamin Coddington @ 2016-04-13 14:11 UTC (permalink / raw)
To: Ding Tianhong
Cc: linux-nfs-approval, tom, Netdev, linux-kernel@vger.kernel.org,
dhowells, viro, chuck.lever
In-Reply-To: <570E2D69.7080102@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 1909 bytes --]
On Wed, 13 Apr 2016, Ding Tianhong wrote:
> Hi everyone:
>
> I have met this problem when I try to test udp6 for nfs connection, my environment is:
>
> Server:
> kernel: 4.1.15
> IP:xxxx::36/64
> MTU:1500
> Setting: /etc/exports:/home/nfs *(rw,sync,no_subtree_check,no_root_squash)
>
> Client:
> kernel: 4.1.18
> IP:xxxx::90/64
> MTU:1500
> command: mount -t nfs -o vers=3,proto=udp6,rsize=4096,wsize=4096 [xxxx::36]:/home/nfs /home/tmp
>
> I check the nfs parameter configuration, it looks fine and could work well for proto=tcp6.
>
> Then I have mount correctly and try to run the command "ls", it hang.
>
> When I use the rsize=1024 and wsize=1024 to mount, the problem disappeared, so I guess it is the problem for GSO or GRO for UDP。
>
> Then I try to debug the problem, first I tcpdump the package from cline to server, and found that
> the client have send readdirplus message to server correctly, and then the Server send a 4k package
> to client(the big package will frag to 4 package by GSO), till now it looks fine, and the Client Nic could
> receive the 4 skb then send to upper stack to ipv6 and udp, I found the incoming 4 package has been merged
> to one and send to upper stack just like sunrpc, but I try to open the rpc_debug, it looks that the rpc could
> not receive message.
>
>
> I built a simple demo to test the udp stack, use the client socket to send big package to server socket, it work well,
> so I think the udp is fine, maybe the bug is in sunrpc.
>
> The test is very simple, does any body met the same problem like me, thanks for any suggestion.
>
> Ding
I had a similar problem. Do you have upstream
405c92f ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
682b1a9 ("ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
Turning off GSO and GRO caused sunrpc to work properly before this.
https://bugzilla.redhat.com/show_bug.cgi?id=1271759
Ben
^ permalink raw reply
* Re: [RESEND PATCH net-next] phy: keep the BCMR_LOOPBACK bit while setup forced mode
From: Sergei Shtylyov @ 2016-04-13 14:19 UTC (permalink / raw)
To: Weidong Wang, David Miller, f.fainelli; +Cc: netdev, linux-kernel
In-Reply-To: <570E3488.9060403@huawei.com>
Hello.
On 4/13/2016 2:59 PM, Weidong Wang wrote:
> When tested the PHY SGMII Loopback,:
> 1.set the LOOPBACK bit,
> 2.set the autoneg to AUTONEG_DISABLE, it calls the
> genphy_setup_forced which will clear the bit.
>
> So just keep the LOOPBACK bit while setup forced mode.
>
> Signed-off-by: Weidong Wang <wangweidong1@huawei.com>
> ---
> drivers/net/phy/phy_device.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index e551f3a..8da4b80 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1124,7 +1124,9 @@ static int genphy_config_advert(struct phy_device *phydev)
> int genphy_setup_forced(struct phy_device *phydev)
> {
> int ctl = 0;
> + int val = phy_read(phydev, MII_BMCR);
Please place this declaration first, DaveM prefers declarations to be
sorted from longest to shortest.
>
> + ctl |= val & BMCR_LOOPBACK;
Just =, removing the 'ctl' initializer, please.
[...]
MBR, Sergei
^ permalink raw reply
* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Sergei Shtylyov @ 2016-04-13 14:26 UTC (permalink / raw)
To: Paolo Abeni, netdev
Cc: David S. Miller, Michael S. Tsirkin, Hannes Frederic Sowa,
Eric W. Biederman, Greg Kurz, Jason Wang
In-Reply-To: <425e8f69ffd8fe315ef9d3cc678519c7060fb2e0.1460393493.git.pabeni@redhat.com>
Hello.
On 4/13/2016 12:04 PM, Paolo Abeni wrote:
> The current tun_net_xmit() implementation don't need any external
> lock since it relay on rcu protection for the tun data structure
s/relay/relies/?
> and on socket queue lock for skb queuing.
>
> This patch set the NETIF_F_LLTX feature bit in the tun device, so
> that on xmit, in absence of qdisc, no serialization lock is acquired
> by the caller.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[...]
MBR, Sergei
^ permalink raw reply
* Symantec users list
From: Judy Adkins @ 2016-04-13 14:19 UTC (permalink / raw)
To: netdev
Hi,
I just wanted to drop you a quick note to see if you would be interested in a discussion about "Symantec users" and the benefits it can bring your organization for your Marketing Initiatives like Email Marketing, Tele Marketing, Direct Mailings etc.
Every contact will include: Company Name, Web Address, Contact Name, Verified Email, Job Title, Complete Mailing Address, Phone Number, FAX Number, Total Employees, SIC Code, and Industry details.
We guarantee 100% on that list type that means every individual on that list will be as per your criteria for sure, any irrelevant contact will be replaced at no cost.
Few Technology Specific Lists:-
(1) Microsoft Dynamic CRM Users (15) Microsoft SQL Users
(2) Salesforce Cloud Users (16) Microsoft Windows Users
(3) ERP Users (17) Microsoft AX,GP,Al,NAV USers
(4) Pivotal CRM (18) Microsoft Forefront Users
(5) Sage SalesLogix CRM (19) Microsoft.Net Users
(6) SAP CRM (20) IBM Users
(7) Oracle CRM (21) Web Developers
(8) Security users
(9) Virtualization Users
(10) VMware Cloud Users
(11) Citrix
(12) Sage Users
(13) Microsoft SharePoint Users
(14) Dell Backup Users and more....
Let me know your target criteria / market like:
Target Title:
Target Industry:
Target Geography:
Appreciate your enquiry and valuable time.
Thanks and Regards,
Judy Adkins
Research Analyst
==============================================================================
If you do not wish to receive further emails kindly reply with "Leave Out"
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
^ permalink raw reply
* [PATCH] Fix NULL pointer dereference in cpsw with two slave PHYs
From: Andrew Goodbody @ 2016-04-13 14:36 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Andrew Goodbody
This is a fix for a NULL pointer dereference from cpsw which is triggered
by having two slave PHYs attached to a cpsw network device. The problem is
due to only maintaining a single reference to a PHY node in the prive data
which gets overwritten by the second PHY probe. So move the PHY node
reference to the individual slave data so that there is now one per slave.
Andrew Goodbody (1):
Prevent NUll pointer dereference with two PHYs on cpsw
drivers/net/ethernet/ti/cpsw.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
--
2.5.0
^ permalink raw reply
* [PATCH] Prevent NUll pointer dereference with two PHYs on cpsw
From: Andrew Goodbody @ 2016-04-13 14:36 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Andrew Goodbody
In-Reply-To: <1460558208-1374-1-git-send-email-andrew.goodbody@cambrionix.com>
Adding a 2nd PHY to cpsw results in a NULL pointer dereference
as below. Fix by maintaining a reference to each PHY node in slave
struct instead of a single reference in the priv struct which was
overwritten by the 2nd PHY.
[ 17.870933] Unable to handle kernel NULL pointer dereference at virtual address 00000180
[ 17.879557] pgd = dc8bc000
[ 17.882514] [00000180] *pgd=9c882831, *pte=00000000, *ppte=00000000
[ 17.889213] Internal error: Oops: 17 [#1] ARM
[ 17.893838] Modules linked in:
[ 17.897102] CPU: 0 PID: 1657 Comm: connmand Not tainted 4.5.0-ge463dfb-dirty #11
[ 17.904947] Hardware name: Cambrionix whippet
[ 17.909576] task: dc859240 ti: dc968000 task.ti: dc968000
[ 17.915339] PC is at phy_attached_print+0x18/0x8c
[ 17.920339] LR is at phy_attached_info+0x14/0x18
[ 17.925247] pc : [<c042baec>] lr : [<c042bb74>] psr: 600f0113
[ 17.925247] sp : dc969cf8 ip : dc969d28 fp : dc969d18
[ 17.937425] r10: dda7a400 r9 : 00000000 r8 : 00000000
[ 17.942971] r7 : 00000001 r6 : ddb00480 r5 : ddb8cb34 r4 : 00000000
[ 17.949898] r3 : c0954cc0 r2 : c09562b0 r1 : 00000000 r0 : 00000000
[ 17.956829] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 17.964401] Control: 10c5387d Table: 9c8bc019 DAC: 00000051
[ 17.970500] Process connmand (pid: 1657, stack limit = 0xdc968210)
[ 17.977059] Stack: (0xdc969cf8 to 0xdc96a000)
[ 17.981692] 9ce0: dc969d28 dc969d08
[ 17.990386] 9d00: c038f9bc c038f6b4 ddb00480 dc969d34 dc969d28 c042bb74 c042bae4 00000000
[ 17.999080] 9d20: c09562b0 c0954cc0 dc969d5c dc969d38 c043ebfc c042bb6c 00000007 00000003
[ 18.007773] 9d40: ddb00000 ddb8cb58 ddb00480 00000001 dc969dec dc969d60 c0441614 c043ea68
[ 18.016465] 9d60: 00000000 00000003 00000000 fffffff4 dc969df4 0000000d 00000000 00000000
[ 18.025159] 9d80: dc969db4 dc969d90 c005dc08 c05839e0 dc969df4 0000000d ddb00000 00001002
[ 18.033851] 9da0: 00000000 00000000 dc969dcc dc969db8 c005ddf4 c005dbc8 00000000 00000118
[ 18.042544] 9dc0: dc969dec dc969dd0 ddb00000 c06db27c ffff9003 00001002 00000000 00000000
[ 18.051237] 9de0: dc969e0c dc969df0 c057c88c c04410dc dc969e0c ddb00000 ddb00000 00000001
[ 18.059930] 9e00: dc969e34 dc969e10 c057cb44 c057c7d8 ddb00000 ddb00138 00001002 beaeda20
[ 18.068622] 9e20: 00000000 00000000 dc969e5c dc969e38 c057cc28 c057cac0 00000000 dc969e80
[ 18.077315] 9e40: dda7a40c beaeda20 00000000 00000000 dc969ecc dc969e60 c05e36d0 c057cc14
[ 18.086007] 9e60: dc969e84 00000051 beaeda20 00000000 dda7a40c 00000014 ddb00000 00008914
[ 18.094699] 9e80: 30687465 00000000 00000000 00000000 00009003 00000000 00000000 00000000
[ 18.103391] 9ea0: 00001002 00008914 dd257ae0 beaeda20 c098a428 beaeda20 00000011 00000000
[ 18.112084] 9ec0: dc969edc dc969ed0 c05e4e54 c05e3030 dc969efc dc969ee0 c055f5ac c05e4cc4
[ 18.120777] 9ee0: beaeda20 dd257ae0 dc8ab4c0 00008914 dc969f7c dc969f00 c010b388 c055f45c
[ 18.129471] 9f00: c071ca40 dd257ac0 c00165e8 dc968000 dc969f3c dc969f20 dc969f64 dc969f28
[ 18.138164] 9f20: c0115708 c0683ec8 dd257ac0 dd257ac0 dc969f74 dc969f40 c055f350 c00fc66c
[ 18.146857] 9f40: dd82e4d0 00000011 00000000 00080000 dd257ac0 00000000 dc8ab4c0 dc8ab4c0
[ 18.155550] 9f60: 00008914 beaeda20 00000011 00000000 dc969fa4 dc969f80 c010bc34 c010b2fc
[ 18.164242] 9f80: 00000000 00000011 00000002 00000036 c00165e8 dc968000 00000000 dc969fa8
[ 18.172935] 9fa0: c00163e0 c010bbcc 00000000 00000011 00000011 00008914 beaeda20 00009003
[ 18.181628] 9fc0: 00000000 00000011 00000002 00000036 00081018 00000001 00000000 beaedc10
[ 18.190320] 9fe0: 00083188 beaeda1c 00043a5d b6d29c0c 600b0010 00000011 00000000 00000000
[ 18.198989] Backtrace:
[ 18.201621] [<c042bad8>] (phy_attached_print) from [<c042bb74>] (phy_attached_info+0x14/0x18)
[ 18.210664] r3:c0954cc0 r2:c09562b0 r1:00000000
[ 18.215588] r4:ddb00480
[ 18.218322] [<c042bb60>] (phy_attached_info) from [<c043ebfc>] (cpsw_slave_open+0x1a0/0x280)
[ 18.227293] [<c043ea5c>] (cpsw_slave_open) from [<c0441614>] (cpsw_ndo_open+0x544/0x674)
[ 18.235874] r7:00000001 r6:ddb00480 r5:ddb8cb58 r4:ddb00000
[ 18.241944] [<c04410d0>] (cpsw_ndo_open) from [<c057c88c>] (__dev_open+0xc0/0x128)
[ 18.249972] r9:00000000 r8:00000000 r7:00001002 r6:ffff9003 r5:c06db27c r4:ddb00000
[ 18.258255] [<c057c7cc>] (__dev_open) from [<c057cb44>] (__dev_change_flags+0x90/0x154)
[ 18.266745] r5:00000001 r4:ddb00000
[ 18.270575] [<c057cab4>] (__dev_change_flags) from [<c057cc28>] (dev_change_flags+0x20/0x50)
[ 18.279523] r9:00000000 r8:00000000 r7:beaeda20 r6:00001002 r5:ddb00138 r4:ddb00000
[ 18.287811] [<c057cc08>] (dev_change_flags) from [<c05e36d0>] (devinet_ioctl+0x6ac/0x76c)
[ 18.296483] r9:00000000 r8:00000000 r7:beaeda20 r6:dda7a40c r5:dc969e80 r4:00000000
[ 18.304762] [<c05e3024>] (devinet_ioctl) from [<c05e4e54>] (inet_ioctl+0x19c/0x1c8)
[ 18.312882] r10:00000000 r9:00000011 r8:beaeda20 r7:c098a428 r6:beaeda20 r5:dd257ae0
[ 18.321235] r4:00008914
[ 18.323956] [<c05e4cb8>] (inet_ioctl) from [<c055f5ac>] (sock_ioctl+0x15c/0x2d8)
[ 18.331829] [<c055f450>] (sock_ioctl) from [<c010b388>] (do_vfs_ioctl+0x98/0x8d0)
[ 18.339765] r7:00008914 r6:dc8ab4c0 r5:dd257ae0 r4:beaeda20
[ 18.345822] [<c010b2f0>] (do_vfs_ioctl) from [<c010bc34>] (SyS_ioctl+0x74/0x84)
[ 18.353573] r10:00000000 r9:00000011 r8:beaeda20 r7:00008914 r6:dc8ab4c0 r5:dc8ab4c0
[ 18.361924] r4:00000000
[ 18.364653] [<c010bbc0>] (SyS_ioctl) from [<c00163e0>] (ret_fast_syscall+0x0/0x3c)
[ 18.372682] r9:dc968000 r8:c00165e8 r7:00000036 r6:00000002 r5:00000011 r4:00000000
[ 18.380960] Code: e92dd810 e24cb010 e24dd010 e59b4004 (e5902180)
[ 18.387580] ---[ end trace c80529466223f3f3 ]---
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
---
drivers/net/ethernet/ti/cpsw.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 42fdfd4..5f5bb44 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -349,6 +349,7 @@ struct cpsw_slave {
struct cpsw_slave_data *data;
struct phy_device *phy;
struct net_device *ndev;
+ struct device_node *phy_node;
u32 port_vlan;
u32 open_stat;
};
@@ -367,7 +368,6 @@ struct cpsw_priv {
spinlock_t lock;
struct platform_device *pdev;
struct net_device *ndev;
- struct device_node *phy_node;
struct napi_struct napi_rx;
struct napi_struct napi_tx;
struct device *dev;
@@ -1148,8 +1148,8 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
- if (priv->phy_node)
- slave->phy = of_phy_connect(priv->ndev, priv->phy_node,
+ if (slave->phy_node)
+ slave->phy = of_phy_connect(priv->ndev, slave->phy_node,
&cpsw_adjust_link, 0, slave->data->phy_if);
else
slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
@@ -2033,7 +2033,8 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
if (strcmp(slave_node->name, "slave"))
continue;
- priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
+ priv->slaves[i].phy_node =
+ of_parse_phandle(slave_node, "phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
if (of_phy_is_fixed_link(slave_node)) {
struct device_node *phy_node;
@@ -2275,12 +2276,22 @@ static int cpsw_probe(struct platform_device *pdev)
/* Select default pin state */
pinctrl_pm_select_default_state(&pdev->dev);
+ data = &priv->data;
+ priv->slaves = devm_kzalloc(&pdev->dev,
+ sizeof(struct cpsw_slave) * data->slaves,
+ GFP_KERNEL);
+ if (!priv->slaves) {
+ ret = -ENOMEM;
+ goto clean_runtime_disable_ret;
+ }
+ for (i = 0; i < data->slaves; i++)
+ priv->slaves[i].slave_num = i;
+
if (cpsw_probe_dt(priv, pdev)) {
dev_err(&pdev->dev, "cpsw: platform data missing\n");
ret = -ENODEV;
goto clean_runtime_disable_ret;
}
- data = &priv->data;
if (is_valid_ether_addr(data->slave_data[0].mac_addr)) {
memcpy(priv->mac_addr, data->slave_data[0].mac_addr, ETH_ALEN);
@@ -2292,16 +2303,6 @@ static int cpsw_probe(struct platform_device *pdev)
memcpy(ndev->dev_addr, priv->mac_addr, ETH_ALEN);
- priv->slaves = devm_kzalloc(&pdev->dev,
- sizeof(struct cpsw_slave) * data->slaves,
- GFP_KERNEL);
- if (!priv->slaves) {
- ret = -ENOMEM;
- goto clean_runtime_disable_ret;
- }
- for (i = 0; i < data->slaves; i++)
- priv->slaves[i].slave_num = i;
-
priv->slaves[0].ndev = ndev;
priv->emac_port = 0;
--
2.5.0
^ permalink raw reply related
* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Eric Dumazet @ 2016-04-13 14:39 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Hannes Frederic Sowa, Paolo Abeni, netdev, David S. Miller,
Eric W. Biederman, Greg Kurz, Jason Wang, Christian Borntraeger,
Herbert Xu
In-Reply-To: <20160413165240-mutt-send-email-mst@redhat.com>
On Wed, 2016-04-13 at 16:54 +0300, Michael S. Tsirkin wrote:
> OK, now I understand what the duplication is about.
> What about NETDEV_TX_LOCKED? Looks like it might have
> the same effect?
>
> This might be worth documenting in include/linux/netdevice.h,
> might it not?
I believe the intent is to get rid of NETDEV_TX_LOCKED at some point.
Some drivers seem to abuse it.
For example, drivers/infiniband/hw/nes/nes_nic.c is clearly completely
buggy :(
^ permalink raw reply
* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Casey Schaufler @ 2016-04-13 15:06 UTC (permalink / raw)
To: Paolo Abeni
Cc: Paul Moore, Florian Westphal, linux-security-module,
David S. Miller, James Morris, Andreas Gruenbacher,
Stephen Smalley, netdev, selinux, Casey Schaufler
In-Reply-To: <1460548628.9170.12.camel@redhat.com>
On 4/13/2016 4:57 AM, Paolo Abeni wrote:
> On Tue, 2016-04-12 at 06:57 -0700, Casey Schaufler wrote:
>> On 4/12/2016 1:52 AM, Paolo Abeni wrote:
>>> On Thu, 2016-04-07 at 14:55 -0400, Paul Moore wrote:
>>>> On Thursday, April 07, 2016 01:45:32 AM Florian Westphal wrote:
>>>>> Paul Moore <paul@paul-moore.com> wrote:
>>>>>> On Wed, Apr 6, 2016 at 6:14 PM, Florian Westphal <fw@strlen.de> wrote:
>>>>>>> netfilter hooks are per namespace -- so there is hook unregister when
>>>>>>> netns is destroyed.
>>>>>> Looking around, I see the global and per-namespace registration
>>>>>> functions (nf_register_hook and nf_register_net_hook, respectively),
>>>>>> but I'm looking to see if/how newly created namespace inherit
>>>>>> netfilter hooks from the init network namespace ... if you can create
>>>>>> a network namespace and dodge the SELinux hooks, that isn't a good
>>>>>> thing from a SELinux point of view, although it might be a plus
>>>>>> depending on where you view Paolo's original patches ;)
>>>>> Heh :-)
>>>>>
>>>>> If you use nf_register_net_hook, the hook is only registered in the
>>>>> namespace.
>>>>>
>>>>> If you use nf_register_hook, the hook is put on a global list and
>>>>> registed in all existing namespaces.
>>>>>
>>>>> New namespaces will have the hook added as well (see
>>>>> netfilter_net_init -> nf_register_hook_list in netfilter/core.c )
>>>>>
>>>>> Since nf_register_hook is used it should be impossible to get a netns
>>>>> that doesn't call these hooks.
>>>> Great, thanks.
>>>>
>>>>>>> Do you think it makes sense to rework the patch to delay registering
>>>>>>> of the netfiler hooks until the system is in a state where they're
>>>>>>> needed, without the 'unregister' aspect?
>>>>>> I would need to see the patch to say for certain, but in principle
>>>>>> that seems perfectly reasonable and I think would satisfy both the
>>>>>> netdev and SELinux camps - good suggestion. My main goal is to drop
>>>>>> the selinux_nf_ip_init() entirely so it can't be used as a ROP gadget.
>>>>>>
>>>>>> We might even be able to trim the secmark_active and peerlbl_active
>>>>>> checks in the SELinux netfilter hooks (an earlier attempt at
>>>>>> optimization; contrary to popular belief, I do care about SELinux
>>>>>> performance), although that would mean that enabling the network
>>>>>> access controls would be one way ... I guess you can disregard that
>>>>>> last bit, I'm thinking aloud again.
>>>>> One way is fine I think.
>>>> Yes, just disregard my second paragraph above.
>>>>
>>>>>>> Ideally this would even be per netns -- in perfect world we would
>>>>>>> be able to make it so that a new netns are created with an empty
>>>>>>> hook list.
>>>>>> In general SELinux doesn't care about namespaces, for reasons that are
>>>>>> sorta beyond the scope of this conversation, so I would like to stick
>>>>>> to a all or nothing approach to enabling the SELinux netfilter hooks
>>>>>> across namespaces. Perhaps we can revisit this at a later time, but
>>>>>> let's keep it simple right now.
>>>>> Okay, I'd prefer to stick to your recommendation anyway wrt. to selinux
>>>>> (Casey, I read your comment regarding smack. Noted, we don't want to
>>>>> break smack either...)
>>>>>
>>>>> I think that in this case the entire question is:
>>>>>
>>>>> In your experience, how likely is a config where selinux is enabled BUT the
>>>>> hooks are not needed (i.e., where we hit the
>>>>>
>>>>> if (!selinux_policycap_netpeer)
>>>>> return NF_ACCEPT;
>>>>>
>>>>> if (!secmark_active && !peerlbl_active)
>>>>> return NF_ACCEPT;
>>>>>
>>>>> tests inside the hooks)? If such setups are uncommon we should just
>>>>> drop this idea or at least put it on the back burner until the more
>>>>> expensive netfilter hooks (conntrack, cough) are out of the way.
>>>> A few years ago I would have said that it is relatively uncommon for admins to
>>>> enable the SELinux network access controls; it was typically just
>>>> government/intelligence agencies who had very strict access control
>>>> requirements and represented a small portion of SELinux users. However, over
>>>> the past few years I've been fielding more and more questions from admins/devs
>>>> in the virtualization space who are interested in some of these capabilities;
>>>> it isn't clear to me how many of these people are switching it on, but there
>>>> is definitely more interest than I have seen in the past and the interested is
>>>> centered around some rather common use cases.
>>>>
>>>> So, to summarize, I don't know ;)
>>>>
>>>> If you've got bigger sources of overhead, my opinion would be to go tackle
>>>> those first. Perhaps I can even find the time to work on the
>>>> SELinux/netfilter stuff while you are off slaying the bigger dragons, no
>>>> promises at the moment.
>>> Double checking if I got the above correctly.
>>>
>>> Will be ok if we post a v2 version of this series, removing the hooks
>>> de-registration bits, but preserving the selinux nf-hooks and
>>> socket_sock_rcv_skb() on-demand/delayed registration ?
>> Imagine that I have two security modules that control sockets.
>> The work I'm knee deep in will allow this. If adding hooks after
>> the init phase is allowed you have to face the possibility that
>> blob sizes (in this case sock->sk_security) may change. That
>> requires checking on every hook that uses blobs to determine
>> whether the blob has data for all the modules using it. I know
>> that that is a simple matter of arithmetic, but it's additional
>> overhead on every hook call. It also makes creating kmem caches
>> for security blobs much more difficult. Another performance
>> optimization that becomes unavailable.
> I got your point.
>
> Without seeing the code, I wonder if the above scenario could be covered
> always allocating a blob large enough for all concurrent security
> modules, i.e. each security module always declares/requests/allocates
> space into the blobs regardless of it does not have registered (yet)
> some security hooks, trading memory usage for performance.
Yes, that would be possible. Unfortunately, many distros ship
kernels with multiple security modules compiled in, but only one
in use. That would make the common case the worst case. For the
near (1-2 years) future the common case will be one security
module using blobs in use with multiple modules compiled.
The LSM infrastructure was built on the assumption that
most systems would not use additional security modules.
That was true then. It isn't now.
> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Or Gerlitz @ 2016-04-13 15:10 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Leon Romanovsky, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
David S. Miller, Doug Ledford, Linus Torvalds, Or Gerlitz,
Leon Romanovsky, Tal Alon
In-Reply-To: <CAJ3xEMiu3o=yMABh1N7w8mDV4yddE1zk5FYoHHHB02tDhOqzHw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
We talked here, and the three MLNX mlx5 maintainers prefer things to
be such they submit a patch
to both trees (rdma and net-next) that contains the changes they plan
to the FW IFC file for that
-next release. They agreed that only fields that will be actually in
use by the driver will be exposed
in the IFC. So... let it be and good luck to us, maybe with even zero
-next conflicts on mlx5 between
to two subsystems, happy upstreaming.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: linux-next: build failure after merge of the net-next tree
From: David Miller @ 2016-04-13 15:15 UTC (permalink / raw)
To: sfr
Cc: netdev, linux-next, linux-kernel, mark.d.rustad,
jeffrey.t.kirsher, andrewx.bowers, broonie, kernel-build-reports,
linaro-kernel
In-Reply-To: <20160413175028.5d3a4cf2@canb.auug.org.au>
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 13 Apr 2016 17:50:28 +1000
> After merging the net-next tree, today's linux-next build (arm
> allmodconfig) failed like thisi (this has actually been failing for a
> few days, now):
>
> ERROR: "__bad_udelay" [drivers/net/ethernet/intel/ixgbe/ixgbe.ko] undefined!
>
> Caused by commit
>
> 49425dfc7451 ("ixgbe: Add support for x550em_a 10G MAC type")
>
> arm only allows udelay()s up to 2 milliseconds. This commit
> adds a 5 ms udelay in ixgbe_acquire_swfw_sync_x550em_a() in
> drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c.
Jeff, please have your folks look into this. Probably just a simple
conversion to mdelay().
Thanks!
^ permalink raw reply
* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: David Miller @ 2016-04-13 15:22 UTC (permalink / raw)
To: mst; +Cc: pabeni, netdev, hannes, ebiederm, gkurz, jasowang
In-Reply-To: <20160413132411-mutt-send-email-mst@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 13 Apr 2016 13:26:51 +0300
> On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
>> This patch disables the default qdisc by explicitly setting the
>> IFF_NO_QUEUE private flag so that now the tun xmit path do not
>> require any lock by default.
>>
>> The default qdisc was first removed as a side effect of commit
>> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
>> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>
> I wonder about this back and forth.
> Jason, do you see a workload where the default qdisc
> is preferable?
I don't think you should change this, putting the default back was a
bug fix.
I think this series is taking way too many liberties in order to
achieve locklessness, in my opinion.
If you have proper multi-queue, the qdisc lock and the netdev transmit
lock are almost completely free.
^ permalink raw reply
* Re: [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"
From: Marc Haber @ 2016-04-13 15:44 UTC (permalink / raw)
To: Giuseppe Cavallaro
Cc: netdev, gabriel.fernandez, afaerber, fschaefer.oss, dinh.linux,
davem, preid, rhgadsdon, linux-kernel, stable
In-Reply-To: <1459494436-27386-3-git-send-email-peppe.cavallaro@st.com>
On Fri, Apr 01, 2016 at 09:07:15AM +0200, Giuseppe Cavallaro wrote:
> This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
> due to problems on GeekBox and Banana Pi M1 board when
> connected to a real transceiver instead of a switch via
> fixed-link.
This reversal is still needed in Linux 4.5.1 on Banana Pi.
Please consider including it in Linux 4.5.2.
Greetings
Marc
>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Gabriel Fernandez <gabriel.fernandez@linaro.org>
> Cc: Andreas Färber <afaerber@suse.de>
> Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
> Cc: Dinh Nguyen <dinh.linux@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 ++++++++++-
> .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 9 +--------
> include/linux/stmmac.h | 1 -
> 3 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index ea76129..af09ced 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev)
> struct stmmac_priv *priv = netdev_priv(ndev);
> struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
> int addr, found;
> - struct device_node *mdio_node = priv->plat->mdio_node;
> + struct device_node *mdio_node = NULL;
> + struct device_node *child_node = NULL;
>
> if (!mdio_bus_data)
> return 0;
>
> if (IS_ENABLED(CONFIG_OF)) {
> + for_each_child_of_node(priv->device->of_node, child_node) {
> + if (of_device_is_compatible(child_node,
> + "snps,dwmac-mdio")) {
> + mdio_node = child_node;
> + break;
> + }
> + }
> +
> if (mdio_node) {
> netdev_dbg(ndev, "FOUND MDIO subnode\n");
> } else {
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index dcbd2a1..9cf181f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
> struct device_node *np = pdev->dev.of_node;
> struct plat_stmmacenet_data *plat;
> struct stmmac_dma_cfg *dma_cfg;
> - struct device_node *child_node = NULL;
>
> plat = devm_kzalloc(&pdev->dev, sizeof(*plat), GFP_KERNEL);
> if (!plat)
> @@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
> plat->phy_node = of_node_get(np);
> }
>
> - for_each_child_of_node(np, child_node)
> - if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) {
> - plat->mdio_node = child_node;
> - break;
> - }
> -
> /* "snps,phy-addr" is not a standard property. Mark it as deprecated
> * and warn of its use. Remove this when phy node support is added.
> */
> if (of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr) == 0)
> dev_warn(&pdev->dev, "snps,phy-addr property is deprecated\n");
>
> - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
> + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
> plat->mdio_bus_data = NULL;
> else
> plat->mdio_bus_data =
> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> index 4bcf5a6..6e53fa8 100644
> --- a/include/linux/stmmac.h
> +++ b/include/linux/stmmac.h
> @@ -114,7 +114,6 @@ struct plat_stmmacenet_data {
> int interface;
> struct stmmac_mdio_bus_data *mdio_bus_data;
> struct device_node *phy_node;
> - struct device_node *mdio_node;
> struct stmmac_dma_cfg *dma_cfg;
> int clk_csr;
> int has_gmac;
> --
> 1.7.4.4
>
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox