* Re: [PATCH v3 07/14] net: smc911x: remove the dmaengine compat need
From: Robert Jarzmik @ 2018-06-20 17:17 UTC (permalink / raw)
To: David S. Miller; +Cc: Daniel Mack, linux-kernel, netdev
In-Reply-To: <20180617170217.24177-8-robert.jarzmik@free.fr>
Hi David,
I have converted all the pxa related drivers to DMA slave maps, the only 2 that
remain in my queue are touching your tree, ie. smc911x and smc91x (see [1]).
Once these 2 are done, I have my last dependency on the dma filter function
removed, and can queue the remaining cleanup patches.
Could you (or somebody from netdev) review it and either ack it (and I'll take
it through the pxa tree), or take it for v4.19 please ?
Cheers.
--
Robert
[1] Submission
Robert Jarzmik <robert.jarzmik@free.fr> writes:
> As the pxa architecture switched towards the dmaengine slave map, the
> old compatibility mechanism to acquire the dma requestor line number and
> priority are not needed anymore.
>
> This patch simplifies the dma resource acquisition, using the more
> generic function dma_request_slave_channel().
>
> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
> ---
> Since v2: converted to NULL filter function and NULL dma parameter call
> ---
> drivers/net/ethernet/smsc/smc911x.c | 13 ++-----------
> 1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/smsc/smc911x.c b/drivers/net/ethernet/smsc/smc911x.c
> index 05157442a980..b1b53f6c452f 100644
> --- a/drivers/net/ethernet/smsc/smc911x.c
> +++ b/drivers/net/ethernet/smsc/smc911x.c
> @@ -74,7 +74,6 @@ static const char version[] =
> #include <linux/skbuff.h>
>
> #include <linux/dmaengine.h>
> -#include <linux/dma/pxa-dma.h>
>
> #include <asm/io.h>
>
> @@ -1795,7 +1794,6 @@ static int smc911x_probe(struct net_device *dev)
> #ifdef SMC_USE_DMA
> struct dma_slave_config config;
> dma_cap_mask_t mask;
> - struct pxad_param param;
> #endif
>
> DBG(SMC_DEBUG_FUNC, dev, "--> %s\n", __func__);
> @@ -1971,15 +1969,8 @@ static int smc911x_probe(struct net_device *dev)
>
> dma_cap_zero(mask);
> dma_cap_set(DMA_SLAVE, mask);
> - param.prio = PXAD_PRIO_LOWEST;
> - param.drcmr = -1UL;
> -
> - lp->rxdma =
> - dma_request_slave_channel_compat(mask, pxad_filter_fn,
> - ¶m, &dev->dev, "rx");
> - lp->txdma =
> - dma_request_slave_channel_compat(mask, pxad_filter_fn,
> - ¶m, &dev->dev, "tx");
> + lp->rxdma = dma_request_channel(mask, NULL, NULL);
> + lp->txdma = dma_request_channel(mask, NULL, NULL);
> lp->rxdma_active = 0;
> lp->txdma_active = 0;
--
Robert
^ permalink raw reply
* Re: [PATCH] net: Fix device name resolving crash in default_device_exit()
From: David Ahern @ 2018-06-20 17:15 UTC (permalink / raw)
To: Kirill Tkhai, netdev
Cc: davem, daniel, jakub.kicinski, ast, linux, john.fastabend, brouer
In-Reply-To: <88c1a635-4404-253b-6144-d7f3f9779531@virtuozzo.com>
On 6/20/18 2:57 AM, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>
> The following script makes kernel to crash since it can't obtain
> a name for a device, when the name is occupied by another device:
>
> #!/bin/bash
> ifconfig eth0 down
> ifconfig eth1 down
> index=`cat /sys/class/net/eth1/ifindex`
> ip link set eth1 name dev$index
> unshare -n sleep 1h &
> pid=$!
> while [[ "`readlink /proc/self/ns/net`" == "`readlink /proc/$pid/ns/net`" ]]; do continue; done
> ip link set dev$index netns $pid
> ip link set eth0 name dev$index
> kill -9 $pid
>
> Kernel messages:
>
> virtio_net virtio1 dev3: renamed from eth1
> virtio_net virtio0 dev3: renamed from eth0
> default_device_exit: failed to move dev3 to init_net: -17
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:8978!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 1 PID: 276 Comm: kworker/u8:3 Not tainted 4.17.0+ #292
> Workqueue: netns cleanup_net
> RIP: 0010:default_device_exit+0x9c/0xb0
> [stack trace snipped]
>
> This patch gives more variability during choosing new name
> of device and fixes the problem.
>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>
> Since there is no suggestions how to fix this in another way, I'm resending the patch.
This patch does not remove the BUG, so does not really solve the
problem. ie., it is fairly trivial to write a script (32k dev%d named
devices in init_net) that triggers it again, so your commit subject and
commit log are not correct with the references to 'fixing the problem'.
The change does provide more variability in naming and reduces the
likelihood of not being able to push a device back to init_net.
^ permalink raw reply
* Re: [PATCH net] sctp: fix erroneous inc of snmp SctpFragUsrMsgs
From: Neil Horman @ 2018-06-20 17:08 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: netdev, linux-sctp, Vlad Yasevich
In-Reply-To: <d89c1e422158d21710ce938aa093a20960bd55e9.1529509634.git.marcelo.leitner@gmail.com>
On Wed, Jun 20, 2018 at 12:47:52PM -0300, Marcelo Ricardo Leitner wrote:
> Currently it is incrementing SctpFragUsrMsgs when the user message size
> is of the exactly same size as the maximum fragment size, which is wrong.
>
> The fix is to increment it only when user message is bigger than the
> maximum fragment size.
>
> Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
> net/sctp/chunk.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 79daa98208c391c780440144d69bc7be875c3476..bfb9f812e2ef9fa605b08dc1f534781573c3abf8 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -237,7 +237,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
> /* Account for a different sized first fragment */
> if (msg_len >= first_len) {
> msg->can_delay = 0;
> - SCTP_INC_STATS(sock_net(asoc->base.sk), SCTP_MIB_FRAGUSRMSGS);
> + if (msg_len > first_len)
> + SCTP_INC_STATS(sock_net(asoc->base.sk),
> + SCTP_MIB_FRAGUSRMSGS);
> } else {
> /* Which may be the only one... */
> first_len = msg_len;
> --
> 2.14.4
>
>
Acked-by: Neil Horman <nhorman@redhat.com>
^ permalink raw reply
* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Davide Caratti @ 2018-06-20 17:02 UTC (permalink / raw)
To: Michel Machado, Fu, Qiaobin, davem@davemloft.net
Cc: netdev@vger.kernel.org, jhs@mojatatu.com, Marcelo Ricardo Leitner,
xiyou.wangcong@gmail.com
In-Reply-To: <9e0d0a90-4ad5-722f-1033-40b72c3841ed@digirati.com.br>
On Wed, 2018-06-20 at 12:47 -0400, Michel Machado wrote:
> On 06/20/2018 12:08 PM, Davide Caratti wrote:
> > On Tue, 2018-06-12 at 15:42 +0000, Fu, Qiaobin wrote:
> > > The new action inheritdsfield copies the field DS of
> > > IPv4 and IPv6 packets into skb->priority. This enables
> > > later classification of packets based on the DS field.
> > >
> > > v4:
> > > *Not allow setting flags other than the expected ones.
> > >
> > > *Allow dumping the pure flags.
> > >
> > > Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> > >
> > > Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> > > Reviewed-by: Michel Machado <michel@digirati.com.br>
> > > ---
> > >
> >
> > [...]
> >
> > > @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
> > >
> > > if (d->flags & SKBEDIT_F_PRIORITY)
> > > skb->priority = d->priority;
> > > + if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
> > > + int wlen = skb_network_offset(skb);
> > > +
> > > + switch (tc_skb_protocol(skb)) {
> > > + case htons(ETH_P_IP):
> > > + wlen += sizeof(struct iphdr);
> > > + if (!pskb_may_pull(skb, wlen))
> > > + goto err;
> > > + skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> > > + break;
> > > +
> > > + case htons(ETH_P_IPV6):
> > > + wlen += sizeof(struct ipv6hdr);
> > > + if (!pskb_may_pull(skb, wlen))
> > > + goto err;
> > > + skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> > > + break;
> > > + }
> > > + }
> > > if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
> > > skb->dev->real_num_tx_queues > d->queue_mapping)
> > > skb_set_queue_mapping(skb, d->queue_mapping);
> > > @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
> > >
> > > spin_unlock(&d->tcf_lock);
> > > return d->tcf_action;
> > > +
> > > +err:
> > > + spin_unlock(&d->tcf_lock);
> > > + return TC_ACT_SHOT;
> > > }
> > >
> >
> > sorry for asking this when the patch is a v4...
> >
> > I spotted this, as I'm rebasing a small series that removes the tcf_lock
> > from the data plane of skbedit to gain some speed, and it converts the
> > stats to be per-cpu counters.
> >
> > in the code above, you are catching failures of pskb_may_pull(skb) and
> > then you return TC_ACT_SHOT. That's OK, but I think you should update the
> > drop counter, like other TC actions do.
> >
> > If you (author / reviewers) think this is a minor issue, like I do think,
> > then I can add the missing update in my series and post it when net-next
> > reopens.
> >
> > WDYT?
> >
> > thank you in advance!
> > regards,
> >
>
> Hi Davide,
>
> I agree that we should update the drop counter, but given that
> you're already converting the stats to be per-cpu counters, whatever we
> add now will be just symbolic since you're going to change it anyway.
that's ok for me also, as I can use the current v4 code for the rebase
(and not wait for another respin) _ but let's hear what reviewers think.
> If
> reviewers think that Qiaobin's patch must add the update line, could you
> provide the exact line and location so we avoid going to v6 of this patch?
In case, I was thinking of something like:
https://elixir.bootlin.com/linux/v4.18-rc1/source/net/sched/act_ipt.c#L249
so, between 'err:' and 'spin_unlock(&d->tcf_lock)', insert a line like:
d->tcf_qstats.drop++;
regards,
--
davide
^ permalink raw reply
* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Michel Machado @ 2018-06-20 16:47 UTC (permalink / raw)
To: Davide Caratti, Fu, Qiaobin, davem@davemloft.net
Cc: netdev@vger.kernel.org, jhs@mojatatu.com, Marcelo Ricardo Leitner,
xiyou.wangcong@gmail.com
In-Reply-To: <264a03455daf28d33809ea77e4badc6a5ec5c9e6.camel@redhat.com>
On 06/20/2018 12:08 PM, Davide Caratti wrote:
> On Tue, 2018-06-12 at 15:42 +0000, Fu, Qiaobin wrote:
>> The new action inheritdsfield copies the field DS of
>> IPv4 and IPv6 packets into skb->priority. This enables
>> later classification of packets based on the DS field.
>>
>> v4:
>> *Not allow setting flags other than the expected ones.
>>
>> *Allow dumping the pure flags.
>>
>> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
>>
>> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>> Reviewed-by: Michel Machado <michel@digirati.com.br>
>> ---
>>
>
> [...]
>
>> @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>>
>> if (d->flags & SKBEDIT_F_PRIORITY)
>> skb->priority = d->priority;
>> + if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
>> + int wlen = skb_network_offset(skb);
>> +
>> + switch (tc_skb_protocol(skb)) {
>> + case htons(ETH_P_IP):
>> + wlen += sizeof(struct iphdr);
>> + if (!pskb_may_pull(skb, wlen))
>> + goto err;
>> + skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
>> + break;
>> +
>> + case htons(ETH_P_IPV6):
>> + wlen += sizeof(struct ipv6hdr);
>> + if (!pskb_may_pull(skb, wlen))
>> + goto err;
>> + skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
>> + break;
>> + }
>> + }
>> if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
>> skb->dev->real_num_tx_queues > d->queue_mapping)
>> skb_set_queue_mapping(skb, d->queue_mapping);
>> @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>>
>> spin_unlock(&d->tcf_lock);
>> return d->tcf_action;
>> +
>> +err:
>> + spin_unlock(&d->tcf_lock);
>> + return TC_ACT_SHOT;
>> }
>>
>
> sorry for asking this when the patch is a v4...
>
> I spotted this, as I'm rebasing a small series that removes the tcf_lock
> from the data plane of skbedit to gain some speed, and it converts the
> stats to be per-cpu counters.
>
> in the code above, you are catching failures of pskb_may_pull(skb) and
> then you return TC_ACT_SHOT. That's OK, but I think you should update the
> drop counter, like other TC actions do.
>
> If you (author / reviewers) think this is a minor issue, like I do think,
> then I can add the missing update in my series and post it when net-next
> reopens.
>
> WDYT?
>
> thank you in advance!
> regards,
>
Hi Davide,
I agree that we should update the drop counter, but given that
you're already converting the stats to be per-cpu counters, whatever we
add now will be just symbolic since you're going to change it anyway. If
reviewers think that Qiaobin's patch must add the update line, could you
provide the exact line and location so we avoid going to v6 of this patch?
[ ]'s
Michel Machado
^ permalink raw reply
* Re: [PATCH] tc, bpf: add option to dump bpf verifier as C program fragment
From: Stephen Hemminger @ 2018-06-20 16:40 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David Ahern, Jakub Kicinski, Ophir Munk, netdev, Thomas Monjalon,
Olga Shern, ast
In-Reply-To: <e454ef12-0ae7-7122-3801-321a62c5d0ce@iogearbox.net>
On Wed, 20 Jun 2018 00:13:52 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 06/18/2018 11:44 PM, David Ahern wrote:
> > On 6/18/18 2:18 PM, Jakub Kicinski wrote:
> >> On Sun, 17 Jun 2018 08:48:41 +0000, Ophir Munk wrote:
> >>> Similar to cbpf used within tcpdump utility with a "-d" option to dump
> >>> the compiled packet-matching code in a human readable form - tc has the
> >>> "verbose" option to dump ebpf verifier output.
> >>> Another useful option of cbpf using tcpdump "-dd" option is to dump
> >>> packet-matching code a C program fragment. Similar to this - this commit
> >>> adds a new tc ebpf option named "code" to dump ebpf verifier as C program
> >>> fragment.
> >>>
> >>> Existing "verbose" option sample output:
> >>>
> >>> Verifier analysis:
> >>> 0: (61) r2 = *(u32 *)(r1 +52)
> >>> 1: (18) r3 = 0xdeadbeef
> >>> 3: (63) *(u32 *)(r10 -4) = r3
> >>> .
> >>> .
> >>> 11: (63) *(u32 *)(r1 +52) = r2
> >>> 12: (18) r0 = 0xffffffff
> >>> 14: (95) exit
> >>>
> >>> New "code" option sample output:
> >>>
> >>> /* struct bpf_insn cls_q_code[] = { */
> >>> {0x61, 2, 1, 52, 0x00000000},
> >>> {0x18, 3, 0, 0, 0xdeadbeef},
> >>> {0x00, 0, 0, 0, 0x00000000},
> >>> .
> >>> .
> >>> {0x63, 1, 2, 52, 0x00000000},
> >>> {0x18, 0, 0, 0, 0xffffffff},
> >>> {0x00, 0, 0, 0, 0x00000000},
> >>> {0x95, 0, 0, 0, 0x00000000},
> >>>
> >>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> >>
> >> Hmm... printing C arrays looks like hacky integration with some C
> >> code... Would you not be better served by simply using libbpf in
> >> whatever is consuming this output?
> >
> > I was thinking the same. bpftool would provide options too -- print the
> > above, print in macro encodings and verifier. I gave an example of this
> > side by side dump at netconf 2.1. Does not look like the slides made it
> > online; see attached.
>
> +1, I would also doubt that this adds a lot in terms of debuggability
> when you're trying to load an object file with thousands of insns. Better
> way would be to use llvm-objdump on the obj file to get to the annotated
> disassembly, see also example in [0]. A .o to .c converter is wip for
> libbpf/bpftool as presented in [1], which should provide the flexibility
> to embedd an obj file.
>
> Cheers,
> Daniel
>
> [0] http://cilium.readthedocs.io/en/latest/bpf/#llvm
> [1] http://vger.kernel.org/netconf2018_files/AlexeiStarovoitov_netconf2018.pdf page 22
I am going to not accept this for now. Please respin for iproute2 next if
you think bpftool won't be able to handle this.
^ permalink raw reply
* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
From: Andy C @ 2018-06-20 16:40 UTC (permalink / raw)
To: Joe Perches, 'Trond Myklebust'
Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
'J. Bruce Fields', 'Neil Brown',
'Chuck Lever', 'Benny Halevy',
'Alexandros Batsakis', Andy Chittenden
In-Reply-To: <0dead6e0deabafe0abdd4b30f97bcdf5c12cd824.camel@perches.com>
On 2018-06-19 22:56, Joe Perches wrote:
> On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:
>
> You really need to check your clock.
> Mail sent in the year 2010?
>
I didn't send it just now. I sent it on 2010-08-09 at 10:27. And I did
receive a copy at that time. If I'm reading the headers correctly,
something happened here:
Received: from mchehab by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux))
id 1fVNCq-0005da-3g; Tue, 19 Jun 2018 20:26:24 +0000
Received: from vger.kernel.org ([209.132.180.67])
by bombadil.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux))
id 1OiOdm-0004za-Ks; Mon, 09 Aug 2010 09:27:30 +0000
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S1755981Ab0HIJ1U (ORCPT <rfc822;tgraf@infradead.org> + 8 others);
Mon, 9 Aug 2010 05:27:20 -0400
--
Andy
^ permalink raw reply
* [PATCH net] netfilter: nf_log: fix uninit read in nf_log_proc_dostring
From: Jann Horn @ 2018-06-20 16:33 UTC (permalink / raw)
To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
netfilter-devel, coreteam, jannh
Cc: David S. Miller, netdev, linux-kernel
When proc_dostring() is called with a non-zero offset in strict mode, it
doesn't just write to the ->data buffer, it also reads. Make sure it
doesn't read uninitialized data.
Fixes: c6ac37d8d884 ("netfilter: nf_log: fix error on write NONE to [...]")
Signed-off-by: Jann Horn <jannh@google.com>
---
net/netfilter/nf_log.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 426457047578..2c47f9ec3511 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -424,6 +424,10 @@ static int nf_log_proc_dostring(struct ctl_table *table, int write,
if (write) {
struct ctl_table tmp = *table;
+ /* proc_dostring() can append to existing strings, so we need to
+ * initialize it as an empty string.
+ */
+ buf[0] = '\0';
tmp.data = buf;
r = proc_dostring(&tmp, write, buffer, lenp, ppos);
if (r)
--
2.18.0.rc1.244.gcf134e6275-goog
^ permalink raw reply related
* Re: [PATCH] tc: fix batch force option
From: Stephen Hemminger @ 2018-06-20 16:33 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, chrism
In-Reply-To: <1529479461-32048-1-git-send-email-vladbu@mellanox.com>
On Wed, 20 Jun 2018 10:24:21 +0300
Vlad Buslov <vladbu@mellanox.com> wrote:
> When sending accumulated compound command results an error, check 'force'
> option before exiting. Move return code check after putting batch bufs and
> freeing iovs to prevent memory leak. Break from loop, instead of returning
> error code to allow cleanup at the end of batch function. Don't reset ret
> code on each iteration.
>
> Fixes: 485d0c6001c4 ("tc: Add batchsize feature for filter and actions")
> Reviewed-by: Roi Dayan <roid@mellanox.com>
> Reviewed-by: Chris Mi <chrism@mellanox.com>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Applied
^ permalink raw reply
* [PATCH v2] ucc_geth: Add BQL support
From: Joakim Tjernlund @ 2018-06-20 16:29 UTC (permalink / raw)
To: David Miller, leoyang.li, netdev; +Cc: Joakim Tjernlund
In-Reply-To: <20180619163036.20578-1-joakim.tjernlund@infinera.com>
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---
v2 - Reoder varibles according to Dave
Add call to netdev_reset_queue(dev) open/close
drivers/net/ethernet/freescale/ucc_geth.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index f77ba9fa257b..e8debbde0a34 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -3096,6 +3096,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, struct net_device *dev)
ugeth_vdbg("%s: IN", __func__);
+ netdev_sent_queue(dev, skb->len);
spin_lock_irqsave(&ugeth->lock, flags);
dev->stats.tx_bytes += skb->len;
@@ -3240,6 +3241,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
{
/* Start from the next BD that should be filled */
struct ucc_geth_private *ugeth = netdev_priv(dev);
+ unsigned int bytes_sent = 0;
+ int howmany = 0;
u8 __iomem *bd; /* BD pointer */
u32 bd_status;
@@ -3257,7 +3260,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
skb = ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]];
if (!skb)
break;
-
+ howmany++;
+ bytes_sent += skb->len;
dev->stats.tx_packets++;
dev_consume_skb_any(skb);
@@ -3279,6 +3283,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
bd_status = in_be32((u32 __iomem *)bd);
}
ugeth->confBd[txQ] = bd;
+ netdev_completed_queue(dev, howmany, bytes_sent);
return 0;
}
@@ -3479,6 +3484,7 @@ static int ucc_geth_open(struct net_device *dev)
phy_start(ugeth->phydev);
napi_enable(&ugeth->napi);
+ netdev_reset_queue(dev);
netif_start_queue(dev);
device_set_wakeup_capable(&dev->dev,
@@ -3509,6 +3515,7 @@ static int ucc_geth_close(struct net_device *dev)
free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
netif_stop_queue(dev);
+ netdev_reset_queue(dev);
return 0;
}
--
2.13.6
^ permalink raw reply related
* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Davide Caratti @ 2018-06-20 16:08 UTC (permalink / raw)
To: Fu, Qiaobin, davem@davemloft.net
Cc: netdev@vger.kernel.org, jhs@mojatatu.com, Michel Machado,
Marcelo Ricardo Leitner, xiyou.wangcong@gmail.com
In-Reply-To: <E5616B71-959E-43FE-9096-BF523948ABB2@bu.edu>
On Tue, 2018-06-12 at 15:42 +0000, Fu, Qiaobin wrote:
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
>
> v4:
> *Not allow setting flags other than the expected ones.
>
> *Allow dumping the pure flags.
>
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
>
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
>
[...]
> @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>
> if (d->flags & SKBEDIT_F_PRIORITY)
> skb->priority = d->priority;
> + if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
> + int wlen = skb_network_offset(skb);
> +
> + switch (tc_skb_protocol(skb)) {
> + case htons(ETH_P_IP):
> + wlen += sizeof(struct iphdr);
> + if (!pskb_may_pull(skb, wlen))
> + goto err;
> + skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> + break;
> +
> + case htons(ETH_P_IPV6):
> + wlen += sizeof(struct ipv6hdr);
> + if (!pskb_may_pull(skb, wlen))
> + goto err;
> + skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> + break;
> + }
> + }
> if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
> skb->dev->real_num_tx_queues > d->queue_mapping)
> skb_set_queue_mapping(skb, d->queue_mapping);
> @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>
> spin_unlock(&d->tcf_lock);
> return d->tcf_action;
> +
> +err:
> + spin_unlock(&d->tcf_lock);
> + return TC_ACT_SHOT;
> }
>
sorry for asking this when the patch is a v4...
I spotted this, as I'm rebasing a small series that removes the tcf_lock
from the data plane of skbedit to gain some speed, and it converts the
stats to be per-cpu counters.
in the code above, you are catching failures of pskb_may_pull(skb) and
then you return TC_ACT_SHOT. That's OK, but I think you should update the
drop counter, like other TC actions do.
If you (author / reviewers) think this is a minor issue, like I do think,
then I can add the missing update in my series and post it when net-next
reopens.
WDYT?
thank you in advance!
regards,
--
davide
^ permalink raw reply
* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-20 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
Jakub Kicinski, Samudrala, Sridhar, qemu-devel, virtualization,
Siwei Liu, Venu Busireddy, Netdev, boris.ostrovsky, aaron.f.brown,
Joao Martins
In-Reply-To: <20180620170904-mutt-send-email-mst@kernel.org>
On Wed, 20 Jun 2018 17:11:59 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jun 20, 2018 at 11:53:59AM +0200, Cornelia Huck wrote:
> > On Tue, 19 Jun 2018 23:32:06 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote:
> > > > Sorry about dragging mainframes into this, but this will only work for
> > > > homogenous device coupling, not for heterogenous. Consider my vfio-pci
> > > > + virtio-net-ccw example again: The guest cannot find out that the two
> > > > belong together by checking some group ID, it has to either use the MAC
> > > > or some needs-to-be-architectured property.
> > > >
> > > > Alternatively, we could propose that mechanism as pci-only, which means
> > > > we can rely on mechanisms that won't necessarily work on non-pci
> > > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> > > > through a network card anytime in the near future, due to the nature of
> > > > network cards currently in use on s390.)
> > >
> > > That's what it boils down to, yes. If there's need to have this for
> > > non-pci devices, then we should put it in config space.
> > > Cornelia, what do you think?
> > >
> >
> > I think the only really useful config on s390 is the vfio-pci network
> > card coupled with a virtio-net-ccw device: Using an s390 network card
> > via vfio-ccw is out due to the nature of the s390 network cards, and
> > virtio-ccw is the default transport (virtio-pci is not supported on any
> > enterprise distro AFAIK).
> >
> > For this, having a uuid in the config space could work (vfio-pci
> > devices have a config space by virtue of being pci devices, and
> > virtio-net-ccw devices have a config space by virtue of being virtio
> > devices -- ccw devices usually don't have that concept).
>
> OK so this calls for adding such a field generally (it's
> device agnostic right now).
>
> How would you suggest doing that?
I hope that I'm not thoroughly confused at this point in time, so I'll
summarize my current understanding (also keep in mind that I haven't
looked at Venu's patches yet):
- The Linux guest initiates coupling from the virtio-net driver.
Matching the other device is done via the MAC, and only pci devices
are allowed for the failover device. (There does not seem to be any
restriction on the transport of the virtio-net device.)
- The Linux guest virtio-net driver does not allow changing the MAC if
standby has been negotiated (implying that the hypervisor needs to
configure the correct MAC).
- In QEMU, we need to know which two devices (vfio-pci and virtio-net)
go together, so that the virtio-net device gets the correct MAC. We
also need the pairing so that we can make the vfio-pci device
available once the guest has negotiated the standby feature.
We can tack the two devices together in QEMU by introducing new,
optional properties pointing from the virtio-net device to the vfio-pci
device (only offer standby if this is set) and the other way around
(don't make the device visible at the start if this is set). Problems:
- The admin needs to figure out the MAC by themselves and set it
correctly. If this is incorrect, the vfio-pci device cannot be found
in the guest. (Not sure how much of a problem this is in practice --
and QEMU cannot figure out the MAC without poking at the vfio-pci
device, and we probably want to avoid that.)
- This two-way pointing makes for interesting handing of the command
line and when both devices are plugged later.
In any case, I'm not sure anymore why we'd want the extra uuid. Is
there any way QEMU (or libvirt) can figure it out without actually
looking at the vfio-pci device?
^ permalink raw reply
* [PATCH 4/5] ceph: use timespec64 for r_mtime
From: Arnd Bergmann @ 2018-06-20 15:50 UTC (permalink / raw)
To: Yan Zheng, Sage Weil, Ilya Dryomov, Alex Elder
Cc: Jens Axboe, Jan Kara, Martin K. Petersen, Arnd Bergmann, y2038,
netdev, linux-kernel, Daniel Jordan, linux-block, ceph-devel,
Jason Dillaman, David S. Miller
In-Reply-To: <20180620155101.57685-1-arnd@arndb.de>
The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/block/rbd.c | 2 +-
fs/ceph/addr.c | 12 ++++++------
fs/ceph/file.c | 11 +++++------
include/linux/ceph/osd_client.h | 6 +++---
net/ceph/osd_client.c | 8 ++++----
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fa0729c1e776..356936333cd9 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1452,7 +1452,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
struct ceph_osd_request *osd_req = obj_request->osd_req;
osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
- ktime_get_real_ts(&osd_req->r_mtime);
+ ktime_get_real_ts64(&osd_req->r_mtime);
osd_req->r_data_offset = obj_request->ex.oe_off;
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 292b3d72d725..d44d51e69e76 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -574,7 +574,7 @@ static u64 get_writepages_data_length(struct inode *inode,
*/
static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
{
- struct timespec ts;
+ struct timespec64 ts;
struct inode *inode;
struct ceph_inode_info *ci;
struct ceph_fs_client *fsc;
@@ -625,7 +625,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
set_page_writeback(page);
- ts = timespec64_to_timespec(inode->i_mtime);
+ ts = inode->i_mtime;
err = ceph_osdc_writepages(&fsc->client->osdc, ceph_vino(inode),
&ci->i_layout, snapc, page_off, len,
ceph_wbc.truncate_seq,
@@ -1134,7 +1134,7 @@ static int ceph_writepages_start(struct address_space *mapping,
pages = NULL;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
rc = ceph_osdc_start_request(&fsc->client->osdc, req, true);
BUG_ON(rc);
req = NULL;
@@ -1734,7 +1734,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
goto out;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!err)
err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1776,7 +1776,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
goto out_put;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!err)
err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1937,7 +1937,7 @@ static int __ceph_pool_perm_get(struct ceph_inode_info *ci,
0, false, true);
err = ceph_osdc_start_request(&fsc->client->osdc, rd_req, false);
- wr_req->r_mtime = timespec64_to_timespec(ci->vfs_inode.i_mtime);
+ wr_req->r_mtime = ci->vfs_inode.i_mtime;
err2 = ceph_osdc_start_request(&fsc->client->osdc, wr_req, false);
if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index ad0bed99b1d5..1795a8dc9a1e 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -721,7 +721,7 @@ struct ceph_aio_request {
struct list_head osd_reqs;
unsigned num_reqs;
atomic_t pending_reqs;
- struct timespec mtime;
+ struct timespec64 mtime;
struct ceph_cap_flush *prealloc_cf;
};
@@ -923,7 +923,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
int num_pages = 0;
int flags;
int ret;
- struct timespec mtime = timespec64_to_timespec(current_time(inode));
+ struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(iter);
loff_t pos = iocb->ki_pos;
bool write = iov_iter_rw(iter) == WRITE;
@@ -1013,7 +1013,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
truncate_inode_pages_range(inode->i_mapping, pos,
(pos+len) | (PAGE_SIZE - 1));
- req->r_mtime = mtime;
+ req->r_mtime = current_time(inode);
}
osd_req_op_extent_osd_data_bvecs(req, 0, bvecs, num_pages, len);
@@ -1131,7 +1131,6 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
int flags;
int ret;
bool check_caps = false;
- struct timespec mtime = timespec64_to_timespec(current_time(inode));
size_t count = iov_iter_count(from);
if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
@@ -1201,7 +1200,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
false, true);
- req->r_mtime = mtime;
+ req->r_mtime = current_time(inode);
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!ret)
ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1663,7 +1662,7 @@ static int ceph_zero_partial_object(struct inode *inode,
goto out;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!ret) {
ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 0d6ee04b4c41..2e6611c1e9a0 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -199,7 +199,7 @@ struct ceph_osd_request {
/* set by submitter */
u64 r_snapid; /* for reads, CEPH_NOSNAP o/w */
struct ceph_snap_context *r_snapc; /* for writes */
- struct timespec r_mtime; /* ditto */
+ struct timespec64 r_mtime; /* ditto */
u64 r_data_offset; /* ditto */
bool r_linger; /* don't resend on failure */
@@ -253,7 +253,7 @@ struct ceph_osd_linger_request {
struct ceph_osd_request_target t;
u32 map_dne_bound;
- struct timespec mtime;
+ struct timespec64 mtime;
struct kref kref;
struct mutex lock;
@@ -508,7 +508,7 @@ extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
struct ceph_snap_context *sc,
u64 off, u64 len,
u32 truncate_seq, u64 truncate_size,
- struct timespec *mtime,
+ struct timespec64 *mtime,
struct page **pages, int nr_pages);
/* watch/notify */
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index a00c74f1154e..a87a021ca9d0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1978,7 +1978,7 @@ static void encode_request_partial(struct ceph_osd_request *req,
p += sizeof(struct ceph_blkin_trace_info);
ceph_encode_32(&p, 0); /* client_inc, always 0 */
- ceph_encode_timespec(p, &req->r_mtime);
+ ceph_encode_timespec64(p, &req->r_mtime);
p += sizeof(struct ceph_timespec);
encode_oloc(&p, end, &req->r_t.target_oloc);
@@ -4512,7 +4512,7 @@ ceph_osdc_watch(struct ceph_osd_client *osdc,
ceph_oid_copy(&lreq->t.base_oid, oid);
ceph_oloc_copy(&lreq->t.base_oloc, oloc);
lreq->t.flags = CEPH_OSD_FLAG_WRITE;
- ktime_get_real_ts(&lreq->mtime);
+ ktime_get_real_ts64(&lreq->mtime);
lreq->reg_req = alloc_linger_request(lreq);
if (!lreq->reg_req) {
@@ -4570,7 +4570,7 @@ int ceph_osdc_unwatch(struct ceph_osd_client *osdc,
ceph_oid_copy(&req->r_base_oid, &lreq->t.base_oid);
ceph_oloc_copy(&req->r_base_oloc, &lreq->t.base_oloc);
req->r_flags = CEPH_OSD_FLAG_WRITE;
- ktime_get_real_ts(&req->r_mtime);
+ ktime_get_real_ts64(&req->r_mtime);
osd_req_op_watch_init(req, 0, lreq->linger_id,
CEPH_OSD_WATCH_OP_UNWATCH);
@@ -5136,7 +5136,7 @@ int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
struct ceph_snap_context *snapc,
u64 off, u64 len,
u32 truncate_seq, u64 truncate_size,
- struct timespec *mtime,
+ struct timespec64 *mtime,
struct page **pages, int num_pages)
{
struct ceph_osd_request *req;
--
2.9.0
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038
^ permalink raw reply related
* [PATCH 1/5] ceph: use timespec64 in for keepalive
From: Arnd Bergmann @ 2018-06-20 15:50 UTC (permalink / raw)
To: Yan Zheng, Sage Weil, Ilya Dryomov, David S. Miller
Cc: y2038, ceph-devel, Arnd Bergmann, linux-kernel, netdev
ceph_con_keepalive_expired() is the last user of timespec_add() and some
of the last uses of ktime_get_real_ts(). Replacing this with timespec64
based interfaces lets us remove that deprecated API.
I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64()
here that take timespec64 structures and convert to/from ceph_timespec,
which is defined to have an unsigned 32-bit tv_sec member. This extends
the range of valid times to year 2106, avoiding the year 2038 overflow.
The ceph file system portion still uses the old functions for inode
timestamps, this will be done separately after the VFS layer is converted.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
include/linux/ceph/decode.h | 20 +++++++++++++++++++-
include/linux/ceph/messenger.h | 2 +-
net/ceph/auth_x.c | 14 +++++++-------
net/ceph/auth_x.h | 2 +-
net/ceph/cls_lock_client.c | 4 ++--
net/ceph/messenger.c | 20 ++++++++++----------
6 files changed, 40 insertions(+), 22 deletions(-)
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h
index d143ac8879c6..45efe09fb7b4 100644
--- a/include/linux/ceph/decode.h
+++ b/include/linux/ceph/decode.h
@@ -194,8 +194,26 @@ ceph_decode_skip_n(p, end, sizeof(u8), bad)
} while (0)
/*
- * struct ceph_timespec <-> struct timespec
+ * struct ceph_timespec <-> struct timespec64
*/
+static inline void ceph_decode_timespec64(struct timespec64 *ts,
+ const struct ceph_timespec *tv)
+{
+ /*
+ * this will still overflow in year 2106. We could extend
+ * the protocol to steal two more bits from tv_nsec to
+ * add three more 136 year epochs after that the way ext4
+ * does if necessary.
+ */
+ ts->tv_sec = (time64_t)le32_to_cpu(tv->tv_sec);
+ ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
+}
+static inline void ceph_encode_timespec64(struct ceph_timespec *tv,
+ const struct timespec64 *ts)
+{
+ tv->tv_sec = cpu_to_le32((u32)ts->tv_sec);
+ tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec);
+}
static inline void ceph_decode_timespec(struct timespec *ts,
const struct ceph_timespec *tv)
{
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index c7dfcb8a1fb2..a718b877c597 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -330,7 +330,7 @@ struct ceph_connection {
int in_base_pos; /* bytes read */
__le64 in_temp_ack; /* for reading an ack */
- struct timespec last_keepalive_ack; /* keepalive2 ack stamp */
+ struct timespec64 last_keepalive_ack; /* keepalive2 ack stamp */
struct delayed_work work; /* send|recv work */
unsigned long delay; /* current delay interval */
diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
index 2f4a1baf5f52..b05c3a540a5a 100644
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -149,12 +149,12 @@ static int process_one_ticket(struct ceph_auth_client *ac,
void *dp, *dend;
int dlen;
char is_enc;
- struct timespec validity;
+ struct timespec64 validity;
void *tp, *tpend;
void **ptp;
struct ceph_crypto_key new_session_key = { 0 };
struct ceph_buffer *new_ticket_blob;
- unsigned long new_expires, new_renew_after;
+ time64_t new_expires, new_renew_after;
u64 new_secret_id;
int ret;
@@ -189,11 +189,11 @@ static int process_one_ticket(struct ceph_auth_client *ac,
if (ret)
goto out;
- ceph_decode_timespec(&validity, dp);
+ ceph_decode_timespec64(&validity, dp);
dp += sizeof(struct ceph_timespec);
- new_expires = get_seconds() + validity.tv_sec;
+ new_expires = ktime_get_real_seconds() + validity.tv_sec;
new_renew_after = new_expires - (validity.tv_sec / 4);
- dout(" expires=%lu renew_after=%lu\n", new_expires,
+ dout(" expires=%llu renew_after=%llu\n", new_expires,
new_renew_after);
/* ticket blob for service */
@@ -385,13 +385,13 @@ static bool need_key(struct ceph_x_ticket_handler *th)
if (!th->have_key)
return true;
- return get_seconds() >= th->renew_after;
+ return ktime_get_real_seconds() >= th->renew_after;
}
static bool have_key(struct ceph_x_ticket_handler *th)
{
if (th->have_key) {
- if (get_seconds() >= th->expires)
+ if (ktime_get_real_seconds() >= th->expires)
th->have_key = false;
}
diff --git a/net/ceph/auth_x.h b/net/ceph/auth_x.h
index 454cb54568af..57ba99f7736f 100644
--- a/net/ceph/auth_x.h
+++ b/net/ceph/auth_x.h
@@ -22,7 +22,7 @@ struct ceph_x_ticket_handler {
u64 secret_id;
struct ceph_buffer *ticket_blob;
- unsigned long renew_after, expires;
+ time64_t renew_after, expires;
};
#define CEPHX_AU_ENC_BUF_LEN 128 /* big enough for encrypted blob */
diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c
index 8d2032b2f225..2105a6eaa66c 100644
--- a/net/ceph/cls_lock_client.c
+++ b/net/ceph/cls_lock_client.c
@@ -32,7 +32,7 @@ int ceph_cls_lock(struct ceph_osd_client *osdc,
int desc_len = strlen(desc);
void *p, *end;
struct page *lock_op_page;
- struct timespec mtime;
+ struct timespec64 mtime;
int ret;
lock_op_buf_size = name_len + sizeof(__le32) +
@@ -63,7 +63,7 @@ int ceph_cls_lock(struct ceph_osd_client *osdc,
ceph_encode_string(&p, end, desc, desc_len);
/* only support infinite duration */
memset(&mtime, 0, sizeof(mtime));
- ceph_encode_timespec(p, &mtime);
+ ceph_encode_timespec64(p, &mtime);
p += sizeof(struct ceph_timespec);
ceph_encode_8(&p, flags);
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index c6413c360771..3f6336248509 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1417,11 +1417,11 @@ static void prepare_write_keepalive(struct ceph_connection *con)
dout("prepare_write_keepalive %p\n", con);
con_out_kvec_reset(con);
if (con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2) {
- struct timespec now;
+ struct timespec64 now;
- ktime_get_real_ts(&now);
+ ktime_get_real_ts64(&now);
con_out_kvec_add(con, sizeof(tag_keepalive2), &tag_keepalive2);
- ceph_encode_timespec(&con->out_temp_keepalive2, &now);
+ ceph_encode_timespec64(&con->out_temp_keepalive2, &now);
con_out_kvec_add(con, sizeof(con->out_temp_keepalive2),
&con->out_temp_keepalive2);
} else {
@@ -2555,7 +2555,7 @@ static int read_keepalive_ack(struct ceph_connection *con)
int ret = read_partial(con, size, size, &ceph_ts);
if (ret <= 0)
return ret;
- ceph_decode_timespec(&con->last_keepalive_ack, &ceph_ts);
+ ceph_decode_timespec64(&con->last_keepalive_ack, &ceph_ts);
prepare_read_tag(con);
return 1;
}
@@ -3223,12 +3223,12 @@ bool ceph_con_keepalive_expired(struct ceph_connection *con,
{
if (interval > 0 &&
(con->peer_features & CEPH_FEATURE_MSGR_KEEPALIVE2)) {
- struct timespec now;
- struct timespec ts;
- ktime_get_real_ts(&now);
- jiffies_to_timespec(interval, &ts);
- ts = timespec_add(con->last_keepalive_ack, ts);
- return timespec_compare(&now, &ts) >= 0;
+ struct timespec64 now;
+ struct timespec64 ts;
+ ktime_get_real_ts64(&now);
+ jiffies_to_timespec64(interval, &ts);
+ ts = timespec64_add(con->last_keepalive_ack, ts);
+ return timespec64_compare(&now, &ts) >= 0;
}
return false;
}
--
2.9.0
^ permalink raw reply related
* [PATCH net] sctp: fix erroneous inc of snmp SctpFragUsrMsgs
From: Marcelo Ricardo Leitner @ 2018-06-20 15:47 UTC (permalink / raw)
To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich
Currently it is incrementing SctpFragUsrMsgs when the user message size
is of the exactly same size as the maximum fragment size, which is wrong.
The fix is to increment it only when user message is bigger than the
maximum fragment size.
Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
net/sctp/chunk.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 79daa98208c391c780440144d69bc7be875c3476..bfb9f812e2ef9fa605b08dc1f534781573c3abf8 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -237,7 +237,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
/* Account for a different sized first fragment */
if (msg_len >= first_len) {
msg->can_delay = 0;
- SCTP_INC_STATS(sock_net(asoc->base.sk), SCTP_MIB_FRAGUSRMSGS);
+ if (msg_len > first_len)
+ SCTP_INC_STATS(sock_net(asoc->base.sk),
+ SCTP_MIB_FRAGUSRMSGS);
} else {
/* Which may be the only one... */
first_len = msg_len;
--
2.14.4
^ permalink raw reply related
* Re: [Y2038] [PATCH] ceph: use ktime_get_real_seconds()
From: Arnd Bergmann @ 2018-06-20 15:47 UTC (permalink / raw)
To: Allen Pais
Cc: Networking, y2038 Mailman List, ceph-devel, David Miller,
Linux Kernel Mailing List
In-Reply-To: <1529488801-22093-1-git-send-email-allen.lkml@gmail.com>
On Wed, Jun 20, 2018 at 12:00 PM, Allen Pais <allen.lkml@gmail.com> wrote:
> Use ktime_get_real_seconds() as get_seconds() is deprecated.
>
> Signed-off-by: Allen Pais <allen.lkml@gmail.com>
I have done a similar patch and will post it soon along with the rest of the
ceph y2038 series. Please have a look at "ceph: use timespec64 in for
keepalive" and comment if you see something that I missed.
Arnd
^ permalink raw reply
* Re: Route fallback issue
From: Grant Taylor @ 2018-06-20 15:38 UTC (permalink / raw)
To: netdev; +Cc: lartc
In-Reply-To: <e19de6ae-536c-7dc5-22e7-62ee7d077c67@spamtrap.tnetconsulting.net>
[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]
On 06/20/2018 09:18 AM, Grant Taylor wrote:
> Where can I find more information on ignore_routes_with_linkdown? I
> don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt.
> (I do see fib_multipath_use_neigh documented there in.)
I'm specifically interested in if ignore_routes_with_linkdown and / or
fib_multipath_use_neigh will cause Linux to fall back to an alternate
(higher metric) route if the link is still up but the neighbor is not
accessible across it.
+-------+
| Linux |
+---+---+
|
+-----+ +---+----+ +-----+
| R 1 +---+ Switch +---+ R 2 |
+-----+ +--------+ +-----+
A typical scenario is where Linux is connected to a DSL or Cable modem
where the physical link stays up even if the neighbor R {1,2} goes
offline. It's not possible to rely on the local link (MII) status to
determine that a neighbor is not reachable. I.e. R 1 going away like below.
+-------+
| Linux |
+---+---+
|
+-----+ +---+----+ +-----+
| R 1 X X Switch +---+ R 2 |
+-----+ +--------+ +-----+
--
Grant. . . .
unix || die
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]
^ permalink raw reply
* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Thomas Walker @ 2018-06-20 15:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Greg KH, devel, Stephen Hemminger, netdev
In-Reply-To: <20180619161440.2bc69506@xeon-e3>
On Tue, Jun 19, 2018 at 04:14:40PM -0700, Stephen Hemminger wrote:
> On Wed, 20 Jun 2018 08:01:50 +0900
> Greg KH <greg@kroah.com> wrote:
>
> > On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> > > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check." Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> > >
> > > commit be9c798d0d13ae609a91177323ac816545c39d28
> > > Author: Stephen Hemminger <stephen@networkplumber.org>
> > > Date: Mon May 14 15:32:18 2018 -0700
> > >
> > > hv_netvsc: common detach logic
> > >
> > > [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
> > >
> > > Make common function for detaching internals of device
> > > during changes to MTU and RSS. Make sure no more packets
> > > are transmitted and all packets have been received before
> > > doing device teardown.
> > >
> > > Change the wait logic to be common and use usleep_range().
> > >
> > > Changes transmit enabling logic so that transmit queues are disabled
> > > during the period when lower device is being changed. And enabled
> > > only after sub channels are setup. This avoids issue where it could
> > > be that a packet was being sent while subchannel was not initialized.
> > >
> > > Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > >
> > >
> > > The removal of which does indeed fix the problem. That led me to:
> > >
> > > commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> > > Author: Dexuan Cui <decui@microsoft.com>
> > > Date: Wed Jun 6 21:32:51 2018 +0000
> > >
> > > hv_netvsc: Fix a network regression after ifdown/ifup
> > >
> > > Recently people reported the NIC stops working after
> > > "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
> > > enabled, after the refactoring of the common detach logic: when the NIC
> > > has sub-channels, usually we enable all the TX queues after all
> > > sub-channels are set up: see rndis_set_subchannel() ->
> > > netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
> > > the number of channels doesn't change, we also must make sure the TX queues
> > > are enabled. The patch fixes the regression.
> > >
> > > Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> > > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > > Cc: K. Y. Srinivasan <kys@microsoft.com>
> > > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > >
> > > Which sounded very promising, but does not seem to fully fix the issue. Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads. If I delay the mtu change until well after the driver loads, everything works fine. If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs. So something is racy here, and the above doesn't entirely address it.
> > >
> > > I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> > >
> > > (oh, and I did also try 4.18-rc1 and the problem still persists)
> >
> > Always cc: the authors of the patch you are having problems with, along
> > with the mailing list for the networking subsystem, so that the right
> > people know to look at this.
>
> How are you changing the MTU? and starting the device.
> When MTU changes the device has to reconnect with the host. This takes a small amount of time
> and no changes to device state are possible then.
>
> If MTU change happens and at the same time some other script tries to bring up the connection,
> then that script will get an error.
Simply setting 'mtu 1300' in /etc/network/interfaces (Debian userland) which results in ifup executing:
ip addr add 192.168.1.100/255.255.255.0 broadcast 192.168.1.255 dev eth0 label eth0
ip link set dev eth0 mtu 1300 up
ip route add default via 192.168.1.1 dev eth0
Running those commands standalone (outside of ifup) reproduces the problem as well. Trying lots of different permutations of loading the driver and different commands with different delays, my initial suspicion that this was somehow related to proximity to the driver load appears incorrect. The only thing that seems to matter is setting the mtu and link state in one go. If I separate the 'ip link' command into two, 'ip link set dev eth0 up' and 'ip link set dev eth0 mtu 1300', in either order (or if either link is already up or mtu is already 1300), then there is no error.
(I should note that the above testing is with 4.14.49 with 52acf73)
^ permalink raw reply
* Re: Route fallback issue
From: Grant Taylor @ 2018-06-20 15:18 UTC (permalink / raw)
To: netdev; +Cc: lartc
In-Reply-To: <7e001d45-f692-7d93-0390-f0e6077bf724@cumulusnetworks.com>
[-- Attachment #1: Type: text/plain, Size: 441 bytes --]
On 06/20/2018 07:48 AM, David Ahern wrote:
> See the ignore_routes_with_linkdown and fib_multipath_use_neigh sysctl
> settings.
Where can I find more information on ignore_routes_with_linkdown? I
don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt.
(I do see fib_multipath_use_neigh documented there in.)
> A feature is in the works to have fallback nexthops.
O.o?
--
Grant. . . .
unix || die
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]
^ permalink raw reply
* KASAN: use-after-free Read in __dev_map_entry_free
From: syzbot @ 2018-06-20 15:19 UTC (permalink / raw)
To: ast, daniel, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15ad7d10400000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
dashboard link: https://syzkaller.appspot.com/bug?extid=457d3e2ffbcf31aee5c0
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1195225f800000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=171a7ce4400000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+457d3e2ffbcf31aee5c0@syzkaller.appspotmail.com
==================================================================
BUG: KASAN: use-after-free in dev_map_flush_old kernel/bpf/devmap.c:365
[inline]
BUG: KASAN: use-after-free in __dev_map_entry_free+0x2a8/0x300
kernel/bpf/devmap.c:379
Read of size 8 at addr ffff8801b8da38c8 by task ksoftirqd/1/18
CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
dev_map_flush_old kernel/bpf/devmap.c:365 [inline]
__dev_map_entry_free+0x2a8/0x300 kernel/bpf/devmap.c:379
__rcu_reclaim kernel/rcu/rcu.h:178 [inline]
rcu_do_batch kernel/rcu/tree.c:2558 [inline]
invoke_rcu_callbacks kernel/rcu/tree.c:2818 [inline]
__rcu_process_callbacks kernel/rcu/tree.c:2785 [inline]
rcu_process_callbacks+0xe9d/0x1760 kernel/rcu/tree.c:2802
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Allocated by task 6675:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
kmalloc include/linux/slab.h:513 [inline]
kzalloc include/linux/slab.h:706 [inline]
dev_map_alloc+0x208/0x7f0 kernel/bpf/devmap.c:102
find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
map_create+0x393/0x1010 kernel/bpf/syscall.c:453
__do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
__se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
__x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 26:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xd9/0x260 mm/slab.c:3813
dev_map_free+0x4fa/0x670 kernel/bpf/devmap.c:191
bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the object at ffff8801b8da37c0
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 264 bytes inside of
512-byte region [ffff8801b8da37c0, ffff8801b8da39c0)
The buggy address belongs to the page:
page:ffffea0006e368c0 count:1 mapcount:0 mapping:ffff8801da800940
index:0xffff8801b8da3540
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0007217b88 ffffea0006e30cc8 ffff8801da800940
raw: ffff8801b8da3540 ffff8801b8da3040 0000000100000004 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8801b8da3780: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801b8da3800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801b8da3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801b8da3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801b8da3980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Marcelo Ricardo Leitner @ 2018-06-20 14:45 UTC (permalink / raw)
To: Fu, Qiaobin
Cc: davem@davemloft.net, netdev@vger.kernel.org, jhs@mojatatu.com,
Michel Machado, xiyou.wangcong@gmail.com
In-Reply-To: <E5616B71-959E-43FE-9096-BF523948ABB2@bu.edu>
On Tue, Jun 12, 2018 at 03:42:55PM +0000, Fu, Qiaobin wrote:
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
>
> v4:
> *Not allow setting flags other than the expected ones.
>
> *Allow dumping the pure flags.
>
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
>
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>
> Note that the motivation for this patch is found in the following discussion:
> https://www.spinics.net/lists/netdev/msg501061.html
> ---
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index fbcfe27a4e6c..6de6071ebed6 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -30,6 +30,7 @@
> #define SKBEDIT_F_MARK 0x4
> #define SKBEDIT_F_PTYPE 0x8
> #define SKBEDIT_F_MASK 0x10
> +#define SKBEDIT_F_INHERITDSFIELD 0x20
>
> struct tc_skbedit {
> tc_gen;
> @@ -45,6 +46,7 @@ enum {
> TCA_SKBEDIT_PAD,
> TCA_SKBEDIT_PTYPE,
> TCA_SKBEDIT_MASK,
> + TCA_SKBEDIT_FLAGS,
> __TCA_SKBEDIT_MAX
> };
> #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 6138d1d71900..9adbcfa3f5fe 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -23,6 +23,9 @@
> #include <linux/rtnetlink.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> +#include <net/ip.h>
> +#include <net/ipv6.h>
> +#include <net/dsfield.h>
>
> #include <linux/tc_act/tc_skbedit.h>
> #include <net/tc_act/tc_skbedit.h>
> @@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>
> if (d->flags & SKBEDIT_F_PRIORITY)
> skb->priority = d->priority;
> + if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
> + int wlen = skb_network_offset(skb);
> +
> + switch (tc_skb_protocol(skb)) {
> + case htons(ETH_P_IP):
> + wlen += sizeof(struct iphdr);
> + if (!pskb_may_pull(skb, wlen))
> + goto err;
> + skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> + break;
> +
> + case htons(ETH_P_IPV6):
> + wlen += sizeof(struct ipv6hdr);
> + if (!pskb_may_pull(skb, wlen))
> + goto err;
> + skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> + break;
> + }
> + }
> if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
> skb->dev->real_num_tx_queues > d->queue_mapping)
> skb_set_queue_mapping(skb, d->queue_mapping);
> @@ -53,6 +75,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
>
> spin_unlock(&d->tcf_lock);
> return d->tcf_action;
> +
> +err:
> + spin_unlock(&d->tcf_lock);
> + return TC_ACT_SHOT;
> }
>
> static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
> @@ -62,6 +88,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
> [TCA_SKBEDIT_MARK] = { .len = sizeof(u32) },
> [TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) },
> [TCA_SKBEDIT_MASK] = { .len = sizeof(u32) },
> + [TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) },
> };
>
> static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -73,6 +100,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> struct tc_skbedit *parm;
> struct tcf_skbedit *d;
> u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
> + u64 *pure_flags = NULL;
> u16 *queue_mapping = NULL, *ptype = NULL;
> bool exists = false;
> int ret = 0, err;
> @@ -114,6 +142,12 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> mask = nla_data(tb[TCA_SKBEDIT_MASK]);
> }
>
> + if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
> + pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> + if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
> + flags |= SKBEDIT_F_INHERITDSFIELD;
> + }
> +
> parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
>
> exists = tcf_idr_check(tn, parm->index, a, bind);
> @@ -178,6 +212,7 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
> .action = d->tcf_action,
> };
> struct tcf_t t;
> + u64 pure_flags = 0;
>
> if (nla_put(skb, TCA_SKBEDIT_PARMS, sizeof(opt), &opt))
> goto nla_put_failure;
> @@ -196,6 +231,11 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
> if ((d->flags & SKBEDIT_F_MASK) &&
> nla_put_u32(skb, TCA_SKBEDIT_MASK, d->mask))
> goto nla_put_failure;
> + if (d->flags & SKBEDIT_F_INHERITDSFIELD)
> + pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> + if (pure_flags != 0 &&
> + nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
> + goto nla_put_failure;
>
> tcf_tm_dump(&t, &d->tcf_tm);
> if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), &t, TCA_SKBEDIT_PAD))
^ permalink raw reply
* Re: [PATCH] ceph: use ktime_get_real_seconds()
From: Allen @ 2018-06-20 14:42 UTC (permalink / raw)
To: idryomov; +Cc: y2038, netdev, ceph-devel, David Miller, linux-kernel
In-Reply-To: <CAOi1vP9Aa9xb2RD7ahE=jOj8nvOX936oVyp6rkKHV8tknF_1Kg@mail.gmail.com>
> >
> > Use ktime_get_real_seconds() as get_seconds() is deprecated.
> >
> > Signed-off-by: Allen Pais <allen.lkml@gmail.com>
> > ---
> > net/ceph/auth_x.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
> > index 2f4a1ba..99561c2 100644
> > --- a/net/ceph/auth_x.c
> > +++ b/net/ceph/auth_x.c
> > @@ -154,7 +154,7 @@ static int process_one_ticket(struct ceph_auth_client *ac,
> > void **ptp;
> > struct ceph_crypto_key new_session_key = { 0 };
> > struct ceph_buffer *new_ticket_blob;
> > - unsigned long new_expires, new_renew_after;
> > + u32 new_expires, new_renew_after;
> > u64 new_secret_id;
> > int ret;
> >
> > @@ -191,9 +191,9 @@ static int process_one_ticket(struct ceph_auth_client *ac,
> >
> > ceph_decode_timespec(&validity, dp);
> > dp += sizeof(struct ceph_timespec);
> > - new_expires = get_seconds() + validity.tv_sec;
> > + new_expires = (u32)ktime_get_real_seconds() + validity.tv_sec;
>
> Why the change to u32 and this cast? If the type has to change,
> wouldn't time64_t make more sense? ktime_get_real_seconds() returns
> time64_t, after all.
>
Right. I'll drop the cast and send out V2.
- Allen
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038
^ permalink raw reply
* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-20 14:34 UTC (permalink / raw)
To: Siwei Liu
Cc: Samudrala, Sridhar, Alexander Duyck, virtio-dev, aaron.f.brown,
Jiri Pirko, Michael S. Tsirkin, Jakub Kicinski, Netdev,
qemu-devel, virtualization, konrad.wilk, boris.ostrovsky,
Joao Martins, Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ20wrF418-WVev6OjaeShZWE488uxRwgaZmsjOHZr8jyOw@mail.gmail.com>
On Tue, 19 Jun 2018 13:09:14 -0700
Siwei Liu <loseweigh@gmail.com> wrote:
> On Tue, Jun 19, 2018 at 3:54 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > On Fri, 15 Jun 2018 10:06:07 -0700
> > Siwei Liu <loseweigh@gmail.com> wrote:
> >
> >> On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> >> > On Thu, 14 Jun 2018 18:57:11 -0700
> >> > Siwei Liu <loseweigh@gmail.com> wrote:
> >> > I'm a bit confused here. What, exactly, ties the two devices together?
> >>
> >> The group UUID. Since QEMU VFIO dvice does not have insight of MAC
> >> address (which it doesn't have to), the association between VFIO
> >> passthrough and standby must be specificed for QEMU to understand the
> >> relationship with this model. Note, standby feature is no longer
> >> required to be exposed under this model.
> >
> > Isn't that a bit limiting, though?
> >
> > With this model, you can probably tie a vfio-pci device and a
> > virtio-net-pci device together. But this will fail if you have
> > different transports: Consider tying together a vfio-pci device and a
> > virtio-net-ccw device on s390, for example. The standby feature bit is
> > on the virtio-net level and should not have any dependency on the
> > transport used.
>
> Probably we'd limit the support for grouping to virtio-net-pci device
> and vfio-pci device only. For virtio-net-pci, as you might see with
> Venu's patch, we store the group UUID on the config space of
> virtio-pci, which is only applicable to PCI transport.
>
> If virtio-net-ccw needs to support the same, I think similar grouping
> interface should be defined on the VirtIO CCW transport. I think the
> current implementation of the Linux failover driver assumes that it's
> SR-IOV VF with same MAC address which the virtio-net-pci needs to pair
> with, and that the PV path is on same PF without needing to update
> network of the port-MAC association change. If we need to extend the
> grouping mechanism to virtio-net-ccw, it has to pass such failover
> mode to virtio driver specifically through some other option I guess.
Hm, I've just spent some time reading the Linux failover code and I did
not really find much pci-related magic in there (other than checking
for a pci device in net_failover_slave_pre_register). We also seem to
look for a matching device by MAC only. What magic am I missing?
Is the look-for-uuid handling supposed to happen in the host only?
> >> > If libvirt already has the knowledge that it should manage the two as a
> >> > couple, why do we need the group id (or something else for other
> >> > architectures)? (Maybe I'm simply missing something because I'm not
> >> > that familiar with pci.)
> >>
> >> The idea is to have QEMU control the visibility and enumeration order
> >> of the passthrough VFIO for the failover scenario. Hotplug can be one
> >> way to achieve it, and perhaps there's other way around also. The
> >> group ID is not just for QEMU to couple devices, it's also helpful to
> >> guest too as grouping using MAC address is just not safe.
> >
> > Sorry about dragging mainframes into this, but this will only work for
> > homogenous device coupling, not for heterogenous. Consider my vfio-pci
> > + virtio-net-ccw example again: The guest cannot find out that the two
> > belong together by checking some group ID, it has to either use the MAC
> > or some needs-to-be-architectured property.
> >
> > Alternatively, we could propose that mechanism as pci-only, which means
> > we can rely on mechanisms that won't necessarily work on non-pci
> > transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> > through a network card anytime in the near future, due to the nature of
> > network cards currently in use on s390.)
>
> Yes, let's do this just for PCI transport (homogenous) for now.
But why? Using pci for passthrough to make things easier (and because
there's not really a use case), sure. But I really don't want to
restrict this to virtio-pci only.
> >> In the model of (b), I think it essentially turns hotplug to one of
> >> mechanisms for QEMU to control the visibility. The libvirt can still
> >> manage the hotplug of individual devices during live migration or in
> >> normal situation to hot add/remove devices. Though the visibility of
> >> the VFIO is under the controll of QEMU, and it's possible that the hot
> >> add/remove request does not involve actual hot plug activity in guest
> >> at all.
> >
> > That depends on how you model visibility, I guess. You'll probably want
> > to stop traffic flowing through one or the other of the cards; would
> > link down or similar be enough for the virtio device?
>
> I'm not sure if it is a good idea. The guest user will see two devices
> with same MAC but one of them is down. Do you expect user to use it or
> not? And since the guest is going to be migrated, we need to unplug a
> broken VF from guest before migrating, why do we bother plugging in
> this useless VF at the first place?
I was thinking about using hotunplugging only over migration and doing
the link up only after feature negotiation has finished, but that is
probably too complicated. Let's stick to hotplug for simplicity's sake.
^ permalink raw reply
* Re: [PATCH] bpfilter: fix user mode helper cross compilation
From: Stefano Brivio @ 2018-06-20 14:19 UTC (permalink / raw)
To: Matteo Croce; +Cc: netdev
In-Reply-To: <20180620140434.18139-1-mcroce@redhat.com>
On Wed, 20 Jun 2018 16:04:34 +0200
Matteo Croce <mcroce@redhat.com> wrote:
> Use $(OBJDUMP) instead of literal 'objdump' to avoid
> using host toolchain when cross compiling.
>
> Fixes: 421780fd4983 ("bpfilter: fix build error")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reported-by: Stefano Brivio <sbrivio@redhat.com>
--
Stefano
^ permalink raw reply
* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-20 14:11 UTC (permalink / raw)
To: Cornelia Huck
Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
Jakub Kicinski, Samudrala, Sridhar, qemu-devel, virtualization,
Siwei Liu, Venu Busireddy, Netdev, boris.ostrovsky, aaron.f.brown,
Joao Martins
In-Reply-To: <20180620115359.1a3bf6fb.cohuck@redhat.com>
On Wed, Jun 20, 2018 at 11:53:59AM +0200, Cornelia Huck wrote:
> On Tue, 19 Jun 2018 23:32:06 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote:
> > > Sorry about dragging mainframes into this, but this will only work for
> > > homogenous device coupling, not for heterogenous. Consider my vfio-pci
> > > + virtio-net-ccw example again: The guest cannot find out that the two
> > > belong together by checking some group ID, it has to either use the MAC
> > > or some needs-to-be-architectured property.
> > >
> > > Alternatively, we could propose that mechanism as pci-only, which means
> > > we can rely on mechanisms that won't necessarily work on non-pci
> > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> > > through a network card anytime in the near future, due to the nature of
> > > network cards currently in use on s390.)
> >
> > That's what it boils down to, yes. If there's need to have this for
> > non-pci devices, then we should put it in config space.
> > Cornelia, what do you think?
> >
>
> I think the only really useful config on s390 is the vfio-pci network
> card coupled with a virtio-net-ccw device: Using an s390 network card
> via vfio-ccw is out due to the nature of the s390 network cards, and
> virtio-ccw is the default transport (virtio-pci is not supported on any
> enterprise distro AFAIK).
>
> For this, having a uuid in the config space could work (vfio-pci
> devices have a config space by virtue of being pci devices, and
> virtio-net-ccw devices have a config space by virtue of being virtio
> devices -- ccw devices usually don't have that concept).
OK so this calls for adding such a field generally (it's
device agnostic right now).
How would you suggest doing that?
--
MST
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox