* Re: [PATCH net-next V3 3/3] tun: rx batching
From: Jason Wang @ 2017-01-03 3:12 UTC (permalink / raw)
To: David Miller; +Cc: netdev, virtualization, wexu, kvm, mst
In-Reply-To: <20161231.123147.1116351209865273335.davem@davemloft.net>
On 2017年01月01日 01:31, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Fri, 30 Dec 2016 13:20:51 +0800
>
>> @@ -1283,10 +1314,15 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>> skb_probe_transport_header(skb, 0);
>>
>> rxhash = skb_get_hash(skb);
>> +
>> #ifndef CONFIG_4KSTACKS
>> - local_bh_disable();
>> - netif_receive_skb(skb);
>> - local_bh_enable();
>> + if (!rx_batched) {
>> + local_bh_disable();
>> + netif_receive_skb(skb);
>> + local_bh_enable();
>> + } else {
>> + tun_rx_batched(tfile, skb, more);
>> + }
>> #else
>> netif_rx_ni(skb);
>> #endif
> If rx_batched has been set, and we are talking to clients not using
> this new MSG_MORE facility (or such clients don't have multiple TX
> packets to send to you, thus MSG_MORE is often clear), you are doing a
> lot more work per-packet than the existing code.
>
> You take the queue lock, you test state, you splice into a local queue
> on the stack, then you walk that local stack queue to submit just one
> SKB to netif_receive_skb().
>
> I think you want to streamline this sequence in such cases so that the
> cost before and after is similar if not equivalent.
Yes, so I will do a skb_queue_empty() check if !MSG_MORE and call
netif_receive_skb() immediately in this case. This can save the wasted
efforts.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* linux-next: manual merge of the rdma-leon tree with the net-next tree
From: Stephen Rothwell @ 2017-01-03 1:27 UTC (permalink / raw)
To: Leon Romanovsky, David Miller, Networking; +Cc: linux-next, linux-kernel
Hi Leon,
Today's linux-next merge of the rdma-leon tree got conflicts in:
drivers/infiniband/hw/mlx5/main.c
drivers/infiniband/hw/mlx5/mlx5_ib.h
drivers/infiniband/hw/mlx5/mr.c
drivers/infiniband/hw/mlx5/qp.c
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
drivers/net/ethernet/mellanox/mlx5/core/eq.c
drivers/net/ethernet/mellanox/mlx5/core/main.c
include/linux/mlx5/device.h
include/linux/mlx5/driver.h
include/linux/mlx5/mlx5_ifc.h
between commits in the rdma-leon tree and commits (several of whitch
are the same, or similar, patches) in the net-next tree.
I just dropped the rdma-leon tree for today. Please clean it up.
--
Cheers,
Stephen Rothwell
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: Jamal Hadi Salim @ 2017-01-03 1:22 UTC (permalink / raw)
To: John Fastabend, Paul Blakey, David S. Miller, netdev
Cc: Jiri Pirko, Hadar Hen Zion, Or Gerlitz, Roi Dayan, Roman Mashak,
Simon Horman
In-Reply-To: <586ADAFE.1010105@gmail.com>
On 17-01-02 05:58 PM, John Fastabend wrote:
> On 17-01-02 02:21 PM, Jamal Hadi Salim wrote:
> Well having the length value avoids ending up with cookie1, cookie2, ...
> values as folks push more and more data into the cookie.
>
Unless there is good reason I dont see why it shouldnt be a fixed length
value. u64/128 should be plenty.
> I don't see any use in the kernel interpreting it. It has no use
> for it as far as I can see. It doesn't appear to be metadata which
> we use skb->mark for at the moment.
>
Like all cookie semantics it is for storing state. The receiver (kernel)
is not just store it and not intepret it. The user when reading it back
simplifies what they have to do for their processing.
>
> The tuple <ifindex:qdisc:prio:handle> really should be unique why
> not use this for system wide mappings?
>
I think on a single machine should be enough, however:
typically the user wants to define the value in a manner that
in a distributed system it is unique. It would be trickier to
do so with well defined values such as above.
> The only thing I can think to do with this that I can't do with
> the above tuple and a simple userspace lookup is stick hardware specific
> "hints" in the cookie for the firmware to consume. Which would be
> very helpful for what its worth.
>
Ok, very different from our use case with actions.
We just use those values to map to stats info without needing to
know what flow or action (graph) it is associated with.
> Its a bit strange to push it as an action when its not really an
> action in the traditional datapath.
>
The action is part of a graph pointed to by a filter.
> I suspect the OVS usage is a simple 1:1 lookup from OVS id to TC id to
> avoid a userspace map lookup.
Not that i care about OVS but it sounds like a good use case (even for
tc), no?
cheers,
jamal
^ permalink raw reply
* RE: [PATCH v2] scm: fix possible control message header alignment issue
From: YUAN Linyu @ 2017-01-03 0:52 UTC (permalink / raw)
To: David Miller; +Cc: netdev@vger.kernel.org, cugyly@163.com
In-Reply-To: <20161230.152057.1192922867281184266.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Saturday, December 31, 2016 4:21 AM
> To: cugyly@163.com
> Cc: netdev@vger.kernel.org; YUAN Linyu
> Subject: Re: [PATCH v2] scm: fix possible control message header alignment
> issue
> If you can come up with a case where this does happen in
> practice, I will continue to consider this patch.
>
Yes, before send patch I also check two archs(arm-v7 and powerpc e6500), they are aligned.
No one report issue, I think cmsghdr aligned on all archs.
> Otherwise, we should make the assumptions that exist explicit
> and get rid of all of the code that does that funny alignment
> upon the cmsghdr structure.
>
Do you accept that I remove all CMSG{_COMPAT}_ALIGN of header ?
> Thanks.
^ permalink raw reply
* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Ansis Atteka @ 2017-01-03 0:40 UTC (permalink / raw)
To: Hayes Wang
Cc: David Miller, mlord@pobox.com, greg@kroah.com,
romieu@fr.zoreil.com, netdev@vger.kernel.org, nic_swsd,
linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
Ansis Atteka
In-Reply-To: <CAA=3Oqn5_5w+vrg9_7UtZ-5BEbZ9Pm=qdX3WoVZKaAkxcMXYXA@mail.gmail.com>
On Sat, Dec 31, 2016 at 4:07 PM, Ansis Atteka <aatteka@nicira.com> wrote:
> On Wed, Nov 30, 2016 at 3:58 AM, Hayes Wang <hayeswang@realtek.com> wrote:
>> Mark Lord <mlord@pobox.com>
>> [...]
>>> > Not sure why, because there really is no other way for the data to
>>> > appear where it does at the beginning of that URB buffer.
>>> >
>>> > This does seem a rather unexpected burden to place upon someone
>>> > reporting a regression in a USB network driver that corrupts user data.
>>>
>>> If you are the only person who can actively reproduce this, which
>>> seems to be the case right now, this is unfortunately the only way to
>>> reach a proper analysis and fix.
>>
>> I have tested it with iperf more than five days without any error.
>> I would think if there is any other way to reproduce it.
>>
I think that I am getting closer to the root cause of this bug. Also,
I have a workaround that at least makes r8152 functionally stable in
my Dell TB15 dock. Mark, would you mind giving a chance to the patch
that I have in the bottom of this email to see if it helps your issue
too (you might have to tweak those settings slightly differently if
you use something else than USB 3.0)
Long story short - what I observed in Wireshark is that if there are
more than ~10 Ethernet frames *close together to each other* then the
data corruption bug starts to express itself. If there are ~15 or more
Ethernet frames close together to each other then the XHCI starts to
emit the "ERROR Transfer event TRB DMA ptr not part of current TD
ep_index 2 comp_code 13" error message and r8152 driver gets toasted.
Hayes, in your iperf reproduction environment did you
1) connect sender and receiver directly with an Ethernet cable?
2) use iperf's TCP mode instead of UDP mode, because I believe that
with UDP mode packets are more likely to be sparsely distributed?
Also, this bug is way easier to reproduce when IP fragmentation kicks
in because IP fragments are typically sent out very close to each
other.
3) were you plugging your USB Ethernet dongle in USB 3.0 port or
whatever Mark was using? It seems that each USB mode has different
coalesce parameters and yours might have work "out of box"?
While I would not call this a proper fix, because it simply reduces
coalescing timeouts by order of 10X and most likely does not eliminate
security aspects of the bug, it at least made my system functionally
stable and I don't see either of those two bugs in my setup anymore:
git diff
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index c254248..4979690 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -365,9 +365,9 @@
#define PCUT_STATUS 0x0001
/* USB_RX_EARLY_TIMEOUT */
-#define COALESCE_SUPER 85000U
-#define COALESCE_HIGH 250000U
-#define COALESCE_SLOW 524280U
+#define COALESCE_SUPER 8500U
+#define COALESCE_HIGH 25000U
+#define COALESCE_SLOW 52428U
/* USB_WDT11_CTRL */
#define TIMER11_EN 0x0001
^ permalink raw reply related
* [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: Reiter Wolfgang @ 2017-01-03 0:39 UTC (permalink / raw)
To: nhorman; +Cc: davem, netdev, linux-kernel, Reiter Wolfgang
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.
This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
---
net/core/drop_monitor.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index f465bad..fb55327 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -102,7 +102,6 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
}
msg = nla_data(nla);
memset(msg, 0, al);
- genlmsg_end(skb, msg_header);
goto out;
err:
@@ -112,6 +111,13 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
swap(data->skb, skb);
spin_unlock_irqrestore(&data->lock, flags);
+ if (skb) {
+ struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
+ struct genlmsghdr *gnlh = (struct genlmsghdr *)nlmsg_data(nlh);
+
+ genlmsg_end(skb, genlmsg_data(gnlh));
+ }
+
return skb;
}
--
2.9.3
^ permalink raw reply related
* Re: [PATCH] net: fealnx: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-kernel
In-Reply-To: <1483386447-19012-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Mon, 2 Jan 2017 20:47:27 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: faraday: ftmac100: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: joel, gwshan, andrew, andrew, netdev, linux-kernel
In-Reply-To: <1483383191-30477-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Mon, 2 Jan 2017 19:53:11 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: emulex: benet: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr
Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
netdev, linux-kernel
In-Reply-To: <1483375334-28056-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Mon, 2 Jan 2017 17:42:14 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dlink: sundance: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: kda, netdev, linux-kernel
In-Reply-To: <1483300332-22834-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 20:52:12 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dlink: dl2k: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-kernel
In-Reply-To: <1483300166-22297-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 20:49:26 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> The previous implementation of set_settings was modifying
> the value of speed and duplex, but with the new API, it's not
> possible. The structure ethtool_link_ksettings is defined
> as const.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dec: winbond-840: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-parisc, linux-kernel
In-Reply-To: <1483300021-21783-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 20:47:01 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dec: uli526x: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-parisc, linux-kernel
In-Reply-To: <1483294266-6222-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:11:06 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dec: de2104x: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: jarod, netdev, linux-parisc, linux-kernel
In-Reply-To: <1483293938-5227-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:05:38 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 2/2] net: sfc: falcon: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:30 UTC (permalink / raw)
To: tremyfr
Cc: linux-net-drivers, ecree, bkenward, andrew, f.fainelli, netdev,
linux-kernel
In-Reply-To: <1483293766-4581-2-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:02:46 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
> ---
> Changelog:
> v2:
> - simplify the code of ef4_ethtool_get_link_ksettings
> (feedback from Bert Kenward)
Applied.
^ permalink raw reply
* Re: [PATCH v2 1/2] net: mdio: add mdio45_ethtool_ksettings_get
From: David Miller @ 2017-01-03 0:30 UTC (permalink / raw)
To: tremyfr
Cc: linux-net-drivers, ecree, bkenward, andrew, f.fainelli, netdev,
linux-kernel
In-Reply-To: <1483293766-4581-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:02:45 +0100
> There is a function in mdio for the old ethtool api gset.
> We add a new function mdio45_ethtool_ksettings_get for the
> new ethtool api glinksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
> ---
> Changelog:
> v2:
> - simplify the code of ef4_ethtool_get_link_ksettings
> (feedback from Bert Kenward)
Applied.
^ permalink raw reply
* Re: [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: David Miller @ 2017-01-03 0:30 UTC (permalink / raw)
To: wr0112358; +Cc: nhorman, netdev, linux-kernel
In-Reply-To: <20170102233410.4070-1-wr0112358@gmail.com>
From: Reiter Wolfgang <wr0112358@gmail.com>
Date: Tue, 3 Jan 2017 00:34:10 +0100
> Final nlmsg_len field update must reflect inserted net_dm_drop_point
> data.
>
> This patch depends on previous patch:
> "drop_monitor: add missing call to genlmsg_end"
>
> Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Several coding style errors:
> @@ -112,6 +111,12 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
> swap(data->skb, skb);
> spin_unlock_irqrestore(&data->lock, flags);
>
> + if(skb) {
There must be a space between "if" and "(skb)"
> + struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
> + struct genlmsghdr *gnlh = (struct genlmsghdr *)nlmsg_data(nlh);
> + genlmsg_end(skb, genlmsg_data(gnlh));
> + }
There should be an empty line between the local variable declarations
and actual code.
^ permalink raw reply
* [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: Reiter Wolfgang @ 2017-01-02 23:34 UTC (permalink / raw)
To: nhorman; +Cc: davem, netdev, linux-kernel, Reiter Wolfgang
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.
This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
---
net/core/drop_monitor.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index f465bad..ccaaf3e 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -102,7 +102,6 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
}
msg = nla_data(nla);
memset(msg, 0, al);
- genlmsg_end(skb, msg_header);
goto out;
err:
@@ -112,6 +111,12 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
swap(data->skb, skb);
spin_unlock_irqrestore(&data->lock, flags);
+ if(skb) {
+ struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
+ struct genlmsghdr *gnlh = (struct genlmsghdr *)nlmsg_data(nlh);
+ genlmsg_end(skb, genlmsg_data(gnlh));
+ }
+
return skb;
}
--
2.9.3
^ permalink raw reply related
* Re: [PATCH for-next V2 00/11] Mellanox mlx5 core and ODP updates 2017-01-01
From: Saeed Mahameed @ 2017-01-02 23:30 UTC (permalink / raw)
To: David Miller
Cc: Saeed Mahameed, Doug Ledford, Linux Netdev List, linux-rdma,
Leon Romanovsky, ilyal, artemyko
In-Reply-To: <20170102.155341.1496970446718890403.davem@davemloft.net>
On Mon, Jan 2, 2017 at 10:53 PM, David Miller <davem@davemloft.net> wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Mon, 2 Jan 2017 11:37:37 +0200
>
>> The following eleven patches mainly come from Artemy Kovalyov
>> who expanded mlx5 on-demand-paging (ODP) support. In addition
>> there are three cleanup patches which don't change any functionality,
>> but are needed to align codebase prior accepting other patches.
>
> Series applied to net-next, thanks.
Whoops,
This series was meant as a pull request, you can blame it on me I
kinda messed up the V2 title.
Doug will have to pull same patches later, will this produce a
conflict on merge window ?
Sorry for the confusion.
^ permalink raw reply
* Re: tcp_bbr: Forcing set of BBR congestion control as default
From: Neal Cardwell @ 2017-01-02 23:16 UTC (permalink / raw)
To: sedat.dilek; +Cc: Netdev
In-Reply-To: <CA+icZUVkui7PaeD7JAfdqJZQY2c9goANwi9foFej3=CsZX+SVA@mail.gmail.com>
On Mon, Jan 2, 2017 at 2:30 PM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> OK, this looks now good.
Great. Glad to hear it!
> Does BBR only work with fq-qdisc best?
Yes. BBR is designed to work with pacing. And so far the "fq" qdisc is
the only qdisc that offers pacing. So BBR currently needs the "fq"
qdisc. In the future, other qdiscs (or even other layers of the stack)
may offer pacing, in which case BBR could use those as well.
> What about fq_codel?
The "fq_codel" qdisc does not implement pacing, so it would not be sufficient.
cheers,
neal
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
From: Willem de Bruijn @ 2017-01-02 23:15 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <20170102230242.GC31716@oracle.com>
On Mon, Jan 2, 2017 at 6:02 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/02/17 17:31), Willem de Bruijn wrote:
>>
>> Thanks for adding this.
>>
>> walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
>> just be extended to add a v3 case where it already multiplexes between
>> v1 and v2.
>
> I looked at that, but the sticky point is that v1/v2 sets up the
> ring->rd* related variables based on frames (e.g., rd_num is tp_frame_nr)
> whereas V3 sets these up based on blocks (e.g, rd_num is tp_block_nr)
> so this impacts the core sending loop a bit.
Good point. Yes, deduplicating the function will help make it crystal
clear where v3 differs from v2.
The patch already has __v3_tx_kernel_ready and __v3_tx_user_ready,
which can be plugged into the existing multiplexer functions
__v1_v2_tx_kernel_ready and __v2_v2_tx_user_ready multiplexer
(along with changing their names).
We'll indeed need a similar multiplexer function for calculating the next
frame to work around this rd_num issue, then.
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
From: Sowmini Varadhan @ 2017-01-02 23:07 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <CAF=yD-J2Sw0=iTuvPe-sa01Ld0D1U6f=R-99WLRKggap_WKdug@mail.gmail.com>
On (01/02/17 17:57), Willem de Bruijn wrote:
> One more point. We should validate the tpacket_req3 input and fail if
> unsupported options are passed. Specifically, fail if any of
> {tp_retire_blk_tov, tp_sizeof_priv, tp_feature_req_word} is non-zero.
>
> Otherwise looks good to me.
Ok, I'll send out v3 tomorrow, with the test case also updated
to share code with walk_v1_v2_tx as cleanly as possible.
Thanks for the review feedback!
--Sowmini
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
From: Sowmini Varadhan @ 2017-01-02 23:02 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <CAF=yD-JLDM9cekGTz7m3VYXmf4d+HEMX4dYZx=sB1X7+5drqsA@mail.gmail.com>
On (01/02/17 17:31), Willem de Bruijn wrote:
>
> Thanks for adding this.
>
> walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
> just be extended to add a v3 case where it already multiplexes between
> v1 and v2.
I looked at that, but the sticky point is that v1/v2 sets up the
ring->rd* related variables based on frames (e.g., rd_num is tp_frame_nr)
whereas V3 sets these up based on blocks (e.g, rd_num is tp_block_nr)
so this impacts the core sending loop a bit.
I suppose we could change the walk_v2_v2_tx to be something like
while (total_packets > 0) {
if (ring->version) {
/* V3 send, that takes above difference into account */
} else {
/* existing code */
}
/* status_bar_update(), user_ready update frame_num */
}
I can change it as above, if you think this would help.
--Sowmini
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: John Fastabend @ 2017-01-02 22:58 UTC (permalink / raw)
To: Jamal Hadi Salim, Paul Blakey, David S. Miller, netdev
Cc: Jiri Pirko, Hadar Hen Zion, Or Gerlitz, Roi Dayan, Roman Mashak,
Simon Horman
In-Reply-To: <6b671aaf-d35d-70a5-65e0-40308baeb471@mojatatu.com>
On 17-01-02 02:21 PM, Jamal Hadi Salim wrote:
> On 17-01-02 01:23 PM, John Fastabend wrote:
>
>>
>> Additionally I would like to point out this is an arbitrary length binary
>> blob (for undefined use, without even a specified encoding) that gets pushed
>> between user space and hardware ;) This seemed to get folks fairly excited in
>> the past.
>>
>
> The binary blob size is a little strange - but i think there is value
> in storing some "cookie" field. The challenge is whether the kernel
> gets to intepret it; in which case encoding must be specified. Or
> whether we should leave it up to user space - in which something
> like tc could standardize its own encodings.
>
Well having the length value avoids ending up with cookie1, cookie2, ...
values as folks push more and more data into the cookie.
I don't see any use in the kernel interpreting it. It has no use
for it as far as I can see. It doesn't appear to be metadata which
we use skb->mark for at the moment.
>> Some questions, exactly what do you mean by "port mappings" above? In
>> general the 'tc' API uses the netdev the netlink msg is processed on as
>> the port mapping. If you mean OVS port to netdev port I think this is
>> a OVS problem and nothing to do with 'tc'. For what its worth there is an
>> existing problem with 'tc' where rules only apply to a single ingress or
>> egress port which is limiting on hardware.
>>
>
> In our case the desire is to be able to correlate for a system wide
> mostly identity/key mapping.
>
The tuple <ifindex:qdisc:prio:handle> really should be unique why
not use this for system wide mappings?
The only thing I can think to do with this that I can't do with
the above tuple and a simple userspace lookup is stick hardware specific
"hints" in the cookie for the firmware to consume. Which would be
very helpful for what its worth.
>> The UFID in my ovs code base is defined as best I can tell here,
>>
>> [OVS_FLOW_ATTR_UFID] = { .type = NL_A_UNSPEC, .optional = true,
>> .min_len = sizeof(ovs_u128) },
>>
>> So you need 128 bits if you want a 1:1 mapping onto 'tc'. So rather
>> than an arbitrary blob why not make the case that 'tc' ids need to be
>> 128 bits long? Even if its just initially done in flower call it
>> flower_flow_id and define it so its not opaque and at least at the code
>> level it isn't an arbitrary blob of data.
>>
>
> I dont know what this UFID is, but do note:
> The idea is not new - the FIB for example has some such cookie
> (albeit a tiny one) which will typically be populated to tell
> you who/what installed the entry.
> I could see f.e use for this cookie to simplify and pretty print in
> a human language for the u32 classifier (i.e user space tc sets
> some fields in the cookie when updating kernel and when user space
> invokes get/dump it uses the cookie to intepret how to pretty print).
>
> I have attached a compile tested version of the cookies on actions
> (flat 64 bit; now that we have experienced the use when we have a
> large number of counters - I would not mind a 128 bit field).
>
Its a bit strange to push it as an action when its not really an
action in the traditional datapath.
I suspect the OVS usage is a simple 1:1 lookup from OVS id to TC id to
avoid a userspace map lookup.
>
> cheers,
> jamal
>
>> And what are the "next" uses of this besides OVS. It would be really
>> valuable to see how this generalizes to other usage models. To avoid
>> embedding OVS syntax into 'tc'.
>>
>> Finally if you want to see an example of binary data encodings look at
>> how drivers/hardware/users are currently using the user defined bits in
>> ethtools ntuple API. Also track down out of tree drivers to see other
>> interesting uses. And that was capped at 64bits :/
>>
>> Thanks,
>> John
>>
>>
>>
>>
>>
>
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
From: Willem de Bruijn @ 2017-01-02 22:57 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <bc66616d08758cfe225a4b02316b72de16e00302.1483309545.git.sowmini.varadhan@oracle.com>
On Sun, Jan 1, 2017 at 5:45 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
> does not currently have TX_RING support. As a result an application
> that wants the best perf for Tx and Rx (e.g. to handle request/response
> transacations) ends up needing 2 sockets, one with *_v2 for Tx and
> another with *_v3 for Rx.
>
> This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
> so that an application can use a single descriptor to get the benefits
> of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
> first filling up multiple frames in the Tx ring and then triggering a
> transmit. This patch only support fixed size Tx frames for TPACKET_V3,
> and requires that tp_next_offset must be zero.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
> v2: sanity checks on tp_next_offset and corresponding Doc updates
> as suggested by Willem de Bruijn
>
> Documentation/networking/packet_mmap.txt | 9 +++++++--
> net/packet/af_packet.c | 27 +++++++++++++++++++--------
> 2 files changed, 26 insertions(+), 10 deletions(-)
> @@ -4180,9 +4193,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
> goto out;
> switch (po->tp_version) {
> case TPACKET_V3:
> - /* Transmit path is not supported. We checked
> - * it above but just being paranoid
> - */
> + /* Block transmit is not supported yet */
> if (!tx_ring)
> init_prb_bdqc(po, rb, pg_vec, req_u);
One more point. We should validate the tpacket_req3 input and fail if
unsupported options are passed. Specifically, fail if any of
{tp_retire_blk_tov, tp_sizeof_priv, tp_feature_req_word} is non-zero.
Otherwise looks good to me.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox