* Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Alexei Starovoitov @ 2016-09-13 1:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: John Fastabend, bblanco, jeffrey.t.kirsher, brouer, davem,
xiyou.wangcong, intel-wired-lan, u9012063, netdev
In-Reply-To: <1473723968.18970.111.camel@edumazet-glaptop3.roam.corp.google.com>
On Mon, Sep 12, 2016 at 04:46:08PM -0700, Eric Dumazet wrote:
> On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
>
> > yep. there are various ways to shoot yourself in the foot with xdp.
> > The simplest program that drops all the packets will make the box unpingable.
>
> Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> scooter on 101 highway ;)
>
> This XDP_TX thing was one of the XDP marketing stuff, but there is
> absolutely no documentation on it, warning users about possible
> limitations/outcomes.
that's fair critique. there is no documentation for xdp in general.
> BTW, I am not sure mlx4 implementation even works, vs BQL :
it doesn't and it shouldn't. xdp and stack use different tx queues.
> mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> but tx completion will call netdev_tx_completed_queue() -> crash
not quite. netdev_tx_completed_queue() is not called for xdp queues.
> Do we have one test to validate that a XDP_TX implementation is actually
> correct ?
it's correct in the scope that it was defined for.
There is no objective to share the same tx queue with the stack in xdp.
The queues must be absolutely separate otherwise performance will tank.
This e1k patch is really special, because the e1k has one tx queue,
but this is for debugging of the programs, so it's ok to cut corners.
The e1k xdp support doesn't need to interact nicely with stack.
It's impossible in the first place. xdp is the layer before the stack.
^ permalink raw reply
* Re: [PATCH 3/3] net-next: dsa: add new driver for qca8xxx family
From: Andrew Lunn @ 2016-09-13 1:23 UTC (permalink / raw)
To: John Crispin
Cc: David S. Miller, Florian Fainelli, netdev, linux-kernel,
qsdk-review
In-Reply-To: <1473669337-21221-4-git-send-email-john@phrozen.org>
> +static int
> +qca8k_setup(struct dsa_switch *ds)
> +{
> + struct qca8k_priv *priv = qca8k_to_priv(ds);
> + int ret, i, phy_mode = -1;
> +
> + /* Keep track of which port is the connected to the cpu. This can be
> + * phy 11 / port 0 or phy 5 / port 6.
> + */
> + switch (dsa_upstream_port(ds)) {
> + case 11:
> + priv->cpu_port = 0;
> + break;
> + case 5:
> + priv->cpu_port = 6;
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
Hi John
Can you explain this funkiness with CPU port numbers. Why use 11
instead of 0? I ideally i would like to use 0 here, but if that it not
possible, it needs documenting in the device tree binding that 5 and
11 are special, representing ports 0 and 6.
> +
> + /* Start by setting up the register mapping */
> + priv->regmap = devm_regmap_init(ds->dev, NULL, priv,
> + &qca8k_regmap_config);
> +
> + if (IS_ERR(priv->regmap))
> + pr_warn("regmap initialization failed");
> +
> + /* Initialize CPU port pad mode (xMII type, delays...) */
> + phy_mode = of_get_phy_mode(ds->ports[ds->dst->cpu_port].dn);
> + if (phy_mode < 0) {
> + pr_err("Can't find phy-mode for master device\n");
> + return phy_mode;
> + }
> + ret = qca8k_set_pad_ctrl(priv, priv->cpu_port, phy_mode);
> + if (ret < 0)
> + return ret;
> +
> + /* Enable CPU Port */
> + qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL0,
> + QCA8K_GLOBAL_FW_CTRL0_CPU_PORT_EN);
Does it matter which port is the CPU port here? 11 or 5
> +
> + /* Enable MIB counters */
> + qca8k_reg_set(priv, QCA8K_REG_MIB, QCA8K_MIB_CPU_KEEP);
> + qca8k_write(priv, QCA8K_REG_MODULE_EN, QCA8K_MODULE_EN_MIB);
> +
> + /* Enable QCA header mode on Port 0 */
> + qca8k_write(priv, QCA8K_REG_PORT_HDR_CTRL(priv->cpu_port),
> + QCA8K_PORT_HDR_CTRL_ALL << QCA8K_PORT_HDR_CTRL_TX_S |
> + QCA8K_PORT_HDR_CTRL_ALL << QCA8K_PORT_HDR_CTRL_RX_S);
Does the header mode only work on port 0?
> +static int
> +qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
> +{
> + struct qca8k_priv *priv = qca8k_to_priv(ds);
> +
> + return mdiobus_read(priv->bus, phy, regnum);
> +}
> +
> +static int
> +qca8k_phy_write(struct dsa_switch *ds, int phy, int regnum, u16 val)
> +{
> + struct qca8k_priv *priv = qca8k_to_priv(ds);
> +
> + return mdiobus_write(priv->bus, phy, regnum, val);
> +}
snip
> +static int
> +qca8k_sw_probe(struct mdio_device *mdiodev)
> +{
> + struct qca8k_priv *priv;
> + u32 phy_id;
> +
> + /* sw_addr is irrelevant as the switch occupies the MDIO bus from
> + * addresses 0 to 4 (PHYs) and 16-23 (for MDIO 32bits protocol). So
> + * we'll probe address 0 to see if we see the right switch family.
> + */
So lets see if i have this right.
Port 0 has no internal phy.
Port 1 has an internal PHY at MDIO address 0.
Port 2 has an internal PHY at MDIO address 1.
...
Port 5 has an internal PHY ad MDIO address 4.
Port 6 has no internal PHY.
This is why you have funky port numbers, and phy_to_port.
I think it would be a lot cleaner to handle this in qca8k_phy_read()
and qca8k_phy_write().
Also, the comment it a bit misleading. You are probing the PHY ID, not
the switch ID. At least for the Marvell switches, different switches
can have the same embedded PHY. It would be annoying to find there is
another incompatible switch with the same PHY ID.
Is the embedded PHY compatible with the at803x driver?
Thanks
Andrew
^ permalink raw reply
* Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Alexei Starovoitov @ 2016-09-13 1:28 UTC (permalink / raw)
To: Tom Herbert
Cc: Eric Dumazet, John Fastabend, Brenden Blanco, Jeff Kirsher,
Jesper Dangaard Brouer, David S. Miller, Cong Wang,
intel-wired-lan, William Tu, Linux Kernel Network Developers
In-Reply-To: <CALx6S35CE+ZYbGm96Y5YKnTqU0zfhg9_h0VSpD3ot8xB-scdFQ@mail.gmail.com>
On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
> >
> >> yep. there are various ways to shoot yourself in the foot with xdp.
> >> The simplest program that drops all the packets will make the box unpingable.
> >
> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> > scooter on 101 highway ;)
> >
> > This XDP_TX thing was one of the XDP marketing stuff, but there is
> > absolutely no documentation on it, warning users about possible
> > limitations/outcomes.
> >
> > BTW, I am not sure mlx4 implementation even works, vs BQL :
> >
> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> > but tx completion will call netdev_tx_completed_queue() -> crash
> >
> > Do we have one test to validate that a XDP_TX implementation is actually
> > correct ?
> >
> Obviously not for e1000 :-(. We really need some real test and
> performance results and analysis on the interaction between the stack
> data path and XDP data path.
no. we don't need it for e1k and we cannot really do it.
<broken record mode on> this patch is for debugging of xdp programs only.
> The fact that these changes are being
> passed of as something only needed for KCM is irrelevant, e1000 is a
> well deployed a NIC and there's no restriction that I see that would
> prevent any users from enabling this feature on real devices.
e1k is not even manufactured any more. Probably the only place
where it can be found is computer history museum.
e1000e fairs slightly better, but it's a different nic and this
patch is not about it.
^ permalink raw reply
* Re: icmpv6: issue with routing table entries from link local addresses
From: Sowmini Varadhan @ 2016-09-13 2:03 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Andreas Hübner, netdev, d. caratti, Sowmini Varadhan
In-Reply-To: <b6804e09-403f-64ff-257b-6c5fda7f8047@stressinduktion.org>
On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello,
>
> On 12.09.2016 16:27, Andreas Hübner wrote:
>>
>> I have the following setup:
>> - 2 directly connected hosts (A+B), both have only link local addresses
>> configured (interface on both hosts is eth0)
:
>> fe80::/64 dev eth1 proto kernel metric 256
>> fe80::/64 dev eth0 proto kernel metric 256
:
>> The issue I currently have is, that the echo reply that host B should
>> generate is never sent back to host A. If I change the order of the
>> routing table entries on host B, everything works fine.
>> (host A is connected on eth0)
:
> This shouldn't be the case. We certainly carry over the ifindex of the
> received packet into the routing lookup of the outgoing packet, thus the
> appropriate rule, with outgoing ifindex should be selected.
Like Hannes, I too would first check "is B sending out the echo-resp? on
which interface?".
But a couple of unexpected things I noticed in linux: the link-local
prefix should have a prefixlen of /10 according to
http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
but "ip -6 route show" lists this as a /64..
moreover, even though I cannot use "ip [-6] route add.." to add the
same prefix multiple times (with different nexthop and/or interface)
unless I explicitly mark them as ECMP with /sbin/ip, it seems like you
can create the same onlink prefix on multiple interfaces, but the
kernel will not treat this as an ECMP group (and sometimes this
can produce inconsistent results depending on the order of
route addition, e.g., for ipv4 rp_filter checks). I dont know if some
variant of this (latter observation) may be the reason for the behavior
that Andreas reports.
--Sowmini
^ permalink raw reply
* Re: icmpv6: issue with routing table entries from link local addresses
From: Hannes Frederic Sowa @ 2016-09-13 2:42 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Andreas Hübner, netdev, d. caratti, Sowmini Varadhan
In-Reply-To: <CACP96tTFX7nyGz6uARYxemvE7-WDxvX7Wj+Jf95PC0Advy7ynw@mail.gmail.com>
On 13.09.2016 04:03, Sowmini Varadhan wrote:
> On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello,
>>
>> On 12.09.2016 16:27, Andreas Hübner wrote:
>
>>>
>>> I have the following setup:
>>> - 2 directly connected hosts (A+B), both have only link local addresses
>>> configured (interface on both hosts is eth0)
> :
>>> fe80::/64 dev eth1 proto kernel metric 256
>>> fe80::/64 dev eth0 proto kernel metric 256
> :
>>> The issue I currently have is, that the echo reply that host B should
>>> generate is never sent back to host A. If I change the order of the
>>> routing table entries on host B, everything works fine.
>>> (host A is connected on eth0)
> :
>> This shouldn't be the case. We certainly carry over the ifindex of the
>> received packet into the routing lookup of the outgoing packet, thus the
>> appropriate rule, with outgoing ifindex should be selected.
>
> Like Hannes, I too would first check "is B sending out the echo-resp? on
> which interface?".
>
> But a couple of unexpected things I noticed in linux: the link-local
> prefix should have a prefixlen of /10 according to
> http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> but "ip -6 route show" lists this as a /64..
The link local subnet is still specified to be a /64 as the other parts
of the address must be 0. Legally we probably could blackhole them.
https://tools.ietf.org/html/rfc4291#section-2.5.6
> moreover, even though I cannot use "ip [-6] route add.." to add the
> same prefix multiple times (with different nexthop and/or interface)
> unless I explicitly mark them as ECMP with /sbin/ip, it seems like you
> can create the same onlink prefix on multiple interfaces, but the
> kernel will not treat this as an ECMP group (and sometimes this
> can produce inconsistent results depending on the order of
> route addition, e.g., for ipv4 rp_filter checks). I dont know if some
> variant of this (latter observation) may be the reason for the behavior
> that Andreas reports.
iproute sets the NLM_F_EXCL flag. Use ip route prepend ...
We don't have urpf checks for ipv6, those are implemented in netfilter
only. This could very well be a firewall issue or something like that.
Bye,
Hannes
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 1/3] e1000: track BQL bytes regardless of skb or not
From: Alexander Duyck @ 2016-09-13 3:00 UTC (permalink / raw)
To: John Fastabend
Cc: Brenden Blanco, Alexei Starovoitov, Jeff Kirsher,
Jesper Dangaard Brouer, David Miller, Cong Wang, intel-wired-lan,
u9012063, Netdev
In-Reply-To: <20160912221327.5610.74333.stgit@john-Precision-Tower-5810>
On Mon, Sep 12, 2016 at 3:13 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> The BQL API does not reference the sk_buff nor does the driver need to
> reference the sk_buff to calculate the length of a transmitted frame.
> This patch removes an sk_buff reference from the xmit irq path and
> also allows packets sent from XDP to use BQL.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> drivers/net/ethernet/intel/e1000/e1000_main.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index f42129d..62a7f8d 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3882,11 +3882,8 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter,
> if (cleaned) {
> total_tx_packets += buffer_info->segs;
> total_tx_bytes += buffer_info->bytecount;
> - if (buffer_info->skb) {
> - bytes_compl += buffer_info->skb->len;
> - pkts_compl++;
> - }
> -
> + bytes_compl += buffer_info->length;
> + pkts_compl++;
> }
> e1000_unmap_and_free_tx_resource(adapter, buffer_info);
> tx_desc->upper.data = 0;
Actually it might be worth looking into why we have two different
stats for tracking bytecount and segs. From what I can tell the
pkts_compl value is never actually used. The function doesn't even
use the value so it is just wasted cycles. And as far as the bytes go
the accounting would be more accurate if you were to use bytecount
instead of buffer_info->skb->len. You would just need to update the
xmit function to use that on the other side so that they match.
- Alex
^ permalink raw reply
* Re: icmpv6: issue with routing table entries from link local addresses
From: YOSHIFUJI Hideaki @ 2016-09-13 3:01 UTC (permalink / raw)
To: Sowmini Varadhan, Hannes Frederic Sowa
Cc: hideaki.yoshifuji, Andreas Hübner, netdev, d. caratti,
Sowmini Varadhan
In-Reply-To: <CACP96tTFX7nyGz6uARYxemvE7-WDxvX7Wj+Jf95PC0Advy7ynw@mail.gmail.com>
Sowmini Varadhan wrote:
> On Mon, Sep 12, 2016 at 1:26 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello,
>>
>> On 12.09.2016 16:27, Andreas Hübner wrote:
>
>>>
>>> I have the following setup:
>>> - 2 directly connected hosts (A+B), both have only link local addresses
>>> configured (interface on both hosts is eth0)
> :
>>> fe80::/64 dev eth1 proto kernel metric 256
>>> fe80::/64 dev eth0 proto kernel metric 256
> :
>>> The issue I currently have is, that the echo reply that host B should
>>> generate is never sent back to host A. If I change the order of the
>>> routing table entries on host B, everything works fine.
>>> (host A is connected on eth0)
> :
>> This shouldn't be the case. We certainly carry over the ifindex of the
>> received packet into the routing lookup of the outgoing packet, thus the
>> appropriate rule, with outgoing ifindex should be selected.
>
> Like Hannes, I too would first check "is B sending out the echo-resp? on
> which interface?".
>
> But a couple of unexpected things I noticed in linux: the link-local
> prefix should have a prefixlen of /10 according to
> http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> but "ip -6 route show" lists this as a /64..
Do not be confused; link-local address for ethernet is described by
IPv6 over FOO document (e.g., RFC2464 for Ethernet). The address
(fe80::/64 for Ethernet, for example) is defined inside the link-local
scope unicast address space (/10).
--
Hideaki Yoshifuji <hideaki.yoshifuji@miraclelinux.com>
Technical Division, MIRACLE LINUX CORPORATION
^ permalink raw reply
* Re: icmpv6: issue with routing table entries from link local addresses
From: Sowmini Varadhan @ 2016-09-13 3:05 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Sowmini Varadhan, Andreas Hübner, netdev, d. caratti
In-Reply-To: <e2eddc8d-ccd6-682b-265f-55242eef1762@stressinduktion.org>
On (09/13/16 04:42), Hannes Frederic Sowa wrote:
> > But a couple of unexpected things I noticed in linux: the link-local
> > prefix should have a prefixlen of /10 according to
> > http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xhtml
> > but "ip -6 route show" lists this as a /64..
>
> The link local subnet is still specified to be a /64 as the other parts
> of the address must be 0. Legally we probably could blackhole them.
> https://tools.ietf.org/html/rfc4291#section-2.5.6
A bit of a gray area. 4291 does not specify this as MBZ, and IANA
registration is a /10. Both Solaris and BSD use /10. And while fec0
is deprecated, I suppose some similar thing could come up in the
future. ymmv.
> We don't have urpf checks for ipv6, those are implemented in netfilter
> only. This could very well be a firewall issue or something like that.
yes, I know that (no rp_filter check for ipv6), and thats why I said it
may be some similar variant. What tripped me up is that onlink prefixes
(which are multipath routes in that they have the same dst, mask, metric)
are not treated as part of the typical IP_ROUTE_MULTIPATH in many places
in the code because the fib_nhs data-structures do not get set up.
(thus, e.g., one ipoib config I was looking at recently, which
had multiple ports connected to the same IB switch, and had the same
onlink prefix on these ports, would not load-spread across all ports
until I explicitly did the 'ip route change' to tell the kernel to
ecmp that prefix).
Lets see what Andreas reports..
--Sowmini
^ permalink raw reply
* Re: [PATCH 00/15] drivers: net: use IS_ENABLED() instead of checking for built-in or module
From: David Miller @ 2016-09-13 3:28 UTC (permalink / raw)
To: javier
Cc: linux-kernel, arnd, sony.chacko, peppe.cavallaro,
linux-net-drivers, fw, bkenward, geert, ecree, kvalo, mw, a,
intel-wired-lan, f.fainelli, felipe.balbi, gregory.clement,
Dept-HSGLinuxNICDev, jeffrey.t.kirsher, mugunthanvnm, andrew,
klassert, sgruszka, kda, qiang.zhao, netdev, gerlando.falauto,
linux-wireless, ionut, venza, alexandre.torgue
In-Reply-To: <1473689026-6983-1-git-send-email-javier@osg.samsung.com>
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 12 Sep 2016 10:03:31 -0400
> This trivial series is similar to [0] for net/ that you already merged, but
> for drivers/net. The patches replaces the open coding to check for a Kconfig
> symbol being built-in or module, with IS_ENABLED() macro that does the same.
Series applied, thanks.
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Alexander Duyck @ 2016-09-13 3:42 UTC (permalink / raw)
To: John Fastabend
Cc: Brenden Blanco, Alexei Starovoitov, Jeff Kirsher,
Jesper Dangaard Brouer, David Miller, Cong Wang, intel-wired-lan,
u9012063, Netdev
In-Reply-To: <20160912221351.5610.29043.stgit@john-Precision-Tower-5810>
On Mon, Sep 12, 2016 at 3:13 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> From: Alexei Starovoitov <ast@fb.com>
>
> This patch adds initial support for XDP on e1000 driver. Note e1000
> driver does not support page recycling in general which could be
> added as a further improvement. However XDP_DROP case will recycle.
> XDP_TX and XDP_PASS do not support recycling.
>
> e1000 only supports a single tx queue at this time so the queue
> is shared between xdp program and Linux stack. It is possible for
> an XDP program to starve the stack in this model.
>
> The XDP program will drop packets on XDP_TX errors. This can occur
> when the tx descriptors are exhausted. This behavior is the same
> for both shared queue models like e1000 and dedicated tx queue
> models used in multiqueue devices. However if both the stack and
> XDP are transmitting packets it is perhaps more likely to occur in
> the shared queue model. Further refinement to the XDP model may be
> possible in the future.
>
> I tested this patch running e1000 in a VM using KVM over a tap
> device.
>
> CC: William Tu <u9012063@gmail.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> drivers/net/ethernet/intel/e1000/e1000.h | 2
> drivers/net/ethernet/intel/e1000/e1000_main.c | 176 +++++++++++++++++++++++++
> 2 files changed, 175 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index d7bdea7..5cf8a0a 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -150,6 +150,7 @@ struct e1000_adapter;
> */
> struct e1000_tx_buffer {
> struct sk_buff *skb;
> + struct page *page;
> dma_addr_t dma;
> unsigned long time_stamp;
> u16 length;
I'm not really a huge fan of adding yet another member to this
structure. Each e1000_tx_buffer is already pretty big at 40 bytes,
pushing it to 48 just means we lose that much more memory. If nothing
else we may wan to look at doing something like creating a union
between the skb, page, and an unsigned long. Then you could use the
lowest bit of the address as a flag indicating if this is a skb or a
page.
> @@ -279,6 +280,7 @@ struct e1000_adapter {
> struct e1000_rx_ring *rx_ring,
> int cleaned_count);
> struct e1000_rx_ring *rx_ring; /* One per active queue */
> + struct bpf_prog *prog;
> struct napi_struct napi;
>
> int num_tx_queues;
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 62a7f8d..232b927 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -32,6 +32,7 @@
> #include <linux/prefetch.h>
> #include <linux/bitops.h>
> #include <linux/if_vlan.h>
> +#include <linux/bpf.h>
>
> char e1000_driver_name[] = "e1000";
> static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
> @@ -842,6 +843,44 @@ static int e1000_set_features(struct net_device *netdev,
> return 0;
> }
>
> +static int e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
> +{
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + struct bpf_prog *old_prog;
> +
> + old_prog = xchg(&adapter->prog, prog);
> + if (old_prog) {
> + synchronize_net();
> + bpf_prog_put(old_prog);
> + }
> +
> + if (netif_running(netdev))
> + e1000_reinit_locked(adapter);
> + else
> + e1000_reset(adapter);
What is the point of the reset? If the interface isn't running is
there anything in the hardware you actually need to cleanup?
> + return 0;
> +}
> +
> +static bool e1000_xdp_attached(struct net_device *dev)
> +{
> + struct e1000_adapter *priv = netdev_priv(dev);
> +
> + return !!priv->prog;
> +}
> +
> +static int e1000_xdp(struct net_device *dev, struct netdev_xdp *xdp)
> +{
> + switch (xdp->command) {
> + case XDP_SETUP_PROG:
> + return e1000_xdp_set(dev, xdp->prog);
> + case XDP_QUERY_PROG:
> + xdp->prog_attached = e1000_xdp_attached(dev);
> + return 0;
> + default:
> + return -EINVAL;
> + }
> +}
> +
> static const struct net_device_ops e1000_netdev_ops = {
> .ndo_open = e1000_open,
> .ndo_stop = e1000_close,
> @@ -860,6 +899,7 @@ static const struct net_device_ops e1000_netdev_ops = {
> #endif
> .ndo_fix_features = e1000_fix_features,
> .ndo_set_features = e1000_set_features,
> + .ndo_xdp = e1000_xdp,
> };
>
> /**
> @@ -1276,6 +1316,9 @@ static void e1000_remove(struct pci_dev *pdev)
> e1000_down_and_stop(adapter);
> e1000_release_manageability(adapter);
>
> + if (adapter->prog)
> + bpf_prog_put(adapter->prog);
> +
> unregister_netdev(netdev);
>
> e1000_phy_hw_reset(hw);
> @@ -1859,7 +1902,7 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
> struct e1000_hw *hw = &adapter->hw;
> u32 rdlen, rctl, rxcsum;
>
> - if (adapter->netdev->mtu > ETH_DATA_LEN) {
> + if (adapter->netdev->mtu > ETH_DATA_LEN || adapter->prog) {
> rdlen = adapter->rx_ring[0].count *
> sizeof(struct e1000_rx_desc);
> adapter->clean_rx = e1000_clean_jumbo_rx_irq;
If you are really serious about using the page based Rx path we should
probably fix the fact that you take a pretty significant hit on
performance penalty for turning this mode on.
> @@ -1973,6 +2016,11 @@ e1000_unmap_and_free_tx_resource(struct e1000_adapter *adapter,
> dev_kfree_skb_any(buffer_info->skb);
> buffer_info->skb = NULL;
> }
> + if (buffer_info->page) {
> + put_page(buffer_info->page);
> + buffer_info->page = NULL;
> + }
> +
> buffer_info->time_stamp = 0;
> /* buffer_info must be completely set up in the transmit path */
> }
> @@ -3298,6 +3346,69 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
> return NETDEV_TX_OK;
> }
>
> +static void e1000_tx_map_rxpage(struct e1000_tx_ring *tx_ring,
> + struct e1000_rx_buffer *rx_buffer_info,
> + unsigned int len)
> +{
> + struct e1000_tx_buffer *buffer_info;
> + unsigned int i = tx_ring->next_to_use;
> +
> + buffer_info = &tx_ring->buffer_info[i];
> +
> + buffer_info->length = len;
> + buffer_info->time_stamp = jiffies;
> + buffer_info->mapped_as_page = false;
> + buffer_info->dma = rx_buffer_info->dma;
> + buffer_info->next_to_watch = i;
> + buffer_info->page = rx_buffer_info->rxbuf.page;
> +
> + tx_ring->buffer_info[i].skb = NULL;
> + tx_ring->buffer_info[i].segs = 1;
> + tx_ring->buffer_info[i].bytecount = len;
> + tx_ring->buffer_info[i].next_to_watch = i;
> +
> + rx_buffer_info->rxbuf.page = NULL;
> +}
> +
> +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info,
> + u32 len,
> + struct net_device *netdev,
> + struct e1000_adapter *adapter)
> +{
> + struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0);
> + struct e1000_hw *hw = &adapter->hw;
> + struct e1000_tx_ring *tx_ring;
> +
> + if (len > E1000_MAX_DATA_PER_TXD)
> + return;
> +
> + /* e1000 only support a single txq at the moment so the queue is being
> + * shared with stack. To support this requires locking to ensure the
> + * stack and XDP are not running at the same time. Devices with
> + * multiple queues should allocate a separate queue space.
> + */
> + HARD_TX_LOCK(netdev, txq, smp_processor_id());
> +
> + tx_ring = adapter->tx_ring;
> +
> + if (E1000_DESC_UNUSED(tx_ring) < 2) {
> + HARD_TX_UNLOCK(netdev, txq);
> + return;
> + }
> +
> + if (netif_xmit_frozen_or_stopped(txq))
> + return;
> +
> + e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len);
> + netdev_sent_queue(netdev, len);
> + e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1);
> +
> + writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
> + mmiowb();
> +
> + HARD_TX_UNLOCK(netdev, txq);
> +}
> +
> #define NUM_REGS 38 /* 1 based count */
> static void e1000_regdump(struct e1000_adapter *adapter)
> {
> @@ -4139,6 +4250,19 @@ static struct sk_buff *e1000_alloc_rx_skb(struct e1000_adapter *adapter,
> return skb;
> }
>
> +static inline int e1000_call_bpf(struct bpf_prog *prog, void *data,
> + unsigned int length)
> +{
> + struct xdp_buff xdp;
> + int ret;
> +
> + xdp.data = data;
> + xdp.data_end = data + length;
> + ret = BPF_PROG_RUN(prog, (void *)&xdp);
> +
> + return ret;
> +}
> +
> /**
> * e1000_clean_jumbo_rx_irq - Send received data up the network stack; legacy
> * @adapter: board private structure
> @@ -4157,12 +4281,15 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
> struct pci_dev *pdev = adapter->pdev;
> struct e1000_rx_desc *rx_desc, *next_rxd;
> struct e1000_rx_buffer *buffer_info, *next_buffer;
> + struct bpf_prog *prog;
> u32 length;
> unsigned int i;
> int cleaned_count = 0;
> bool cleaned = false;
> unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>
> + rcu_read_lock(); /* rcu lock needed here to protect xdp programs */
> + prog = READ_ONCE(adapter->prog);
> i = rx_ring->next_to_clean;
> rx_desc = E1000_RX_DESC(*rx_ring, i);
> buffer_info = &rx_ring->buffer_info[i];
> @@ -4188,12 +4315,54 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
>
> cleaned = true;
> cleaned_count++;
> + length = le16_to_cpu(rx_desc->length);
> +
> + if (prog) {
> + struct page *p = buffer_info->rxbuf.page;
> + dma_addr_t dma = buffer_info->dma;
> + int act;
> +
> + if (unlikely(!(status & E1000_RXD_STAT_EOP))) {
> + /* attached bpf disallows larger than page
> + * packets, so this is hw error or corruption
> + */
> + pr_info_once("%s buggy !eop\n", netdev->name);
> + break;
> + }
> + if (unlikely(rx_ring->rx_skb_top)) {
> + pr_info_once("%s ring resizing bug\n",
> + netdev->name);
> + break;
> + }
> + dma_sync_single_for_cpu(&pdev->dev, dma,
> + length, DMA_FROM_DEVICE);
> + act = e1000_call_bpf(prog, page_address(p), length);
> + switch (act) {
> + case XDP_PASS:
> + break;
> + case XDP_TX:
> + dma_sync_single_for_device(&pdev->dev,
> + dma,
> + length,
> + DMA_TO_DEVICE);
> + e1000_xmit_raw_frame(buffer_info, length,
> + netdev, adapter);
Implementing a new xmit path and clean-up routines for just this is
going to be a pain. I'd say if we are going to do something like this
then maybe we should look at coming up with a new ndo for the xmit and
maybe push more of this into some sort of inline hook. Duplicating
this code in every driver is going to be really expensive.
Also I just noticed there is no break statement from the xmit code
above to the drop that below. I'd think you could overwrite the frame
data in a case where the Rx exceeds the Tx due to things like flow
control generating back pressure.
> + case XDP_DROP:
> + default:
> + /* re-use mapped page. keep buffer_info->dma
> + * as-is, so that e1000_alloc_jumbo_rx_buffers
> + * only needs to put it back into rx ring
> + */
> + total_rx_bytes += length;
> + total_rx_packets++;
> + goto next_desc;
> + }
> + }
> +
> dma_unmap_page(&pdev->dev, buffer_info->dma,
> adapter->rx_buffer_len, DMA_FROM_DEVICE);
> buffer_info->dma = 0;
>
> - length = le16_to_cpu(rx_desc->length);
> -
> /* errors is only valid for DD + EOP descriptors */
> if (unlikely((status & E1000_RXD_STAT_EOP) &&
> (rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK))) {
> @@ -4327,6 +4496,7 @@ next_desc:
> rx_desc = next_rxd;
> buffer_info = next_buffer;
> }
> + rcu_read_unlock();
> rx_ring->next_to_clean = i;
>
> cleaned_count = E1000_DESC_UNUSED(rx_ring);
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 1/3] e1000: track BQL bytes regardless of skb or not
From: Tom Herbert @ 2016-09-13 4:25 UTC (permalink / raw)
To: Alexander Duyck
Cc: John Fastabend, Brenden Blanco, Alexei Starovoitov, Jeff Kirsher,
Jesper Dangaard Brouer, David Miller, Cong Wang, intel-wired-lan,
William Tu, Netdev
In-Reply-To: <CAKgT0UdShvcviYP=VVp8c0OOESDnqsppArVeUhjxfM6XqX6HwQ@mail.gmail.com>
On Mon, Sep 12, 2016 at 8:00 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, Sep 12, 2016 at 3:13 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> The BQL API does not reference the sk_buff nor does the driver need to
>> reference the sk_buff to calculate the length of a transmitted frame.
>> This patch removes an sk_buff reference from the xmit irq path and
>> also allows packets sent from XDP to use BQL.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>> drivers/net/ethernet/intel/e1000/e1000_main.c | 7 ++-----
>> 1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
>> index f42129d..62a7f8d 100644
>> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>> @@ -3882,11 +3882,8 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter,
>> if (cleaned) {
>> total_tx_packets += buffer_info->segs;
>> total_tx_bytes += buffer_info->bytecount;
>> - if (buffer_info->skb) {
>> - bytes_compl += buffer_info->skb->len;
>> - pkts_compl++;
>> - }
>> -
>> + bytes_compl += buffer_info->length;
>> + pkts_compl++;
>> }
>> e1000_unmap_and_free_tx_resource(adapter, buffer_info);
>> tx_desc->upper.data = 0;
>
> Actually it might be worth looking into why we have two different
> stats for tracking bytecount and segs. From what I can tell the
> pkts_compl value is never actually used. The function doesn't even
> use the value so it is just wasted cycles. And as far as the bytes go
Transmit flow steering which I posted and is pending on some testing
uses the pkt count BQL to track inflight packets.
> the accounting would be more accurate if you were to use bytecount
> instead of buffer_info->skb->len. You would just need to update the
> xmit function to use that on the other side so that they match.
>
> - Alex
^ permalink raw reply
* [PATCH V2] net: dsa: add FIB support
From: John Crispin @ 2016-09-13 6:00 UTC (permalink / raw)
To: David S. Miller, Andrew Lunn, Florian Fainelli
Cc: netdev, linux-kernel, John Crispin
Add SWITCHDEV_OBJ_ID_IPV4_FIB support to the DSA layer.
Signed-off-by: John Crispin <john@phrozen.org>
---
Changes in V2
* rebase on latest net-next to fix compile errors
Documentation/networking/dsa/dsa.txt | 18 +++++++++++++++
include/net/dsa.h | 13 +++++++++++
net/dsa/slave.c | 41 ++++++++++++++++++++++++++++++++++
3 files changed, 72 insertions(+)
diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt
index 6d6c07c..6cd5831 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.txt
@@ -607,6 +607,24 @@ of DSA, would be the its port-based VLAN, used by the associated bridge device.
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and MDB info.
+Route offloading
+----------------
+
+- ipv4_fib_prepare: routing layer function invoked to prepare the installation
+ of an ipv4 route into the switches routing database. If the operation is not
+ supported this function should return -EOPNOTSUPP. No hardware setup must be
+ done in this function. See ipv4_fib_add for this and details.
+
+- ipv4_fib_add: routing layer function invoked when a new ipv4 route should be
+ installed into the routing database of the switch, it should be programmed
+ with the route for the specified VLAN ID of the device that the route applies
+ to.
+
+- ipv4_fib_del: routing layer function invoked when an ipv4 route should be
+ removed from the routing database, the switch hardware should be programmed
+ to delete the specified MAC address of the nexthop from the specified VLAN ID
+ if it was mapped into the forwarding database.
+
TODO
====
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7556646..7fdd63e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -237,6 +237,7 @@ struct switchdev_obj;
struct switchdev_obj_port_fdb;
struct switchdev_obj_port_mdb;
struct switchdev_obj_port_vlan;
+struct switchdev_obj_ipv4_fib;
struct dsa_switch_ops {
struct list_head list;
@@ -386,6 +387,18 @@ struct dsa_switch_ops {
int (*port_mdb_dump)(struct dsa_switch *ds, int port,
struct switchdev_obj_port_mdb *mdb,
int (*cb)(struct switchdev_obj *obj));
+
+ /*
+ * IPV4 routing
+ */
+ int (*ipv4_fib_prepare)(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_ipv4_fib *fib4,
+ struct switchdev_trans *trans);
+ int (*ipv4_fib_add)(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_ipv4_fib *fib4,
+ struct switchdev_trans *trans);
+ int (*ipv4_fib_del)(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_ipv4_fib *fib4);
};
void register_switch_driver(struct dsa_switch_ops *type);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9ecbe78..c974ac0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -334,6 +334,38 @@ static int dsa_slave_port_mdb_dump(struct net_device *dev,
return -EOPNOTSUPP;
}
+static int dsa_slave_ipv4_fib_add(struct net_device *dev,
+ const struct switchdev_obj_ipv4_fib *fib4,
+ struct switchdev_trans *trans)
+{
+ struct dsa_slave_priv *p = netdev_priv(dev);
+ struct dsa_switch *ds = p->parent;
+ int ret;
+
+ if (!ds->ops->ipv4_fib_prepare || !ds->ops->ipv4_fib_add)
+ return -EOPNOTSUPP;
+
+ if (switchdev_trans_ph_prepare(trans))
+ ret = ds->ops->ipv4_fib_prepare(ds, p->port, fib4, trans);
+ else
+ ret = ds->ops->ipv4_fib_add(ds, p->port, fib4, trans);
+
+ return ret;
+}
+
+static int dsa_slave_ipv4_fib_del(struct net_device *dev,
+ const struct switchdev_obj_ipv4_fib *fib4)
+{
+ struct dsa_slave_priv *p = netdev_priv(dev);
+ struct dsa_switch *ds = p->parent;
+ int ret = -EOPNOTSUPP;
+
+ if (ds->ops->ipv4_fib_del)
+ ret = ds->ops->ipv4_fib_del(ds, p->port, fib4);
+
+ return ret;
+}
+
static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
struct dsa_slave_priv *p = netdev_priv(dev);
@@ -465,6 +497,11 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
SWITCHDEV_OBJ_PORT_VLAN(obj),
trans);
break;
+ case SWITCHDEV_OBJ_ID_IPV4_FIB:
+ err = dsa_slave_ipv4_fib_add(dev,
+ SWITCHDEV_OBJ_IPV4_FIB(obj),
+ trans);
+ break;
default:
err = -EOPNOTSUPP;
break;
@@ -490,6 +527,10 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
err = dsa_slave_port_vlan_del(dev,
SWITCHDEV_OBJ_PORT_VLAN(obj));
break;
+ case SWITCHDEV_OBJ_ID_IPV4_FIB:
+ err = dsa_slave_ipv4_fib_del(dev,
+ SWITCHDEV_OBJ_IPV4_FIB(obj));
+ break;
default:
err = -EOPNOTSUPP;
break;
--
1.7.10.4
^ permalink raw reply related
* [PATCH V3] net-next: dsa: add FIB support
From: John Crispin @ 2016-09-13 6:02 UTC (permalink / raw)
To: David S. Miller, Andrew Lunn, Florian Fainelli
Cc: netdev, linux-kernel, John Crispin
Add SWITCHDEV_OBJ_ID_IPV4_FIB support to the DSA layer.
Signed-off-by: John Crispin <john@phrozen.org>
---
Changes in V2
* rebase on latest net-next to fix compile errors
Changes in V3
* fix subject prefix. this needs to go into the next tree
Documentation/networking/dsa/dsa.txt | 18 +++++++++++++++
include/net/dsa.h | 13 +++++++++++
net/dsa/slave.c | 41 ++++++++++++++++++++++++++++++++++
3 files changed, 72 insertions(+)
diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt
index 6d6c07c..6cd5831 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.txt
@@ -607,6 +607,24 @@ of DSA, would be the its port-based VLAN, used by the associated bridge device.
function that the driver has to call for each MAC address known to be behind
the given port. A switchdev object is used to carry the VID and MDB info.
+Route offloading
+----------------
+
+- ipv4_fib_prepare: routing layer function invoked to prepare the installation
+ of an ipv4 route into the switches routing database. If the operation is not
+ supported this function should return -EOPNOTSUPP. No hardware setup must be
+ done in this function. See ipv4_fib_add for this and details.
+
+- ipv4_fib_add: routing layer function invoked when a new ipv4 route should be
+ installed into the routing database of the switch, it should be programmed
+ with the route for the specified VLAN ID of the device that the route applies
+ to.
+
+- ipv4_fib_del: routing layer function invoked when an ipv4 route should be
+ removed from the routing database, the switch hardware should be programmed
+ to delete the specified MAC address of the nexthop from the specified VLAN ID
+ if it was mapped into the forwarding database.
+
TODO
====
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7556646..7fdd63e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -237,6 +237,7 @@ struct switchdev_obj;
struct switchdev_obj_port_fdb;
struct switchdev_obj_port_mdb;
struct switchdev_obj_port_vlan;
+struct switchdev_obj_ipv4_fib;
struct dsa_switch_ops {
struct list_head list;
@@ -386,6 +387,18 @@ struct dsa_switch_ops {
int (*port_mdb_dump)(struct dsa_switch *ds, int port,
struct switchdev_obj_port_mdb *mdb,
int (*cb)(struct switchdev_obj *obj));
+
+ /*
+ * IPV4 routing
+ */
+ int (*ipv4_fib_prepare)(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_ipv4_fib *fib4,
+ struct switchdev_trans *trans);
+ int (*ipv4_fib_add)(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_ipv4_fib *fib4,
+ struct switchdev_trans *trans);
+ int (*ipv4_fib_del)(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_ipv4_fib *fib4);
};
void register_switch_driver(struct dsa_switch_ops *type);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9ecbe78..c974ac0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -334,6 +334,38 @@ static int dsa_slave_port_mdb_dump(struct net_device *dev,
return -EOPNOTSUPP;
}
+static int dsa_slave_ipv4_fib_add(struct net_device *dev,
+ const struct switchdev_obj_ipv4_fib *fib4,
+ struct switchdev_trans *trans)
+{
+ struct dsa_slave_priv *p = netdev_priv(dev);
+ struct dsa_switch *ds = p->parent;
+ int ret;
+
+ if (!ds->ops->ipv4_fib_prepare || !ds->ops->ipv4_fib_add)
+ return -EOPNOTSUPP;
+
+ if (switchdev_trans_ph_prepare(trans))
+ ret = ds->ops->ipv4_fib_prepare(ds, p->port, fib4, trans);
+ else
+ ret = ds->ops->ipv4_fib_add(ds, p->port, fib4, trans);
+
+ return ret;
+}
+
+static int dsa_slave_ipv4_fib_del(struct net_device *dev,
+ const struct switchdev_obj_ipv4_fib *fib4)
+{
+ struct dsa_slave_priv *p = netdev_priv(dev);
+ struct dsa_switch *ds = p->parent;
+ int ret = -EOPNOTSUPP;
+
+ if (ds->ops->ipv4_fib_del)
+ ret = ds->ops->ipv4_fib_del(ds, p->port, fib4);
+
+ return ret;
+}
+
static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
struct dsa_slave_priv *p = netdev_priv(dev);
@@ -465,6 +497,11 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
SWITCHDEV_OBJ_PORT_VLAN(obj),
trans);
break;
+ case SWITCHDEV_OBJ_ID_IPV4_FIB:
+ err = dsa_slave_ipv4_fib_add(dev,
+ SWITCHDEV_OBJ_IPV4_FIB(obj),
+ trans);
+ break;
default:
err = -EOPNOTSUPP;
break;
@@ -490,6 +527,10 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
err = dsa_slave_port_vlan_del(dev,
SWITCHDEV_OBJ_PORT_VLAN(obj));
break;
+ case SWITCHDEV_OBJ_ID_IPV4_FIB:
+ err = dsa_slave_ipv4_fib_del(dev,
+ SWITCHDEV_OBJ_IPV4_FIB(obj));
+ break;
default:
err = -EOPNOTSUPP;
break;
--
1.7.10.4
^ permalink raw reply related
* Re: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Leon Romanovsky @ 2016-09-13 6:05 UTC (permalink / raw)
To: Yuval Mintz
Cc: Parav Pandit, Ram Amrani, Doug Ledford, David Miller, Ariel Elior,
Michal Kalderon, Rajesh Borundia,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <CY4PR11MB17202A542AC07CBA4A65E93697FF0-JNf6+SjKdlG0ooKL/ADlEpPPoyLQLiKMvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 2418 bytes --]
On Mon, Sep 12, 2016 at 05:39:35PM +0000, Yuval Mintz wrote:
> >>> include/linux/qed/common_hsi.h | 1 +
> >>> include/linux/qed/qed_if.h | 9 +-
> >>> include/linux/qed/qed_ll2_if.h | 140 +
> >>> include/linux/qed/qed_roce_if.h | 604 ++++
> >>> include/linux/qed/qede_roce.h | 88 +
> >> > include/linux/qed/rdma_common.h | 1 +
> >>
> >> Something not directly related to your patches, but they brought my
> >> attention to the fact that all these new (and old) rdma<->net devices
> >> are polluting include/linux
> >>
> > ocrdma driver includes be_roce.h located in net/ethernet/emulex/benet
> > location instead of include/linux/.
> > This file helps to bind rdma to net device or underlying hw device.
>
> > May be similar change can be done for rest of the drivers for
> > rdma<-->net devices?
>
> By adding explicit inclusion paths in the Makefile, a la
> ccflags-y := -Idrivers/net/ethernet/emulex/benet ?
>
> While this might work, I personally dislike it as I find it
> counter-intuitive when going over the code -
> I don't expect driver to locally modify the inclusion path.
> Besides, we're going to [eventually] a whole suite of drivers based
> on the qed module, some of which would reside under drivers/scsi;
> Not sure it's best to have 3 or 4 different drivers privately include the
> same directory under a different subsystem.
I agree with you that orcdma's way can be valuable for small drivers.
Orcmda has small shared headers set and doesn't need to change them rapidly
to support different devices.
I thought to place them in similar directory to include/soc/* and remove
from include/linux/. We have include/rdma/ and it looks like a good
candidate.
>
> >> Filtered output:
> >> ➜ linux-rdma git:(topic/fixes-for-4.8-2) ls -dl include/linux/*/
> >> drwxrwxr-x 2 leonro leonro 4096 Aug 30 16:27 include/linux/hsi/
> >> drwxrwxr-x 2 leonro leonro 4096 Sep 12 19:08 include/linux/mlx4/
> >> drwxrwxr-x 2 leonro leonro 4096 Sep 7 15:31 include/linux/mlx5/
> >> drwxrwxr-x 2 leonro leonro 4096 Sep 8 17:46 include/linux/qed/
> >>
> >> Is this the right place for them?
> >
> > Thanks
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: icmpv6: issue with routing table entries from link local addresses
From: Andreas Hübner @ 2016-09-13 6:35 UTC (permalink / raw)
To: Hannes Frederic Sowa; +Cc: netdev, d. caratti
In-Reply-To: <b6804e09-403f-64ff-257b-6c5fda7f8047@stressinduktion.org>
On Mon, Sep 12, 2016 at 07:26:23PM +0200, Hannes Frederic Sowa wrote:
> > Actually, I'm convinced I must be doing something wrong here. The setup
> > for the issue is quite trivial, someone would have tripped over it
> > already. The only condition is that one host has multiple interfaces
> > with ipv6 enabled.
> >
> > Any help in shedding some light onto this issue would be appreciated.
>
> This shouldn't be the case. We certainly carry over the ifindex of the
> received packet into the routing lookup of the outgoing packet, thus the
> appropriate rule, with outgoing ifindex should be selected.
I saw this in the code and that's the reason why I wrote the initial
mail. Was trying to trace with ftrace, but got stuck somewhere around
the find_rr_leaf function. (If there is any good documentation on the
internal fib data structure, please point me to it.)
> I also couldn't reproduce your problem here with my system. Can you
> verify with tcpdump that the packet is leaving on another interface?
It is not leaving on another interface but simply discarded on the host.
The Ip6OutNoRoutes stat in /proc/net/snmp6 is increased.
>From my understanding the routing subsystem finds the first matching entry
in the main routing table, checks the interface and bails out because it
does not match.
I did omit a crucial information in the last mail, I'm currently stuck
on an older distribution kernel (3.16).
I'll try to check if there have been any relevant changes to IPv6
route lookup in the last two years.
(Maybe I should try to reproduce it with the current kernel, sorry that
I didn't think of this before.)
Andreas
^ permalink raw reply
* Re: [RFC 02/11] Add RoCE driver framework
From: Leon Romanovsky @ 2016-09-13 6:38 UTC (permalink / raw)
To: Yuval Mintz
Cc: Mark Bloch, Ram Amrani,
dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, David Miller,
Ariel Elior, Michal Kalderon, Rajesh Borundia,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <CY4PR11MB1720F9B74BBF82D0779FAC8F97FF0-JNf6+SjKdlG0ooKL/ADlEpPPoyLQLiKMvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 897 bytes --]
On Mon, Sep 12, 2016 at 07:17:35PM +0000, Yuval Mintz wrote:
> >> +uint debug;
> >> +module_param(debug, uint, 0);
> >> +MODULE_PARM_DESC(debug, "Default debug msglevel");
>
> >Why are you adding this as a module parameter?
>
> I believe this is mostly to follow same line as qede which also defines
> 'debug' module parameter for allowing easy user control of debug
> prints [& specifically for probe prints, which can't be controlled
> otherwise].
Can you give us an example where dynamic debug and tracing infrastructures
are not enough?
AFAIK, most of these debug module parameters are legacy copy/paste
code which is useless in real life scenarios.
Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [RFC 02/11] Add RoCE driver framework
From: Leon Romanovsky @ 2016-09-13 6:46 UTC (permalink / raw)
To: Ram Amrani
Cc: dledford, davem, Yuval.Mintz, Ariel.Elior, Michal.Kalderon,
rajesh.borundia, linux-rdma, netdev
In-Reply-To: <1473696465-27986-3-git-send-email-Ram.Amrani@qlogic.com>
[-- Attachment #1: Type: text/plain, Size: 607 bytes --]
On Mon, Sep 12, 2016 at 07:07:36PM +0300, Ram Amrani wrote:
> Adds a skeletal implementation of the qed* RoCE driver -
> basically the ability to communicate with the qede driver and
> receive notifications from it regarding various init/exit events.
>
> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
> Signed-off-by: Ram Amrani <Ram.Amrani@qlogic.com>
> ---
<...>
> +
> +#define QEDR_MODULE_VERSION "8.10.10.0"
I am strongly against module versions. You should rely on official
kernel version.
> +#define QEDR_NODE_DESC "QLogic 579xx RoCE HCA"
> +#define DP_NAME(dev) ((dev)->ibdev.name)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Mintz, Yuval @ 2016-09-13 6:48 UTC (permalink / raw)
To: Leon Romanovsky, Yuval Mintz
Cc: Parav Pandit, Ram Amrani, Doug Ledford, David Miller, Ariel Elior,
Michal Kalderon, Rajesh Borundia,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <20160913060545.GN8812-2ukJVAZIZ/Y@public.gmane.org>
>> While this might work, I personally dislike it as I find it
>> counter-intuitive when going over the code -
>> I don't expect driver to locally modify the inclusion path.
>> Besides, we're going to [eventually] a whole suite of drivers based
>> on the qed module, some of which would reside under drivers/scsi;
>> Not sure it's best to have 3 or 4 different drivers privately include the
>> same directory under a different subsystem.
> I agree with you that orcdma's way can be valuable for small drivers.
> Orcmda has small shared headers set and doesn't need to change them rapidly
> to support different devices.
> I thought to place them in similar directory to include/soc/* and remove
> from include/linux/. We have include/rdma/ and it looks like a good
> candidate.
I'm perfectly fine with relocating those to a different directory under include/,
although using 'rdma' doesn't sound like a good fit [as the headers would be
included by ethernet, scsi and rdma drivers].
Are there good existing alternatives?
Regardless, I don't believe this should be part of the initial submission,
as it would involve in relocating existing networking headers as well.
I think we can move those at leisure later on.
[We're in the middle of transitioning our e-mails from qlogic -> cavium,
so sorry if things become corrupted]--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next 0/2] mvneta xmit_more and bql support
From: Marcin Wojtas @ 2016-09-13 7:00 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, netdev
Cc: davem, linux, sebastian.hesselbarth, andrew, jason,
thomas.petazzoni, gregory.clement, nadavh, alior, simon.guinot,
nitroshift, mw, jaz
Hi,
This short patchset introduces two enhancements to mvneta driver
TX packets concatenation support using xmit_more mechanism and also
byte queue limit in order to decrease latency on saturated links.
Any comments or feedback would be welcome.
Best regards,
Marcin
Marcin Wojtas (1):
net: mvneta: add BQL support
Simon Guinot (1):
net: mvneta: add xmit_more support
drivers/net/ethernet/marvell/mvneta.c | 33 +++++++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next 1/2] net: mvneta: add xmit_more support
From: Marcin Wojtas @ 2016-09-13 7:00 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, netdev
Cc: davem, linux, sebastian.hesselbarth, andrew, jason,
thomas.petazzoni, gregory.clement, nadavh, alior, simon.guinot,
nitroshift, mw, jaz
In-Reply-To: <1473750006-21199-1-git-send-email-mw@semihalf.com>
From: Simon Guinot <simon.guinot@sequanux.org>
Basing on xmit_more flag of the skb, TX descriptors can be concatenated
before flushing. This commit delay Tx descriptor flush if the queue is
running and if there is more skb's to send.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
---
drivers/net/ethernet/marvell/mvneta.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index d41c28d..b9dccea 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -512,6 +512,7 @@ struct mvneta_tx_queue {
* descriptor ring
*/
int count;
+ int pending;
int tx_stop_threshold;
int tx_wake_threshold;
@@ -802,8 +803,9 @@ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
/* Only 255 descriptors can be added at once ; Assume caller
* process TX desriptors in quanta less than 256
*/
- val = pend_desc;
+ val = pend_desc + txq->pending;
mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
+ txq->pending = 0;
}
/* Get pointer to next TX descriptor to be processed (send) by HW */
@@ -2357,11 +2359,14 @@ out:
struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id);
txq->count += frags;
- mvneta_txq_pend_desc_add(pp, txq, frags);
-
if (txq->count >= txq->tx_stop_threshold)
netif_tx_stop_queue(nq);
+ if (!skb->xmit_more || netif_xmit_stopped(nq))
+ mvneta_txq_pend_desc_add(pp, txq, frags);
+ else
+ txq->pending += frags;
+
u64_stats_update_begin(&stats->syncp);
stats->tx_packets++;
stats->tx_bytes += len;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 2/2] net: mvneta: add BQL support
From: Marcin Wojtas @ 2016-09-13 7:00 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, netdev
Cc: davem, linux, sebastian.hesselbarth, andrew, jason,
thomas.petazzoni, gregory.clement, nadavh, alior, simon.guinot,
nitroshift, mw, jaz
In-Reply-To: <1473750006-21199-1-git-send-email-mw@semihalf.com>
Tests showed that when whole bandwidth is consumed, the latency for
various kind of traffic can reach high values. With saturated
link (e.g. with iperf from target to host) simple ping could take
significant amount of time. BQL proved to improve this situation
when implemented in mvneta driver. Measurements of ping latency
for 3 link speeds:
Speed | Latency w/o BQL | Latency with BQL
10 | 7-14 ms | 3.5 ms
100 | 2-12 ms | 0.6 ms
1000 | often timeout | up to 2ms
Decreasing latency as above result in sligt performance cost - 4kpps
(-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x.
For 1500B packets in the same setup, the mpstat tool showed +8% of
CPU occupation (default affinity, second CPU idle). Even though this
cost seems reasonable to take, considering other improvements.
This commit adds byte queue limit mechanism for the mvneta driver.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
---
drivers/net/ethernet/marvell/mvneta.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index b9dccea..bb5df35 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1719,8 +1719,10 @@ static struct mvneta_tx_queue *mvneta_tx_done_policy(struct mvneta_port *pp,
/* Free tx queue skbuffs */
static void mvneta_txq_bufs_free(struct mvneta_port *pp,
- struct mvneta_tx_queue *txq, int num)
+ struct mvneta_tx_queue *txq, int num,
+ struct netdev_queue *nq)
{
+ unsigned int bytes_compl = 0, pkts_compl = 0;
int i;
for (i = 0; i < num; i++) {
@@ -1728,6 +1730,11 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
txq->txq_get_index;
struct sk_buff *skb = txq->tx_skb[txq->txq_get_index];
+ if (skb) {
+ bytes_compl += skb->len;
+ pkts_compl++;
+ }
+
mvneta_txq_inc_get(txq);
if (!IS_TSO_HEADER(txq, tx_desc->buf_phys_addr))
@@ -1738,6 +1745,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
continue;
dev_kfree_skb_any(skb);
}
+
+ netdev_tx_completed_queue(nq, pkts_compl, bytes_compl);
}
/* Handle end of transmission */
@@ -1751,7 +1760,7 @@ static void mvneta_txq_done(struct mvneta_port *pp,
if (!tx_done)
return;
- mvneta_txq_bufs_free(pp, txq, tx_done);
+ mvneta_txq_bufs_free(pp, txq, tx_done, nq);
txq->count -= tx_done;
@@ -2358,6 +2367,8 @@ out:
struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id);
+ netdev_tx_sent_queue(nq, len);
+
txq->count += frags;
if (txq->count >= txq->tx_stop_threshold)
netif_tx_stop_queue(nq);
@@ -2385,9 +2396,10 @@ static void mvneta_txq_done_force(struct mvneta_port *pp,
struct mvneta_tx_queue *txq)
{
+ struct netdev_queue *nq = netdev_get_tx_queue(pp->dev, txq->id);
int tx_done = txq->count;
- mvneta_txq_bufs_free(pp, txq, tx_done);
+ mvneta_txq_bufs_free(pp, txq, tx_done, nq);
/* reset txq */
txq->count = 0;
@@ -2884,6 +2896,8 @@ static int mvneta_txq_init(struct mvneta_port *pp,
static void mvneta_txq_deinit(struct mvneta_port *pp,
struct mvneta_tx_queue *txq)
{
+ struct netdev_queue *nq = netdev_get_tx_queue(pp->dev, txq->id);
+
kfree(txq->tx_skb);
if (txq->tso_hdrs)
@@ -2895,6 +2909,8 @@ static void mvneta_txq_deinit(struct mvneta_port *pp,
txq->size * MVNETA_DESC_ALIGNED_SIZE,
txq->descs, txq->descs_phys);
+ netdev_tx_reset_queue(nq);
+
txq->descs = NULL;
txq->last_desc = 0;
txq->next_desc_to_proc = 0;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net-next] ipv4: accept u8 in IP_TOS ancillary data
From: Jesper Dangaard Brouer @ 2016-09-13 7:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev, Francesco Fusco, Maciej Żenczykowski,
brouer
In-Reply-To: <1473341474.15733.36.camel@edumazet-glaptop3.roam.corp.google.com>
On Thu, 08 Sep 2016 06:31:14 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-09-08 at 11:15 +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 07 Sep 2016 21:52:56 -0700
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > In commit f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as
> > > ancillary data") Francesco added IP_TOS values specified as integer.
> > >
> > > However, kernel sends to userspace (at recvmsg() time) an IP_TOS value
> > > in a single byte, when IP_RECVTOS is set on the socket.
> > >
> > > It can be very useful to reflect all ancillary options as given by the
> > > kernel in a subsequent sendmsg(), instead of aborting the sendmsg() with
> > > EINVAL after Francesco patch.
> > >
> > > So this patch extends IP_TOS ancillary to accept an u8, so that an UDP
> > > server can simply reuse same ancillary block without having to mangle
> > > it.
> > >
> > > Jesper can then augment
> > > https://github.com/netoptimizer/network-testing/blob/master/src/udp_example02.c
> > > to add TOS reflection ;)
> >
> > This is actually your old program ;-)
> > Do I need to change anything, as I'm just bouncing the packet back with sendmsg() ?
>
> I guess you want to add an option and if this option is requested by the
> user, add :
>
> setsockopt(fd, SOL_IP, IP_PKTINFO, &on, sizeof(on));
> + if (tos_reflect)
> + setsockopt(fd, SOL_IP, IP_RECVTOS, &on, sizeof(on));
>
> before the loop doing the recvmsg()/sendmsg() calls.
Hi Eric,
I've implemented what you suggested:
https://github.com/netoptimizer/network-testing/commit/0758ad77a96ecb1
Now QA can use this tool to verify the kernel commit ;-)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* [PATCH net-next] cxgb4vf: don't offload Rx checksums for IPv6 fragments
From: Hariprasad Shenai @ 2016-09-13 7:14 UTC (permalink / raw)
To: netdev; +Cc: davem, leedom, nirranjan, Hariprasad Shenai
The checksum provided by the device doesn't include the L3 headers,
as IPv6 expects
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index c8fd4f8fe1fa..4d4b94a8969a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -1648,14 +1648,15 @@ int t4vf_ethrx_handler(struct sge_rspq *rspq, const __be64 *rsp,
if (csum_ok && !pkt->err_vec &&
(be32_to_cpu(pkt->l2info) & (RXF_UDP_F | RXF_TCP_F))) {
- if (!pkt->ip_frag)
+ if (!pkt->ip_frag) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
- else {
+ rxq->stats.rx_cso++;
+ } else if (pkt->l2info & htonl(F_RXF_IP)) {
__sum16 c = (__force __sum16)pkt->csum;
skb->csum = csum_unfold(c);
skb->ip_summed = CHECKSUM_COMPLETE;
+ rxq->stats.rx_cso++;
}
- rxq->stats.rx_cso++;
} else
skb_checksum_none_assert(skb);
--
2.3.4
^ permalink raw reply related
* RE: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
From: Y.B. Lu @ 2016-09-13 7:23 UTC (permalink / raw)
To: Scott Wood, linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ulf.hansson-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
Arnd Bergmann
Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Russell King, Bhupesh Sharma,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Santosh Shilimkar,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jochen Friedrich, X.B. Xie,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Rob Herring, linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Claudiu Manoil, Kumar Gala, Leo Li,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
linux-clk-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <1473722714.30217.196.camel-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
> -----Original Message-----
> From: linux-mmc-owner@vger.kernel.org [mailto:linux-mmc-
> owner@vger.kernel.org] On Behalf Of Scott Wood
> Sent: Tuesday, September 13, 2016 7:25 AM
> To: Y.B. Lu; linux-mmc@vger.kernel.org; ulf.hansson@linaro.org; Arnd
> Bergmann
> Cc: linuxppc-dev@lists.ozlabs.org; devicetree@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> clk@vger.kernel.org; linux-i2c@vger.kernel.org; iommu@lists.linux-
> foundation.org; netdev@vger.kernel.org; Mark Rutland; Rob Herring;
> Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie
> Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
>
> On Mon, 2016-09-12 at 06:39 +0000, Y.B. Lu wrote:
> > Hi Scott,
> >
> > Thanks for your review :)
> > See my comment inline.
> >
> > >
> > > -----Original Message-----
> > > From: Scott Wood [mailto:oss@buserror.net]
> > > Sent: Friday, September 09, 2016 11:47 AM
> > > To: Y.B. Lu; linux-mmc@vger.kernel.org; ulf.hansson@linaro.org; Arnd
> > > Bergmann
> > > Cc: linuxppc-dev@lists.ozlabs.org; devicetree@vger.kernel.org;
> > > linux-arm- kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> > > linux- clk@vger.kernel.org; linux-i2c@vger.kernel.org;
> > > iommu@lists.linux- foundation.org; netdev@vger.kernel.org; Mark
> > > Rutland; Rob Herring; Russell King; Jochen Friedrich; Joerg Roedel;
> > > Claudiu Manoil; Bhupesh Sharma; Qiang Zhao; Kumar Gala; Santosh
> > > Shilimkar; Leo Li; X.B. Xie
> > > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ
> > > platforms
> > >
> > > On Tue, 2016-09-06 at 16:28 +0800, Yangbo Lu wrote:
> > > >
> > > > The global utilities block controls power management, I/O device
> > > > enabling, power-onreset(POR) configuration monitoring, alternate
> > > > function selection for multiplexed signals,and clock control.
> > > >
> > > > This patch adds a driver to manage and access global utilities
> block.
> > > > Initially only reading SVR and registering soc device are supported.
> > > > Other guts accesses, such as reading RCW, should eventually be
> > > > moved into this driver as well.
> > > >
> > > > Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
> > > > Signed-off-by: Scott Wood <oss@buserror.net>
> > > Don't put my signoff on patches that I didn't put it on myself.
> > > Definitely don't put mine *after* yours on patches that were last
> > > modified by you.
> > >
> > > If you want to mention that the soc_id encoding was my suggestion,
> > > then do so explicitly.
> > >
> > [Lu Yangbo-B47093] I found your 'signoff' on this patch at below link.
> > http://patchwork.ozlabs.org/patch/649211/
> >
> > So, let me just change the order in next version ?
> > Signed-off-by: Scott Wood <oss@buserror.net>
> > Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
>
> No. This isn't my patch so my signoff shouldn't be on it.
[Lu Yangbo-B47093] Ok, will remove it.
>
> > [Lu Yangbo-B47093] It's a good idea to move die into .family I think.
> > In my opinion, it's better to keep svr and name in soc_id just like
> > your suggestion above.
> > >
> > > {
> > > .soc_id = "svr:0x85490010,name:T1023E,",
> > > .family = "QorIQ T1024",
> > > }
> > The user probably don’t like to learn the svr value. What they want is
> > just to match the soc they use.
> > It's convenient to use name+rev for them to match a soc.
>
> What the user should want 99% of the time is to match the die (plus
> revision), not the soc.
>
> > Regarding shrinking the table, I think it's hard to use svr+mask.
> > Because I find many platforms use different masks.
> > We couldn’t know the mask according svr value.
>
> The mask would be part of the table:
>
> {
> {
> .die = "T1024",
> .svr = 0x85400000,
> .mask = 0xfff00000,
> },
> {
> .die = "T1040",
> .svr = 0x85200000,
> .mask = 0xfff00000,
> },
> {
> .die = "LS1088A",
> .svr = 0x87030000,
> .mask = 0xffff0000,
> },
> ...
> }
>
> There's a small risk that we get the mask wrong and a different die is
> created that matches an existing table, but it doesn't seem too likely,
> and can easily be fixed with a kernel update if it happens.
>
[Lu Yangbo-B47093] You mean we will not define soc device attribute for each soc and we will define attribute for each die instead, right?
If so, when we want to match a specific soc we need to use its svr value in code. If it's acceptable, I can try in next version.
> BTW, aren't ls2080a and ls2085a the same die? And is there no non-E
> version of LS2080A/LS2040A?
[Lu Yangbo-B47093] I checked all the svr values in chip errata doc "Revision level to part marking cross-reference" table.
I found ls2080a and ls2085a were in two separate doc. And I didn’t find non-E version of LS2080A/LS2040A in chip errata doc.
Do you know is there any other doc we can confirm this?
>
> > > > + do {
> > > > + if (!matches->soc_id)
> > > > + return NULL;
> > > > + if (glob_match(svr_match, matches->soc_id))
> > > > + break;
> > > > + } while (matches++);
> > > Are you expecting "matches++" to ever evaluate as false?
> > [Lu Yangbo-B47093] Yes, this is used to match the soc we use in
> > qoriq_soc array until getting true.
> > We need to get the name and die information defined in array.
>
> I'm not asking whether the glob_match will ever return true. I'm saying
> that "matches++" will never become NULL.
[Lu Yangbo-B47093] The matches++ will never become NULL while it will return NULL after matching for all the members in array.
>
> > > > + /* Register soc device */
> > > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> > > > + if (!soc_dev_attr) {
> > > > + ret = -ENOMEM;
> > > > + goto out_unmap;
> > > > + }
> > > Couldn't this be statically allocated?
> > [Lu Yangbo-B47093] Do you mean we define this struct statically ?
> >
> > static struct soc_device_attribute soc_dev_attr;
>
> Yes.
>
[Lu Yangbo-B47093] It's ok to define it statically. Is there any need to do that?
> > > > +
> > > > + soc_dev = soc_device_register(soc_dev_attr);
> > > > + if (IS_ERR(soc_dev)) {
> > > > + ret = -ENODEV;
> > > Why are you changing the error code?
> > [Lu Yangbo-B47093] What error code should we use ? :)
>
> ret = PTR_ERR(soc_dev);
[Lu Yangbo-B47093] Ok.. will do that.
>
> + }
> > > > + return 0;
> > > > +out:
> > > > + kfree(soc_dev_attr->machine);
> > > > + kfree(soc_dev_attr->family);
> > > > + kfree(soc_dev_attr->soc_id);
> > > > + kfree(soc_dev_attr->revision);
> > > > + kfree(soc_dev_attr);
> > > > +out_unmap:
> > > > + iounmap(guts->regs);
> > > > +out_free:
> > > > + kfree(guts);
> > > devm
> > [Lu Yangbo-B47093] What's the devm meaning here :)
>
> If you allocate these with devm_kzalloc(), devm_kasprintf(),
> devm_kstrdup(), etc. then they will be freed automatically when the
> device is unbound.
>
> >
> > >
> > >
> > > >
> > > > +static int fsl_guts_remove(struct platform_device *dev) {
> > > > + kfree(soc_dev_attr->machine);
> > > > + kfree(soc_dev_attr->family);
> > > > + kfree(soc_dev_attr->soc_id);
> > > > + kfree(soc_dev_attr->revision);
> > > > + kfree(soc_dev_attr);
> > > > + soc_device_unregister(soc_dev);
> > > > + iounmap(guts->regs);
> > > > + kfree(guts);
> > > > + return 0;
> > > > +}
> > > Don't free the memory before you unregister the device that uses it
> > > (moot if you use devm).
> > [Lu Yangbo-B47093] The soc.c driver mentions that.
> > Ensure soc_dev->attr is freed prior to calling soc_device_unregister.
>
> That comment is wrong. Freeing the memory first creates a race condition
> that could result in accessing freed memory, if something accesses the
> soc device in parallel with unbinding.
>
[Lu Yangbo-B47093] Ok, will unregister the device first. Thanks.
> -Scott
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply
* United Nations Compensation Fund
From: Jan Eliasson @ 2016-09-13 7:32 UTC (permalink / raw)
Attention,
This is to official inform you that we have been having a meeting for
the past two months in the meeting we discussed on issue concerning
Scam victims this compensation exercise is also extended to victims of
War displaced also including some government retired workers and You
are among those that was listed all over the world that will benefit
from these compensation which have been concluded.
Therefore I am writing to notify you that United Nation have agreed to
compensate you with the sum of US $ 4.5m (four million five hundred
thousand united state dollars) We have arranged your payment of USD
$4.5m through ATM CARD which is the instruction from Ban Ki-moon UN
Secretary General
The ATM Card with Security Pin Numbers shall be delivered to you using
DHL Express Mail Service (DHL) .Note: that you will pay for the
Delivery fee of $205 to DHL COURIER COMPANY BENIN REPUBLIC for your
immediate Delivery of your ATM VISA Card.
Contact our representative Barrister. David Fisher and forward the
following details to him.
1. Your Full Name:
2. Address Where You want us to Send Your ATM Card
3. Mobile Number:
Contact Barrister. David Fisher with below email and forward your
details to him.
Email: (fisher.dav@yandex.com)
Thanks
Jan Eliasson
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox