* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: David Miller @ 2016-04-08 16:53 UTC (permalink / raw)
To: eric.dumazet; +Cc: yangyingliang, netdev, dingtianhong
In-Reply-To: <1460126665.6473.437.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Apr 2016 07:44:25 -0700
> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>
>> I expand tcp_adv_win_scale and tcp_rmem. It has no effect.
>
> Try :
>
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>
> And restart your flows.
I'm honestly beginning to suspect a bug in their driver and how they
handle skb->truesize.
Yang, until you show us the driver you are using and how is handles
receive packets, we are largely in the dark about a major component
of this issue and that is entirely unfair to us.
^ permalink raw reply
* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Brenden Blanco @ 2016-04-08 16:48 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Jesper Dangaard Brouer, davem, netdev, tom, alexei.starovoitov,
ogerlitz, eric.dumazet, ecree, john.fastabend, tgraf, johannes,
eranlinuxmellanox, lorenzo
In-Reply-To: <5707916A.2030305@iogearbox.net>
On Fri, Apr 08, 2016 at 01:09:30PM +0200, Daniel Borkmann wrote:
> On 04/08/2016 12:36 PM, Jesper Dangaard Brouer wrote:
> >On Thu, 7 Apr 2016 21:48:46 -0700
> >Brenden Blanco <bblanco@plumgrid.com> wrote:
> >
> >>Add a new bpf prog type that is intended to run in early stages of the
> >>packet rx path. Only minimal packet metadata will be available, hence a
> >>new context type, struct bpf_phys_dev_md, is exposed to userspace. So
> >>far only expose the readable packet length, and only in read mode.
> >>
> >>The PHYS_DEV name is chosen to represent that the program is meant only
> >>for physical adapters, rather than all netdevs.
> >>
> >>While the user visible struct is new, the underlying context must be
> >>implemented as a minimal skb in order for the packet load_* instructions
> >>to work. The skb filled in by the driver must have skb->len, skb->head,
> >>and skb->data set, and skb->data_len == 0.
> >>
> >>Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> >>---
> >> include/uapi/linux/bpf.h | 14 ++++++++++
> >> kernel/bpf/verifier.c | 1 +
> >> net/core/filter.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 83 insertions(+)
> >>
> >>diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>index 70eda5a..3018d83 100644
> >>--- a/include/uapi/linux/bpf.h
> >>+++ b/include/uapi/linux/bpf.h
> >>@@ -93,6 +93,7 @@ enum bpf_prog_type {
> >> BPF_PROG_TYPE_SCHED_CLS,
> >> BPF_PROG_TYPE_SCHED_ACT,
> >> BPF_PROG_TYPE_TRACEPOINT,
> >>+ BPF_PROG_TYPE_PHYS_DEV,
> >> };
> >>
> >> #define BPF_PSEUDO_MAP_FD 1
> >>@@ -368,6 +369,19 @@ struct __sk_buff {
> >> __u32 tc_classid;
> >> };
> >>
> >>+/* user return codes for PHYS_DEV prog type */
> >>+enum bpf_phys_dev_action {
> >>+ BPF_PHYS_DEV_DROP,
> >>+ BPF_PHYS_DEV_OK,
> >>+};
> >
> >I can imagine these extra return codes:
> >
> > BPF_PHYS_DEV_MODIFIED, /* Packet page/payload modified */
> > BPF_PHYS_DEV_STOLEN, /* E.g. forward use-case */
> > BPF_PHYS_DEV_SHARED, /* Queue for async processing, e.g. tcpdump use-case */
> >
> >The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> >which we can look at when we get that far...
>
> I'd probably let the tcpdump case be handled by the rest of the stack.
> Forwarding could be to some txqueue or perhaps directly to a virtual net
> device e.g. the veth end sitting in a container (where fake skb would
> need to be promoted to a real one). Or, perhaps instead of veth end,
> directly demuxed into a target socket queue in that container ...
> Alternatively for tcpdump use-case, you could also do sampling on the
> bpf_phy_dev with eBPF maps.
+1, there is plenty of infrastructure to deal with this already.
>
> >For the "MODIFIED" case, people caring about checksumming, might want
> >to voice their concern if we want additional info or return codes,
> >indicating if software need to recalc CSUM (or e.g. if only IP-pseudo
> >hdr was modified).
> >
> >>+/* user accessible metadata for PHYS_DEV packet hook
> >>+ * new fields must be added to the end of this structure
> >>+ */
> >>+struct bpf_phys_dev_md {
> >>+ __u32 len;
> >>+};
> >
> >This is userspace visible. I can easily imagine this structure will get
> >extended. How does a userspace eBPF program know that the struct got
> >extended? (bet you got some scheme, normally I would add a "version" as
> >first elem).
Since fields are only ever added to the end, programs that access beyond
the struct as understood by the running kernel will be rejected. The
struct ordering is a hard requirement. We've gone through quite a few
upgrades of this style of struct and not had any issues that I am aware.
> >
> >BTW, how and where is this "struct bpf_phys_dev_md" allocated?
>
> It isn't, see bpf_phys_dev_convert_ctx_access(). At load time the verifier
> will convert/rewrite this based on offsetof() to a real access of the per
> cpu sk_buff, that's the only purpose.
>
> Cheers,
> Daniel
^ permalink raw reply
* Re: [PATCH net] mpls: find_outdev: check for err ptr in addition to NULL check
From: David Miller @ 2016-04-08 16:43 UTC (permalink / raw)
To: roopa; +Cc: netdev
In-Reply-To: <1460089718-25788-1-git-send-email-roopa@cumulusnetworks.com>
From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Thu, 7 Apr 2016 21:28:38 -0700
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to
> find the output device. In case of an error, inet{,6}_fib_lookup_dev()
> returns error pointer and dev_get_by_index() returns NULL. But the function
> only checks for NULL and thus can end up calling dev_put on an ERR_PTR.
> This patch adds an additional check for err ptr after the NULL check.
>
> Before: Trying to add an mpls route with no oif from user, no available
> path to 10.1.1.8 and no default route:
> $ip -f mpls route add 100 as 200 via inet 10.1.1.8
> [ 822.337195] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000003a3
...
> After patch:
> $ip -f mpls route add 100 as 200 via inet 10.1.1.8
> RTNETLINK answers: Network is unreachable
>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Reported-by: David Miller <davem@davemloft.net>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] net: thunderx: Fix broken of_node_put() code.
From: David Daney @ 2016-04-08 16:41 UTC (permalink / raw)
To: David Daney
Cc: David S. Miller, netdev, linux-kernel, linux-arm-kernel,
Robert Richter, Sunil Goutham, David Daney
In-Reply-To: <1459472517-5696-1-git-send-email-ddaney.cavm@gmail.com>
Due to mail server malfunction, this patch was sent twice. Please
ignore this duplicate.
Thanks,
David Daney
On 03/31/2016 06:01 PM, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
>
> commit b7d3e3d3d21a ("net: thunderx: Don't leak phy device references
> on -EPROBE_DEFER condition.") incorrectly moved the call to
> of_node_put() outside of the loop. Under normal loop exit, the node
> has already had of_node_put() called, so the extra call results in:
>
> [ 8.228020] ERROR: Bad of_node_put() on /soc@0/pci@848000000000/mrml-bridge0@1,0/bgx0/xlaui00
> [ 8.239433] CPU: 16 PID: 608 Comm: systemd-udevd Not tainted 4.6.0-rc1-numa+ #157
> [ 8.247380] Hardware name: www.cavium.com EBB8800/EBB8800, BIOS 0.3 Mar 2 2016
> [ 8.273541] Call trace:
> [ 8.273550] [<fffffc0008097364>] dump_backtrace+0x0/0x210
> [ 8.273557] [<fffffc0008097598>] show_stack+0x24/0x2c
> [ 8.273560] [<fffffc0008399ed0>] dump_stack+0x8c/0xb4
> [ 8.273566] [<fffffc00085aa828>] of_node_release+0xa8/0xac
> [ 8.273570] [<fffffc000839cad8>] kobject_cleanup+0x8c/0x194
> [ 8.273573] [<fffffc000839c97c>] kobject_put+0x44/0x6c
> [ 8.273576] [<fffffc00085a9ab0>] of_node_put+0x24/0x30
> [ 8.273587] [<fffffc0000bd0f74>] bgx_probe+0x17c/0xcd8 [thunder_bgx]
> [ 8.273591] [<fffffc00083ed220>] pci_device_probe+0xa0/0x114
> [ 8.273596] [<fffffc0008473fbc>] driver_probe_device+0x178/0x418
> [ 8.273599] [<fffffc000847435c>] __driver_attach+0x100/0x118
> [ 8.273602] [<fffffc0008471b58>] bus_for_each_dev+0x6c/0xac
> [ 8.273605] [<fffffc0008473884>] driver_attach+0x30/0x38
> [ 8.273608] [<fffffc00084732f4>] bus_add_driver+0x1f8/0x29c
> [ 8.273611] [<fffffc0008475028>] driver_register+0x70/0x110
> [ 8.273617] [<fffffc00083ebf08>] __pci_register_driver+0x60/0x6c
> [ 8.273623] [<fffffc0000bf0040>] bgx_init_module+0x40/0x48 [thunder_bgx]
> [ 8.273626] [<fffffc0008090d04>] do_one_initcall+0xcc/0x1c0
> [ 8.273631] [<fffffc0008198abc>] do_init_module+0x68/0x1c8
> [ 8.273635] [<fffffc0008125668>] load_module+0xf44/0x11f4
> [ 8.273638] [<fffffc0008125b64>] SyS_finit_module+0xb8/0xe0
> [ 8.273641] [<fffffc0008093b30>] el0_svc_naked+0x24/0x28
>
> Go back to the previous (correct) code that only did the extra
> of_node_put() call on early exit from the loop.
>
> Signed-off-by: David Daney <david.daney@cavium.com>
> ---
> drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> index 9679515..d20539a 100644
> --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> @@ -1011,10 +1011,11 @@ static int bgx_init_of_phy(struct bgx *bgx)
> }
>
> lmac++;
> - if (lmac == MAX_LMAC_PER_BGX)
> + if (lmac == MAX_LMAC_PER_BGX) {
> + of_node_put(node);
> break;
> + }
> }
> - of_node_put(node);
> return 0;
>
> defer:
>
^ permalink raw reply
* Re: [RFC PATCH v2 2/5] net: add ndo to set bpf prog in adapter rx
From: Brenden Blanco @ 2016-04-08 16:39 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel,
eric.dumazet, ecree, john.fastabend, tgraf, johannes,
eranlinuxmellanox, lorenzo
In-Reply-To: <20160408113858.4d39b274@redhat.com>
On Fri, Apr 08, 2016 at 11:38:58AM +0200, Jesper Dangaard Brouer wrote:
>
> On Thu, 7 Apr 2016 21:48:47 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
>
> > Add two new set/get netdev ops for drivers implementing the
> > BPF_PROG_TYPE_PHYS_DEV filter.
> >
> > Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> [...]
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index cb4e508..3acf732 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> [...]
> > @@ -1102,6 +1103,14 @@ struct tc_to_netdev {
> > * appropriate rx headroom value allows avoiding skb head copy on
> > * forward. Setting a negative value resets the rx headroom to the
> > * default value.
> > + * int (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog);
> > + * This function is used to set or clear a bpf program used in the
> > + * earliest stages of packet rx. The prog will have been loaded as
> > + * BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling
> > + * bpf_prog_put on any old progs that are stored, but not on the passed
> > + * in prog.
> > + * bool (*ndo_bpf_get)(struct net_device *dev);
> > + * This function is used to check if a bpf program is set on the device.
> > *
>
> This interface for the entire device, right. I can imagine that users
> want to attach a eBPF program per RX queue. Can we adapt the interface
> to support this? (could default to adding it all queues)
>
Currently yes, for the entire device. I don't see rx queue exposed in
common setlink APIs. Wouldn't this be available only through ethtool,
generally? That would be a significant change to the api, but not a lot
of code. I would defer to others on which is cleaner. An alternative
could be to always run the program, but expose the queue number in
struct bpf_phys_dev_md. That is not as flexible since the program is
still shared, but maybe still useful.
>
> I'm also wondering if we should add a "flags" parameter. Or maybe we
> can extend 'struct bpf_prog' with I have in mind.
>
> When the eBPF program is attached to a RX queue, I want to know if the
> program want to modify packet-data, up-front.
>
> The problem is that most drivers use dma_sync, which means that data is
> considered 'read-only' (the "considered" part depend on DMA engine, and
> we might find a DMA loop-hole for some configs).
> If the program want to write, the driver have the option of
> reconfiguring the driver routine to use dma_unmap, before handing over
> the page. Or driver can also choose to realloc the specific RX ring
> queue pages as single pages (using dma_map/unmap consistently).
> This also allow us to give a return code indicating given driver does
> not support writable packet-pages (rejecting program).
When write-mode is enabled for this prog type, we'll add the flag. I
don't want to add unused flags prematurely. When we add such support, it
should be available in the bpf_prog struct, similar to the cb_access or
dst_needed bool fields.
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [net-next 00/18][pull request] 10GbE Intel Wired LAN Driver Updates 2016-04-07
From: David Miller @ 2016-04-08 16:32 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene, john.ronciak
In-Reply-To: <1460085673-87056-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 7 Apr 2016 20:20:55 -0700
> This series contains updates to ixgbe and ixgbevf.
Pulled, thanks Jeff.
^ permalink raw reply
* [PATCH] net: thunderx: Fix broken of_node_put() code.
From: David Daney @ 2016-04-01 1:01 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: linux-kernel, linux-arm-kernel, Robert Richter, Sunil Goutham,
David Daney
From: David Daney <david.daney@cavium.com>
commit b7d3e3d3d21a ("net: thunderx: Don't leak phy device references
on -EPROBE_DEFER condition.") incorrectly moved the call to
of_node_put() outside of the loop. Under normal loop exit, the node
has already had of_node_put() called, so the extra call results in:
[ 8.228020] ERROR: Bad of_node_put() on /soc@0/pci@848000000000/mrml-bridge0@1,0/bgx0/xlaui00
[ 8.239433] CPU: 16 PID: 608 Comm: systemd-udevd Not tainted 4.6.0-rc1-numa+ #157
[ 8.247380] Hardware name: www.cavium.com EBB8800/EBB8800, BIOS 0.3 Mar 2 2016
[ 8.273541] Call trace:
[ 8.273550] [<fffffc0008097364>] dump_backtrace+0x0/0x210
[ 8.273557] [<fffffc0008097598>] show_stack+0x24/0x2c
[ 8.273560] [<fffffc0008399ed0>] dump_stack+0x8c/0xb4
[ 8.273566] [<fffffc00085aa828>] of_node_release+0xa8/0xac
[ 8.273570] [<fffffc000839cad8>] kobject_cleanup+0x8c/0x194
[ 8.273573] [<fffffc000839c97c>] kobject_put+0x44/0x6c
[ 8.273576] [<fffffc00085a9ab0>] of_node_put+0x24/0x30
[ 8.273587] [<fffffc0000bd0f74>] bgx_probe+0x17c/0xcd8 [thunder_bgx]
[ 8.273591] [<fffffc00083ed220>] pci_device_probe+0xa0/0x114
[ 8.273596] [<fffffc0008473fbc>] driver_probe_device+0x178/0x418
[ 8.273599] [<fffffc000847435c>] __driver_attach+0x100/0x118
[ 8.273602] [<fffffc0008471b58>] bus_for_each_dev+0x6c/0xac
[ 8.273605] [<fffffc0008473884>] driver_attach+0x30/0x38
[ 8.273608] [<fffffc00084732f4>] bus_add_driver+0x1f8/0x29c
[ 8.273611] [<fffffc0008475028>] driver_register+0x70/0x110
[ 8.273617] [<fffffc00083ebf08>] __pci_register_driver+0x60/0x6c
[ 8.273623] [<fffffc0000bf0040>] bgx_init_module+0x40/0x48 [thunder_bgx]
[ 8.273626] [<fffffc0008090d04>] do_one_initcall+0xcc/0x1c0
[ 8.273631] [<fffffc0008198abc>] do_init_module+0x68/0x1c8
[ 8.273635] [<fffffc0008125668>] load_module+0xf44/0x11f4
[ 8.273638] [<fffffc0008125b64>] SyS_finit_module+0xb8/0xe0
[ 8.273641] [<fffffc0008093b30>] el0_svc_naked+0x24/0x28
Go back to the previous (correct) code that only did the extra
of_node_put() call on early exit from the loop.
Signed-off-by: David Daney <david.daney@cavium.com>
---
drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 9679515..d20539a 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -1011,10 +1011,11 @@ static int bgx_init_of_phy(struct bgx *bgx)
}
lmac++;
- if (lmac == MAX_LMAC_PER_BGX)
+ if (lmac == MAX_LMAC_PER_BGX) {
+ of_node_put(node);
break;
+ }
}
- of_node_put(node);
return 0;
defer:
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2] packet: uses kfree_skb() for drops or errors.
From: Weongyo Jeong @ 2016-04-08 16:27 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: Network Development, David S. Miller, Willem de Bruijn
In-Reply-To: <CAF=yD-K9f_kPB5itNW_xdg8LOuSGhNsmb0B1Ur8fR12LN=_Syw@mail.gmail.com>
On Thu, Apr 07, 2016 at 12:06:12PM -0400, Willem de Bruijn wrote:
> On Wed, Apr 6, 2016 at 5:14 PM, Weongyo Jeong <weongyo.linux@gmail.com> wrote:
> > consume_skb() isn't for drop or error cases
>
> for drop or error -> for error
>
> > that kfree_skb() is more proper
> > one. At this patch, it fixed tpacket_rcv() and packet_rcv() to be
> > consistent for error or non-error cases letting perf trace its event
> > properly.
> >
> > Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
>
> Don't forget to add the target to your subject line: PATCH net-next v3.
>
> > ---
> > net/packet/af_packet.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 1ecfa71..cd100cf 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -2040,7 +2040,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
> > struct sockaddr_ll *sll;
> > struct packet_sock *po;
> > u8 *skb_head = skb->data;
> > - int skb_len = skb->len;
> > + int err = 0, skb_len = skb->len;
>
> bool
>
> Otherwise looks good.
Thank you for review Willem. I just had submitted v3 version.
Regards,
Weongyo Jeong
^ permalink raw reply
* [PATCH net-next v3] packet: uses kfree_skb() for errors.
From: Weongyo Jeong @ 2016-04-08 16:25 UTC (permalink / raw)
To: netdev; +Cc: Weongyo Jeong, David S. Miller, Willem de Bruijn
consume_skb() isn't for error cases that kfree_skb() is more proper
one. At this patch, it fixed tpacket_rcv() and packet_rcv() to be
consistent for error or non-error cases letting perf trace its event
properly.
Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
---
net/packet/af_packet.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1ecfa71..4e054bb 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2042,6 +2042,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
u8 *skb_head = skb->data;
int skb_len = skb->len;
unsigned int snaplen, res;
+ bool err = false;
if (skb->pkt_type == PACKET_LOOPBACK)
goto drop;
@@ -2130,6 +2131,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
return 0;
drop_n_acct:
+ err = true;
spin_lock(&sk->sk_receive_queue.lock);
po->stats.stats1.tp_drops++;
atomic_inc(&sk->sk_drops);
@@ -2141,7 +2143,10 @@ drop_n_restore:
skb->len = skb_len;
}
drop:
- consume_skb(skb);
+ if (!err)
+ consume_skb(skb);
+ else
+ kfree_skb(skb);
return 0;
}
@@ -2160,6 +2165,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
struct sk_buff *copy_skb = NULL;
struct timespec ts;
__u32 ts_status;
+ bool err = false;
/* struct tpacket{2,3}_hdr is aligned to a multiple of TPACKET_ALIGNMENT.
* We may add members to them until current aligned size without forcing
@@ -2367,10 +2373,14 @@ drop_n_restore:
skb->len = skb_len;
}
drop:
- kfree_skb(skb);
+ if (!err)
+ consume_skb(skb);
+ else
+ kfree_skb(skb);
return 0;
drop_n_account:
+ err = true;
po->stats.stats1.tp_drops++;
spin_unlock(&sk->sk_receive_queue.lock);
--
2.1.3
^ permalink raw reply related
* Re: [Lsf] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Alexander Duyck @ 2016-04-08 16:12 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Waskiewicz, PJ, lsf@lists.linux-foundation.org,
linux-mm@kvack.org, netdev@vger.kernel.org, bblanco@plumgrid.com,
alexei.starovoitov@gmail.com,
James.Bottomley@HansenPartnership.com, tom@herbertland.com,
lsf-pc@lists.linux-foundation.org
In-Reply-To: <20160407223853.6f4c7dbd@redhat.com>
On Thu, Apr 7, 2016 at 1:38 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Thu, 7 Apr 2016 19:48:50 +0000
> "Waskiewicz, PJ" <PJ.Waskiewicz@netapp.com> wrote:
>
>> On Thu, 2016-04-07 at 16:17 +0200, Jesper Dangaard Brouer wrote:
>> > (Topic proposal for MM-summit)
>> >
>> > Network Interface Cards (NIC) drivers, and increasing speeds stress
>> > the page-allocator (and DMA APIs). A number of driver specific
>> > open-coded approaches exists that work-around these bottlenecks in
>> > the
>> > page allocator and DMA APIs. E.g. open-coded recycle mechanisms, and
>> > allocating larger pages and handing-out page "fragments".
>> >
>> > I'm proposing a generic page-pool recycle facility, that can cover
>> > the
>> > driver use-cases, increase performance and open up for zero-copy RX.
>>
>> Is this based on the page recycle stuff from ixgbe that used to be in
>> the driver? If so I'd really like to be part of the discussion.
>
> Okay, so it is not part of the driver any-longer? I've studied the
> current ixgbe driver (and other NIC drivers) closely. Do you have some
> code pointers, to this older code?
No, it is still in the driver. I think when PJ said "used to" he was
referring to the fact that the code was present in the driver back
when he was working on it at Intel.
You have to realize that the page reuse code has been in the Intel
drivers for a long time. I think I introduced it originally on igb in
July of 2008 as page recycling, commit bf36c1a0040c ("igb: add page
recycling support"), and it was copied over to ixgbe in September,
commit 762f4c571058 ("ixgbe: recycle pages in packet split mode").
> The likely-fastest recycle code I've see is in the bnx2x driver. If
> you are interested see: bnx2x_reuse_rx_data(). Again is it a bit
> open-coded produce/consumer ring queue (which would be nice to also
> cleanup).
Yeah, that is essentially the same kind of code we have in
ixgbe_reuse_rx_page(). From what I can tell though the bnx2x doesn't
actually reuse the buffers in the common case. That function is only
called in the copy-break and error cases to recycle the buffer so that
it doesn't have to be freed.
> To amortize the cost of allocating a single page, most other drivers
> use the trick of allocating a larger (compound) page, and partition
> this page into smaller "fragments". Which also amortize the cost of
> dma_map/unmap (important on non-x86).
Right. The only reason why I went the reuse route instead of the
compound page route is that I had speculated that you could still
bottleneck yourself since the issue I was trying to avoid was the
dma_map call hitting a global lock in IOMMU enabled systems. With the
larger page route I could at best reduce the number of map calls to
1/16 or 1/32 of what it was. By doing the page reuse I actually bring
it down to something approaching 0 as long as the buffers are being
freed in a reasonable timeframe. This way the code would scale so I
wouldn't have to worry about how many rings were active at the same
time.
As PJ can attest we even saw bugs where the page reuse actually was
too effective in some cases leading to us carrying memory from one
node to another when the interrupt was migrated. That was why we had
to add the code to force us to free the page if it came from another
node.
> This is actually problematic performance wise, because packet-data
> (in these page fragments) only get DMA_sync'ed, and is thus considered
> "read-only". As netstack need to write packet headers, yet-another
> (writable) memory area is allocated per packet (plus the SKB meta-data
> struct).
Have you done any actual testing with build_skb recently that shows
how much of a gain there is to be had? I'm just curious as I know I
saw a gain back in the day, but back when I ran that test we didn't
have things like napi_alloc_skb running around which should be a
pretty big win. It might be useful to hack a driver such as ixgbe to
use build_skb and see if it is even worth the trouble to do it
properly.
Here is a patch I had generated back in 2013 to convert ixgbe over to
using build_skb, https://patchwork.ozlabs.org/patch/236044/. You
might be able to updated to make it work against current ixgbe and
then could come back to us with data on what the actual gain is. My
thought is the gain should have significantly decreased since back in
the day as we optimized napi_alloc_skb to the point where I think the
only real difference is probably the memcpy to pull the headers from
the page.
- Alex
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [PATCHv3 net-next 6/6] bridge: a netlink notification should be sent when those attributes are changed by ioctl
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: davem, Stephen Hemminger, nikolay
In-Reply-To: <cover.1460131308.git.lucien.xin@gmail.com>
Now when we change the attributes of bridge or br_port by netlink,
a relevant netlink notification will be sent, but if we change them
by ioctl or sysfs, no notification will be sent.
We should ensure that whenever those attributes change internally or from
sysfs/ioctl, that a netlink notification is sent out to listeners.
Also, NetworkManager will use this in the future to listen for out-of-band
bridge master attribute updates and incorporate them into the runtime
configuration.
This patch is used for ioctl.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_ioctl.c | 40 ++++++++++++++++++++++++----------------
1 file changed, 24 insertions(+), 16 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index 263b4de..f8fc624 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -112,7 +112,9 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
{
struct net_bridge *br = netdev_priv(dev);
+ struct net_bridge_port *p = NULL;
unsigned long args[4];
+ int ret = -EOPNOTSUPP;
if (copy_from_user(args, rq->ifr_data, sizeof(args)))
return -EFAULT;
@@ -182,25 +184,29 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- return br_set_forward_delay(br, args[1]);
+ ret = br_set_forward_delay(br, args[1]);
+ break;
case BRCTL_SET_BRIDGE_HELLO_TIME:
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- return br_set_hello_time(br, args[1]);
+ ret = br_set_hello_time(br, args[1]);
+ break;
case BRCTL_SET_BRIDGE_MAX_AGE:
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- return br_set_max_age(br, args[1]);
+ ret = br_set_max_age(br, args[1]);
+ break;
case BRCTL_SET_AGEING_TIME:
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- return br_set_ageing_time(br, args[1]);
+ ret = br_set_ageing_time(br, args[1]);
+ break;
case BRCTL_GET_PORT_INFO:
{
@@ -240,20 +246,19 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
return -EPERM;
br_stp_set_enabled(br, args[1]);
- return 0;
+ ret = 0;
+ break;
case BRCTL_SET_BRIDGE_PRIORITY:
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
br_stp_set_bridge_priority(br, args[1]);
- return 0;
+ ret = 0;
+ break;
case BRCTL_SET_PORT_PRIORITY:
{
- struct net_bridge_port *p;
- int ret;
-
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
@@ -263,14 +268,11 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
else
ret = br_stp_set_port_priority(p, args[2]);
spin_unlock_bh(&br->lock);
- return ret;
+ break;
}
case BRCTL_SET_PATH_COST:
{
- struct net_bridge_port *p;
- int ret;
-
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
@@ -280,8 +282,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
else
ret = br_stp_set_path_cost(p, args[2]);
spin_unlock_bh(&br->lock);
-
- return ret;
+ break;
}
case BRCTL_GET_FDB_ENTRIES:
@@ -289,7 +290,14 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
args[2], args[3]);
}
- return -EOPNOTSUPP;
+ if (!ret) {
+ if (p)
+ br_ifinfo_notify(RTM_NEWLINK, p);
+ else
+ netdev_state_change(br->dev);
+ }
+
+ return ret;
}
static int old_deviceless(struct net *net, void __user *uarg)
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next 5/6] bridge: a netlink notification should be sent when those attributes are changed by br_sysfs_if
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: davem, Stephen Hemminger, nikolay
In-Reply-To: <cover.1460131308.git.lucien.xin@gmail.com>
Now when we change the attributes of bridge or br_port by netlink,
a relevant netlink notification will be sent, but if we change them
by ioctl or sysfs, no notification will be sent.
We should ensure that whenever those attributes change internally or from
sysfs/ioctl, that a netlink notification is sent out to listeners.
Also, NetworkManager will use this in the future to listen for out-of-band
bridge master attribute updates and incorporate them into the runtime
configuration.
This patch is used for br_sysfs_if, and we also move br_ifinfo_notify out
of store_flag.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_sysfs_if.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index efe415a..1e04d4d 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -61,7 +61,6 @@ static int store_flag(struct net_bridge_port *p, unsigned long v,
if (flags != p->flags) {
p->flags = flags;
br_port_flags_change(p, mask);
- br_ifinfo_notify(RTM_NEWLINK, p);
}
return 0;
}
@@ -253,8 +252,10 @@ static ssize_t brport_store(struct kobject *kobj,
spin_lock_bh(&p->br->lock);
ret = brport_attr->store(p, val);
spin_unlock_bh(&p->br->lock);
- if (ret == 0)
+ if (!ret) {
+ br_ifinfo_notify(RTM_NEWLINK, p);
ret = count;
+ }
}
rtnl_unlock();
}
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next 3/6] bridge: simplify the stp_state_store by calling store_bridge_parm
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: davem, Stephen Hemminger, nikolay
In-Reply-To: <cover.1460131308.git.lucien.xin@gmail.com>
There are some repetitive codes in stp_state_store, we can remove
them by calling store_bridge_parm.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_sysfs_br.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 137cd3b..f9d484e 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -128,27 +128,21 @@ static ssize_t stp_state_show(struct device *d,
}
-static ssize_t stp_state_store(struct device *d,
- struct device_attribute *attr, const char *buf,
- size_t len)
+static int set_stp_state(struct net_bridge *br, unsigned long val)
{
- struct net_bridge *br = to_bridge(d);
- char *endp;
- unsigned long val;
-
- if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
- return -EPERM;
-
- val = simple_strtoul(buf, &endp, 0);
- if (endp == buf)
- return -EINVAL;
-
if (!rtnl_trylock())
return restart_syscall();
br_stp_set_enabled(br, val);
rtnl_unlock();
- return len;
+ return 0;
+}
+
+static ssize_t stp_state_store(struct device *d,
+ struct device_attribute *attr, const char *buf,
+ size_t len)
+{
+ return store_bridge_parm(d, buf, len, set_stp_state);
}
static DEVICE_ATTR_RW(stp_state);
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next 4/6] bridge: a netlink notification should be sent when those attributes are changed by br_sysfs_br
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: davem, Stephen Hemminger, nikolay
In-Reply-To: <cover.1460131308.git.lucien.xin@gmail.com>
Now when we change the attributes of bridge or br_port by netlink,
a relevant netlink notification will be sent, but if we change them
by ioctl or sysfs, no notification will be sent.
We should ensure that whenever those attributes change internally or from
sysfs/ioctl, that a netlink notification is sent out to listeners.
Also, NetworkManager will use this in the future to listen for out-of-band
bridge master attribute updates and incorporate them into the runtime
configuration.
This patch is used for br_sysfs_br. and we also need to remove some
rtnl_trylock in old functions so that we can call it in a common one.
For group_addr_store, we cannot make it use store_bridge_parm, because
it's not a string-to-long convert, we will add notification on it
individually.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_sysfs_br.c | 21 +++++++++------------
net/bridge/br_vlan.c | 30 +++++-------------------------
2 files changed, 14 insertions(+), 37 deletions(-)
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index f9d484e..70bddfd 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -43,7 +43,14 @@ static ssize_t store_bridge_parm(struct device *d,
if (endp == buf)
return -EINVAL;
+ if (!rtnl_trylock())
+ return restart_syscall();
+
err = (*set)(br, val);
+ if (!err)
+ netdev_state_change(br->dev);
+ rtnl_unlock();
+
return err ? err : len;
}
@@ -101,15 +108,7 @@ static ssize_t ageing_time_show(struct device *d,
static int set_ageing_time(struct net_bridge *br, unsigned long val)
{
- int ret;
-
- if (!rtnl_trylock())
- return restart_syscall();
-
- ret = br_set_ageing_time(br, val);
- rtnl_unlock();
-
- return ret;
+ return br_set_ageing_time(br, val);
}
static ssize_t ageing_time_store(struct device *d,
@@ -130,10 +129,7 @@ static ssize_t stp_state_show(struct device *d,
static int set_stp_state(struct net_bridge *br, unsigned long val)
{
- if (!rtnl_trylock())
- return restart_syscall();
br_stp_set_enabled(br, val);
- rtnl_unlock();
return 0;
}
@@ -315,6 +311,7 @@ static ssize_t group_addr_store(struct device *d,
br->group_addr_set = true;
br_recalculate_fwd_mask(br);
+ netdev_state_change(br->dev);
rtnl_unlock();
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 9309bb4..e001152 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -651,15 +651,7 @@ int __br_vlan_filter_toggle(struct net_bridge *br, unsigned long val)
int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val)
{
- int err;
-
- if (!rtnl_trylock())
- return restart_syscall();
-
- err = __br_vlan_filter_toggle(br, val);
- rtnl_unlock();
-
- return err;
+ return __br_vlan_filter_toggle(br, val);
}
int __br_vlan_set_proto(struct net_bridge *br, __be16 proto)
@@ -713,18 +705,10 @@ err_filt:
int br_vlan_set_proto(struct net_bridge *br, unsigned long val)
{
- int err;
-
if (val != ETH_P_8021Q && val != ETH_P_8021AD)
return -EPROTONOSUPPORT;
- if (!rtnl_trylock())
- return restart_syscall();
-
- err = __br_vlan_set_proto(br, htons(val));
- rtnl_unlock();
-
- return err;
+ return __br_vlan_set_proto(br, htons(val));
}
static bool vlan_default_pvid(struct net_bridge_vlan_group *vg, u16 vid)
@@ -855,21 +839,17 @@ int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val)
if (val >= VLAN_VID_MASK)
return -EINVAL;
- if (!rtnl_trylock())
- return restart_syscall();
-
if (pvid == br->default_pvid)
- goto unlock;
+ goto out;
/* Only allow default pvid change when filtering is disabled */
if (br->vlan_enabled) {
pr_info_once("Please disable vlan filtering to change default_pvid\n");
err = -EPERM;
- goto unlock;
+ goto out;
}
err = __br_vlan_set_default_pvid(br, pvid);
-unlock:
- rtnl_unlock();
+out:
return err;
}
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next 2/6] bridge: simplify the forward_delay_store by calling store_bridge_parm
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: davem, Stephen Hemminger, nikolay
In-Reply-To: <cover.1460131308.git.lucien.xin@gmail.com>
There are some repetitive codes in forward_delay_store, we can remove
them by calling store_bridge_parm.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_sysfs_br.c | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index c48f6b0..137cd3b 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -160,29 +160,22 @@ static ssize_t group_fwd_mask_show(struct device *d,
return sprintf(buf, "%#x\n", br->group_fwd_mask);
}
-
-static ssize_t group_fwd_mask_store(struct device *d,
- struct device_attribute *attr,
- const char *buf,
- size_t len)
+static int set_group_fwd_mask(struct net_bridge *br, unsigned long val)
{
- struct net_bridge *br = to_bridge(d);
- char *endp;
- unsigned long val;
-
- if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
- return -EPERM;
-
- val = simple_strtoul(buf, &endp, 0);
- if (endp == buf)
- return -EINVAL;
-
if (val & BR_GROUPFWD_RESTRICTED)
return -EINVAL;
br->group_fwd_mask = val;
- return len;
+ return 0;
+}
+
+static ssize_t group_fwd_mask_store(struct device *d,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t len)
+{
+ return store_bridge_parm(d, buf, len, set_group_fwd_mask);
}
static DEVICE_ATTR_RW(group_fwd_mask);
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next 1/6] bridge: simplify the flush_store by calling store_bridge_parm
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: nikolay, davem
In-Reply-To: <cover.1460131308.git.lucien.xin@gmail.com>
There are some repetitive codes in flush_store, we can remove
them by calling store_bridge_parm, also, it would send rtnl notification
after we add it in store_bridge_parm in the following patches.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/bridge/br_sysfs_br.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 6b80914..c48f6b0 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -336,17 +336,17 @@ static ssize_t group_addr_store(struct device *d,
static DEVICE_ATTR_RW(group_addr);
+static int set_flush(struct net_bridge *br, unsigned long val)
+{
+ br_fdb_flush(br);
+ return 0;
+}
+
static ssize_t flush_store(struct device *d,
struct device_attribute *attr,
const char *buf, size_t len)
{
- struct net_bridge *br = to_bridge(d);
-
- if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
- return -EPERM;
-
- br_fdb_flush(br);
- return len;
+ return store_bridge_parm(d, buf, len, set_flush);
}
static DEVICE_ATTR_WO(flush);
--
2.1.0
^ permalink raw reply related
* [PATCHv3 net-next 0/6] bridge: support sending rntl info when we set attributes through sysfs/ioctl
From: Xin Long @ 2016-04-08 16:03 UTC (permalink / raw)
To: network dev, bridge; +Cc: nikolay, davem
This patchset is used to support sending rntl info to user in some places,
and ensure that whenever those attributes change internally or from sysfs,
that a netlink notification is sent out to listeners.
It also make some adjustment in bridge sysfs so that we can implement this
easily.
I've done some tests on this patchset, like:
[br_sysfs]
1. change all the attribute values of br or brif:
$ echo $value > /sys/class/net/br0/bridge/{*}
$ echo $value > /sys/class/net/br0/brif/eth1/{*}
2. meanwhile, on another terminal to observe the msg:
$ bridge monitor
[br_ioctl]
1. in bridge-utils package, do some changes in br_set, let brctl command
use ioctl to set attribute:
if ((ret = set_sysfs(path, value)) < 0) { -->
if (1) {
$ brctl set*
2. meanwhile, on another terminal to observe the msg:
$ bridge monitor
This test covers all the attributes that brctl and sysfs support to set.
Xin Long (6):
bridge: simplify the flush_store by calling store_bridge_parm
bridge: simplify the forward_delay_store by calling store_bridge_parm
bridge: simplify the stp_state_store by calling store_bridge_parm
bridge: a netlink notification should be sent when those attributes
are changed by br_sysfs_br
bridge: a netlink notification should be sent when those attributes
are changed by br_sysfs_if
bridge: a netlink notification should be sent when those attributes
are changed by ioctl
net/bridge/br_ioctl.c | 40 ++++++++++++++---------
net/bridge/br_sysfs_br.c | 84 ++++++++++++++++++++----------------------------
net/bridge/br_sysfs_if.c | 5 +--
net/bridge/br_vlan.c | 30 +++--------------
4 files changed, 66 insertions(+), 93 deletions(-)
--
2.1.0
^ permalink raw reply
* Re: FWD: [PATCH v2] Marvell phy: add fiber status check for some components
From: Charles-Antoine Couret @ 2016-04-08 15:45 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli; +Cc: netdev
In-Reply-To: <20160404132552.GH21828@lunn.ch>
Le 04/04/2016 15:25, Andrew Lunn a écrit :
> Should we be using the old mechanism to swap between TP, BNC and AUI
> to swap between copper and fibre?
>
> Andrew
What is this method ?
A specific ioctl ?
Regards,
Charles-Antoine
^ permalink raw reply
* Re: [patch net-next 0/5] mlxsw: small driver update
From: Jiri Pirko @ 2016-04-08 15:51 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460130325-14931-1-git-send-email-jiri@resnulli.us>
Fri, Apr 08, 2016 at 05:45:20PM CEST, jiri@resnulli.us wrote:
>From: Jiri Pirko <jiri@mellanox.com>
>
>Cosmetics, in preparation to sharedbuffer patchset.
Dave, I just realized there is dependency on:
"devlink: remove implicit type set in port register" which I sent couple
of minutes after this patchset. I can either resend in bulk, or if you
could apply in order, that would be great.
Thanks and sorry, owe you another beer :)
>
>Jiri Pirko (5):
> mlxsw: Move devlink port registration into common core code
> mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*
> mlxsw: Do not pass around driver_priv directly
> mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM
> mlxsw: reg: Fix SBPM register name
>
> drivers/net/ethernet/mellanox/mlxsw/core.c | 56 ++++++++++++++--------
> drivers/net/ethernet/mellanox/mlxsw/core.h | 26 +++++++---
> drivers/net/ethernet/mellanox/mlxsw/reg.h | 27 ++++-------
> drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 52 +++++++++-----------
> drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 3 +-
> .../net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 20 ++++----
> drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 42 +++++++---------
> 7 files changed, 114 insertions(+), 112 deletions(-)
>
>--
>2.5.5
>
^ permalink raw reply
* [PATCH] net: force inlining of netif_tx_start/stop_queue, sock_hold, __sock_put
From: Denys Vlasenko @ 2016-04-08 15:51 UTC (permalink / raw)
To: David S. Miller; +Cc: Denys Vlasenko, linux-kernel, netdev, netfilter-devel
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Arguably, gcc should do better, but gcc people aren't willing
to invest time into it, asking to use __always_inline instead.
With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.
netif_tx_stop_queue: 207 copies, 590 calls:
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 80 8f e0 01 00 00 01 lock orb $0x1,0x1e0(%rdi)
5d pop %rbp
c3 retq
netif_tx_start_queue: 47 copies, 111 calls
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 80 a7 e0 01 00 00 fe lock andb $0xfe,0x1e0(%rdi)
5d pop %rbp
c3 retq
sock_hold: 39 copies, 124 calls
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 ff 87 80 00 00 00 lock incl 0x80(%rdi)
5d pop %rbp
c3 retq
__sock_put: 6 copies, 13 calls
55 push %rbp
48 89 e5 mov %rsp,%rbp
f0 ff 8f 80 00 00 00 lock decl 0x80(%rdi)
5d pop %rbp
c3 retq
This patch fixes this via s/inline/__always_inline/.
Code size decrease after the patch is ~2.5k:
text data bss dec hex filename
56719876 56364551 36196352 149280779 8e5d80b vmlinux_before
56717440 56364551 36196352 149278343 8e5ce87 vmlinux
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: netfilter-devel@vger.kernel.org
---
include/linux/netdevice.h | 4 ++--
include/net/sock.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb0d5d0..f924ddc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2801,7 +2801,7 @@ static inline void netif_tx_schedule_all(struct net_device *dev)
netif_schedule_queue(netdev_get_tx_queue(dev, i));
}
-static inline void netif_tx_start_queue(struct netdev_queue *dev_queue)
+static __always_inline void netif_tx_start_queue(struct netdev_queue *dev_queue)
{
clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
}
@@ -2851,7 +2851,7 @@ static inline void netif_tx_wake_all_queues(struct net_device *dev)
}
}
-static inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
+static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
{
set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
}
diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e0..fd15eb1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -564,7 +564,7 @@ static inline bool __sk_del_node_init(struct sock *sk)
modifications.
*/
-static inline void sock_hold(struct sock *sk)
+static __always_inline void sock_hold(struct sock *sk)
{
atomic_inc(&sk->sk_refcnt);
}
@@ -572,7 +572,7 @@ static inline void sock_hold(struct sock *sk)
/* Ungrab socket in the context, which assumes that socket refcnt
cannot hit zero, f.e. it is true in context of any socketcall.
*/
-static inline void __sock_put(struct sock *sk)
+static __always_inline void __sock_put(struct sock *sk)
{
atomic_dec(&sk->sk_refcnt);
}
--
1.8.1.4
^ permalink raw reply related
* [patch net-next 2/2] devlink: share user_ptr pointer for both devlink and devlink_port
From: Jiri Pirko @ 2016-04-08 15:47 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460130437-15015-1-git-send-email-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Ptr to devlink structure can be easily obtained from
devlink_port->devlink. So share user_ptr[0] pointer for both and leave
user_ptr[1] free for other users.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
net/core/devlink.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 44f880d..b84cf0d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -119,7 +119,8 @@ static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink,
return devlink_port_get_from_attrs(devlink, info->attrs);
}
-#define DEVLINK_NL_FLAG_NEED_PORT BIT(0)
+#define DEVLINK_NL_FLAG_NEED_DEVLINK BIT(0)
+#define DEVLINK_NL_FLAG_NEED_PORT BIT(1)
static int devlink_nl_pre_doit(const struct genl_ops *ops,
struct sk_buff *skb, struct genl_info *info)
@@ -132,8 +133,9 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
mutex_unlock(&devlink_mutex);
return PTR_ERR(devlink);
}
- info->user_ptr[0] = devlink;
- if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
+ if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
+ info->user_ptr[0] = devlink;
+ } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
struct devlink_port *devlink_port;
mutex_lock(&devlink_port_mutex);
@@ -143,7 +145,7 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
mutex_unlock(&devlink_mutex);
return PTR_ERR(devlink_port);
}
- info->user_ptr[1] = devlink_port;
+ info->user_ptr[0] = devlink_port;
}
return 0;
}
@@ -356,8 +358,8 @@ out:
static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
struct genl_info *info)
{
- struct devlink *devlink = info->user_ptr[0];
- struct devlink_port *devlink_port = info->user_ptr[1];
+ struct devlink_port *devlink_port = info->user_ptr[0];
+ struct devlink *devlink = devlink_port->devlink;
struct sk_buff *msg;
int err;
@@ -436,8 +438,8 @@ static int devlink_port_type_set(struct devlink *devlink,
static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb,
struct genl_info *info)
{
- struct devlink *devlink = info->user_ptr[0];
- struct devlink_port *devlink_port = info->user_ptr[1];
+ struct devlink_port *devlink_port = info->user_ptr[0];
+ struct devlink *devlink = devlink_port->devlink;
int err;
if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) {
@@ -511,6 +513,7 @@ static const struct genl_ops devlink_nl_ops[] = {
.doit = devlink_nl_cmd_get_doit,
.dumpit = devlink_nl_cmd_get_dumpit,
.policy = devlink_nl_policy,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
{
@@ -533,12 +536,14 @@ static const struct genl_ops devlink_nl_ops[] = {
.doit = devlink_nl_cmd_port_split_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_PORT_UNSPLIT,
.doit = devlink_nl_cmd_port_unsplit_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
};
--
2.5.5
^ permalink raw reply related
* [patch net-next 1/2] devlink: remove implicit type set in port register
From: Jiri Pirko @ 2016-04-08 15:47 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
From: Jiri Pirko <jiri@mellanox.com>
As we rely on caller zeroing or correctly set the struct before the call,
this implicit type set is either no-op (DEVLINK_PORT_TYPE_NOTSET is 0)
or it rewrites wanted value. So remove this.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
net/core/devlink.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 590fa56..44f880d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -630,7 +630,6 @@ int devlink_port_register(struct devlink *devlink,
}
devlink_port->devlink = devlink;
devlink_port->index = port_index;
- devlink_port->type = DEVLINK_PORT_TYPE_NOTSET;
devlink_port->registered = true;
list_add_tail(&devlink_port->list, &devlink->port_list);
mutex_unlock(&devlink_port_mutex);
--
2.5.5
^ permalink raw reply related
* [patch net-next 4/5] mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM
From: Jiri Pirko @ 2016-04-08 15:45 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460130325-14931-1-git-send-email-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Same field, same values, so share the same enum.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/reg.h | 23 +++++++---------------
.../net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 20 +++++++++----------
2 files changed, 17 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 28f5b99..19bdc82 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3476,9 +3476,10 @@ static const struct mlxsw_reg_info mlxsw_reg_sbpr = {
.len = MLXSW_REG_SBPR_LEN,
};
-enum mlxsw_reg_sbpr_dir {
- MLXSW_REG_SBPR_DIR_INGRESS,
- MLXSW_REG_SBPR_DIR_EGRESS,
+/* shared direstion enum for SBPR, SBCM, SBPM */
+enum mlxsw_reg_sbxx_dir {
+ MLXSW_REG_SBXX_DIR_INGRESS,
+ MLXSW_REG_SBXX_DIR_EGRESS,
};
/* reg_sbpr_dir
@@ -3511,7 +3512,7 @@ enum mlxsw_reg_sbpr_mode {
MLXSW_ITEM32(reg, sbpr, mode, 0x08, 0, 4);
static inline void mlxsw_reg_sbpr_pack(char *payload, u8 pool,
- enum mlxsw_reg_sbpr_dir dir,
+ enum mlxsw_reg_sbxx_dir dir,
enum mlxsw_reg_sbpr_mode mode, u32 size)
{
MLXSW_REG_ZERO(sbpr, payload);
@@ -3553,11 +3554,6 @@ MLXSW_ITEM32(reg, sbcm, local_port, 0x00, 16, 8);
*/
MLXSW_ITEM32(reg, sbcm, pg_buff, 0x00, 8, 6);
-enum mlxsw_reg_sbcm_dir {
- MLXSW_REG_SBCM_DIR_INGRESS,
- MLXSW_REG_SBCM_DIR_EGRESS,
-};
-
/* reg_sbcm_dir
* Direction.
* Access: Index
@@ -3590,7 +3586,7 @@ MLXSW_ITEM32(reg, sbcm, max_buff, 0x1C, 0, 24);
MLXSW_ITEM32(reg, sbcm, pool, 0x24, 0, 4);
static inline void mlxsw_reg_sbcm_pack(char *payload, u8 local_port, u8 pg_buff,
- enum mlxsw_reg_sbcm_dir dir,
+ enum mlxsw_reg_sbxx_dir dir,
u32 min_buff, u32 max_buff, u8 pool)
{
MLXSW_REG_ZERO(sbcm, payload);
@@ -3630,11 +3626,6 @@ MLXSW_ITEM32(reg, sbpm, local_port, 0x00, 16, 8);
*/
MLXSW_ITEM32(reg, sbpm, pool, 0x00, 8, 4);
-enum mlxsw_reg_sbpm_dir {
- MLXSW_REG_SBPM_DIR_INGRESS,
- MLXSW_REG_SBPM_DIR_EGRESS,
-};
-
/* reg_sbpm_dir
* Direction.
* Access: Index
@@ -3661,7 +3652,7 @@ MLXSW_ITEM32(reg, sbpm, min_buff, 0x18, 0, 24);
MLXSW_ITEM32(reg, sbpm, max_buff, 0x1C, 0, 24);
static inline void mlxsw_reg_sbpm_pack(char *payload, u8 local_port, u8 pool,
- enum mlxsw_reg_sbpm_dir dir,
+ enum mlxsw_reg_sbxx_dir dir,
u32 min_buff, u32 max_buff)
{
MLXSW_REG_ZERO(sbpm, payload);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index 97c8d53..f58b1d3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -110,7 +110,7 @@ static int mlxsw_sp_port_headroom_init(struct mlxsw_sp_port *mlxsw_sp_port)
struct mlxsw_sp_sb_pool {
u8 pool;
- enum mlxsw_reg_sbpr_dir dir;
+ enum mlxsw_reg_sbxx_dir dir;
enum mlxsw_reg_sbpr_mode mode;
u32 size;
};
@@ -129,11 +129,11 @@ struct mlxsw_sp_sb_pool {
}
#define MLXSW_SP_SB_POOL_INGRESS(_pool, _size) \
- MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBPR_DIR_INGRESS, \
+ MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBXX_DIR_INGRESS, \
MLXSW_REG_SBPR_MODE_DYNAMIC, _size)
#define MLXSW_SP_SB_POOL_EGRESS(_pool, _size) \
- MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBPR_DIR_EGRESS, \
+ MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBXX_DIR_EGRESS, \
MLXSW_REG_SBPR_MODE_DYNAMIC, _size)
static const struct mlxsw_sp_sb_pool mlxsw_sp_sb_pools[] = {
@@ -173,7 +173,7 @@ struct mlxsw_sp_sb_cm {
u8 pg;
u8 tc;
} u;
- enum mlxsw_reg_sbcm_dir dir;
+ enum mlxsw_reg_sbxx_dir dir;
u32 min_buff;
u32 max_buff;
u8 pool;
@@ -189,15 +189,15 @@ struct mlxsw_sp_sb_cm {
}
#define MLXSW_SP_SB_CM_INGRESS(_pg, _min_buff, _max_buff) \
- MLXSW_SP_SB_CM(_pg, MLXSW_REG_SBCM_DIR_INGRESS, \
+ MLXSW_SP_SB_CM(_pg, MLXSW_REG_SBXX_DIR_INGRESS, \
_min_buff, _max_buff, 0)
#define MLXSW_SP_SB_CM_EGRESS(_tc, _min_buff, _max_buff) \
- MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBCM_DIR_EGRESS, \
+ MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBXX_DIR_EGRESS, \
_min_buff, _max_buff, 0)
#define MLXSW_SP_CPU_PORT_SB_CM_EGRESS(_tc) \
- MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBCM_DIR_EGRESS, 104, 2, 3)
+ MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBXX_DIR_EGRESS, 104, 2, 3)
static const struct mlxsw_sp_sb_cm mlxsw_sp_sb_cms[] = {
MLXSW_SP_SB_CM_INGRESS(0, MLXSW_SP_BYTES_TO_CELLS(10000), 8),
@@ -304,7 +304,7 @@ static int mlxsw_sp_cpu_port_sb_cms_init(struct mlxsw_sp *mlxsw_sp)
struct mlxsw_sp_sb_pm {
u8 pool;
- enum mlxsw_reg_sbpm_dir dir;
+ enum mlxsw_reg_sbxx_dir dir;
u32 min_buff;
u32 max_buff;
};
@@ -318,11 +318,11 @@ struct mlxsw_sp_sb_pm {
}
#define MLXSW_SP_SB_PM_INGRESS(_pool, _min_buff, _max_buff) \
- MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBPM_DIR_INGRESS, \
+ MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBXX_DIR_INGRESS, \
_min_buff, _max_buff)
#define MLXSW_SP_SB_PM_EGRESS(_pool, _min_buff, _max_buff) \
- MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBPM_DIR_EGRESS, \
+ MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBXX_DIR_EGRESS, \
_min_buff, _max_buff)
static const struct mlxsw_sp_sb_pm mlxsw_sp_sb_pms[] = {
--
2.5.5
^ permalink raw reply related
* [patch net-next 5/5] mlxsw: reg: Fix SBPM register name
From: Jiri Pirko @ 2016-04-08 15:45 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460130325-14931-1-git-send-email-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Fix copy&paste error and state the name of SBPM register correctly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/reg.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 19bdc82..57e4a63 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3598,8 +3598,8 @@ static inline void mlxsw_reg_sbcm_pack(char *payload, u8 local_port, u8 pg_buff,
mlxsw_reg_sbcm_pool_set(payload, pool);
}
-/* SBPM - Shared Buffer Class Management Register
- * ----------------------------------------------
+/* SBPM - Shared Buffer Port Management Register
+ * ---------------------------------------------
* The SBPM register configures and retrieves the shared buffer allocation
* and configuration according to Port-Pool, including the definition
* of the associated quota.
--
2.5.5
^ permalink raw reply related
* [patch net-next 3/5] mlxsw: Do not pass around driver_priv directly
From: Jiri Pirko @ 2016-04-08 15:45 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460130325-14931-1-git-send-email-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Instead of that, pass mlxsw_core and use a helper to get driver priv
from driver code. Looks much cleaner that way.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 19 +++++++++++--------
drivers/net/ethernet/mellanox/mlxsw/core.h | 11 +++++++----
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 17 +++++++++--------
drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 8 ++++----
4 files changed, 31 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 39161fb..3958195 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -114,6 +114,12 @@ struct mlxsw_core {
/* driver_priv has to be always the last item */
};
+void *mlxsw_core_driver_priv(struct mlxsw_core *mlxsw_core)
+{
+ return mlxsw_core->driver_priv;
+}
+EXPORT_SYMBOL(mlxsw_core_driver_priv);
+
struct mlxsw_rx_listener_item {
struct list_head list;
struct mlxsw_rx_listener rxl;
@@ -795,8 +801,7 @@ static int mlxsw_devlink_port_split(struct devlink *devlink,
return -EINVAL;
if (!mlxsw_core->driver->port_split)
return -EOPNOTSUPP;
- return mlxsw_core->driver->port_split(mlxsw_core->driver_priv,
- port_index, count);
+ return mlxsw_core->driver->port_split(mlxsw_core, port_index, count);
}
static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
@@ -808,8 +813,7 @@ static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
return -EINVAL;
if (!mlxsw_core->driver->port_unsplit)
return -EOPNOTSUPP;
- return mlxsw_core->driver->port_unsplit(mlxsw_core->driver_priv,
- port_index);
+ return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index);
}
static const struct devlink_ops mlxsw_devlink_ops = {
@@ -880,8 +884,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_devlink_register;
- err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
- mlxsw_bus_info);
+ err = mlxsw_driver->init(mlxsw_core, mlxsw_bus_info);
if (err)
goto err_driver_init;
@@ -892,7 +895,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
return 0;
err_debugfs_init:
- mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+ mlxsw_core->driver->fini(mlxsw_core);
err_driver_init:
devlink_unregister(devlink);
err_devlink_register:
@@ -918,7 +921,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
struct devlink *devlink = priv_to_devlink(mlxsw_core);
mlxsw_core_debugfs_fini(mlxsw_core);
- mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+ mlxsw_core->driver->fini(mlxsw_core);
devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
mlxsw_core->bus->fini(mlxsw_core->bus_priv);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 0454212..f3cebef 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -62,6 +62,8 @@ struct mlxsw_driver;
struct mlxsw_bus;
struct mlxsw_bus_info;
+void *mlxsw_core_driver_priv(struct mlxsw_core *mlxsw_core);
+
int mlxsw_core_driver_register(struct mlxsw_driver *mlxsw_driver);
void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver);
@@ -192,11 +194,12 @@ struct mlxsw_driver {
const char *kind;
struct module *owner;
size_t priv_size;
- int (*init)(void *driver_priv, struct mlxsw_core *mlxsw_core,
+ int (*init)(struct mlxsw_core *mlxsw_core,
const struct mlxsw_bus_info *mlxsw_bus_info);
- void (*fini)(void *driver_priv);
- int (*port_split)(void *driver_priv, u8 local_port, unsigned int count);
- int (*port_unsplit)(void *driver_priv, u8 local_port);
+ void (*fini)(struct mlxsw_core *mlxsw_core);
+ int (*port_split)(struct mlxsw_core *mlxsw_core, u8 local_port,
+ unsigned int count);
+ int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port);
void (*txhdr_construct)(struct sk_buff *skb,
const struct mlxsw_tx_info *tx_info);
u8 txhdr_len;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 8abe1a6..19b3c14 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1948,9 +1948,10 @@ static u8 mlxsw_sp_cluster_base_port_get(u8 local_port)
return local_port - offset;
}
-static int mlxsw_sp_port_split(void *priv, u8 local_port, unsigned int count)
+static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
+ unsigned int count)
{
- struct mlxsw_sp *mlxsw_sp = priv;
+ struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
struct mlxsw_sp_port *mlxsw_sp_port;
u8 width = MLXSW_PORT_MODULE_MAX_WIDTH / count;
u8 module, cur_width, base_port;
@@ -2022,9 +2023,9 @@ err_port_create:
return err;
}
-static int mlxsw_sp_port_unsplit(void *priv, u8 local_port)
+static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port)
{
- struct mlxsw_sp *mlxsw_sp = priv;
+ struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
struct mlxsw_sp_port *mlxsw_sp_port;
u8 module, cur_width, base_port;
unsigned int count;
@@ -2369,10 +2370,10 @@ static int mlxsw_sp_lag_init(struct mlxsw_sp *mlxsw_sp)
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(slcr), slcr_pl);
}
-static int mlxsw_sp_init(void *priv, struct mlxsw_core *mlxsw_core,
+static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
const struct mlxsw_bus_info *mlxsw_bus_info)
{
- struct mlxsw_sp *mlxsw_sp = priv;
+ struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
int err;
mlxsw_sp->core = mlxsw_core;
@@ -2443,9 +2444,9 @@ err_event_register:
return err;
}
-static void mlxsw_sp_fini(void *priv)
+static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
{
- struct mlxsw_sp *mlxsw_sp = priv;
+ struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
mlxsw_sp_switchdev_fini(mlxsw_sp);
mlxsw_sp_traps_fini(mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index 2518c84..3842eab 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -1447,10 +1447,10 @@ static int mlxsw_sx_flood_init(struct mlxsw_sx *mlxsw_sx)
return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sgcr), sgcr_pl);
}
-static int mlxsw_sx_init(void *priv, struct mlxsw_core *mlxsw_core,
+static int mlxsw_sx_init(struct mlxsw_core *mlxsw_core,
const struct mlxsw_bus_info *mlxsw_bus_info)
{
- struct mlxsw_sx *mlxsw_sx = priv;
+ struct mlxsw_sx *mlxsw_sx = mlxsw_core_driver_priv(mlxsw_core);
int err;
mlxsw_sx->core = mlxsw_core;
@@ -1497,9 +1497,9 @@ err_event_register:
return err;
}
-static void mlxsw_sx_fini(void *priv)
+static void mlxsw_sx_fini(struct mlxsw_core *mlxsw_core)
{
- struct mlxsw_sx *mlxsw_sx = priv;
+ struct mlxsw_sx *mlxsw_sx = mlxsw_core_driver_priv(mlxsw_core);
mlxsw_sx_traps_fini(mlxsw_sx);
mlxsw_sx_event_unregister(mlxsw_sx, MLXSW_TRAP_ID_PUDE);
--
2.5.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox