* Inter-VRF routing on a single machine
From: Darwin Dingel @ 2016-04-12 10:09 UTC (permalink / raw)
To: netdev
Hi All,
Have anyone tried the following setup on a single machine with 2 TCP
sockets on different VRF's and succeeded?
- client_socket on VRF1
- server_socket on VRF2
- ip rules and iproutes for inter-VRF set up
- client_socket sends TCP connect to server_socket. skb was sent using
VRF1 interface
- skb received in loopback interface
- TCP code got SYN but cannot route back to VRF1 to send ACK.
I was wondering if this is a known limitation of VRF as of the moment,
or could work with proper iprules/iproute.
Thanks!
Darwin
^ permalink raw reply
* Re: [PATCH net-next WIP] ethtool: generic netlink policy
From: Johannes Berg @ 2016-04-12 10:01 UTC (permalink / raw)
To: Roopa Prabhu, netdev; +Cc: davem, jiri, eladr, idosch
In-Reply-To: <1460344545-45501-1-git-send-email-roopa@cumulusnetworks.com>
Hi,
> + [ETHTOOL_ATTR_FLAGS] = { .type = NLA_U32 },
I suppose this comes from the current API, but perhaps it'd be
worthwhile to make provision for more flags? Perhaps even using
NLA_BINARY and have "infinitely extensible" flags.
> + [ETHTOOL_ATTR_SSET_COUNT] = { .type = NLA_U32
What do you need that for? Wouldn't it be sufficient to count the SSET
values returned? I can see how this would be useful for ioctl, but not
really for netlink messages?
> +static struct genl_ops ethtool_ops[] = {
> + {
> + .cmd = ETHTOOL_CMD_GET_SETTINGS,
> + .policy = ethtool_policy,
> + .doit = ethtool_get_settings,
> + },
[...]
> + {
> + .cmd = ETHTOOL_CMD_SET_MODULE_INFO,
> + .policy = ethtool_policy,
> + .doit = ethtool_set_module_info,
> + },
> +};
Shouldn't the ops have GENL_ADMIN_PERM as flags?
> +int ethtool_get_settings(struct net_device *dev, struct ethtool_cmd
> *cmd)
> +{
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(ethtool_get_settings);
I don't understand what kind of placeholder this was meant to be - but
why would it be exported? This part is called by the genl ops, so
doesn't really make sense?
It seems that instead these functions should be static, declared above
the ops, and call into the existing ethtool driver ops, based on
IFINDEX demultiplexing.
> +void ethtool_get_ethtool_stats(struct net_device *dev,
> + struct ethtool_stats *stats,
> + u64 *data)
> +{
> +
> + /* example the driver handler would do the below
> + *
> + err = nla_put_u32(msg, PORT_ATTR_IFINDEX, ifindex);
> + if (err < 0)
> + goto err_out;
> +
> + err = nla_put_u32(msg, PORT_ATTR_FLAGS, flags);
> + if (err < 0)
> + goto err_out;
> +
> + err = nla_put_u32(msg, PORT_ATTR_SSET_COUNT,
> + count);
> + if (err < 0)
> + goto err_out;
> +
> + nest = nla_nest_start(msg, PORT_ATTR_STATS);
> + for (i = 0; i < count; i++)
> + nla_put_u64(msg, PORT_ATTR_STAT, data[i]);
> + nla_nest_end(msg, nest);
> +
> + */
> +}
> +EXPORT_SYMBOL_GPL(ethtool_get_ethtool_stats);
It seems possible that you could have a lot of ports, or a lot of
strings, or similar, so I think this should be a dumpit instead of a
doit handler.
Similar, perhaps, for the EEPROM thing, unless you provide API to query
the size first so the application can size it's recvmsg() buffer
appropriately - however doing so also requires a big message allocation
and more code in userspace, so I think having an "offset/length" type
of API combined with a dumpit rather than doit would be good for all of
the things that could get bigger or that might be extended in the
future.
johannes
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Nicolas Ferre @ 2016-04-12 9:23 UTC (permalink / raw)
To: Andrew Lunn, Sergei Shtylyov
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160411185115.GA30623-g2DYL2Zd6BY@public.gmane.org>
Le 11/04/2016 20:51, Andrew Lunn a écrit :
> On Mon, Apr 11, 2016 at 09:39:02PM +0300, Sergei Shtylyov wrote:
>> Hello.
>>
>> On 04/11/2016 09:19 PM, Andrew Lunn wrote:
>>
>>>>> The code you are deleting would of ignored the flags in the gpio
>>>>> property, i.e. active low.
>>>>
>>>> Hm, you're right -- I forgot about that... :-/
>>>>
>>>>> The new code in the previous patch does
>>>>> however take the flags into account. Did you check if there are any
>>>>> device trees which have flags, which were never used, but are now
>>>>> going to be used and thus break...
>>>>
>>>> Checked this now and found out arch/arm/boot/dts/ar91-vinco.dts.
>>>> Looks like it needs to be fixed indeed...
>>>>
>>> And this is where it gets tricky. You are breaking backwards
>>> compatibility by now respecting the flag. An old DT blob is not going
>>> to work.
>>
>> Do we care that much about the DT blobs that are just *wrong*?
>
> Wrong, but currently works.
>
>>> You potentially need to add a new property and deprecate the old one.
>>
>> I would like to avoid that...
>
> You will need the agreement from the at91-vinco maintainer.
If the at91-vinco has to be modified, you have my agreement that it can
be modified.
Bye,
--
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Nicolas Ferre @ 2016-04-12 9:22 UTC (permalink / raw)
To: Andrew Lunn, Sergei Shtylyov; +Cc: netdev, linux-kernel
In-Reply-To: <20160411022802.GB4307@lunn.ch>
Le 11/04/2016 04:28, Andrew Lunn a écrit :
> On Sat, Apr 09, 2016 at 01:25:03AM +0300, Sergei Shtylyov wrote:
>> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
>> there should be no need to frob the PHY reset in this driver anymore...
>>
>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>
>> ---
>> drivers/net/ethernet/cadence/macb.c | 17 -----------------
>> drivers/net/ethernet/cadence/macb.h | 1 -
>> 2 files changed, 18 deletions(-)
>>
>> Index: net-next/drivers/net/ethernet/cadence/macb.c
>> ===================================================================
>> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
>> +++ net-next/drivers/net/ethernet/cadence/macb.c
>> @@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
>> = macb_clk_init;
>> int (*init)(struct platform_device *) = macb_init;
>> struct device_node *np = pdev->dev.of_node;
>> - struct device_node *phy_node;
>> const struct macb_config *macb_config = NULL;
>> struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
>> unsigned int queue_mask, num_queues;
>> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
>> else
>> macb_get_hwaddr(bp);
>>
>> - /* Power up the PHY if there is a GPIO reset */
>> - phy_node = of_get_next_available_child(np, NULL);
>> - if (phy_node) {
>> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
>> -
>> - if (gpio_is_valid(gpio)) {
>> - bp->reset_gpio = gpio_to_desc(gpio);
>> - gpiod_direction_output(bp->reset_gpio, 1);
>
> Hi Sergei
>
> The code you are deleting would of ignored the flags in the gpio
I don't parse this.
The code deleted does take the flag into account. And the DT property
associated to it seems correct to me (I mean, with proper flag
specification).
> property, i.e. active low. The new code in the previous patch does
> however take the flags into account. Did you check if there are any
> device trees which have flags, which were never used, but are now
> going to be used and thus break...
Flag was used and you are saying that it's taken into account in new
code... So, what's the issue?
I see a difference in the way the "value" of gpiod_* functions is used.
There may be a misunderstanding here...
Bye,
--
Nicolas Ferre
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Or Gerlitz @ 2016-04-12 9:13 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Leon Romanovsky, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma@vger.kernel.org, David S. Miller, Doug Ledford,
Linus Torvalds, Or Gerlitz, Leon Romanovsky, Tal Alon
In-Reply-To: <CALzJLG8fXPLQ7tj2Nc7NbGpu4EuNoXTE+w8tDkwQ2F_AqgDtLA@mail.gmail.com>
On Tue, Apr 12, 2016 at 10:40 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> Why would you break down this patch to many when no matter what you
> do, at the end it would look the same ?
> As Leon mentioned we MLNX maintainers prefer to update this file at
> once when possible.
See my response to Leon. It happened to me many times in code review
that people gave
me patches that open X fields in the IFC file and their code used Y <<
X fields. I don't
want the IFC file to have even one unused field, and I think the
correct way to do that
is have both the IFC file and the driver changes in the same series. I
understand the trend
to have zero-conflicts, lets try that. Did you make sure all exposed
IFC fields are used?
Or.
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Or Gerlitz @ 2016-04-12 9:09 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Saeed Mahameed, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
David S. Miller, Doug Ledford, Linus Torvalds, Or Gerlitz,
Leon Romanovsky, Tal Alon
In-Reply-To: <20160412060117.GA24649-2ukJVAZIZ/Y@public.gmane.org>
On Tue, Apr 12, 2016 at 9:01 AM, Leon Romanovsky <leon-2ukJVAZIZ/Y@public.gmane.org> wrote:
> On Tue, Apr 12, 2016 at 08:36:21AM +0300, Or Gerlitz wrote:
>> Conflicts happens @ all times, life.
>> I understand your desire to get it down to zero, but it's not gonna
>> work, pick another target.
> Maybe you are right and the time will show, but now we (Saeed, Matan and me)
> are trying hard to achieve this goal.
going with your approach, next you are going to define a goal of no RC time
conflicts between rdma and net, and this will slow down the already
terribly slow
(no 4.6 net mlx5 rc patches so far) process even further.
>> For example, the networking community has a fairly large rc activity
>> (I would say 10-20x
>> vs rdma), so when Dave does his "merge-rebases" for net-next over net
>> and linus tree
>> (4-5 times in a release), he has to this way or another solve
>> conflicts, yes! ditto for
>> Linus during merge windows and to some extent in rc times too.
> I don't see any harm in our desire to decrease work overhead from these
> busy people.
Desire is one thing, you took it too further to my taste.
>> > It won't help to anyone to split this commit to more than one patch.
>> The commit change-log should make it clear what this is about, and it doesn't.
>> If you believe in something, state that clear, be precise.
> I agree.
good, so please do that in the respin
>> As Saeed admitted the shared code in the commit spans maybe 2% of it.
>> The 1st commit deals with a field which is not used in the driver,
>> this is a cleanup
>> that you can do in rc (net) patch (remove the field all together) and
>> overall, w.o seeing
> I don't agree with your point that cleanup should go to RC.
>> the down-stream patches that depend on the newly introduced fields,
>> how do you know there aren't such (unused) bits in the 2nd commit?
> No, I don't know in advance, but the truth is that it doesn't bother
> anyone, because we are exposing our internal HW to kernel clients and
> doing it with minimal impact on the maintainers.
this is not the internal HW, it's the FW API and the FW API is stabilized
together with implementing the feature in the driver. We must not expose
unused fields since they might change/move or be eliminated as part of
the driver implementation and this creates extra noise and burden for other
developers and maintainers. The 1st patch should change to be elimination
of these two fields (cqe_zip_xxx and early VF enable) b/c they are not
used in the code.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Configuring ethernet link fails with No such device
From: Bob Ham @ 2016-04-12 8:58 UTC (permalink / raw)
To: Stefan Agner, systemd-devel, netdev, davem
Cc: fabio.estevam, bryan.wu, u.kleine-koenig, l.stach
In-Reply-To: <67b662523c63277fa95a59472cbf018f@agner.ch>
On Mon, 2016-04-11 at 15:46 -0700, Stefan Agner wrote:
> The FEC driver (fec_main.c) does not initialize phy_dev until the
> device
> has been opened, and therefor the callback
> fec_enet_(get|set)_settings
> returns -19.
I saw the same problem with the FEC driver. From what I recall, it
became clear that there was a problem with the driver returning from
the eth device initialisation before the PHY was initialised, which
apparently is Bad and Wrong.
> Or in other words: Is this a Kernel or systemd issue?
From what I recall, both; an issue with the FEC driver, and issues in
systemd/udevd's handling of link-level settings.
--
Bob Ham <bob.ham@collabora.com>
Software Engineer
>>>>>>>>
Open First
Collabora is hiring!
Please check out our latest opportunities here:
http://bit.ly/Collabora-Careers
<<<<<<<<
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
^ permalink raw reply
* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Paolo Abeni @ 2016-04-12 8:52 UTC (permalink / raw)
To: Paul Moore
Cc: Florian Westphal, linux-security-module, David S. Miller,
James Morris, Andreas Gruenbacher, Stephen Smalley, netdev,
selinux
In-Reply-To: <2239567.jkCk1gtQAE@sifl>
On Thu, 2016-04-07 at 14:55 -0400, Paul Moore wrote:
> On Thursday, April 07, 2016 01:45:32 AM Florian Westphal wrote:
> > Paul Moore <paul@paul-moore.com> wrote:
> > > On Wed, Apr 6, 2016 at 6:14 PM, Florian Westphal <fw@strlen.de> wrote:
> > > > netfilter hooks are per namespace -- so there is hook unregister when
> > > > netns is destroyed.
> > >
> > > Looking around, I see the global and per-namespace registration
> > > functions (nf_register_hook and nf_register_net_hook, respectively),
> > > but I'm looking to see if/how newly created namespace inherit
> > > netfilter hooks from the init network namespace ... if you can create
> > > a network namespace and dodge the SELinux hooks, that isn't a good
> > > thing from a SELinux point of view, although it might be a plus
> > > depending on where you view Paolo's original patches ;)
> >
> > Heh :-)
> >
> > If you use nf_register_net_hook, the hook is only registered in the
> > namespace.
> >
> > If you use nf_register_hook, the hook is put on a global list and
> > registed in all existing namespaces.
> >
> > New namespaces will have the hook added as well (see
> > netfilter_net_init -> nf_register_hook_list in netfilter/core.c )
> >
> > Since nf_register_hook is used it should be impossible to get a netns
> > that doesn't call these hooks.
>
> Great, thanks.
>
> > > > Do you think it makes sense to rework the patch to delay registering
> > > > of the netfiler hooks until the system is in a state where they're
> > > > needed, without the 'unregister' aspect?
> > >
> > > I would need to see the patch to say for certain, but in principle
> > > that seems perfectly reasonable and I think would satisfy both the
> > > netdev and SELinux camps - good suggestion. My main goal is to drop
> > > the selinux_nf_ip_init() entirely so it can't be used as a ROP gadget.
> > >
> > > We might even be able to trim the secmark_active and peerlbl_active
> > > checks in the SELinux netfilter hooks (an earlier attempt at
> > > optimization; contrary to popular belief, I do care about SELinux
> > > performance), although that would mean that enabling the network
> > > access controls would be one way ... I guess you can disregard that
> > > last bit, I'm thinking aloud again.
> >
> > One way is fine I think.
>
> Yes, just disregard my second paragraph above.
>
> > > > Ideally this would even be per netns -- in perfect world we would
> > > > be able to make it so that a new netns are created with an empty
> > > > hook list.
> > >
> > > In general SELinux doesn't care about namespaces, for reasons that are
> > > sorta beyond the scope of this conversation, so I would like to stick
> > > to a all or nothing approach to enabling the SELinux netfilter hooks
> > > across namespaces. Perhaps we can revisit this at a later time, but
> > > let's keep it simple right now.
> >
> > Okay, I'd prefer to stick to your recommendation anyway wrt. to selinux
> > (Casey, I read your comment regarding smack. Noted, we don't want to
> > break smack either...)
> >
> > I think that in this case the entire question is:
> >
> > In your experience, how likely is a config where selinux is enabled BUT the
> > hooks are not needed (i.e., where we hit the
> >
> > if (!selinux_policycap_netpeer)
> > return NF_ACCEPT;
> >
> > if (!secmark_active && !peerlbl_active)
> > return NF_ACCEPT;
> >
> > tests inside the hooks)? If such setups are uncommon we should just
> > drop this idea or at least put it on the back burner until the more
> > expensive netfilter hooks (conntrack, cough) are out of the way.
>
> A few years ago I would have said that it is relatively uncommon for admins to
> enable the SELinux network access controls; it was typically just
> government/intelligence agencies who had very strict access control
> requirements and represented a small portion of SELinux users. However, over
> the past few years I've been fielding more and more questions from admins/devs
> in the virtualization space who are interested in some of these capabilities;
> it isn't clear to me how many of these people are switching it on, but there
> is definitely more interest than I have seen in the past and the interested is
> centered around some rather common use cases.
>
> So, to summarize, I don't know ;)
>
> If you've got bigger sources of overhead, my opinion would be to go tackle
> those first. Perhaps I can even find the time to work on the
> SELinux/netfilter stuff while you are off slaying the bigger dragons, no
> promises at the moment.
Double checking if I got the above correctly.
Will be ok if we post a v2 version of this series, removing the hooks
de-registration bits, but preserving the selinux nf-hooks and
socket_sock_rcv_skb() on-demand/delayed registration ? Will that fit
with the post-init read only memory usage that you are planning ?
Regards,
Paolo
^ permalink raw reply
* RE: [PATCH net-next 00/11] FUJITSU Extended Socket driver version 1.1
From: Izumi, Taku @ 2016-04-12 8:35 UTC (permalink / raw)
To: David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <20160411.115617.2124648532876219571.davem@davemloft.net>
Dear David and Jiri,
Thank you for reviewing!
> This submission is of an extremely low quality.
>
> All of your ioctl additions are completely inappropriate, as are your
> debugfs facilities. You must remove all of them completely.
OK, I'll remove ioctl part. But I'd like to keep some debugfs facility
for status information and some specific stats other thatn net_stats.
Are you okay with this ?
Sincerely,
Taku Izumi
^ permalink raw reply
* pull request: bluetooth-next 2016-04-12
From: Johan Hedberg @ 2016-04-12 8:08 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-bluetooth
[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]
Hi Dave,
Here's a set of Bluetooth & 802.15.4 patches intended for the 4.7 kernel:
- Fix for race condition in vhci driver
- Memory leak fix for ieee802154/adf7242 driver
- Improvements to deal with single-mode (LE-only) Bluetooth controllers
- Fix for allowing the BT_SECURITY_FIPS security level
- New BCM2E71 ACPI ID
- NULL pointer dereference fix fox hci_ldisc driver
Let me know if there are any issues pulling. Thanks.
Johan
----
The following changes since commit 9ef280c6c28f0c01aa9d909263ad47c796713a8e:
irda: sh_irda: remove driver (2016-04-04 16:24:13 -0400)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream
for you to fetch changes up to 8805eea2494a2837983bc4aaaf6842c89666ec25:
Bluetooth: hci_bcsp: fix code style (2016-04-08 23:01:36 +0200)
----------------------------------------------------------------
Alexander Aring (1):
6lowpan: iphc: fix handling of link-local compression
Jiri Slaby (2):
Bluetooth: vhci: fix open_timeout vs. hdev race
Bluetooth: vhci: purge unhandled skbs
Johan Hedberg (2):
Bluetooth: Fix setting NO_BREDR advertising flag
Bluetooth: Ignore unknown advertising packet types
Loic Poulain (2):
Bluetooth: hci_bcm: Add BCM2E71 ACPI ID
Bluetooth: hci_ldisc: Fix null pointer derefence in case of early data
Maxim Zhukov (1):
Bluetooth: hci_bcsp: fix code style
Patrik Flykt (1):
Bluetooth: Allow setting BT_SECURITY_FIPS with setsockopt
Sudip Mukherjee (1):
ieee802154/adf7242: fix memory leak of firmware
drivers/bluetooth/hci_bcm.c | 1 +
drivers/bluetooth/hci_bcsp.c | 57 +++++++++++++++++++++++++++++++--------------------------
drivers/bluetooth/hci_ldisc.c | 11 +++++++----
drivers/bluetooth/hci_uart.h | 1 +
drivers/bluetooth/hci_vhci.c | 9 ++++++---
drivers/net/ieee802154/adf7242.c | 2 ++
net/6lowpan/iphc.c | 11 +++++++++--
net/bluetooth/hci_event.c | 13 +++++++++++++
net/bluetooth/hci_request.c | 6 +++---
net/bluetooth/l2cap_sock.c | 2 +-
10 files changed, 74 insertions(+), 39 deletions(-)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Saeed Mahameed @ 2016-04-12 7:40 UTC (permalink / raw)
To: leon
Cc: Or Gerlitz, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma@vger.kernel.org, David S. Miller, Doug Ledford,
Linus Torvalds, Or Gerlitz, Leon Romanovsky, Tal Alon
In-Reply-To: <20160412060117.GA24649@leon.nu>
On Tue, Apr 12, 2016 at 9:01 AM, Leon Romanovsky <leon@leon.nu> wrote:
> On Tue, Apr 12, 2016 at 08:36:21AM +0300, Or Gerlitz wrote:
>>
>> I understand your desire to get it down to zero, but it's not gonna
>> work, pick another target.
>
> Maybe you are right and the time will show, but now we (Saeed, Matan and me)
> are trying hard to achieve this goal.
>
>>
>> For example, the networking community has a fairly large rc activity
>> (I would say 10-20x
>> vs rdma), so when Dave does his "merge-rebases" for net-next over net
>> and linus tree
>> (4-5 times in a release), he has to this way or another solve
>> conflicts, yes! ditto for
>> Linus during merge windows and to some extent in rc times too.
>
> I don't see any harm in our desire to decrease work overhead from these
> busy people.
>
>>
>> > It won't help to anyone to split this commit to more than one patch.
>>
>> The commit change-log should make it clear what this is about, and it doesn't.
>> If you believe in something, state that clear, be precise.
>
> I agree.
>
>>
>> As Saeed admitted the shared code in the commit spans maybe 2% of it.
>>
>> The 1st commit deals with a field which is not used in the driver,
>> this is a cleanup
>> that you can do in rc (net) patch (remove the field all together) and
>> overall, w.o seeing
Or, I guess everybody here agrees that mlx5_ifc is our Connectx-4 pure
HW spec, written in C, isn't that cool ?
I see no harm updating our HW spec once in a kernel cycle revealing
new cool HW bits and interfaces
for anyone to use mlx5e/mlx5_core/mlx5_ib .. you name it.
Why would you break down this patch to many when no matter what you
do, at the end it would look the same ?
As Leon mentioned we MLNX maintainers prefer to update this file at
once when possible.
>
> I don't agree with your point that cleanup should go to RC.
I am with Leon on this one, the cleanup code is just cleanup for new
features to come,
it has nothing to do with RC (net).
>
>> the down-stream patches that depend on the newly introduced fields,
>> how do you know there aren't such (unused) bits in the 2nd commit?
>
> No, I don't know in advance, but the truth is that it doesn't bother
> anyone, because we are exposing our internal HW to kernel clients and
> doing it with minimal impact on the maintainers.
Yep, this is exactly what i am trying to say, there are no two ways to
describe/write (mlx5_ifc) code,
if it is a HW spec, why shouldn't it appear from day one ?
^ permalink raw reply
* Re: Configuring ethernet link fails with No such device
From: Stefan Agner @ 2016-04-12 7:25 UTC (permalink / raw)
To: David Miller
Cc: systemd-devel, netdev, fabio.estevam, l.stach, u.kleine-koenig,
bryan.wu, bob.ham
In-Reply-To: <20160411.212900.1164760179761034053.davem@davemloft.net>
On 2016-04-11 18:29, David Miller wrote:
> From: Stefan Agner <stefan@agner.ch>
> Date: Mon, 11 Apr 2016 15:46:08 -0700
>
>> What is the expectation/definition when link configuration should be
>> possible? Only after the network device got opened or before?
>
> Only after it is open. Drivers almost always have the entire chip in
> powerdown state when it is not open, so we wouldn't be able to
> properly do link settings even if we wanted to when the device is
> closed.
I see. Afact it is a udev rule which triggers the built-in link setup
code:
https://github.com/systemd/systemd/blob/09541e49ebd17b41482e447dd8194942f39788c0/rules/80-net-setup-link.rules
The udev rule is triggering on action add (=> probe on driver level). At
least on the device I tested, it seems that there is no event on open...
Any other ideas what could be used as trigger to configure the link?
^ permalink raw reply
* [PATCH net v3] net: sched: do not requeue a NULL skb
From: Lars Persson @ 2016-04-12 6:45 UTC (permalink / raw)
To: netdev; +Cc: jhs, linux-kernel, xiyou.wangcong, eric.dumazet, Lars Persson
A failure in validate_xmit_skb_list() triggered an unconditional call
to dev_requeue_skb with skb=NULL. This slowly grows the queue
discipline's qlen count until all traffic through the queue stops.
We take the optimistic approach and continue running the queue after a
failure since it is unknown if later packets also will fail in the
validate path.
Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
Signed-off-by: Lars Persson <larper@axis.com>
---
v3: After a discussion with Eric and Cong I went back to v1 and added the
likely() for the common path.
---
net/sched/sch_generic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f18c350..80742ed 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -159,12 +159,15 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
if (validate)
skb = validate_xmit_skb_list(skb, dev);
- if (skb) {
+ if (likely(skb)) {
HARD_TX_LOCK(dev, txq, smp_processor_id());
if (!netif_xmit_frozen_or_stopped(txq))
skb = dev_hard_start_xmit(skb, dev, txq, &ret);
HARD_TX_UNLOCK(dev, txq);
+ } else {
+ spin_lock(root_lock);
+ return qdisc_qlen(q);
}
spin_lock(root_lock);
--
2.1.4
^ permalink raw reply related
* [PATCH V3] net: mediatek: update the IRQ part of the binding document
From: John Crispin @ 2016-04-12 6:35 UTC (permalink / raw)
To: David S. Miller
Cc: Felix Fietkau, Matthias Brugger, netdev, linux-mediatek,
linux-kernel, John Crispin, devicetree
The current binding document only describes a single interrupt. Update the
document by adding the 2 other interrupts.
The driver currently only uses a single interrupt. The HW is however able
to using IRQ grouping to split TX and RX onto separate GIC irqs.
Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: devicetree@vger.kernel.org
---
This binding doc was merged in 4.6-rc1 and there are no users yet. The
current driver only uses 1 irq but will work fine with all 3 listed in
the devicetree. This patch should be merged before v4.6 is final such
that listing all 3 irqs becomes part of the ABI. I have already posted
a patch that utilizes all 3 irqs for next-next for v4.7 inclusion.
Changes in V3:
* be verbose about the 3 irqs and their ordering
Changes in V2:
* split this patch out of the series that fixes tx stalls in the driver
Documentation/devicetree/bindings/net/mediatek-net.txt | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
index 5ca7929..32eaaca 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -9,7 +9,8 @@ have dual GMAC each represented by a child node..
Required properties:
- compatible: Should be "mediatek,mt7623-eth"
- reg: Address and length of the register set for the device
-- interrupts: Should contain the frame engines interrupt
+- interrupts: Should contain the three frame engines interrupts in numeric
+ order. These are fe_int0, fe_int1 and fe_int2.
- clocks: the clock used by the core
- clock-names: the names of the clock listed in the clocks property. These are
"ethif", "esw", "gp2", "gp1"
@@ -42,7 +43,9 @@ eth: ethernet@1b100000 {
<ðsys CLK_ETHSYS_GP2>,
<ðsys CLK_ETHSYS_GP1>;
clock-names = "ethif", "esw", "gp2", "gp1";
- interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW>;
+ interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW
+ GIC_SPI 199 IRQ_TYPE_LEVEL_LOW
+ GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>;
power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
resets = <ðsys MT2701_ETHSYS_ETH_RST>;
reset-names = "eth";
--
1.7.10.4
^ permalink raw reply related
* DONATION !!!
From: Ally Mohammed @ 2016-04-07 12:06 UTC (permalink / raw)
To: Recipients
This Message is directed to you from Saudi Arabia Prince Alwaleed bin Talal for his charity donation and You have been selected as recipient/benefactor for $2.5 Million Dollars from Prince Alwaleed Philanthropic Foundation Grant. For more information contact rebeccabill@careceo.com
Thanks
Ally Mohammed
^ permalink raw reply
* Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-12 6:28 UTC (permalink / raw)
To: Alexander Duyck
Cc: lsf@lists.linux-foundation.org, James Bottomley, Sagi Grimberg,
Tom Herbert, Brenden Blanco, Christoph Hellwig, linux-mm,
netdev@vger.kernel.org, Bart Van Assche,
lsf-pc@lists.linux-foundation.org, Alexei Starovoitov, brouer
In-Reply-To: <CAKgT0UdbO00-Pe3xdrCC2T8L=XVZasWSQQVzTTs9r521RDes+Q@mail.gmail.com>
On Mon, 11 Apr 2016 15:02:51 -0700 Alexander Duyck <alexander.duyck@gmail.com> wrote:
> Have you taken a look at possibly trying to optimize the DMA pool API
> to work with pages? It sounds like it is supposed to do something
> similar to what you are wanting to do.
Yes, I have looked at the mm/dmapool.c API. AFAIK this is for DMA
coherent memory (see use of dma_alloc_coherent/dma_free_coherent).
What we are doing is "streaming" DMA memory, when processing the RX
ring.
(NIC are only using DMA coherent memory for the descriptors, which are
allocated on driver init)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-12 6:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: lsf@lists.linux-foundation.org, James Bottomley, Sagi Grimberg,
Tom Herbert, Brenden Blanco, Christoph Hellwig, linux-mm,
netdev@vger.kernel.org, Bart Van Assche,
lsf-pc@lists.linux-foundation.org, brouer
In-Reply-To: <20160411222124.GA80595@ast-mbp.thefacebook.com>
On Mon, 11 Apr 2016 15:21:26 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Mon, Apr 11, 2016 at 11:41:57PM +0200, Jesper Dangaard Brouer wrote:
> >
> > On Sun, 10 Apr 2016 21:45:47 +0300 Sagi Grimberg <sagi@grimberg.me> wrote:
> >
[...]
> > >
> > > If we go down this road how about also attaching some driver opaques
> > > to the page sets?
> >
> > That was the ultimate plan... to leave some opaques bytes left in the
> > page struct that drivers could use.
> >
> > In struct page I would need a pointer back to my page_pool struct and a
> > page flag. Then, I would need room to store the dma_unmap address.
> > (And then some of the usual fields are still needed, like the refcnt,
> > and reusing some of the list constructs). And a zero-copy cross-domain
> > id.
>
> I don't think we need to add anything to struct page.
> This is supposed to be small cache of dma_mapped pages with lockless access.
> It can be implemented as an array or link list where every element
> is dma_addr and pointer to page. If it is full, dma_unmap_page+put_page to
> send it to back to page allocator.
It sounds like the Intel drivers recycle facility, where they split the
page into two parts, and keep page in RX-ring, by swapping to other
half of page, if page_count(page) is <= 2. Thus, they use the atomic
page ref count to synchronize on.
Thus, we end-up having two atomic operations per RX packet, on the page
refcnt. Where DPDK have zero...
By fully taking over the page as an allocator, almost like slab. I can
optimize the common case (of the packet-page getting allocated and
free'ed on the same CPU), and remove these atomic operations.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Leon Romanovsky @ 2016-04-12 6:01 UTC (permalink / raw)
To: Or Gerlitz
Cc: Saeed Mahameed, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
David S. Miller, Doug Ledford, Linus Torvalds, Or Gerlitz,
Leon Romanovsky, Tal Alon
In-Reply-To: <CAJ3xEMijYrLvSmbNZp0B6AXqKGozz1rdZ58SCUP_09zOB6-2gQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]
On Tue, Apr 12, 2016 at 08:36:21AM +0300, Or Gerlitz wrote:
> Conflicts happens @ all times, life.
>
...
>
> I understand your desire to get it down to zero, but it's not gonna
> work, pick another target.
Maybe you are right and the time will show, but now we (Saeed, Matan and me)
are trying hard to achieve this goal.
>
> For example, the networking community has a fairly large rc activity
> (I would say 10-20x
> vs rdma), so when Dave does his "merge-rebases" for net-next over net
> and linus tree
> (4-5 times in a release), he has to this way or another solve
> conflicts, yes! ditto for
> Linus during merge windows and to some extent in rc times too.
I don't see any harm in our desire to decrease work overhead from these
busy people.
>
> > It won't help to anyone to split this commit to more than one patch.
>
> The commit change-log should make it clear what this is about, and it doesn't.
> If you believe in something, state that clear, be precise.
I agree.
>
> As Saeed admitted the shared code in the commit spans maybe 2% of it.
>
> The 1st commit deals with a field which is not used in the driver,
> this is a cleanup
> that you can do in rc (net) patch (remove the field all together) and
> overall, w.o seeing
I don't agree with your point that cleanup should go to RC.
> the down-stream patches that depend on the newly introduced fields,
> how do you know there aren't such (unused) bits in the 2nd commit?
No, I don't know in advance, but the truth is that it doesn't bother
anyone, because we are exposing our internal HW to kernel clients and
doing it with minimal impact on the maintainers.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Or Gerlitz @ 2016-04-12 5:36 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Saeed Mahameed, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
David S. Miller, Doug Ledford, Linus Torvalds, Or Gerlitz,
Leon Romanovsky, Tal Alon
In-Reply-To: <20160412051537.GD25242-2ukJVAZIZ/Y@public.gmane.org>
On Tue, Apr 12, 2016 at 8:15 AM, Leon Romanovsky <leon-2ukJVAZIZ/Y@public.gmane.org> wrote:
> On Tue, Apr 12, 2016 at 12:37:34AM +0300, Or Gerlitz wrote:
>> On Tue, Apr 12, 2016 at 12:24 AM, Saeed Mahameed
>> <saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>> > On Tue, Apr 12, 2016 at 12:17 AM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>> >> feature --> features
>>
>> > Correct, will fix.
>>
>> >>> * Add vport to steering commands for SRIOV ACL support
>> >>> * Add mlcr, pcmr and mcia registers for dump module EEPROM
>> >>> * Add support for FCS, baeacon led and disable_link bits to hca caps
>> >>> * Add CQE period mode bit in CQ context for CQE based CQ
>> >>> moderation support
>> >>> * Add umr SQ bit for fragmented memory registration
>> >>> * Add needed bits and caps for Striding RQ support
>>
>> >> AFAIK, all the above are features will go through net-next, what made
>> >> you anticipate conflicts with linux-rdma?
>>
>> > FCS bit is needed also for rdma, so we took the liberty of updating
>> > all the needed HW structs, bits, caps, etc ..
>> > at once for all mlx5 features planned for 4.7 regardless of rdma/net conflicts.
>>
>> The cover letter states that this series deals with shared code.
>>
>> I guess you might also could extend it a bit to deal also with code
>> that you suspect could lead to conflicts, but I don't see why it
>> evolved to that extent.
>
> Or,
> All these micro-optimizations on this shared file can potentially lead
> to undesired merge conflicts. Subsystem maintainers and Linus don't need to
> deal with these conflicts at all.
Leon,
Conflicts happens @ all times, life.
We (MLNX) didn't do a good job to minimize them as much as
possible on the 4.5 cycle (understatement) and did vast improvement
in the 4.6 cycle (one or two conflicts AFAIK and communicated to Linus).
It's correct that Linus got really angry on these two conflicts but I have
communicated to him the fact that we did that improvement and we're talking
on two commits for a fairly large volume of patches, the response was, if you
do things right, we will happily keep working with you, people, go look.
I understand your desire to get it down to zero, but it's not gonna
work, pick another target.
For example, the networking community has a fairly large rc activity
(I would say 10-20x
vs rdma), so when Dave does his "merge-rebases" for net-next over net
and linus tree
(4-5 times in a release), he has to this way or another solve
conflicts, yes! ditto for
Linus during merge windows and to some extent in rc times too.
> It won't help to anyone to split this commit to more than one patch.
The commit change-log should make it clear what this is about, and it doesn't.
If you believe in something, state that clear, be precise.
As Saeed admitted the shared code in the commit spans maybe 2% of it.
The 1st commit deals with a field which is not used in the driver,
this is a cleanup
that you can do in rc (net) patch (remove the field all together) and
overall, w.o seeing
the down-stream patches that depend on the newly introduced fields,
how do you know there aren't such (unused) bits in the 2nd commit?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")
From: Hariprasad Shenai @ 2016-04-12 5:29 UTC (permalink / raw)
To: bhelgaas, linux-pci, linux-kernel, davem, netdev
Cc: swise, leedom, santosh, kumaras, nirranjan
Hi All,
The following patch introduced a regression, causing cxgb4 driver to fail in
PCIe probe.
commit 104daa71b39614343929e1982170d5fcb0569bb5
Author: Hannes Reinecke <hare@suse.de>
Author: Hannes Reinecke <hare@suse.de>
Date: Mon Feb 15 09:42:01 2016 +0100
PCI: Determine actual VPD size on first access
PCI-2.2 VPD entries have a maximum size of 32k, but might actually be
smaller than that. To figure out the actual size one has to read the VPD
area until the 'end marker' is reached.
Per spec, reading outside of the VPD space is "not allowed." In practice,
it may cause simple read errors or even crash the card. To make matters
worse not every PCI card implements this properly, leaving us with no 'end'
marker or even completely invalid data.
Try to determine the size of the VPD data when it's first accessed. If no
valid data can be read an I/O error will be returned when reading or
writing the sysfs attribute.
As the amount of VPD data is unknown initially the size of the sysfs
attribute will always be set to '0'.
[bhelgaas: changelog, use 0/1 (not false/true) for bitfield, tweak
pci_vpd_pci22_read() error checking]
Tested-by: Shane Seymour <shane.seymour@hpe.com>
Tested-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
The problem is stemming from the fact that the Chelsio adapters actually have
two VPD structures stored in the VPD. An abbreviated on at Offset 0x0 and the
complete VPD at Offset 0x400. The abbreviated one only contains the PN, SN and
EC Keywords, while the complete VPD contains those plus various adapter
constants contained in V0, V1, etc. And it also contains the Base Ethernet MAC
Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact
the adapter firmware. (We don't have the "NA" Keywork in the VPD Structure at
Offset 0x0 because that's not an allowed VPD Keyword in the PCI-E 3.0
specification.)
With the new code, the computed size of the VPD is 0x200 and so our efforts
to read the VPD at Offset 0x400 silently fails. We check the result of the
read looking for a signature 0x82 byte but we're checking against random stack
garbage.
The end result is that the cxgb4 driver now fails the PCI-E Probe.
Thanks,
Hari
^ permalink raw reply
* Re: [PATCH for-next 2/2] net/mlx5: Update mlx5_ifc hardware features
From: Leon Romanovsky @ 2016-04-12 5:15 UTC (permalink / raw)
To: Or Gerlitz
Cc: Saeed Mahameed, Saeed Mahameed, Matan Barak, Linux Netdev List,
linux-rdma@vger.kernel.org, David S. Miller, Doug Ledford,
Linus Torvalds, Or Gerlitz, Leon Romanovsky, Tal Alon
In-Reply-To: <CAJ3xEMhorMRsxtawjim9zoBn-zzoDXcMAJfHUqCZBNUbOKrMJA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]
On Tue, Apr 12, 2016 at 12:37:34AM +0300, Or Gerlitz wrote:
> On Tue, Apr 12, 2016 at 12:24 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
> > On Tue, Apr 12, 2016 at 12:17 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> >> feature --> features
>
> > Correct, will fix.
>
> >>> * Add vport to steering commands for SRIOV ACL support
> >>> * Add mlcr, pcmr and mcia registers for dump module EEPROM
> >>> * Add support for FCS, baeacon led and disable_link bits to hca caps
> >>> * Add CQE period mode bit in CQ context for CQE based CQ
> >>> moderation support
> >>> * Add umr SQ bit for fragmented memory registration
> >>> * Add needed bits and caps for Striding RQ support
>
> >> AFAIK, all the above are features will go through net-next, what made
> >> you anticipate conflicts with linux-rdma?
>
> > FCS bit is needed also for rdma, so we took the liberty of updating
> > all the needed HW structs, bits, caps, etc ..
> > at once for all mlx5 features planned for 4.7 regardless of rdma/net conflicts.
>
> The cover letter states that this series deals with shared code.
>
> I guess you might also could extend it a bit to deal also with code
> that you suspect could lead to conflicts, but I don't see why it
> evolved to that extent.
Or,
All these micro-optimizations on this shared file can potentially lead
to undesired merge conflicts. Subsystem maintainers and Linus don't need to
deal with these conflicts at all.
It won't help to anyone to split this commit to more than one patch.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: roopa @ 2016-04-12 3:53 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev, jhs, davem, Nikolay Aleksandrov
In-Reply-To: <570A9B4D.80104@cumulusnetworks.com>
On 4/10/16, 11:28 AM, roopa wrote:
> On 4/10/16, 1:16 AM, Thomas Graf wrote:
[snip]
>>
>> This currently ties everything to a net_device with a selector to
>> include certain bits of that net_device. How about we take it half a
>> step further and allow for non net_device stats such as IP, TCP,
>> routing or ipsec stats to be retrieved as well?
> yes, absolutely. and that is also the goal.
>> A simple array of nested attributes replacing IFLA_STATS_* would
>> allow for that, e.g.
>>
>> 1. {.type = ST_IPSTATS, value = { ...} }
>>
>> 2. {.type = ST_LINK, .value = {
>> {.type = ST_LINK_NAME, .value = "eth0"},
>> {.type = ST_LINK_Q, .value = 10}
>> }}
>>
>> 3. ...
> One thing though, Its unclear to me if we absolutely need the additional nest.
> Every stats netlink msg has an ifindex in the header (if_stats_msg) if the scope
> of the stats is a netdev. If the msg does not have an ifindex in the if_stats_msg,
> it represents a global stat. ie Cant a dump, include other stats netlink msgs after
> all the netdev msgs are done when the filter has global stat filters ?.
> same will apply to RTM_GETSTATS (without NLM_F_DUMP).
>
> Since the msg may potentially have more nest levels
> in the IFLA_EXT_STATS categories, just trying to see if i can avoid adding another
> top-level nest. We can sure add it if there is no other way to include global
> stats in the same dump.
>
Just wanted to elaborate on what i was trying to say:
Top level stats attributes can be netdev or global attributes: We can include string "LINK" in
the names of all stats belonging to a netdev to make it easier to recognize the netdev stats (example):
IFLA_STATS_LINK64, (netdev)
IFLA_STATS_LINK_INET6, (netdev)
IFLA_STATS_TCP, (non-netdev, global tcp stats)
RTM_GETSTATS (NLM_F_DUMP) with user given filter_mask {
If filter-mask contains any link stats, start with per netdev stats messages:
{ if_stats_msg.ifindex = 1,
if_stats_msg.filter_mask = mask of included link stats,
<stats attributes> }
{ if_stats_msg.ifindex = 2,
if_stats_msg.filter_mask = mask of included link stats,
<stats attributes> }
global stats (if user given filter mask contains global filters.):
{ if_stats_msg.ifindex = 0,
if_stats_msg.filter_mask = mask of included global stats,
<stats attributes> }
}
We will need a field in netlink_callback to indicate global or netdev stats when the stats
crosses skb boundaries. A single nlmsg cannot have both netdev and global stats.
Non-dump RTM_GETSTATS examples:
RTM_GETSTATS with valid ifindex and filter_mask {
filter_mask cannot have global stats (return -EINVAL)
{ if_stats_msg.ifindex = <user_given_ifindex>,
if_stats_msg.filter_mask = mask of included link stats,
<stats attributes> }
}
RTM_GETSTATS with ifindex = 0 and filter_mask {
filter_mask cannot have link stats (return -EINVAL)
{ if_stats_msg.ifindex = 0,
if_stats_msg.filter_mask = mask of included global link stats,
<stats attributes> }
}
Will this not work ?
Thanks,
Roopa
^ permalink raw reply
* Re: [PATCH net-next 0/7] DSA refactoring: set 1
From: Vivien Didelot @ 2016-04-12 3:16 UTC (permalink / raw)
To: Andrew Lunn, David Miller; +Cc: Florian Fainelli, netdev, Andrew Lunn
In-Reply-To: <1460404209-32083-1-git-send-email-andrew@lunn.ch>
Andrew Lunn <andrew@lunn.ch> writes:
> There has been a long running effort to refractor DSA probing to make
> the switches true linux devices. Here are a small collection of
> patches moving in this direction. Most have been seen before.
>
> We take a little step forward by passing the dsa device point to the
> driver, thus allowing it to perform resource allocations using the
> normal mechanisms. This device structure will later be replaced by the
> devices own device structure.
>
> Future patches will add a true driver probe function, so we rename the
> current probe function, cleaning up the namespace.
>
> phys_port_mask continually confuses me, thinking it is about PHYs. But
> it is actually about ports to the outside world, user ports. So rename
> it.
>
> Lots more patches yet to follow, this is just doing some ground work.
>
> Andrew Lunn (7):
> net: dsa: Pass the dsa device to the switch drivers
> net: dsa: Have the switch driver allocate there own private memory
> net: dsa: Remove allocation of driver private memory
> net: dsa: Keep the mii bus and address in the private structure
> net: dsa: Rename DSA probe function.
> dsa: Rename phys_port_mask to user_port_mask
> dsa: mv88e6xxx: Use bus in mv88e6xxx_lookup_name()
>
> drivers/net/dsa/bcm_sf2.c | 24 +++++++++++++-------
> drivers/net/dsa/mv88e6060.c | 47 +++++++++++++++++++++++---------------
> drivers/net/dsa/mv88e6060.h | 11 +++++++++
> drivers/net/dsa/mv88e6123.c | 14 +++++++-----
> drivers/net/dsa/mv88e6131.c | 14 +++++++-----
> drivers/net/dsa/mv88e6171.c | 14 +++++++-----
> drivers/net/dsa/mv88e6352.c | 14 +++++++-----
> drivers/net/dsa/mv88e6xxx.c | 55 +++++++++++++++++++++++++++++++--------------
> drivers/net/dsa/mv88e6xxx.h | 17 +++++++++++---
> include/net/dsa.h | 16 ++++++++-----
> net/dsa/dsa.c | 19 +++++++++-------
> 11 files changed, 166 insertions(+), 79 deletions(-)
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
^ permalink raw reply
* Re: [PATCH net-next 7/7] dsa: mv88e6xxx: Use bus in mv88e6xxx_lookup_name()
From: Vivien Didelot @ 2016-04-12 3:15 UTC (permalink / raw)
To: Andrew Lunn, David Miller; +Cc: Florian Fainelli, netdev, Andrew Lunn
In-Reply-To: <1460404209-32083-8-git-send-email-andrew@lunn.ch>
Andrew Lunn <andrew@lunn.ch> writes:
> mv88e6xxx_lookup_name() returns the model name of a switch at a given
> address on an MII bus. Using mii_bus to identify the bus rather than
> the host device is more logical, so change the parameter.
>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
^ permalink raw reply
* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-12 2:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <1460376828.6473.538.camel@edumazet-glaptop3.roam.corp.google.com>
On 2016/4/11 20:13, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
>>
>> On 2016/4/8 22:44, Eric Dumazet wrote:
>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>
>>>> I expand tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>
>>> Try :
>>>
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> And restart your flows.
>>>
>> cat /proc/sys/net/ipv4/tcp_rmem
>> 10240 2097152 10485760
>
> What about leaving the default values ?
I tried, it did not work.
>
> $ cat /proc/sys/net/ipv4/tcp_rmem
> 4096 87380 6291456
>
>>
>> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>
>> It seems has not effect.
>>
>
> I have no idea what you did on the sender side to allow it to send more
> than 1.5 MB then.
We are doing performance test. The sender send 256KB per-block with 128
threads to one socket. And the receiver uses 10Gb NIC to handle the
data on ARM64. The data flow is driver->ip layer->tcp layer->iscsi.
I added some debug messages and found handling backlog packets in
__release_sock() cost about 11ms at most. This can cause backlog queue
overflow. The sk_data_ready is re-assigned, it may cost time in our
program. I will check it out.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox