* [PATCH net-next 1/2] PCI: hv: support reporting serial number as slot information
From: Stephen Hemminger @ 2018-08-29 16:24 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev, linux-pci
In-Reply-To: <20180829162452.25805-1-sthemmin@microsoft.com>
The Hyper-V host API for PCI provides a unique "serial number" which
can be used as basis for sysfs PCI slot table. This can be useful
for cases where userspace wants to find the PCI device based on
serial number.
When an SR-IOV NIC is added, the host sends an attach message
with serial number. The kernel doesn't use the serial number, but
it is useful when doing the same thing in a userspace driver such
as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
way to find the matching PCI device.
There may be some cases where serial number is not unique such
as when using GPU's. But the PCI slot infrastructure will handle
that by adding suffix "2-1" etc.
This has a side effect which may also be useful. The common udev
network device naming policy uses the slot information (rather
than PCI address). This causes udev to give shorter network device
names for VF devices. It does not break applications or startup
because the VF device must never be configured directly.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/pci/controller/pci-hyperv.c | 30 +++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index c00f82cc54aa..e6a6c1146a41 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -89,6 +89,8 @@ static enum pci_protocol_version_t pci_protocol_version;
#define STATUS_REVISION_MISMATCH 0xC0000059
+#define SLOT_NAME_SIZE 21
+
/*
* Message Types
*/
@@ -494,6 +496,7 @@ struct hv_pci_dev {
struct list_head list_entry;
refcount_t refs;
enum hv_pcichild_state state;
+ struct pci_slot *pci_slot;
struct pci_function_description desc;
bool reported_missing;
struct hv_pcibus_device *hbus;
@@ -1457,6 +1460,28 @@ static void prepopulate_bars(struct hv_pcibus_device *hbus)
spin_unlock_irqrestore(&hbus->device_list_lock, flags);
}
+static void hv_pci_assign_slots(struct hv_pcibus_device *hbus)
+{
+ struct hv_pci_dev *hpdev;
+ char name[SLOT_NAME_SIZE];
+ unsigned long flags;
+ int slot_nr;
+
+ spin_lock_irqsave(&hbus->device_list_lock, flags);
+ list_for_each_entry(hpdev, &hbus->children, list_entry) {
+ if (hpdev->pci_slot)
+ continue;
+
+ slot_nr = PCI_SLOT(wslot_to_devfn(hpdev->desc.win_slot.slot));
+ snprintf(name, SLOT_NAME_SIZE, "%u", hpdev->desc.ser);
+ hpdev->pci_slot = pci_create_slot(hbus->pci_bus, slot_nr,
+ name, NULL);
+ if (!hpdev->pci_slot)
+ pr_warn("pci_create slot %s failed\n", name);
+ }
+ spin_unlock_irqrestore(&hbus->device_list_lock, flags);
+}
+
/**
* create_root_hv_pci_bus() - Expose a new root PCI bus
* @hbus: Root PCI bus, as understood by this driver
@@ -1480,6 +1505,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus)
pci_lock_rescan_remove();
pci_scan_child_bus(hbus->pci_bus);
pci_bus_assign_resources(hbus->pci_bus);
+ hv_pci_assign_slots(hbus);
pci_bus_add_devices(hbus->pci_bus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_installed;
@@ -1742,6 +1768,7 @@ static void pci_devices_present_work(struct work_struct *work)
*/
pci_lock_rescan_remove();
pci_scan_child_bus(hbus->pci_bus);
+ hv_pci_assign_slots(hbus);
pci_unlock_rescan_remove();
break;
@@ -1858,6 +1885,9 @@ static void hv_eject_device_work(struct work_struct *work)
list_del(&hpdev->list_entry);
spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags);
+ if (hpdev->pci_slot)
+ pci_destroy_slot(hpdev->pci_slot);
+
memset(&ctxt, 0, sizeof(ctxt));
ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 0/2] hv_netvsc: associate VF and PV device by serial number
From: Stephen Hemminger @ 2018-08-29 16:24 UTC (permalink / raw)
To: kys, haiyangz, sthemmin; +Cc: devel, netdev, linux-pci
The Hyper-V implementation of PCI controller has concept of 32 bit serial number
(not to be confused with PCI-E serial number). This value is sent in the protocol
from the host to indicate SR-IOV VF device is attached to a synthetic NIC.
Using the serial number (instead of MAC address) to associate the two devices
avoids lots of potential problems when there are duplicate MAC addresses from
tunnels or layered devices.
The patch set is broken into two parts, one is for the PCI controller
and the other is for the netvsc device. Normally, these go through different
trees but sending them together here for better review. The PCI changes
were submitted previously, but the main review comment was "why do you
need this?". This is why.
Stephen Hemminger (2):
PCI: hv: support reporting serial number as slot information
hv_netvsc: pair VF based on serial number
drivers/net/hyperv/netvsc.c | 3 ++
drivers/net/hyperv/netvsc_drv.c | 58 ++++++++++++++++-------------
drivers/pci/controller/pci-hyperv.c | 30 +++++++++++++++
3 files changed, 66 insertions(+), 25 deletions(-)
--
2.18.0
^ permalink raw reply
* Re: [PATCH bpf-next 00/11] AF_XDP zero-copy support for i40e
From: Daniel Borkmann @ 2018-08-29 16:12 UTC (permalink / raw)
To: Björn Töpel, magnus.karlsson, magnus.karlsson,
alexander.h.duyck, alexander.duyck, ast, brouer, netdev,
jesse.brandeburg, anjali.singhai, peter.waskiewicz.jr
Cc: Björn Töpel, michael.lundkvist, willemdebruijn.kernel,
john.fastabend, jakub.kicinski, neerav.parikh, mykyta.iziumtsev,
francois.ozog, ilias.apalodimas, brian.brooks, u9012063, pavel,
qi.z.zhang
In-Reply-To: <20180828124435.30578-1-bjorn.topel@gmail.com>
On 08/28/2018 02:44 PM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This patch set introduces zero-copy AF_XDP support for Intel's i40e
> driver. In the first preparatory patch we also add support for
> XDP_REDIRECT for zero-copy allocated frames so that XDP programs can
> redirect them. This was a ToDo from the first AF_XDP zero-copy patch
> set from early June. Special thanks to Alex Duyck and Jesper Dangaard
> Brouer for reviewing earlier versions of this patch set.
>
> The i40e zero-copy code is located in its own file i40e_xsk.[ch]. Note
> that in the interest of time, to get an AF_XDP zero-copy implementation
> out there for people to try, some code paths have been copied from the
> XDP path to the zero-copy path. It is out goal to merge the two paths
> in later patch sets.
>
> In contrast to the implementation from beginning of June, this patch
> set does not require any extra HW queues for AF_XDP zero-copy
> TX. Instead, the XDP TX HW queue is used for both XDP_REDIRECT and
> AF_XDP zero-copy TX.
>
> Jeff, given that most of changes are in i40e, it is up to you how you
> would like to route these patches. The set is tagged bpf-next, but
> if taking it via the Intel driver tree is easier, let us know.
>
> We have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
> NIC is Intel I40E 40Gbit/s using the i40e driver.
>
> Below are the results in Mpps of the I40E NIC benchmark runs for 64
> and 1500 byte packets, generated by a commercial packet generator HW
> outputing packets at full 40 Gbit/s line rate. The results are with
> retpoline and all other spectre and meltdown fixes, so these results
> are not comparable to the ones from the zero-copy patch set in June.
>
> AF_XDP performance 64 byte packets.
> Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy
> rxdrop 2.6 8.2 15.0
> txpush 2.2 - 21.9
> l2fwd 1.7 2.3 11.3
>
> AF_XDP performance 1500 byte packets:
> Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy
> rxdrop 2.0 3.3 3.3
> l2fwd 1.3 1.7 3.1
>
> XDP performance on our system as a base line:
>
> 64 byte packets:
> XDP stats CPU pps issue-pps
> XDP-RX CPU 16 18.4M 0
>
> 1500 byte packets:
> XDP stats CPU pps issue-pps
> XDP-RX CPU 16 3.3M 0
>
> The structure of the patch set is as follows:
>
> Patch 1: Add support for XDP_REDIRECT of zero-copy allocated frames
> Patches 2-4: Preparatory patches to common xsk and net code
> Patches 5-7: Preparatory patches to i40e driver code for RX
> Patch 8: i40e zero-copy support for RX
> Patch 9: Preparatory patch to i40e driver code for TX
> Patch 10: i40e zero-copy support for TX
> Patch 11: Add flags to sample application to force zero-copy/copy mode
>
> We based this patch set on bpf-next commit 050cdc6c9501 ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
>
>
> Magnus & Björn
>
> Björn Töpel (8):
> xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPY
> xdp: export xdp_rxq_info_unreg_mem_model
> xsk: expose xdp_umem_get_{data,dma} to drivers
> i40e: added queue pair disable/enable functions
> i40e: refactor Rx path for re-use
> i40e: move common Rx functions to i40e_txrx_common.h
> i40e: add AF_XDP zero-copy Rx support
> samples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock
>
> Magnus Karlsson (3):
> net: add napi_if_scheduled_mark_missed
> i40e: move common Tx functions to i40e_txrx_common.h
> i40e: add AF_XDP zero-copy Tx support
Thanks for working on this, LGTM! Are you also planning to get ixgbe
out after that?
For the series:
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Thanks,
Daniel
^ permalink raw reply
* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
From: Alex Vesker @ 2018-08-29 15:42 UTC (permalink / raw)
To: Alexander Duyck, Saeed Mahameed
Cc: Saeed Mahameed, David S. Miller, Netdev, Jiri Pirko,
Jakub Kicinski, Bjorn Helgaas
In-Reply-To: <2d84340e-0703-0bc7-4917-3b18979b2aa5@mellanox.com>
> On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed <saeedm@mellanox.com>
>>> wrote:
>>>> Hi Dave,
>>>>
>>>> This series provides devlink parameters updates to both devlink API
>>>> and
>>>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a
>>>> previous
>>>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
>>>> Devlink" to address review comments [1].
>>>>
>>>> Changes from the original series:
>>>> - According to the discussion outcome, we are keeping the
>>>> congestion control
>>>> setting as mlx5 device specific for the current HW generation.
>>>> - Changed the congestion_mode and congestion action param type to
>>>> string
>>>> - Added patches to fix devlink handling of param type string
>>>> - Added a patch which adds extack messages support for param set.
>>>> - At the end of this series, I've added yet another mlx5 devlink
>>>> related
>>>> feature, firmware snapshot support.
>>>>
>>>> For more information please see tag log below.
>>>>
>>>> Please pull and let me know if there's any problem.
>>>>
>>>> [1] https://patchwork.ozlabs.org/patch/945996/
>>>>
>>>> Thanks,
>>>> Saeed.
>>>>
>>>> ---
>>>>
>>>> The following changes since commit
>>>> e6476c21447c4b17c47e476aade6facf050f31e8:
>>>>
>>>> net: remove bogus RCU annotations on socket.wq (2018-07-31
>>>> 12:40:22 -0700)
>>>>
>>>> are available in the Git repository at:
>>>>
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
>>>> tags/mlx5-updates-2018-08-01
>>>>
>>>> for you to fetch changes up to
>>>> 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>>>>
>>>> net/mlx5: Use devlink region_snapshot parameter (2018-08-01
>>>> 14:49:09 -0700)
>>>>
>>>> ----------------------------------------------------------------
>>>> mlx5-updates-2018-08-01
>>>>
>>>> This series provides devlink parameters updates to both devlink API
>>>> and
>>>> mlx5 driver,
>>>>
>>>> 1) Devlink changes: (Moshe Shemesh)
>>>> The first two patches fix devlink param infrastructure for string type
>>>> params.
>>>> The third patch adds a devlink helper function to safely copy
>>>> string from
>>>> driver to devlink.
>>>> The forth patch adds extack support for param set.
>>>>
>>>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
>>>> Next three patches add new devlink driver specific params for
>>>> controlling
>>>> congestion action and mode, using string type params and extack
>>>> messages support.
>>>>
>>>> This congestion mode enables hw workaround in specific devices
>>>> which is
>>>> controlled by devlink driver-specific params. The workaround is device
>>>> specific for this NIC generation, the next NIC will not need it.
>>>>
>>>> Congestion parameters:
>>>> - Congestion action
>>>> HW W/A mechanism in the PCIe buffer which monitors the
>>>> amount of
>>>> consumed PCIe buffer per host. This mechanism supports
>>>> the
>>>> following actions in case of threshold overflow:
>>>> - Disabled - NOP (Default)
>>>> - Drop
>>>> - Mark - Mark CE bit in the CQE of received packet
>>>> - Congestion mode
>>>> - Aggressive - Aggressive static trigger threshold
>>>> (Default)
>>>> - Dynamic - Dynamically change the trigger threshold
>>>>
>>>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
>>>> Last three patches, add the support for capturing region snapshot
>>>> of the
>>>> firmware crspace during critical errors, using devlink region_snapshot
>>>> parameter.
>>>>
>>>> -Saeed.
>>>>
>>>> ----------------------------------------------------------------
>>>> Alex Vesker (3):
>>>> net/mlx5: Add Vendor Specific Capability access gateway
>>>> net/mlx5: Add Crdump FW snapshot support
>>>> net/mlx5: Use devlink region_snapshot parameter
>>>>
>>>> Eran Ben Elisha (3):
>>>> net/mlx5: Move all devlink related functions calls to devlink.c
>>>> net/mlx5: Add MPEGC register configuration functionality
>>>> net/mlx5: Enable PCIe buffer congestion handling workaround
>>>> via devlink
>>>>
>>>> Moshe Shemesh (4):
>>>> devlink: Fix param set handling for string type
>>>> devlink: Fix param cmode driverinit for string type
>>>> devlink: Add helper function for safely copy string param
>>>> devlink: Add extack messages support to param set
>>>>
>>>> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 +-
>>>> drivers/net/ethernet/mellanox/mlx4/main.c | 6 +-
>>>> drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 +-
>>>> drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 388
>>>> +++++++++++++++++++++
>>>> drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 13 +
>>>> .../net/ethernet/mellanox/mlx5/core/diag/crdump.c | 223 ++++++++++++
>>>> drivers/net/ethernet/mellanox/mlx5/core/health.c | 3 +
>>>> drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 4 +
>>>> .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320
>>>> +++++++++++++++++
>>>> .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 56 +++
>>>> drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 +-
>>>> include/linux/mlx5/driver.h | 5 +
>>>> include/net/devlink.h | 15 +-
>>>> net/core/devlink.c | 44 ++-
>>>> 14 files changed, 1076 insertions(+), 17 deletions(-)
>>>> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
>>>> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
>>>> create mode 100644
>>>> drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
>>>> create mode 100644
>>>> drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
>>>> create mode 100644
>>>> drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h
>>>
>>> So after looking over the patch set the one thing I would ask for in
>>> this is some sort of documentation at a minimum. As a user I don't see
>>> how you can expect someone to be able to use this when the naming of
>>> things are pretty cryptic and there is no real explanation anywhere if
>>> you don't go through and read the patch description itself. When you
>>> start adding driver specific interfaces, you should at least start
>>> adding vendor specific documentation.
>>>
>>
>> Sure, sounds like a great idea, something like:
>> Documentation/networking/mlx5.txt and have a devlink section ?
>> or have a generic devlink doc and a mlx5 section in it ?
>
> Either would work for me.
>
For which patches are you missing documentation?
>>> Also I don't see how using a vendor specific configuration space
>>> section can be done without adding some tie-ins to the PCI core files
>>> because it should be possible to race with someone poking at the
>>> register space via something like setpci/lspci. Also one of the things
>>> that came up was that drivers are not supposed to be banging on the
>>> PCI configuration space at will, and it seems like this patch set is
>>> doing exactly that through the VSC block.
>>>
>>
>> this is a whole different feature than the device specific parameters.
>> The whole vendor specific configuration space access is needed only
>> for diagnostic/dump
>> purposes when something really bad happens and the command interface
>> with FW is down,
>> and when the FW is un-responsive, we want to dump the crspace into the
>> already existing devlink
>> crdump buffer, how do you expect us to read it if we are not allowed
>> to access it ?
>>
>> What do you mean by tie-ins to the PCI core files ? can you please
>> elaborate ?
>
> You have added a vendor specific config section and you are using it
> to access several of the pieces of metadata. The setup isn't too
> different than the VPD setup and approach. However I don't see many of
> the protections that exist for VPD in place for this vendor specific
> configuration. As such I have concerns. For example what is to keep
> requests to the various devlink interfaces from racing with each other
> when they both end up operating through the VCS?
>
> - Alex
Hi, I would like to resubmit the devlink region crdump support for mlx5,
which is part of this patch-set.
AlexD, regarding the protection, various devlink interfaces cannot race
since
devlink_mutex is used. The VSC access is also protected, using
mlx5_vsc_gw_lock/unlock
only one can acquire the lock.
After explaining this I want to clarify something, the access to VSC is
not user driven
it happens automatically by the driver when an error is detected to
collect a crdump.
^ permalink raw reply
* Re: [Patch iproute2] ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets
From: Stephen Hemminger @ 2018-08-29 15:41 UTC (permalink / raw)
To: Cong Wang; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAM_iQpWod288CWHmDzzVkrdivjQtTBY+ES4hErMRkqY4=r6TfQ@mail.gmail.com>
On Tue, 28 Aug 2018 16:16:49 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Aug 27, 2018 at 3:27 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 27 Aug 2018 14:46:52 -0700
> > Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > > UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
> > > make them available in ss -e output.
> > >
> > > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > > ---
> > > misc/ss.c | 25 +++++++++++++++++++++++++
> > > 1 file changed, 25 insertions(+)
> > >
> > > diff --git a/misc/ss.c b/misc/ss.c
> > > index 41e7762b..d28bc1ec 100644
> > > --- a/misc/ss.c
> > > +++ b/misc/ss.c
> > > @@ -16,6 +16,7 @@
> > > #include <sys/ioctl.h>
> > > #include <sys/socket.h>
> > > #include <sys/uio.h>
> > > +#include <sys/sysmacros.h>
> >
> > Why is this included, it isn't on my system.
>
> It is for major() and minor().
Ok on Debian, these are in the architecture include, so this will work fine.
^ permalink raw reply
* Skiers List
From: Katrina Muller @ 2018-08-29 15:17 UTC (permalink / raw)
To: netdev
Hi,
Would you be interested in acquiring an email list of "Skiers List" from USA?
We also have data for Spa and Resort Visitors, Timeshare Owners, Entertainment Enthusiasts List, Golfers List, Basketball Enthusiasts, Baseball Enthusiasts, Students List, Apparel Buyers, Soccer Enthusiasts, Cycling Enthusiasts, and many more..
Each record in the list contains Contact Name (First, Middle and Last Name), Mailing Address, List type and Opt-in email address.
All the contacts are opt-in verified, 100% permission based and can be used for unlimited multi-channel marketing.
Please let me know your thoughts towards procuring the Skiers List.
Best Regards,
Katrina Muller
Research Analyst
We respect your privacy, if you do not wish to receive any further emails from our end, please reply with a subject “Leave Out”.
^ permalink raw reply
* Re: [bpf-next PATCH 0/2] bpf: test_sockmap updates
From: Daniel Borkmann @ 2018-08-29 15:38 UTC (permalink / raw)
To: John Fastabend, alexei.starovoitov; +Cc: netdev
In-Reply-To: <20180828160921.24004.71893.stgit@john-Precision-Tower-5810>
On 08/28/2018 06:10 PM, John Fastabend wrote:
> Two small test sockmap updates for bpf-next. These help me run some
> additional tests with test_sockmap.
Applied to bpf-next, thanks!
^ permalink raw reply
* Re: [PATCH bpf-next] bpf: remove duplicated include from syscall.c
From: Daniel Borkmann @ 2018-08-29 15:37 UTC (permalink / raw)
To: YueHaibing, Alexei Starovoitov; +Cc: netdev, kernel-janitors
In-Reply-To: <1535442152-5021-1-git-send-email-yuehaibing@huawei.com>
On 08/28/2018 09:42 AM, YueHaibing wrote:
> Remove duplicated include.
>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Applied to bpf-next, thanks!
^ permalink raw reply
* Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute
From: Marcel Holtmann @ 2018-08-29 15:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Jiri Pirko, netdev, David S. Miller
In-Reply-To: <20180829082428.0da2d748@xeon-e3>
Hi Stephen,
>>> The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
>>> file as DEVTYPE= information. To avoid any kind of race conditions
>>> between netlink messages and reading from sysfs, it is useful to add the
>>> same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
>>> messages.
>>>
>>> For network managing daemons that have to classify ARPHRD_ETHER network
>>> devices into different types (like Wireless LAN, Bluetooth etc.), this
>>> avoids the extra round trip to sysfs and parsing of the uevent file.
>>>
>>> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
>>> ---
>>> include/uapi/linux/if_link.h | 2 ++
>>> net/core/rtnetlink.c | 12 ++++++++++++
>>> 2 files changed, 14 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 43391e2d1153..781294972bb4 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -166,6 +166,8 @@ enum {
>>> IFLA_NEW_IFINDEX,
>>> IFLA_MIN_MTU,
>>> IFLA_MAX_MTU,
>>> + IFLA_DEVTYPE, /* Name value from SET_NETDEV_DEVTYPE */
>>
>> This is not something netdev-related. dev->dev.type is struct device_type.
>> This is a generic "device" thing. Incorrect to expose over
>> netdev-specific API. Please use "device" API for this.
>
> There is no device API in netlink. The whole point of this patch is to
> do it in one message. It might be a performance optimization, but I can't
> see how it could be a race condition. Devices set type before registering.
the only way right now to pick up the DEVTYPE= value is from the /sys/class/net/*/uevent file. That is based on the ifname and not the index. When udev + systemd start renaming things behind your back, your daemon does not have a clean one-shot way of getting that information. As stated, the information in DEVTYPE= are a sub-classification of ARPHRD_ETHER and allows to differentiate a wired Ethernet card from a WiFi interface, from a Bluetooth interface, from WiMAX and so on. They just happen to be in dev.dev_type data structure at the moment and I didn't want to duplicate that information.
I am actually fine doing this via IFLA_INFO_KIND since NetworkManager seems to be fixed now or do this via a different method or maybe just a different attribute name. I really just want to get the sub-classification of ARPHRD_ETHER that we need in userspace networking daemons from the kernel without having to go poke left and right over sysfs or interact with udev or systemd.
Regards
Marcel
^ permalink raw reply
* Re: Kernel Panic on high bandwidth transfer over wifi
From: Eric Dumazet @ 2018-08-29 15:29 UTC (permalink / raw)
To: Nathaniel Munk, netdev@vger.kernel.org
In-Reply-To: <porA6Oc9uVOmzI_nV2MhBU5RzBhBCq82gffuA1TBkrQxwTToIy03qc44DkN8BmdVKgQAGU6qDyrkCx7sDQI5SbSt9b9-HHf9ufHcIQlT-kg=@munk.com.au>
On 08/29/2018 04:42 AM, Nathaniel Munk wrote:
> Hi all,
> I'm running Arch Linux on kernel 4.18.5 (same issue on both arch-provided kernel and mainline built-from-source). There is an issue whereby the kernel crashes when transferring at high bandwidths (approx 6mB/s) over a specific wifi connection. I can only reproduce the issue when using the Personal Hotspot on my iPhone 6S+, but can reproduce it very consistently on that connection.
>
> More often than not, any download reaching this speed will cause a panic, but if the download is immediately terminated at the first error the system can recover (and doing this I have obtained the attached logs). Unfortunately, I have not had access to a second machine to obtain the netconsole printout of the panic.
>
> As above, high-bandwidth transfers on other wifi networks do not cause the issue (nor on ethernet connections).
>
> As you can see from the attached log, the issue appears at tcp_recvmsg+0x579 and net_tx_action+0x1fe. At both these positions (net/ipv4/tcp.c:2000 and net/core/dev.c:4279 in mainline 4.18.5), a member of the skb struct is called.
>
> Thank you for your time (and I apologize if this is spurious or badly worded, this is my first bug report), and please don't hesitate to let me know if there's anything else I can do to help work this out.
>
> Regards,
> -------------------
> Nathaniel Munk
> nathaniel@munk.com.au
>
Unfortunately there is no attached log ;)
^ permalink raw reply
* Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute
From: Stephen Hemminger @ 2018-08-29 15:24 UTC (permalink / raw)
To: Jiri Pirko; +Cc: Marcel Holtmann, netdev, davem
In-Reply-To: <20180829071855.GB2181@nanopsycho>
On Wed, 29 Aug 2018 09:18:55 +0200
Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Aug 28, 2018 at 10:58:11PM CEST, marcel@holtmann.org wrote:
> >The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
> >file as DEVTYPE= information. To avoid any kind of race conditions
> >between netlink messages and reading from sysfs, it is useful to add the
> >same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
> >messages.
> >
> >For network managing daemons that have to classify ARPHRD_ETHER network
> >devices into different types (like Wireless LAN, Bluetooth etc.), this
> >avoids the extra round trip to sysfs and parsing of the uevent file.
> >
> >Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
> >---
> > include/uapi/linux/if_link.h | 2 ++
> > net/core/rtnetlink.c | 12 ++++++++++++
> > 2 files changed, 14 insertions(+)
> >
> >diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> >index 43391e2d1153..781294972bb4 100644
> >--- a/include/uapi/linux/if_link.h
> >+++ b/include/uapi/linux/if_link.h
> >@@ -166,6 +166,8 @@ enum {
> > IFLA_NEW_IFINDEX,
> > IFLA_MIN_MTU,
> > IFLA_MAX_MTU,
> >+ IFLA_DEVTYPE, /* Name value from SET_NETDEV_DEVTYPE */
>
> This is not something netdev-related. dev->dev.type is struct device_type.
> This is a generic "device" thing. Incorrect to expose over
> netdev-specific API. Please use "device" API for this.
There is no device API in netlink. The whole point of this patch is to
do it in one message. It might be a performance optimization, but I can't
see how it could be a race condition. Devices set type before registering.
^ permalink raw reply
* Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute
From: Marcel Holtmann @ 2018-08-29 15:23 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, David S. Miller
In-Reply-To: <20180829071855.GB2181@nanopsycho>
Hi Jiri,
>> The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
>> file as DEVTYPE= information. To avoid any kind of race conditions
>> between netlink messages and reading from sysfs, it is useful to add the
>> same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
>> messages.
>>
>> For network managing daemons that have to classify ARPHRD_ETHER network
>> devices into different types (like Wireless LAN, Bluetooth etc.), this
>> avoids the extra round trip to sysfs and parsing of the uevent file.
>>
>> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
>> ---
>> include/uapi/linux/if_link.h | 2 ++
>> net/core/rtnetlink.c | 12 ++++++++++++
>> 2 files changed, 14 insertions(+)
>>
>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> index 43391e2d1153..781294972bb4 100644
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -166,6 +166,8 @@ enum {
>> IFLA_NEW_IFINDEX,
>> IFLA_MIN_MTU,
>> IFLA_MAX_MTU,
>> + IFLA_DEVTYPE, /* Name value from SET_NETDEV_DEVTYPE */
>
> This is not something netdev-related. dev->dev.type is struct device_type.
> This is a generic "device" thing. Incorrect to expose over
> netdev-specific API. Please use "device" API for this.
it is not just "device" related since this is a sub-classification of ARPHRD_ETHER type as a wrote above. Don't get hang up that this information is part of struct device. The dev->dev.type contains strings like "wlan", "bluetooth", "wimax", "gadget" etc. so that you can tell what kind of Ethernet card you have.
We can revive the patches for adding this information as IFLA_INFO_KIND, but last time they where reverted since NetworkManager did all the wrong decisions based on that. That part is fixed now and we can put that back and declare the sub-type classification of Ethernet device in two places. Or we just take the information that we added a long time ago for exactly this sub-classification and provide them to userspace via RTNL.
Regards
Marcel
^ permalink raw reply
* Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL
From: Cong Wang @ 2018-08-29 19:07 UTC (permalink / raw)
To: Al Viro
Cc: Jamal Hadi Salim, Kees Cook, LKML, Jiri Pirko, David Miller,
Linux Kernel Network Developers
In-Reply-To: <20180828000310.GE6515@ZenIV.linux.org.uk>
On Mon, Aug 27, 2018 at 5:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > > I cant think of any challenges. Cong/Jiri? Would it require development
> > > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > > dont want them in include/net).
> > > BTW, the idea of improving grep-ability of the code by prefixing the
> > > ops appropriately makes sense. i.e we should have ops->cls_init,
> > > ops->act_init etc.
> >
> > Hmm? Isn't struct tcf_proto_ops used and must be provided
> > by each tc filter module? How does it work if you move it into
> > net/sched/* for out-of-tree modules? Are they supposed to
> > include "..../net/sched/tcf_proto.h"?? Or something else?
>
> If you care about out-of-tree modules, that could easily live in
> include/net/tcf_proto.h, provided that it's not pulled by indirect
> includes into hell knows how many places. Try
> make allmodconfig
> make >/dev/null 2>&1
> find -name '.*.cmd'|xargs grep sch_generic.h
>
> That finds 2977 files here, most of them having nothing to do with
> net/sched.
Moving it to include/net/tcf_proto.h is fine, as out-of-tree modules
can still compile by modifying the included header path.
include/net/pkt_cls.h might be a choice here too.
^ permalink raw reply
* mlx5 driver loading failing on v4.19 / net-next / bpf-next
From: Jesper Dangaard Brouer @ 2018-08-29 15:05 UTC (permalink / raw)
To: Saeed Mahameed, netdev@vger.kernel.org
Cc: brouer, Tariq Toukan, Eran Ben Elisha
Hi Saeed,
I'm having issues loading mlx5 driver on v4.19 kernels (tested both
net-next and bpf-next), while kernel v4.18 seems to work. It happens
with a Mellanox ConnectX-5 NIC (and also a CX4-Lx but I removed that
from the system now).
One pain point is very long boot-time, caused by some timeout code in
the driver. The kernel console log (dmesg) says:
[ 5.763330] mlx5_core 0000:03:00.0: firmware version: 16.22.1002
[ 5.769367] mlx5_core 0000:03:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8 GT/s x16 link at 0000:00:02.0 (capable of 252.048 Gb/s with 16 GT/s x16 link)
(...) other drivers loading
[ 66.816635] mlx5_core 0000:03:00.0: wait_func:964:(pid 112): ENABLE_HCA(0x104) timeout. Will cause a leak of a command resource
[ 66.828123] mlx5_core 0000:03:00.0: enable hca failed
[ 66.845516] mlx5_core 0000:03:00.0: mlx5_load_one failed with error code -110
[ 66.852802] mlx5_core: probe of 0000:03:00.0 failed with error -110
[ 66.859347] mlx5_core 0000:03:00.1: firmware version: 16.22.1002
[ 66.865388] mlx5_core 0000:03:00.1: 126.016 Gb/s available PCIe bandwidth, limited by 8 GT/s x16 link at 0000:00:02.0 (capable of 252.048 Gb/s with 16 GT/s x16 link)
[ 125.787395] XFS (sda3): Mounting V5 Filesystem
[ 125.848509] XFS (sda3): Ending clean mount
[ 127.984784] mlx5_core 0000:03:00.1: wait_func:964:(pid 5): ENABLE_HCA(0x104) timeout. Will cause a leak of a command resource
[ 127.996090] mlx5_core 0000:03:00.1: enable hca failed
[ 128.013819] mlx5_core 0000:03:00.1: mlx5_load_one failed with error code -110
[ 128.021076] mlx5_core: probe of 0000:03:00.1 failed with error -110
Do you have any idea what could be causing this?
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH v2 iproute2-next 2/5] bridge: colorize output and use JSON print library
From: Stephen Hemminger @ 2018-08-29 15:04 UTC (permalink / raw)
To: Roopa Prabhu; +Cc: netdev, Stephen Hemminger, Julien Fortin, David Ahern
In-Reply-To: <CAJieiUi3BZPmaEJP6AK14zUwo20ssTNMz6eUfjFS0ZExizK9ng@mail.gmail.com>
On Sat, 14 Jul 2018 18:41:03 -0700
Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On Tue, Feb 20, 2018 at 11:24 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > From: Stephen Hemminger <sthemmin@microsoft.com>
> >
> > Use new functions from json_print to simplify code.
> > Provide standard flag for colorizing output.
> >
> > The shortened -c flag is ambiguous it could mean color or
> > compressvlan; it is now changed to mean color for consistency
> > with other iproute2 commands.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > bridge/br_common.h | 2 +-
> > bridge/bridge.c | 10 +-
> > bridge/fdb.c | 281 +++++++++++++++--------------------------
> > bridge/mdb.c | 362 ++++++++++++++++++++++-------------------------------
> > bridge/vlan.c | 276 +++++++++++++++-------------------------
> > 5 files changed, 363 insertions(+), 568 deletions(-)
> >
> > diff --git a/bridge/br_common.h b/bridge/br_common.h
> > index b25f61e50e05..2f1cb8fd9f3d 100644
> > --- a/bridge/br_common.h
> > +++ b/bridge/br_common.h
> > @@ -6,7 +6,7 @@
> > #define MDB_RTR_RTA(r) \
> > ((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32))))
> >
> > -extern void print_vlan_info(FILE *fp, struct rtattr *tb, int ifindex);
> > +extern void print_vlan_info(FILE *fp, struct rtattr *tb);
> > extern int print_linkinfo(const struct sockaddr_nl *who,
> > struct nlmsghdr *n,
> > void *arg);
> > diff --git a/bridge/bridge.c b/bridge/bridge.c
> > index 4b112e3b8da9..e5b4c3c2198f 100644
> > --- a/bridge/bridge.c
> > +++ b/bridge/bridge.c
> > @@ -16,12 +16,15 @@
> > #include "utils.h"
> > #include "br_common.h"
> > #include "namespace.h"
> > +#include "color.h"
> >
> > struct rtnl_handle rth = { .fd = -1 };
> > int preferred_family = AF_UNSPEC;
> > int oneline;
> > int show_stats;
> > int show_details;
> > +int show_pretty;
> > +int color;
> > int compress_vlans;
> > int json;
> > int timestamp;
> > @@ -39,7 +42,7 @@ static void usage(void)
> > "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
> > " OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
> > " -o[neline] | -t[imestamp] | -n[etns] name |\n"
> > -" -c[ompressvlans] -p[retty] -j{son} }\n");
> > +" -c[ompressvlans] -color -p[retty] -j{son} }\n");
> > exit(-1);
> > }
> >
> > @@ -170,6 +173,8 @@ main(int argc, char **argv)
> > NEXT_ARG();
> > if (netns_switch(argv[1]))
> > exit(-1);
> > + } else if (matches(opt, "-color") == 0) {
> > + enable_color();
> > } else if (matches(opt, "-compressvlans") == 0) {
> > ++compress_vlans;
> > } else if (matches(opt, "-force") == 0) {
> > @@ -195,6 +200,9 @@ main(int argc, char **argv)
> >
> > _SL_ = oneline ? "\\" : "\n";
> >
> > + if (json)
> > + check_if_color_enabled();
> > +
> > if (batch_file)
> > return batch(batch_file);
> >
> > diff --git a/bridge/fdb.c b/bridge/fdb.c
> > index 93b5b2e694e3..b4f6e8b3a01b 100644
> > --- a/bridge/fdb.c
> > +++ b/bridge/fdb.c
> > @@ -22,9 +22,9 @@
> > #include <linux/neighbour.h>
> > #include <string.h>
> > #include <limits.h>
> > -#include <json_writer.h>
> > #include <stdbool.h>
> >
> > +#include "json_print.h"
> > #include "libnetlink.h"
> > #include "br_common.h"
> > #include "rt_names.h"
> > @@ -32,8 +32,6 @@
> >
> > static unsigned int filter_index, filter_vlan, filter_state;
> >
> > -json_writer_t *jw_global;
> > -
> > static void usage(void)
> > {
> > fprintf(stderr,
> > @@ -83,13 +81,46 @@ static int state_a2n(unsigned int *s, const char *arg)
> > return 0;
> > }
> >
> > -static void start_json_fdb_flags_array(bool *fdb_flags)
> > +static void fdb_print_flags(FILE *fp, unsigned int flags)
> > +{
> > + open_json_array(PRINT_JSON,
> > + is_json_context() ? "flags" : "");
> > +
> > + if (flags & NTF_SELF)
> > + print_string(PRINT_ANY, NULL, "%s ", "self");
> > +
> > + if (flags & NTF_ROUTER)
> > + print_string(PRINT_ANY, NULL, "%s ", "router");
> > +
> > + if (flags & NTF_EXT_LEARNED)
> > + print_string(PRINT_ANY, NULL, "%s ", "extern_learn");
> > +
> > + if (flags & NTF_OFFLOADED)
> > + print_string(PRINT_ANY, NULL, "%s ", "offload");
> > +
> > + if (flags & NTF_MASTER)
> > + print_string(PRINT_ANY, NULL, "%s ", "master");
> > +
> > + close_json_array(PRINT_JSON, NULL);
> > +}
> > +
> > +static void fdb_print_stats(FILE *fp, const struct nda_cacheinfo *ci)
> > {
> > - if (*fdb_flags)
> > - return;
> > - jsonw_name(jw_global, "flags");
> > - jsonw_start_array(jw_global);
> > - *fdb_flags = true;
> > + static int hz;
> > +
> > + if (!hz)
> > + hz = get_user_hz();
> > +
> > + if (is_json_context()) {
> > + print_uint(PRINT_JSON, "used", NULL,
> > + ci->ndm_used / hz);
> > + print_uint(PRINT_JSON, "updated", NULL,
> > + ci->ndm_updated / hz);
> > + } else {
> > + fprintf(fp, "used %d/%d ", ci->ndm_used / hz,
> > + ci->ndm_updated / hz);
> > +
> > + }
> > }
> >
> > int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
> > @@ -99,8 +130,6 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
> > int len = n->nlmsg_len;
> > struct rtattr *tb[NDA_MAX+1];
> > __u16 vid = 0;
> > - bool fdb_flags = false;
> > - const char *state_s;
> >
> > if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
> > fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
> > @@ -132,189 +161,98 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
> > if (filter_vlan && filter_vlan != vid)
> > return 0;
> >
> > - if (jw_global)
> > - jsonw_start_object(jw_global);
> > -
> > - if (n->nlmsg_type == RTM_DELNEIGH) {
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "opCode", "deleted");
> > - else
> > - fprintf(fp, "Deleted ");
> > - }
> > + open_json_object(NULL);
> > + if (n->nlmsg_type == RTM_DELNEIGH)
> > + print_bool(PRINT_ANY, "deleted", "Deleted ", true);
> >
> > if (tb[NDA_LLADDR]) {
> > + const char *lladdr;
> > SPRINT_BUF(b1);
> > - ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
> > - RTA_PAYLOAD(tb[NDA_LLADDR]),
> > - ll_index_to_type(r->ndm_ifindex),
> > - b1, sizeof(b1));
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "mac", b1);
> > - else
> > - fprintf(fp, "%s ", b1);
> > +
> > + lladdr = ll_addr_n2a(RTA_DATA(tb[NDA_LLADDR]),
> > + RTA_PAYLOAD(tb[NDA_LLADDR]),
> > + ll_index_to_type(r->ndm_ifindex),
> > + b1, sizeof(b1));
> > +
> > + print_color_string(PRINT_ANY, COLOR_MAC,
> > + "mac", "%s ", lladdr);
> > }
> >
> > if (!filter_index && r->ndm_ifindex) {
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "dev",
> > - ll_index_to_name(r->ndm_ifindex));
> > - else
> > - fprintf(fp, "dev %s ",
> > - ll_index_to_name(r->ndm_ifindex));
> > + if (!is_json_context())
> > + fprintf(fp, "dev ");
> > + print_color_string(PRINT_ANY, COLOR_IFNAME,
> > + "ifname", "%s ",
> > + ll_index_to_name(r->ndm_ifindex));
> > }
> >
> > if (tb[NDA_DST]) {
> > int family = AF_INET;
> > - const char *abuf_s;
> > + const char *dst;
> >
> > if (RTA_PAYLOAD(tb[NDA_DST]) == sizeof(struct in6_addr))
> > family = AF_INET6;
> >
> > - abuf_s = format_host(family,
> > - RTA_PAYLOAD(tb[NDA_DST]),
> > - RTA_DATA(tb[NDA_DST]));
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "dst", abuf_s);
> > - else
> > - fprintf(fp, "dst %s ", abuf_s);
> > - }
> > + dst = format_host(family,
> > + RTA_PAYLOAD(tb[NDA_DST]),
> > + RTA_DATA(tb[NDA_DST]));
> >
> > - if (vid) {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "vlan", vid);
> > - else
> > - fprintf(fp, "vlan %hu ", vid);
> > + print_color_string(PRINT_ANY,
> > + ifa_family_color(family),
> > + "dst", "%s ", dst);
> > }
> >
> > - if (tb[NDA_PORT]) {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "port",
> > - rta_getattr_be16(tb[NDA_PORT]));
> > - else
> > - fprintf(fp, "port %d ",
> > - rta_getattr_be16(tb[NDA_PORT]));
> > - }
> > + if (vid)
> > + print_uint(PRINT_ANY,
> > + "vlan", "vlan %hu ", vid);
> >
> > - if (tb[NDA_VNI]) {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "vni",
> > - rta_getattr_u32(tb[NDA_VNI]));
> > - else
> > - fprintf(fp, "vni %d ",
> > - rta_getattr_u32(tb[NDA_VNI]));
> > - }
> > + if (tb[NDA_PORT])
> > + print_uint(PRINT_ANY,
> > + "port", "port %u ",
> > + rta_getattr_be16(tb[NDA_PORT]));
> >
> > - if (tb[NDA_SRC_VNI]) {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "src_vni",
> > - rta_getattr_u32(tb[NDA_SRC_VNI]));
> > - else
> > - fprintf(fp, "src_vni %d ",
> > + if (tb[NDA_VNI])
> > + print_uint(PRINT_ANY,
> > + "vni", "vni %u ",
> > + rta_getattr_u32(tb[NDA_VNI]));
> > +
> > + if (tb[NDA_SRC_VNI])
> > + print_uint(PRINT_ANY,
> > + "src_vni", "src_vni %u ",
> > rta_getattr_u32(tb[NDA_SRC_VNI]));
> > - }
> >
> > if (tb[NDA_IFINDEX]) {
> > unsigned int ifindex = rta_getattr_u32(tb[NDA_IFINDEX]);
> >
> > - if (ifindex) {
> > - if (!tb[NDA_LINK_NETNSID]) {
> > - const char *ifname = ll_index_to_name(ifindex);
> > -
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "viaIf",
> > - ifname);
> > - else
> > - fprintf(fp, "via %s ", ifname);
> > - } else {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "viaIfIndex",
> > - ifindex);
> > - else
> > - fprintf(fp, "via ifindex %u ", ifindex);
> > - }
> > - }
> > - }
> > -
> > - if (tb[NDA_LINK_NETNSID]) {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "linkNetNsId",
> > - rta_getattr_u32(tb[NDA_LINK_NETNSID]));
> > + if (tb[NDA_LINK_NETNSID])
> > + print_uint(PRINT_ANY,
> > + "viaIfIndex", "via ifindex %u ",
> > + ifindex);
> > else
> > - fprintf(fp, "link-netnsid %d ",
> > - rta_getattr_u32(tb[NDA_LINK_NETNSID]));
> > + print_string(PRINT_ANY,
> > + "viaIf", "via %s ",
> > + ll_index_to_name(ifindex));
> > }
> >
> > - if (show_stats && tb[NDA_CACHEINFO]) {
> > - struct nda_cacheinfo *ci = RTA_DATA(tb[NDA_CACHEINFO]);
> > - int hz = get_user_hz();
> > + if (tb[NDA_LINK_NETNSID])
> > + print_uint(PRINT_ANY,
> > + "linkNetNsId", "link-netnsid %d ",
> > + rta_getattr_u32(tb[NDA_LINK_NETNSID]));
> >
> > - if (jw_global) {
> > - jsonw_uint_field(jw_global, "used",
> > - ci->ndm_used/hz);
> > - jsonw_uint_field(jw_global, "updated",
> > - ci->ndm_updated/hz);
> > - } else {
> > - fprintf(fp, "used %d/%d ", ci->ndm_used/hz,
> > - ci->ndm_updated/hz);
> > - }
> > - }
> > + if (show_stats && tb[NDA_CACHEINFO])
> > + fdb_print_stats(fp, RTA_DATA(tb[NDA_CACHEINFO]));
> >
> > - if (jw_global) {
> > - if (r->ndm_flags & NTF_SELF) {
> > - start_json_fdb_flags_array(&fdb_flags);
> > - jsonw_string(jw_global, "self");
> > - }
> > - if (r->ndm_flags & NTF_ROUTER) {
> > - start_json_fdb_flags_array(&fdb_flags);
> > - jsonw_string(jw_global, "router");
> > - }
> > - if (r->ndm_flags & NTF_EXT_LEARNED) {
> > - start_json_fdb_flags_array(&fdb_flags);
> > - jsonw_string(jw_global, "extern_learn");
> > - }
> > - if (r->ndm_flags & NTF_OFFLOADED) {
> > - start_json_fdb_flags_array(&fdb_flags);
> > - jsonw_string(jw_global, "offload");
> > - }
> > - if (r->ndm_flags & NTF_MASTER)
> > - jsonw_string(jw_global, "master");
> > - if (fdb_flags)
> > - jsonw_end_array(jw_global);
> > + fdb_print_flags(fp, r->ndm_flags);
> >
> > - if (tb[NDA_MASTER])
> > - jsonw_string_field(jw_global,
> > - "master",
> > - ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
> >
> > - } else {
> > - if (r->ndm_flags & NTF_SELF)
> > - fprintf(fp, "self ");
> > - if (r->ndm_flags & NTF_ROUTER)
> > - fprintf(fp, "router ");
> > - if (r->ndm_flags & NTF_EXT_LEARNED)
> > - fprintf(fp, "extern_learn ");
> > - if (r->ndm_flags & NTF_OFFLOADED)
> > - fprintf(fp, "offload ");
> > - if (tb[NDA_MASTER]) {
> > - fprintf(fp, "master %s ",
> > - ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
> > - } else if (r->ndm_flags & NTF_MASTER) {
> > - fprintf(fp, "master ");
> > - }
> > - }
> > -
> > - state_s = state_n2a(r->ndm_state);
> > - if (jw_global) {
> > - if (state_s[0])
> > - jsonw_string_field(jw_global, "state", state_s);
> > -
> > - jsonw_end_object(jw_global);
> > - } else {
> > - fprintf(fp, "%s\n", state_s);
> > -
> > - fflush(fp);
> > - }
> > + if (tb[NDA_MASTER])
> > + print_string(PRINT_ANY, "master", "%s ",
> > + ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
> >
> > + print_string(PRINT_ANY, "state", "%s\n",
> > + state_n2a(r->ndm_state));
> > + close_json_object();
> > + fflush(fp);
> > return 0;
> > }
> >
> > @@ -386,26 +324,13 @@ static int fdb_show(int argc, char **argv)
> > exit(1);
> > }
> >
> > - if (json) {
> > - jw_global = jsonw_new(stdout);
> > - if (!jw_global) {
> > - fprintf(stderr, "Error allocation json object\n");
> > - exit(1);
> > - }
> > - if (pretty)
> > - jsonw_pretty(jw_global, 1);
> > -
> > - jsonw_start_array(jw_global);
> > - }
> > -
> > + new_json_obj(json);
> > if (rtnl_dump_filter(&rth, print_fdb, stdout) < 0) {
> > fprintf(stderr, "Dump terminated\n");
> > exit(1);
> > }
> > - if (jw_global) {
> > - jsonw_end_array(jw_global);
> > - jsonw_destroy(&jw_global);
> > - }
> > + delete_json_obj();
> > + fflush(stdout);
> >
> > return 0;
> > }
> > diff --git a/bridge/mdb.c b/bridge/mdb.c
> > index da0282fdc91c..8c08baf570ec 100644
> > --- a/bridge/mdb.c
> > +++ b/bridge/mdb.c
> > @@ -14,12 +14,12 @@
> > #include <linux/if_ether.h>
> > #include <string.h>
> > #include <arpa/inet.h>
> > -#include <json_writer.h>
> >
> > #include "libnetlink.h"
> > #include "br_common.h"
> > #include "rt_names.h"
> > #include "utils.h"
> > +#include "json_print.h"
> >
> > #ifndef MDBA_RTA
> > #define MDBA_RTA(r) \
> > @@ -27,9 +27,6 @@
> > #endif
> >
> > static unsigned int filter_index, filter_vlan;
> > -json_writer_t *jw_global;
> > -static bool print_mdb_entries = true;
> > -static bool print_mdb_router = true;
> >
> > static void usage(void)
> > {
> > @@ -43,162 +40,131 @@ static bool is_temp_mcast_rtr(__u8 type)
> > return type == MDB_RTR_TYPE_TEMP_QUERY || type == MDB_RTR_TYPE_TEMP;
> > }
> >
> > +static const char *format_timer(__u32 ticks)
> > +{
> > + struct timeval tv;
> > + static char tbuf[32];
> > +
> > + __jiffies_to_tv(&tv, ticks);
> > + snprintf(tbuf, sizeof(tbuf), "%4lu.%.2lu",
> > + (unsigned long)tv.tv_sec,
> > + (unsigned long)tv.tv_usec / 10000);
> > +
> > + return tbuf;
> > +}
> > +
> > static void __print_router_port_stats(FILE *f, struct rtattr *pattr)
> > {
> > struct rtattr *tb[MDBA_ROUTER_PATTR_MAX + 1];
> > - struct timeval tv;
> > - __u8 type;
> >
> > parse_rtattr(tb, MDBA_ROUTER_PATTR_MAX, MDB_RTR_RTA(RTA_DATA(pattr)),
> > RTA_PAYLOAD(pattr) - RTA_ALIGN(sizeof(uint32_t)));
> > +
> > if (tb[MDBA_ROUTER_PATTR_TIMER]) {
> > - __jiffies_to_tv(&tv,
> > - rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]));
> > - if (jw_global) {
> > - char formatted_time[9];
> > -
> > - snprintf(formatted_time, sizeof(formatted_time),
> > - "%4i.%.2i", (int)tv.tv_sec,
> > - (int)tv.tv_usec/10000);
> > - jsonw_string_field(jw_global, "timer", formatted_time);
> > - } else {
> > - fprintf(f, " %4i.%.2i",
> > - (int)tv.tv_sec, (int)tv.tv_usec/10000);
> > - }
> > + __u32 timer = rta_getattr_u32(tb[MDBA_ROUTER_PATTR_TIMER]);
> > +
> > + print_string(PRINT_ANY, "timer", " %s",
> > + format_timer(timer));
> > }
> > +
> > if (tb[MDBA_ROUTER_PATTR_TYPE]) {
> > - type = rta_getattr_u8(tb[MDBA_ROUTER_PATTR_TYPE]);
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "type",
> > - is_temp_mcast_rtr(type) ? "temp" : "permanent");
> > - else
> > - fprintf(f, " %s",
> > - is_temp_mcast_rtr(type) ? "temp" : "permanent");
> > + __u8 type = rta_getattr_u8(tb[MDBA_ROUTER_PATTR_TYPE]);
> > +
> > + print_string(PRINT_ANY, "type", " %s",
> > + is_temp_mcast_rtr(type) ? "temp" : "permanent");
> > }
> > }
> >
> > -static void br_print_router_ports(FILE *f, struct rtattr *attr, __u32 brifidx)
> > +static void br_print_router_ports(FILE *f, struct rtattr *attr,
> > + const char *brifname)
> > {
> > - uint32_t *port_ifindex;
> > + int rem = RTA_PAYLOAD(attr);
> > struct rtattr *i;
> > - int rem;
> >
> > - rem = RTA_PAYLOAD(attr);
> > - if (jw_global) {
> > - jsonw_name(jw_global, ll_index_to_name(brifidx));
> > - jsonw_start_array(jw_global);
> > - for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
> > - port_ifindex = RTA_DATA(i);
> > - jsonw_start_object(jw_global);
> > - jsonw_string_field(jw_global,
> > - "port",
> > - ll_index_to_name(*port_ifindex));
> > + if (is_json_context())
> > + open_json_array(PRINT_JSON, brifname);
> > + else if (!show_stats)
> > + fprintf(f, "router ports on %s: ", brifname);
> > +
> > + for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
> > + uint32_t *port_ifindex = RTA_DATA(i);
> > + const char *port_ifname = ll_index_to_name(*port_ifindex);
> > +
> > + if (is_json_context()) {
> > + open_json_object(NULL);
> > + print_string(PRINT_JSON, "port", NULL, port_ifname);
> > +
> > if (show_stats)
> > __print_router_port_stats(f, i);
> > - jsonw_end_object(jw_global);
> > - }
> > - jsonw_end_array(jw_global);
> > - } else {
> > - if (!show_stats)
> > - fprintf(f, "router ports on %s: ",
> > - ll_index_to_name(brifidx));
> > - for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
> > - port_ifindex = RTA_DATA(i);
> > - if (show_stats) {
> > - fprintf(f, "router ports on %s: %s",
> > - ll_index_to_name(brifidx),
> > - ll_index_to_name(*port_ifindex));
> > - __print_router_port_stats(f, i);
> > - fprintf(f, "\n");
> > - } else{
> > - fprintf(f, "%s ",
> > - ll_index_to_name(*port_ifindex));
> > - }
> > - }
> > - if (!show_stats)
> > + close_json_object();
> > + } else if (show_stats) {
> > + fprintf(f, "router ports on %s: %s",
> > + brifname, port_ifname);
> > +
> > + __print_router_port_stats(f, i);
> > fprintf(f, "\n");
> > + } else {
> > + fprintf(f, "%s ", port_ifname);
> > + }
> > }
> > + close_json_array(PRINT_JSON, NULL);
> > }
> >
> > -static void start_json_mdb_flags_array(bool *mdb_flags)
> > -{
> > - if (*mdb_flags)
> > - return;
> > - jsonw_name(jw_global, "flags");
> > - jsonw_start_array(jw_global);
> > - *mdb_flags = true;
> > -}
> > -
> > -static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e,
> > +static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
> > struct nlmsghdr *n, struct rtattr **tb)
> > {
> > SPRINT_BUF(abuf);
> > + const char *dev;
> > const void *src;
> > int af;
> > - bool mdb_flags = false;
> >
> > if (filter_vlan && e->vid != filter_vlan)
> > return;
> > +
> > af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
> > src = af == AF_INET ? (const void *)&e->addr.u.ip4 :
> > (const void *)&e->addr.u.ip6;
> > - if (jw_global)
> > - jsonw_start_object(jw_global);
> > - if (n->nlmsg_type == RTM_DELMDB) {
> > - if (jw_global)
> > - jsonw_string_field(jw_global, "opCode", "deleted");
> > - else
> > - fprintf(f, "Deleted ");
> > - }
> > - if (jw_global) {
> > - jsonw_string_field(jw_global, "dev", ll_index_to_name(ifindex));
> > - jsonw_string_field(jw_global,
> > - "port",
> > - ll_index_to_name(e->ifindex));
> > - jsonw_string_field(jw_global, "grp", inet_ntop(af, src,
> > - abuf, sizeof(abuf)));
> > - jsonw_string_field(jw_global, "state",
> > - (e->state & MDB_PERMANENT) ? "permanent" : "temp");
> > - if (e->flags & MDB_FLAGS_OFFLOAD) {
> > - start_json_mdb_flags_array(&mdb_flags);
> > - jsonw_string(jw_global, "offload");
> > - }
> > - if (mdb_flags)
> > - jsonw_end_array(jw_global);
> > - } else{
> > - fprintf(f, "dev %s port %s grp %s %s %s",
> > - ll_index_to_name(ifindex),
> > - ll_index_to_name(e->ifindex),
> > - inet_ntop(af, src, abuf, sizeof(abuf)),
> > - (e->state & MDB_PERMANENT) ? "permanent" : "temp",
> > - (e->flags & MDB_FLAGS_OFFLOAD) ? "offload" : "");
> > - }
> > - if (e->vid) {
> > - if (jw_global)
> > - jsonw_uint_field(jw_global, "vid", e->vid);
> > - else
> > - fprintf(f, " vid %hu", e->vid);
> > + dev = ll_index_to_name(ifindex);
> > +
> > + open_json_object(NULL);
> > +
> > + if (n->nlmsg_type == RTM_DELMDB)
> > + print_bool(PRINT_ANY, "deleted", "Deleted ", true);
> > +
> > +
> > + if (is_json_context()) {
> > + print_int(PRINT_JSON, "index", NULL, ifindex);
> > + print_string(PRINT_JSON, "dev", NULL, dev);
> > + } else {
> > + fprintf(f, "%u: ", ifindex);
> > + color_fprintf(f, COLOR_IFNAME, "%s ", dev);
> > }
> > - if (show_stats && tb && tb[MDBA_MDB_EATTR_TIMER]) {
> > - struct timeval tv;
> >
> > - __jiffies_to_tv(&tv, rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]));
> > - if (jw_global) {
> > - char formatted_time[9];
> > + print_string(PRINT_ANY, "port", " %s ",
> > + ll_index_to_name(e->ifindex));
> >
> > - snprintf(formatted_time, sizeof(formatted_time),
> > - "%4i.%.2i", (int)tv.tv_sec,
> > - (int)tv.tv_usec/10000);
> > - jsonw_string_field(jw_global, "timer", formatted_time);
> > - } else {
> > - fprintf(f, "%4i.%.2i", (int)tv.tv_sec,
> > - (int)tv.tv_usec/10000);
> > - }
> > + print_color_string(PRINT_ANY, ifa_family_color(af),
> > + "grp", " %s ",
> > + inet_ntop(af, src, abuf, sizeof(abuf)));
> > +
> > + print_string(PRINT_ANY, "state", " %s ",
> > + (e->state & MDB_PERMANENT) ? "permanent" : "temp");
> > +
> > + open_json_array(PRINT_JSON, "flags");
> > + if (e->flags & MDB_FLAGS_OFFLOAD)
> > + print_string(PRINT_ANY, NULL, "%s ", "offload");
> > + close_json_array(PRINT_JSON, NULL);
> > +
> > + if (e->vid)
> > + print_uint(PRINT_ANY, "vid", " vid %u", e->vid);
> > +
> > + if (show_stats && tb && tb[MDBA_MDB_EATTR_TIMER]) {
> > + __u32 timer = rta_getattr_u32(tb[MDBA_MDB_EATTR_TIMER]);
> > +
> > + print_string(PRINT_ANY, "timer", " %s",
> > + format_timer(timer));
> > }
> > - if (jw_global)
> > - jsonw_end_object(jw_global);
> > - else
> > - fprintf(f, "\n");
> > + close_json_object();
> > }
> >
> > static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
> > @@ -218,15 +184,60 @@ static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr,
> > }
> > }
> >
> > +static void print_mdb_entries(FILE *fp, struct nlmsghdr *n,
> > + int ifindex, struct rtattr *mdb)
> > +{
> > + int rem = RTA_PAYLOAD(mdb);
> > + struct rtattr *i;
> > +
> > + open_json_array(PRINT_JSON, "mdb");
> > + for (i = RTA_DATA(mdb); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
> > + br_print_mdb_entry(fp, ifindex, i, n);
> > + close_json_array(PRINT_JSON, NULL);
> > +}
> > +
> > +static void print_router_entries(FILE *fp, struct nlmsghdr *n,
> > + int ifindex, struct rtattr *router)
> > +{
> > + const char *brifname = ll_index_to_name(ifindex);
> > +
> > + open_json_array(PRINT_JSON, "router");
> > + if (n->nlmsg_type == RTM_GETMDB) {
> > + if (show_details)
> > + br_print_router_ports(fp, router, brifname);
> > + } else {
> > + struct rtattr *i = RTA_DATA(router);
> > + uint32_t *port_ifindex = RTA_DATA(i);
> > +
> > + if (is_json_context()) {
> > + open_json_array(PRINT_JSON, brifname);
> > + open_json_object(NULL);
> > +
> > + print_string(PRINT_JSON, "port", NULL,
> > + ll_index_to_name(*port_ifindex));
> > + close_json_object();
> > + close_json_array(PRINT_JSON, NULL);
> > + } else {
> > + fprintf(fp, "router port dev %s master %s\n",
> > + ll_index_to_name(*port_ifindex),
> > + brifname);
> > + }
> > + }
> > + close_json_array(PRINT_JSON, NULL);
> > +}
> > +
> > int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
> > {
> > FILE *fp = arg;
> > struct br_port_msg *r = NLMSG_DATA(n);
> > int len = n->nlmsg_len;
> > - struct rtattr *tb[MDBA_MAX+1], *i;
> > + struct rtattr *tb[MDBA_MAX+1];
> >
> > - if (n->nlmsg_type != RTM_GETMDB && n->nlmsg_type != RTM_NEWMDB && n->nlmsg_type != RTM_DELMDB) {
> > - fprintf(stderr, "Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
> > + if (n->nlmsg_type != RTM_GETMDB &&
> > + n->nlmsg_type != RTM_NEWMDB &&
> > + n->nlmsg_type != RTM_DELMDB) {
> > + fprintf(stderr,
> > + "Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
> > n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
> >
> > return 0;
> > @@ -243,50 +254,14 @@ int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
> >
> > parse_rtattr(tb, MDBA_MAX, MDBA_RTA(r), n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
> >
> > - if (tb[MDBA_MDB] && print_mdb_entries) {
> > - int rem = RTA_PAYLOAD(tb[MDBA_MDB]);
> > + if (n->nlmsg_type == RTM_DELMDB)
> > + print_bool(PRINT_ANY, "deleted", "Deleted ", true);
> >
> > - for (i = RTA_DATA(tb[MDBA_MDB]); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
> > - br_print_mdb_entry(fp, r->ifindex, i, n);
> > - }
> > + if (tb[MDBA_MDB])
> > + print_mdb_entries(fp, n, r->ifindex, tb[MDBA_MDB]);
> >
> > - if (tb[MDBA_ROUTER] && print_mdb_router) {
> > - if (n->nlmsg_type == RTM_GETMDB) {
> > - if (show_details)
> > - br_print_router_ports(fp, tb[MDBA_ROUTER],
> > - r->ifindex);
> > - } else {
> > - uint32_t *port_ifindex;
> > -
> > - i = RTA_DATA(tb[MDBA_ROUTER]);
> > - port_ifindex = RTA_DATA(i);
> > - if (n->nlmsg_type == RTM_DELMDB) {
> > - if (jw_global)
> > - jsonw_string_field(jw_global,
> > - "opCode",
> > - "deleted");
> > - else
> > - fprintf(fp, "Deleted ");
> > - }
> > - if (jw_global) {
> > - jsonw_name(jw_global,
> > - ll_index_to_name(r->ifindex));
> > - jsonw_start_array(jw_global);
> > - jsonw_start_object(jw_global);
> > - jsonw_string_field(jw_global, "port",
> > - ll_index_to_name(*port_ifindex));
> > - jsonw_end_object(jw_global);
> > - jsonw_end_array(jw_global);
> > - } else {
> > - fprintf(fp, "router port dev %s master %s\n",
> > - ll_index_to_name(*port_ifindex),
> > - ll_index_to_name(r->ifindex));
> > - }
> > - }
> > - }
> > -
> > - if (!jw_global)
> > - fflush(fp);
> > + if (tb[MDBA_ROUTER])
> > + print_router_entries(fp, n, r->ifindex, tb[MDBA_ROUTER]);
> >
> > return 0;
> > }
> > @@ -319,62 +294,21 @@ static int mdb_show(int argc, char **argv)
> > }
> > }
> >
> > + new_json_obj(json);
> > +
> > /* get mdb entries*/
> > if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETMDB) < 0) {
> > perror("Cannot send dump request");
> > return -1;
> > }
> >
> > - if (!json) {
> > - /* Normal output */
> > - if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
> > - fprintf(stderr, "Dump terminated\n");
> > - return -1;
> > - }
> > - return 0;
> > - }
> > -
> > - /* Json output */
> > - jw_global = jsonw_new(stdout);
> > - if (!jw_global) {
> > - fprintf(stderr, "Error allocation json object\n");
> > - exit(1);
> > - }
> > -
> > - if (pretty)
> > - jsonw_pretty(jw_global, 1);
> > -
> > - jsonw_start_object(jw_global);
> > - jsonw_name(jw_global, "mdb");
> > - jsonw_start_array(jw_global);
> > -
> > - /* print mdb entries */
> > - print_mdb_entries = true;
> > - print_mdb_router = false;
> > if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
> > fprintf(stderr, "Dump terminated\n");
> > return -1;
> > }
> > - jsonw_end_array(jw_global);
> > -
> > - /* get router ports */
> > - if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETMDB) < 0) {
> > - perror("Cannot send dump request");
> > - return -1;
> > - }
> > - jsonw_name(jw_global, "router");
> > - jsonw_start_object(jw_global);
> >
> > - /* print router ports */
> > - print_mdb_entries = false;
> > - print_mdb_router = true;
> > - if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
> > - fprintf(stderr, "Dump terminated\n");
> > - return -1;
> > - }
> > - jsonw_end_object(jw_global);
> > - jsonw_end_object(jw_global);
> > - jsonw_destroy(&jw_global);
> > + delete_json_obj();
> > + fflush(stdout);
> >
> > return 0;
> > }
> > diff --git a/bridge/vlan.c b/bridge/vlan.c
> > index 7c8b3ad54857..9f4a7a2be55c 100644
> > --- a/bridge/vlan.c
> > +++ b/bridge/vlan.c
> > @@ -8,19 +8,16 @@
> > #include <netinet/in.h>
> > #include <linux/if_bridge.h>
> > #include <linux/if_ether.h>
> > -#include <json_writer.h>
> > #include <string.h>
> >
> > +#include "json_print.h"
> > #include "libnetlink.h"
> > #include "br_common.h"
> > #include "utils.h"
> >
> > static unsigned int filter_index, filter_vlan;
> > -static int last_ifidx = -1;
> > static int show_vlan_tunnel_info = 0;
> >
> > -json_writer_t *jw_global;
> > -
> > static void usage(void)
> > {
> > fprintf(stderr,
> > @@ -257,38 +254,33 @@ static int filter_vlan_check(__u16 vid, __u16 flags)
> >
> > static void print_vlan_port(FILE *fp, int ifi_index)
> > {
> > - if (jw_global) {
> > - jsonw_name(jw_global,
> > - ll_index_to_name(ifi_index));
> > - jsonw_start_array(jw_global);
> > - } else {
> > - fprintf(fp, "%s",
> > - ll_index_to_name(ifi_index));
> > - }
> > + print_string(PRINT_ANY, NULL, "%s",
> > + ll_index_to_name(ifi_index));
> > }
> >
>
> Stephen, this seems to have broken both json and non-json output.
>
> Here is some output before and after the patch (same thing for tunnelshow):
>
> before:
> $bridge vlan show
> port vlan ids
> hostbond4 1000
> 1001 PVID Egress Untagged
> 1002
> 1003
> 1004
>
> hostbond3 1000 PVID Egress Untagged
> 1001
> 1002
> 1003
> 1004
>
> bridge 1 PVID Egress Untagged
> 1000
> 1001
> 1002
> 1003
> 1004
>
> vxlan0 1 PVID Egress Untagged
> 1000
> 1001
> 1002
> 1003
> 1004
>
>
> $ bridge -j -c vlan show
> {
> "hostbond4": [{
> "vlan": 1000
> },{
> "vlan": 1001,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1002,
> "vlanEnd": 1004
> }
> ],
> "hostbond3": [{
> "vlan": 1000,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1001,
> "vlanEnd": 1004
> }
> ],
> "bridge": [{
> "vlan": 1,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1000,
> "vlanEnd": 1004
> }
> ],
> "vxlan0": [{
> "vlan": 1,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1000,
> "vlanEnd": 1004
> }
> ]
> }
>
>
> after:
> ====
>
> $bridge vlan show
> port vlan ids
> hostbond4
> 1000 1001 PVID untagged 1002 1003 1004
> hostbond3
> 1000 PVID untagged 1001 1002 1003 1004
> bridge
> 1 PVID untagged 1000 1001 1002 1003 1004
> vxlan0
> 1 PVID untagged 1000 1001 1002 1003 1004
>
> $bridge -j -c vlan show
> ["hostbond4","vlan":[{"vlan":1000},{"vlan":1001,"pvid":null,"untagged":null},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"hostbond3","vlan":[{"vlan":1000,"pvid":null,"untagged":null},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"bridge","vlan":[{"vlan":1,"pvid":null,"untagged":null},{"vlan":1000},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"vxlan0","vlan":[{"vlan":1,"pvid":null,"untagged":null},{"vlan":1000},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}]]
I can fix it.
^ permalink raw reply
* Fw: [Bug 200943] New: Repeating tcp_mark_head_lost in dmesg
From: Stephen Hemminger @ 2018-08-29 15:02 UTC (permalink / raw)
To: netdev
Begin forwarded message:
Date: Sun, 26 Aug 2018 22:24:12 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 200943] New: Repeating tcp_mark_head_lost in dmesg
https://bugzilla.kernel.org/show_bug.cgi?id=200943
Bug ID: 200943
Summary: Repeating tcp_mark_head_lost in dmesg
Product: Networking
Version: 2.5
Kernel Version: 4.14.66
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: IPV4
Assignee: stephen@networkplumber.org
Reporter: rm+bko@romanrm.net
Regression: No
Getting a bunch of these now every hour during continuous ~100 Mbit of network
traffic.
What's up with that? Seems harmless, as in the kernel doesn't crash and the
network connection is not interrupted. (Maybe the particular TCP session is?)
If there are no ill-effects from this condition, is such spammy WARN_ON really
necessary?
[Mon Aug 27 02:16:11 2018] ------------[ cut here ]------------
[Mon Aug 27 02:16:11 2018] WARNING: CPU: 5 PID: 0 at net/ipv4/tcp_input.c:2263
tcp_mark_head_lost+0x247/0x260
[Mon Aug 27 02:16:11 2018] Modules linked in: dm_snapshot loop vhost_net vhost
tap tun ip6t_MASQUERADE nf_nat_masquerade_ipv6 ipt_MASQUERADE
nf_nat_masquerade_ipv4 xt_DSCP xt_mark ip6t_REJECT nf_reject_ipv6 ipt_REJECT
nf_reject_ipv4 xt_owner xt_tcpudp xt_set ip_set_hash_net ip_set nfnetlink
xt_limit xt_length xt_multiport xt_conntrack ip6t_rpfilter ipt_rpfilter
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw
wireguard ip6_udp_tunnel udp_tunnel ip6table_mangle iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw
iptable_mangle ip6table_filter ip6_tables matroxfb_base matroxfb_g450
matroxfb_Ti3026 matroxfb_accel matroxfb_DAC1064 g450_pll matroxfb_misc
iptable_filter ip_tables x_tables cpufreq_powersave cpufreq_userspace
cpufreq_conservative 8021q garp mrp
[Mon Aug 27 02:16:11 2018] bridge stp llc bonding tcp_bbr sch_fq tcp_illinois
fuse radeon ttm drm_kms_helper drm i2c_algo_bit it87 hwmon_vid eeepc_wmi
asus_wmi sparse_keymap rfkill video wmi_bmof mxm_wmi edac_mce_amd kvm_amd kvm
snd_pcm snd_timer snd soundcore joydev evdev pcspkr k10temp fam15h_power
sp5100_tco sg shpchp wmi pcc_cpufreq acpi_cpufreq button ext4 crc16 mbcache
jbd2 fscrypto btrfs zstd_decompress zstd_compress xxhash algif_skcipher af_alg
dm_crypt dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_mod
hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0
multipath linear md_mod vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
ohci_pci crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
[Mon Aug 27 02:16:11 2018] pcbc aesni_intel aes_x86_64 crypto_simd glue_helper
cryptd r8169 ahci xhci_pci libahci ohci_hcd ehci_pci mii xhci_hcd ehci_hcd
i2c_piix4 libata usbcore scsi_mod bnx2
[Mon Aug 27 02:16:11 2018] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W
4.14.66-rm1+ #132
[Mon Aug 27 02:16:11 2018] Hardware name: To be filled by O.E.M. To be filled
by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2901 05/04/2016
[Mon Aug 27 02:16:11 2018] task: ffff8ba79c679dc0 task.stack: ffffb4d741928000
[Mon Aug 27 02:16:11 2018] RIP: 0010:tcp_mark_head_lost+0x247/0x260
[Mon Aug 27 02:16:11 2018] RSP: 0018:ffff8ba7aed437d8 EFLAGS: 00010202
[Mon Aug 27 02:16:11 2018] RAX: 0000000000000018 RBX: ffff8ba3901a0800 RCX:
0000000000000000
[Mon Aug 27 02:16:11 2018] RDX: 0000000000000017 RSI: 0000000000000001 RDI:
ffff8ba4d47e9000
[Mon Aug 27 02:16:11 2018] RBP: ffff8ba4d47e9000 R08: 000000000000000d R09:
0000000000000000
[Mon Aug 27 02:16:11 2018] R10: 000000000000100c R11: 0000000000000000 R12:
0000000000000001
[Mon Aug 27 02:16:11 2018] R13: ffff8ba4d47e9158 R14: 0000000000000001 R15:
000000009d0b6708
[Mon Aug 27 02:16:11 2018] FS: 0000000000000000(0000)
GS:ffff8ba7aed40000(0000) knlGS:0000000000000000
[Mon Aug 27 02:16:11 2018] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Aug 27 02:16:11 2018] CR2: 0000000001fbdff0 CR3: 000000040c8d6000 CR4:
00000000000406e0
[Mon Aug 27 02:16:11 2018] Call Trace:
[Mon Aug 27 02:16:11 2018] <IRQ>
[Mon Aug 27 02:16:11 2018] tcp_fastretrans_alert+0x5c3/0xa20
[Mon Aug 27 02:16:11 2018] tcp_ack+0x95a/0x1170
[Mon Aug 27 02:16:11 2018] ? __slab_free.isra.70+0x79/0x200
[Mon Aug 27 02:16:11 2018] tcp_rcv_established+0x16a/0x5a0
[Mon Aug 27 02:16:11 2018] ? tcp_v4_inbound_md5_hash+0x76/0x1e0
[Mon Aug 27 02:16:11 2018] tcp_v4_do_rcv+0x130/0x1f0
[Mon Aug 27 02:16:11 2018] tcp_v4_rcv+0x9ac/0xaa0
[Mon Aug 27 02:16:11 2018] ip_local_deliver_finish+0x9a/0x1c0
[Mon Aug 27 02:16:11 2018] ip_local_deliver+0x6b/0xe0
[Mon Aug 27 02:16:11 2018] ? ip_rcv_finish+0x440/0x440
[Mon Aug 27 02:16:11 2018] ip_rcv+0x2b0/0x3c0
[Mon Aug 27 02:16:11 2018] ? inet_del_offload+0x50/0x50
[Mon Aug 27 02:16:11 2018] __netif_receive_skb_core+0x85f/0xb50
[Mon Aug 27 02:16:11 2018] ? br_allowed_egress+0x2d/0x50 [bridge]
[Mon Aug 27 02:16:11 2018] ? br_forward+0x49/0xe0 [bridge]
[Mon Aug 27 02:16:11 2018] ? br_vlan_lookup+0xdd/0x150 [bridge]
[Mon Aug 27 02:16:11 2018] netif_receive_skb_internal+0x34/0xe0
[Mon Aug 27 02:16:11 2018] ? br_handle_vlan+0x4b/0xf0 [bridge]
[Mon Aug 27 02:16:11 2018] br_pass_frame_up+0xd4/0x180 [bridge]
[Mon Aug 27 02:16:11 2018] ? br_allowed_ingress+0x1ea/0x2e0 [bridge]
[Mon Aug 27 02:16:11 2018] br_handle_frame_finish+0x23f/0x530 [bridge]
[Mon Aug 27 02:16:11 2018] ? get_partial_node.isra.69+0x13c/0x1d0
[Mon Aug 27 02:16:11 2018] br_handle_frame+0x1b7/0x320 [bridge]
[Mon Aug 27 02:16:11 2018] __netif_receive_skb_core+0x367/0xb50
[Mon Aug 27 02:16:11 2018] ? inet_gro_receive+0x203/0x2b0
[Mon Aug 27 02:16:11 2018] netif_receive_skb_internal+0x34/0xe0
[Mon Aug 27 02:16:11 2018] napi_gro_receive+0xb8/0xe0
[Mon Aug 27 02:16:11 2018] bnx2_poll_work+0x71a/0x12e0 [bnx2]
[Mon Aug 27 02:16:11 2018] bnx2_poll_msix+0x41/0xf0 [bnx2]
[Mon Aug 27 02:16:11 2018] net_rx_action+0x28c/0x3f0
[Mon Aug 27 02:16:11 2018] __do_softirq+0x10a/0x2a2
[Mon Aug 27 02:16:11 2018] irq_exit+0xbe/0xd0
[Mon Aug 27 02:16:11 2018] do_IRQ+0x66/0x100
[Mon Aug 27 02:16:11 2018] common_interrupt+0x7d/0x7d
[Mon Aug 27 02:16:11 2018] </IRQ>
[Mon Aug 27 02:16:11 2018] RIP: 0010:cpuidle_enter_state+0xa4/0x2d0
[Mon Aug 27 02:16:11 2018] RSP: 0018:ffffb4d74192bea0 EFLAGS: 00000246
ORIG_RAX: ffffffffffffff1a
[Mon Aug 27 02:16:11 2018] RAX: ffff8ba7aed61800 RBX: 0000f8f2a31961cb RCX:
000000000000001f
[Mon Aug 27 02:16:11 2018] RDX: 0000f8f2a31961cb RSI: fffffff1cd4e0887 RDI:
0000000000000000
[Mon Aug 27 02:16:11 2018] RBP: 0000000000000002 R08: 000000000000000a R09:
000000000000000a
[Mon Aug 27 02:16:11 2018] R10: 0000000000000364 R11: 00000000000002a6 R12:
ffff8ba7961b3200
[Mon Aug 27 02:16:11 2018] R13: ffffffff90cb2c58 R14: 0000f8f2a3139a63 R15:
ffffffff90cb2b80
[Mon Aug 27 02:16:11 2018] do_idle+0x19d/0x200
[Mon Aug 27 02:16:11 2018] cpu_startup_entry+0x6f/0x80
[Mon Aug 27 02:16:11 2018] start_secondary+0x1ae/0x200
[Mon Aug 27 02:16:11 2018] secondary_startup_64+0xa5/0xb0
[Mon Aug 27 02:16:11 2018] Code: e8 df aa 00 00 85 c0 78 0c 0f b6 43 39 44 89
e6 e9 16 ff ff ff 8b 95 ec 05 00 00 8b 85 80 06 00 00 03 85 84 06 00 00 39 d0
76 a7 <0f> 0b eb a3 31 f6 e9 12 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00
[Mon Aug 27 02:16:11 2018] ---[ end trace 3d7c0b943ef03b6a ]---
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* Fw: [Bug 200967] New: No network with U.S. Robotics USR997902
From: Stephen Hemminger @ 2018-08-29 15:02 UTC (permalink / raw)
To: netdev
Begin forwarded message:
Date: Wed, 29 Aug 2018 04:36:20 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 200967] New: No network with U.S. Robotics USR997902
https://bugzilla.kernel.org/show_bug.cgi?id=200967
Bug ID: 200967
Summary: No network with U.S. Robotics USR997902
Product: Networking
Version: 2.5
Kernel Version: 4.18.5
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: IPV4
Assignee: stephen@networkplumber.org
Reporter: kyrimis@alumni.princeton.edu
Regression: No
Created attachment 278193
--> https://bugzilla.kernel.org/attachment.cgi?id=278193&action=edit
dmesg and hwinfo output
(I am reporting this upstream as instructed by OpenSUSE, where I had originally
reported this problem.)
After upgrading to kernel 4.18 (originally noticed the problem with kernel
4.18.0, the problem persists with kernel 4.18.5), I could no longer connect to
the network.
A similar machine, which I upgraded at the same time, had no problem, so I
thought that the problem might be due to my using a network card instead of the
motherboard's built-in network controller. Sure enough, configuring the
built-in controller and connecting that to the network worked fine.
According to lspci, the card that doesn't work with the new kernel is:
U.S. Robotics USR997902 10/100/1000 Mbps PCI Network Card (rev 10)
Ifconfig shows that the network controller has been recognized as such, but
that it has not obtained an IP address.
The problem did not occur with kernel 4.17.14.
I have attached the output of hwinfo and dmesg for kernels 4.17.14 and 4.18.5,
with the network cable connected to the U.S. Robotics controller (enp5s0).
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* [PATCH 2/2] net: ethernet: cpsw-phy-sel: prefer phandle for phy sel
From: Tony Lindgren @ 2018-08-29 15:00 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-omap, devicetree, Andrew Lunn, Grygorii Strashko,
Ivan Khoronzhuk, Mark Rutland, Murali Karicheri, Rob Herring
In-Reply-To: <20180829150024.43210-1-tony@atomide.com>
The cpsw-phy-sel device is not a child of the cpsw interconnect target
module. It lives in the system control module.
Let's fix this issue by trying to use cpsw-phy-sel phandle first if it
exists and if not fall back to current usage of trying to find the
cpsw-phy-sel child. That way the phy sel driver can be a child of the
system control module where it belongs in the device tree.
Without this fix, we cannot have a proper interconnect target module
hierarchy in device tree for things like genpd.
Note that deferred probe is mostly not supported by cpsw and this patch
does not attempt to fix that. In case deferred probe support is needed,
this could be added to cpsw_slave_open() and phy_connect() so they start
handling and returning errors.
For documenting it, looks like the cpsw-phy-sel is used for all cpsw device
tree nodes. It's missing the related binding documentation, so let's also
update the binding documentation accordingly.
Cc: devicetree@vger.kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
drivers/net/ethernet/ti/cpsw-phy-sel.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw-phy-sel.c b/drivers/net/ethernet/ti/cpsw-phy-sel.c
--- a/drivers/net/ethernet/ti/cpsw-phy-sel.c
+++ b/drivers/net/ethernet/ti/cpsw-phy-sel.c
@@ -170,10 +170,13 @@ void cpsw_phy_sel(struct device *dev, phy_interface_t phy_mode, int slave)
struct device_node *node;
struct cpsw_phy_sel_priv *priv;
- node = of_get_child_by_name(dev->of_node, "cpsw-phy-sel");
+ node = of_parse_phandle(dev->of_node, "cpsw-phy-sel", 0);
if (!node) {
- dev_err(dev, "Phy mode driver DT not found\n");
- return;
+ node = of_get_child_by_name(dev->of_node, "cpsw-phy-sel");
+ if (!node) {
+ dev_err(dev, "Phy mode driver DT not found\n");
+ return;
+ }
}
dev = bus_find_device(&platform_bus_type, NULL, node, match);
--
2.18.0
^ permalink raw reply
* [PATCH 1/2] dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle
From: Tony Lindgren @ 2018-08-29 15:00 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-omap, devicetree, Andrew Lunn, Grygorii Strashko,
Ivan Khoronzhuk, Mark Rutland, Murali Karicheri, Rob Herring
The current cpsw usage for cpsw-phy-sel is undocumented but is used for
all the boards using cpsw. And cpsw-phy-sel is not really a child of
the cpsw device, it lives in the system control module instead.
Let's document the existing usage, and improve it a bit where we prefer
to use a phandle instead of a child device for it. That way we can
properly describe the hardware in dts files for things like genpd.
Cc: devicetree@vger.kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
Documentation/devicetree/bindings/net/cpsw.txt | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -19,6 +19,10 @@ Required properties:
- slaves : Specifies number for slaves
- active_slave : Specifies the slave to use for time stamping,
ethtool and SIOCGMIIPHY
+- cpsw-phy-sel : Specifies the phandle to the CPSW phy mode selection
+ device. See also cpsw-phy-sel.txt for it's binding.
+ Note that in legacy cases cpsw-phy-sel may be
+ a child device instead of a phandle.
Optional properties:
- ti,hwmods : Must be "cpgmac0"
@@ -75,6 +79,7 @@ Examples:
cpts_clock_mult = <0x80000000>;
cpts_clock_shift = <29>;
syscon = <&cm>;
+ cpsw-phy-sel = <&phy_sel>;
cpsw_emac0: slave@0 {
phy_id = <&davinci_mdio>, <0>;
phy-mode = "rgmii-txid";
@@ -103,6 +108,7 @@ Examples:
cpts_clock_mult = <0x80000000>;
cpts_clock_shift = <29>;
syscon = <&cm>;
+ cpsw-phy-sel = <&phy_sel>;
cpsw_emac0: slave@0 {
phy_id = <&davinci_mdio>, <0>;
phy-mode = "rgmii-txid";
--
2.18.0
^ permalink raw reply
* [PATCH bpf 3/3] bpf: fix sg shift repair start offset in bpf_msg_pull_data
From: Daniel Borkmann @ 2018-08-29 14:50 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
In-Reply-To: <20180829145036.5514-1-daniel@iogearbox.net>
When we perform the sg shift repair for the scatterlist ring, we
currently start out at i = first_sg + 1. However, this is not
correct since the first_sg could point to the sge sitting at slot
MAX_SKB_FRAGS - 1, and a subsequent i = MAX_SKB_FRAGS will access
the scatterlist ring (sg) out of bounds. Add the sk_msg_iter_var()
helper for iterating through the ring, and apply the same rule
for advancing to the next ring element as we do elsewhere. Later
work will use this helper also in other places.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/filter.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 43ba5f8..2c7801f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2282,6 +2282,13 @@ static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
.arg2_type = ARG_ANYTHING,
};
+#define sk_msg_iter_var(var) \
+ do { \
+ var++; \
+ if (var == MAX_SKB_FRAGS) \
+ var = 0; \
+ } while (0)
+
BPF_CALL_4(bpf_msg_pull_data,
struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
{
@@ -2302,9 +2309,7 @@ BPF_CALL_4(bpf_msg_pull_data,
if (start < offset + len)
break;
offset += len;
- i++;
- if (i == MAX_SKB_FRAGS)
- i = 0;
+ sk_msg_iter_var(i);
} while (i != msg->sg_end);
if (unlikely(start >= offset + len))
@@ -2330,9 +2335,7 @@ BPF_CALL_4(bpf_msg_pull_data,
*/
do {
copy += sg[i].length;
- i++;
- if (i == MAX_SKB_FRAGS)
- i = 0;
+ sk_msg_iter_var(i);
if (bytes_sg_total <= copy)
break;
} while (i != msg->sg_end);
@@ -2358,9 +2361,7 @@ BPF_CALL_4(bpf_msg_pull_data,
sg[i].length = 0;
put_page(sg_page(&sg[i]));
- i++;
- if (i == MAX_SKB_FRAGS)
- i = 0;
+ sk_msg_iter_var(i);
} while (i != last_sg);
sg[first_sg].length = copy;
@@ -2377,7 +2378,8 @@ BPF_CALL_4(bpf_msg_pull_data,
if (!shift)
goto out;
- i = first_sg + 1;
+ i = first_sg;
+ sk_msg_iter_var(i);
do {
int move_from;
@@ -2394,9 +2396,7 @@ BPF_CALL_4(bpf_msg_pull_data,
sg[move_from].page_link = 0;
sg[move_from].offset = 0;
- i++;
- if (i == MAX_SKB_FRAGS)
- i = 0;
+ sk_msg_iter_var(i);
} while (1);
msg->sg_end -= shift;
if (msg->sg_end < 0)
--
2.9.5
^ permalink raw reply related
* [PATCH bpf 2/3] bpf: fix shift upon scatterlist ring wrap-around in bpf_msg_pull_data
From: Daniel Borkmann @ 2018-08-29 14:50 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
In-Reply-To: <20180829145036.5514-1-daniel@iogearbox.net>
If first_sg and last_sg wraps around in the scatterlist ring, then we
need to account for that in the shift as well. E.g. crafting such msgs
where this is the case leads to a hang as shift becomes negative. E.g.
consider the following scenario:
first_sg := 14 |=> shift := -12 msg->sg_start := 10
last_sg := 3 | msg->sg_end := 5
round 1: i := 15, move_from := 3, sg[15] := sg[ 3]
round 2: i := 0, move_from := -12, sg[ 0] := sg[-12]
round 3: i := 1, move_from := -11, sg[ 1] := sg[-11]
round 4: i := 2, move_from := -10, sg[ 2] := sg[-10]
[...]
round 13: i := 11, move_from := -1, sg[ 2] := sg[ -1]
round 14: i := 12, move_from := 0, sg[ 2] := sg[ 0]
round 15: i := 13, move_from := 1, sg[ 2] := sg[ 1]
round 16: i := 14, move_from := 2, sg[ 2] := sg[ 2]
round 17: i := 15, move_from := 3, sg[ 2] := sg[ 3]
[...]
This means we will loop forever and never hit the msg->sg_end condition
to break out of the loop. When we see that the ring wraps around, then
the shift should be MAX_SKB_FRAGS - first_sg + last_sg - 1. Meaning,
the remainder slots from the tail of the ring and the head until last_sg
combined.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/filter.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index b9225c5..43ba5f8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2370,7 +2370,10 @@ BPF_CALL_4(bpf_msg_pull_data,
* had a single entry though we can just replace it and
* be done. Otherwise walk the ring and shift the entries.
*/
- shift = last_sg - first_sg - 1;
+ WARN_ON_ONCE(last_sg == first_sg);
+ shift = last_sg > first_sg ?
+ last_sg - first_sg - 1 :
+ MAX_SKB_FRAGS - first_sg + last_sg - 1;
if (!shift)
goto out;
--
2.9.5
^ permalink raw reply related
* [PATCH bpf 1/3] bpf: fix msg->data/data_end after sg shift repair in bpf_msg_pull_data
From: Daniel Borkmann @ 2018-08-29 14:50 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
In-Reply-To: <20180829145036.5514-1-daniel@iogearbox.net>
In the current code, msg->data is set as sg_virt(&sg[i]) + start - offset
and msg->data_end relative to it as msg->data + bytes. Using iterator i
to point to the updated starting scatterlist element holds true for some
cases, however not for all where we'd end up pointing out of bounds. It
is /correct/ for these ones:
1) When first finding the starting scatterlist element (sge) where we
find that the page is already privately owned by the msg and where
the requested bytes and headroom fit into the sge's length.
However, it's /incorrect/ for the following ones:
2) After we made the requested area private and updated the newly allocated
page into first_sg slot of the scatterlist ring; when we find that no
shift repair of the ring is needed where we bail out updating msg->data
and msg->data_end. At that point i will point to last_sg, which in this
case is the next elem of first_sg in the ring. The sge at that point
might as well be invalid (e.g. i == msg->sg_end), which we use for
setting the range of sg_virt(&sg[i]). The correct one would have been
first_sg.
3) Similar as in 2) but when we find that a shift repair of the ring is
needed. In this case we fix up all sges and stop once we've reached the
end. In this case i will point to will point to the new msg->sg_end,
and the sge at that point will be invalid. Again here the requested
range sits in first_sg.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/filter.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index ec4d67c..b9225c5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2310,6 +2310,7 @@ BPF_CALL_4(bpf_msg_pull_data,
if (unlikely(start >= offset + len))
return -EINVAL;
+ first_sg = i;
/* The start may point into the sg element so we need to also
* account for the headroom.
*/
@@ -2317,8 +2318,6 @@ BPF_CALL_4(bpf_msg_pull_data,
if (!msg->sg_copy[i] && bytes_sg_total <= len)
goto out;
- first_sg = i;
-
/* At this point we need to linearize multiple scatterlist
* elements or a single shared page. Either way we need to
* copy into a linear buffer exclusively owned by BPF. Then
@@ -2400,7 +2399,7 @@ BPF_CALL_4(bpf_msg_pull_data,
if (msg->sg_end < 0)
msg->sg_end += MAX_SKB_FRAGS;
out:
- msg->data = sg_virt(&sg[i]) + start - offset;
+ msg->data = sg_virt(&sg[first_sg]) + start - offset;
msg->data_end = msg->data + bytes;
return 0;
--
2.9.5
^ permalink raw reply related
* [PATCH bpf 0/3] Three fixes for bpf_msg_pull_data
From: Daniel Borkmann @ 2018-08-29 14:50 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
This set contains three more fixes for the bpf_msg_pull_data()
mainly for correcting scatterlist ring wrap-arounds as well as
fixing up data pointers. For details please see individual patches.
Thanks!
Daniel Borkmann (3):
bpf: fix msg->data/data_end after sg shift repair in bpf_msg_pull_data
bpf: fix shift upon scatterlist ring wrap-around in bpf_msg_pull_data
bpf: fix sg shift repair start offset in bpf_msg_pull_data
net/core/filter.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
--
2.9.5
^ permalink raw reply
* [PATCH] ieee802154: mcr20a: read out of bounds in mcr20a_set_channel()
From: Dan Carpenter @ 2018-08-29 14:49 UTC (permalink / raw)
To: Xue Liu
Cc: Alexander Aring, Stefan Schmidt, David S. Miller, linux-wpan,
netdev, kernel-janitors
The "channel" variable can be any u8 value. We need to make sure we
don't read outside of the PLL_INT[] or PLL_FRAC[] arrays.
Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
This patch is obviously harmless, but it's from static analysis. I'm
pretty sure this is required, but I can't swear.
diff --git a/drivers/net/ieee802154/mcr20a.c b/drivers/net/ieee802154/mcr20a.c
index e428277781ac..4f41d1d3588e 100644
--- a/drivers/net/ieee802154/mcr20a.c
+++ b/drivers/net/ieee802154/mcr20a.c
@@ -512,6 +512,9 @@ mcr20a_set_channel(struct ieee802154_hw *hw, u8 page, u8 channel)
dev_dbg(printdev(lp), "%s\n", __func__);
+ if (channel < 11 || channel - 11 >= ARRAY_SIZE(PLL_INT))
+ return -EINVAL;
+
/* freqency = ((PLL_INT+64) + (PLL_FRAC/65536)) * 32 MHz */
ret = regmap_write(lp->regmap_dar, DAR_PLL_INT0, PLL_INT[channel - 11]);
if (ret)
^ permalink raw reply related
* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-08-29 18:13 UTC (permalink / raw)
To: Kirill Tkhai
Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
nicolas.dichtel
In-Reply-To: <adc2fae5-e22b-3a74-d531-01570e7970ee@virtuozzo.com>
Hi Kirill,
Thanks for the question!
On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> Hi, Christian,
>
> On 29.08.2018 02:18, Christian Brauner wrote:
> > From: Christian Brauner <christian@brauner.io>
> >
> > Hey,
> >
> > A while back we introduced and enabled IFLA_IF_NETNSID in
> > RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> > to signficant performance increases since it allows userspace to avoid
> > taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> > interfaces from the netns associated with the netns_fd. Especially when a
> > lot of network namespaces are in use, using setns() becomes increasingly
> > problematic when performance matters.
>
> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> problems with the performance? You should do this only once on application
> startup, and then you have created netlink sockets in any net namespaces you
> need. What is the problem here?
So we have a daemon (LXD) that is often running thousands of containers.
When users issue a lxc list request against the daemon it returns a list
of all containers including all of the interfaces and addresses for each
container. To retrieve those addresses we currently rely on setns() +
getifaddrs() for each of those containers. That has horrible
performance.
The problem with what you're proposing is that the daemon would need to
cache a socket file descriptor for each container which is something
that we unfortunately cannot do since we can't excessively cache file
descriptors because we can easily hit the open file limit. We also
refrain from caching file descriptors for a long time for security
reasons.
For the case where users just request a list of the interfaces we
can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
performance. But we can't do the same with RTM_GETADDR requests which
was an oversight on my part when I wrote the original patchset for the
RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
RTM_GETADDR.
Based on this patchset I have written a userspace POC that is basically
a netns namespace aware getifaddr() or - as I like to call it -
netns_getifaddr().
>
> > Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> > getifaddrs() style functions and friends). But currently, RTM_GETADDR
> > requests do not support a similar property like IFLA_IF_NETNSID for
> > RTM_*LINK requests.
> > This is problematic since userspace can retrieve interfaces from another
> > network namespace by sending a IFLA_IF_NETNSID property along but
> > RTM_GETLINK request but is still forced to use the legacy setns() style of
> > retrieving interfaces in RTM_GETADDR requests.
> >
> > The goal of this series is to make it possible to perform RTM_GETADDR
> > requests on different network namespaces. To this end a new IFA_IF_NETNSID
> > property for RTM_*ADDR requests is introduced. It can be used to send a
> > network namespace identifier along in RTM_*ADDR requests. The network
> > namespace identifier will be used to retrieve the target network namespace
> > in which the request is supposed to be fulfilled. This aligns the behavior
> > of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> >
> > Security:
> > - The caller must have assigned a valid network namespace identifier for
> > the target network namespace.
> > - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> > target network namespace.
> >
> > Thanks!
> > Christian
> >
> > [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> > [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> > [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> > [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> > [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> >
> > Christian Brauner (5):
> > rtnetlink: add rtnl_get_net_ns_capable()
> > if_addr: add IFA_IF_NETNSID
> > ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> > ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> > rtnetlink: move type calculation out of loop
> >
> > include/net/rtnetlink.h | 1 +
> > include/uapi/linux/if_addr.h | 1 +
> > net/core/rtnetlink.c | 15 +++++---
> > net/ipv4/devinet.c | 38 +++++++++++++++-----
> > net/ipv6/addrconf.c | 70 ++++++++++++++++++++++++++++--------
> > 5 files changed, 97 insertions(+), 28 deletions(-)
> >
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox