* Re: [danielwa@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
From: Daniel Walker @ 2018-10-18 17:11 UTC (permalink / raw)
To: Claudiu Manoil; +Cc: Hemant Ramdasi, netdev@vger.kernel.org
In-Reply-To: <HE1PR04MB1145DAA9F214275D6489B06B96F80@HE1PR04MB1145.eurprd04.prod.outlook.com>
On Thu, Oct 18, 2018 at 04:49:26PM +0000, Claudiu Manoil wrote:
> I can only advise you to check whether the MACCFG2 register settings are consistent
> at this point, when ping fails. You should check the I/F Mode bits (22-23) and the
> Full Duplex bit (31), in big-endian format. If these do not match the 100Mbps full
> duplex link mode, then it might be that another thread (probably doing reset_gfar)
> changes MACCFG2 concurrently. I think MACCFG2 may be dumped with ethtool -d.
> I can get my hands on a board no sooner than maybe next week.
A board won't help you .. I'm running on customer hardware which you don't have
access to.
After boot up you have MACCFG2 = 0x7205 which is the same
as the INIT settings.
After the interface is brought up adjust_link() changes to MACCFG2 = 0x7105
which I think is MII.
0x7105 stays after the interface is brought down until gfar_mac_reset sets it to
0x7205 (GMII) .. then adjust link resets it to 0x7105 (MII) ..
That goes on and on each time to interface is brought down/up. It seems like
this is what your expecting to happen, but it doesn't seems to work %100 of the
time.
Daniel
^ permalink raw reply
* Re: [net-next 01/11] igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller support
From: Jakub Kicinski @ 2018-10-18 17:14 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, Sasha Neftin, netdev, nhorman, sassmann
In-Reply-To: <20181017222322.2171-2-jeffrey.t.kirsher@intel.com>
On Wed, 17 Oct 2018 15:23:12 -0700, Jeff Kirsher wrote:
> From: Sasha Neftin <sasha.neftin@intel.com>
>
> This patch adds the beginning framework onto which I am going to add
> the igc driver which supports the Intel(R) I225-LM/I225-V 2.5G
> Ethernet Controller.
>
> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
bunch of minor nit picks
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> new file mode 100644
> index 000000000000..afe595cfcf63
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2018 Intel Corporation */
> +
> +#ifndef _IGC_H_
> +#define _IGC_H_
> +
> +#include <linux/kobject.h>
> +
> +#include <linux/pci.h>
> +#include <linux/netdevice.h>
> +#include <linux/vmalloc.h>
> +
> +#include <linux/ethtool.h>
> +
> +#include <linux/sctp.h>
> +
> +#define IGC_ERR(args...) pr_err("igc: " args)
Looks like you're reimplementing pr_fmt()
> +#define PFX "igc: "
> +
> +#include <linux/timecounter.h>
> +#include <linux/net_tstamp.h>
> +#include <linux/ptp_clock_kernel.h>
Splitting the includes looks fairly weird. Also, you're not using any
of them here.
> +/* main */
> +extern char igc_driver_name[];
> +extern char igc_driver_version[];
> +
> +#endif /* _IGC_H_ */
> diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
> new file mode 100644
> index 000000000000..aa68b4516700
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/igc/igc_hw.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2018 Intel Corporation */
> +
> +#ifndef _IGC_HW_H_
> +#define _IGC_HW_H_
> +
> +#define IGC_DEV_ID_I225_LM 0x15F2
> +#define IGC_DEV_ID_I225_V 0x15F3
> +
> +#endif /* _IGC_HW_H_ */
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> new file mode 100644
> index 000000000000..753749ce5ae0
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Intel Corporation */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +
> +#include "igc.h"
> +#include "igc_hw.h"
> +
> +#define DRV_VERSION "0.0.1-k"
You can just use the kernel version, it works pretty well in presence
of backports.
> +#define DRV_SUMMARY "Intel(R) 2.5G Ethernet Linux Driver"
> +
> +MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
> +MODULE_DESCRIPTION(DRV_SUMMARY);
> +MODULE_LICENSE("GPL v2");
> +MODULE_VERSION(DRV_VERSION);
> +
> +char igc_driver_name[] = "igc";
> +char igc_driver_version[] = DRV_VERSION;
> +static const char igc_driver_string[] = DRV_SUMMARY;
> +static const char igc_copyright[] =
> + "Copyright(c) 2018 Intel Corporation.";
> +
> +static const struct pci_device_id igc_pci_tbl[] = {
> + { PCI_VDEVICE(INTEL, IGC_DEV_ID_I225_LM) },
> + { PCI_VDEVICE(INTEL, IGC_DEV_ID_I225_V) },
> + /* required last entry */
> + {0, }
> +};
> +
> +MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
> +
> +/**
> + * igc_probe - Device Initialization Routine
> + * @pdev: PCI device information struct
> + * @ent: entry in igc_pci_tbl
> + *
> + * Returns 0 on success, negative on failure
> + *
> + * igc_probe initializes an adapter identified by a pci_dev structure.
> + * The OS initialization, configuring the adapter private structure,
> + * and a hardware reset occur.
> + */
> +static int igc_probe(struct pci_dev *pdev,
> + const struct pci_device_id *ent)
> +{
> + int err, pci_using_dac;
> +
> + err = pci_enable_device_mem(pdev);
> + if (err)
> + return err;
> +
> + pci_using_dac = 0;
> + err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
> + if (!err) {
> + err = dma_set_coherent_mask(&pdev->dev,
> + DMA_BIT_MASK(64));
> + if (!err)
> + pci_using_dac = 1;
You never use this pci_using_dac. dma_set_mask_and_coherent() maybe?
> + } else {
> + err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
> + if (err) {
> + err = dma_set_coherent_mask(&pdev->dev,
> + DMA_BIT_MASK(32));
> + if (err) {
> + IGC_ERR("Wrong DMA configuration, aborting\n");
> + goto err_dma;
> + }
> + }
> + }
> +
> + err = pci_request_selected_regions(pdev,
> + pci_select_bars(pdev,
> + IORESOURCE_MEM),
> + igc_driver_name);
> + if (err)
> + goto err_pci_reg;
> +
> + pci_set_master(pdev);
> + err = pci_save_state(pdev);
And you never look at err?
> + return 0;
> +
> +err_pci_reg:
> +err_dma:
The error label should be named after what it points to, not where it's
coming from.
> + pci_disable_device(pdev);
> + return err;
> +}
> +
> +/**
> + * igc_remove - Device Removal Routine
> + * @pdev: PCI device information struct
> + *
> + * igc_remove is called by the PCI subsystem to alert the driver
> + * that it should release a PCI device. This could be caused by a
> + * Hot-Plug event, or because the driver is going to be removed from
> + * memory.
> + */
> +static void igc_remove(struct pci_dev *pdev)
> +{
> + pci_release_selected_regions(pdev,
> + pci_select_bars(pdev, IORESOURCE_MEM));
> +
> + pci_disable_device(pdev);
> +}
> +
> +static struct pci_driver igc_driver = {
> + .name = igc_driver_name,
> + .id_table = igc_pci_tbl,
> + .probe = igc_probe,
> + .remove = igc_remove,
> +};
> +
> +/**
> + * igc_init_module - Driver Registration Routine
> + *
> + * igc_init_module is the first routine called when the driver is
> + * loaded. All it does is register with the PCI subsystem.
> + */
> +static int __init igc_init_module(void)
> +{
> + int ret;
> +
> + pr_info("%s - version %s\n",
> + igc_driver_string, igc_driver_version);
> +
> + pr_info("%s\n", igc_copyright);
> +
> + ret = pci_register_driver(&igc_driver);
> + return ret;
Why the variable?
> +}
> +
> +module_init(igc_init_module);
> +
> +/**
> + * igc_exit_module - Driver Exit Cleanup Routine
> + *
> + * igc_exit_module is called just before the driver is removed
> + * from memory.
> + */
> +static void __exit igc_exit_module(void)
> +{
> + pci_unregister_driver(&igc_driver);
> +}
> +
> +module_exit(igc_exit_module);
> +/* igc_main.c */
I'd argue most editors make it fairly clear which file one is
editing, hence making this sort of comments entirely superfluous :)
^ permalink raw reply
* Re: [net-next 03/11] igc: Add netdev
From: Jakub Kicinski @ 2018-10-18 17:15 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, Sasha Neftin, netdev, nhorman, sassmann
In-Reply-To: <20181017222322.2171-4-jeffrey.t.kirsher@intel.com>
On Wed, 17 Oct 2018 15:23:14 -0700, Jeff Kirsher wrote:
> +/**
> + * igc_ioctl - I/O control method
> + * @netdev: network interface device structure
> + * @ifreq: frequency
Is it? :)
> + * @cmd: command
> + */
> +static int igc_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
> +{
> + switch (cmd) {
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
You don't seem to be adding anything to this function in the series.
Why add the stub?
^ permalink raw reply
* Re: [pull request][net-next 00/14] Mellanox, mlx5 updates
From: David Miller @ 2018-10-18 17:26 UTC (permalink / raw)
To: saeedm; +Cc: netdev
In-Reply-To: <20181018000859.16212-1-saeedm@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Wed, 17 Oct 2018 17:08:45 -0700
> This series from Paul, Or and Mark provides the support for
> e-switch tc offloading of multiple priorities and chains, the series
> is based on a merge commit with mlx5-next branch for patches that are
> already submitted and reviewed through the netdev and rdma mailing
> lists.
>
> For more information please see tag log below.
>
> Please pull and let me know if there is any problem.
Pulled, thanks Saeed.
^ permalink raw reply
* Re: [net-next 00/11][pull request] 1GbE Intel Wired LAN Driver Updates 2018-10-17
From: David Miller @ 2018-10-18 17:27 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, sasha.neftin
In-Reply-To: <20181017222322.2171-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 17 Oct 2018 15:23:11 -0700
> This series adds support for the new igc driver.
>
> The igc driver is the new client driver supporting the Intel I225
> Ethernet Controller, which supports 2.5GbE speeds. The reason for
> creating a new client driver, instead of adding support for the new
> device in e1000e, is that the silicon behaves more like devices
> supported in igb driver. It also did not make sense to add a client
> part, to the igb driver which supports only 1GbE server parts.
>
> This initial set of patches is designed for basic support (i.e. link and
> pass traffic). Follow-on patch series will add more advanced support
> like VLAN, Wake-on-LAN, etc..
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH bpf-next v3 7/7] selftests/bpf: add test cases for queue and stack maps
From: Mauricio Vasquez @ 2018-10-18 17:33 UTC (permalink / raw)
To: Song Liu; +Cc: Alexei Starovoitov, Daniel Borkmann, Networking
In-Reply-To: <CAPhsuW4dPb8goUE46G2P2f=kXdFwgfRQzPuOjtkO8qB2SErOhw@mail.gmail.com>
On 10/18/18 11:36 AM, Song Liu wrote:
> On Thu, Oct 18, 2018 at 6:16 AM Mauricio Vasquez B
> <mauricio.vasquez@polito.it> wrote:
>> test_maps:
>> Tests that queue/stack maps are behaving correctly even in corner cases
>>
>> test_progs:
>> Tests new ebpf helpers
>>
>> Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
>> ---
>> tools/lib/bpf/bpf.c | 12 ++
>> tools/lib/bpf/bpf.h | 2
>> tools/testing/selftests/bpf/Makefile | 5 +
>> tools/testing/selftests/bpf/bpf_helpers.h | 7 +
>> tools/testing/selftests/bpf/test_maps.c | 122 ++++++++++++++++++++
>> tools/testing/selftests/bpf/test_progs.c | 99 ++++++++++++++++
>> tools/testing/selftests/bpf/test_queue_map.c | 4 +
>> tools/testing/selftests/bpf/test_queue_stack_map.h | 59 ++++++++++
>> tools/testing/selftests/bpf/test_stack_map.c | 4 +
>> 9 files changed, 313 insertions(+), 1 deletion(-)
>> create mode 100644 tools/testing/selftests/bpf/test_queue_map.c
>> create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h
>> create mode 100644 tools/testing/selftests/bpf/test_stack_map.c
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index d70a255cb05e..03f9bcc4ef50 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -278,6 +278,18 @@ int bpf_map_lookup_elem(int fd, const void *key, void *value)
>> return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
>> }
>>
>> +int bpf_map_lookup_and_delete_elem(int fd, const void *key, void *value)
>> +{
>> + union bpf_attr attr;
>> +
>> + bzero(&attr, sizeof(attr));
>> + attr.map_fd = fd;
>> + attr.key = ptr_to_u64(key);
>> + attr.value = ptr_to_u64(value);
>> +
>> + return sys_bpf(BPF_MAP_LOOKUP_AND_DELETE_ELEM, &attr, sizeof(attr));
>> +}
>> +
>> int bpf_map_delete_elem(int fd, const void *key)
>> {
>> union bpf_attr attr;
>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index 258c3c178333..26a51538213c 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -99,6 +99,8 @@ LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
>> __u64 flags);
>>
>> LIBBPF_API int bpf_map_lookup_elem(int fd, const void *key, void *value);
>> +LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
>> + void *value);
>> LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
>> LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
>> LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>> index d99dd6fc3fbe..e39dfb4e7970 100644
>> --- a/tools/testing/selftests/bpf/Makefile
>> +++ b/tools/testing/selftests/bpf/Makefile
>> @@ -37,7 +37,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
>> test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
>> get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
>> test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
>> - test_sk_lookup_kern.o test_xdp_vlan.o
>> + test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
>>
>> # Order correspond to 'make run_tests' order
>> TEST_PROGS := test_kmod.sh \
>> @@ -118,6 +118,9 @@ CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
>> $(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline
>> $(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
>>
>> +$(OUTPUT)/test_queue_map.o: test_queue_stack_map.h
>> +$(OUTPUT)/test_stack_map.o: test_queue_stack_map.h
> This looks weird. You meant the .c files, right?
Queue and stack tests are very similar, in order to avoid a lot of code
duplication I created a single test_queue_stack_map.h file where all the
logic is placed.
There are two .c files (test_queue_map.c and test_stack_map.c) they
define the map type and include test_queue_stack_map.h.
test_queue_map.o and test_stack_map.o create an implicit dependency on
test_queue_map.c and test_stack_map.c, but the dependency on the header
is also needed, so I added those two lines.
>
>> +
>> BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
>> BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
>> BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
>> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
>> index fda8c162d0df..6407a3df0f3b 100644
>> --- a/tools/testing/selftests/bpf/bpf_helpers.h
>> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
>> @@ -16,6 +16,13 @@ static int (*bpf_map_update_elem)(void *map, void *key, void *value,
>> (void *) BPF_FUNC_map_update_elem;
>> static int (*bpf_map_delete_elem)(void *map, void *key) =
>> (void *) BPF_FUNC_map_delete_elem;
>> +static int (*bpf_map_push_elem)(void *map, void *value,
>> + unsigned long long flags) =
>> + (void *) BPF_FUNC_map_push_elem;
>> +static int (*bpf_map_pop_elem)(void *map, void *value) =
>> + (void *) BPF_FUNC_map_pop_elem;
>> +static int (*bpf_map_peek_elem)(void *map, void *value) =
>> + (void *) BPF_FUNC_map_peek_elem;
>> static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) =
>> (void *) BPF_FUNC_probe_read;
>> static unsigned long long (*bpf_ktime_get_ns)(void) =
>> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
>> index 9b552c0fc47d..4db2116e52be 100644
>> --- a/tools/testing/selftests/bpf/test_maps.c
>> +++ b/tools/testing/selftests/bpf/test_maps.c
>> @@ -15,6 +15,7 @@
>> #include <string.h>
>> #include <assert.h>
>> #include <stdlib.h>
>> +#include <time.h>
>>
>> #include <sys/wait.h>
>> #include <sys/socket.h>
>> @@ -471,6 +472,122 @@ static void test_devmap(int task, void *data)
>> close(fd);
>> }
>>
>> +static void test_queuemap(int task, void *data)
>> +{
>> + const int MAP_SIZE = 32;
>> + __u32 vals[MAP_SIZE + MAP_SIZE/2], val;
>> + int fd, i;
>> +
>> + /* Fill test values to be used */
>> + for (i = 0; i < MAP_SIZE + MAP_SIZE/2; i++)
>> + vals[i] = rand();
>> +
>> + /* Invalid key size */
>> + fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 4, sizeof(val), MAP_SIZE,
>> + map_flags);
>> + assert(fd < 0 && errno == EINVAL);
>> +
>> + fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 0, sizeof(val), MAP_SIZE,
>> + map_flags);
>> + /* Queue map does not support BPF_F_NO_PREALLOC */
>> + if (map_flags & BPF_F_NO_PREALLOC) {
>> + assert(fd < 0 && errno == EINVAL);
>> + return;
>> + }
>> + if (fd < 0) {
>> + printf("Failed to create queuemap '%s'!\n", strerror(errno));
>> + exit(1);
>> + }
>> +
>> + /* Push MAP_SIZE elements */
>> + for (i = 0; i < MAP_SIZE; i++)
>> + assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);
>> +
>> + /* Check that element cannot be pushed due to max_entries limit */
>> + assert(bpf_map_update_elem(fd, NULL, &val, 0) == -1 &&
>> + errno == E2BIG);
>> +
>> + /* Peek element */
>> + assert(bpf_map_lookup_elem(fd, NULL, &val) == 0 && val == vals[0]);
>> +
>> + /* Replace half elements */
>> + for (i = MAP_SIZE; i < MAP_SIZE + MAP_SIZE/2; i++)
>> + assert(bpf_map_update_elem(fd, NULL, &vals[i], BPF_EXIST) == 0);
>> +
>> + /* Pop all elements */
>> + for (i = MAP_SIZE/2; i < MAP_SIZE + MAP_SIZE/2; i++)
>> + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 &&
>> + val == vals[i]);
>> +
>> + /* Check that there are not elements left */
>> + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == -1 &&
>> + errno == ENOENT);
>> +
>> + /* Check that non supported functions set errno to EINVAL */
>> + assert(bpf_map_delete_elem(fd, NULL) == -1 && errno == EINVAL);
>> + assert(bpf_map_get_next_key(fd, NULL, NULL) == -1 && errno == EINVAL);
>> +
>> + close(fd);
>> +}
>> +
>> +static void test_stackmap(int task, void *data)
>> +{
>> + const int MAP_SIZE = 32;
>> + __u32 vals[MAP_SIZE + MAP_SIZE/2], val;
>> + int fd, i;
>> +
>> + /* Fill test values to be used */
>> + for (i = 0; i < MAP_SIZE + MAP_SIZE/2; i++)
>> + vals[i] = rand();
>> +
>> + /* Invalid key size */
>> + fd = bpf_create_map(BPF_MAP_TYPE_STACK, 4, sizeof(val), MAP_SIZE,
>> + map_flags);
>> + assert(fd < 0 && errno == EINVAL);
>> +
>> + fd = bpf_create_map(BPF_MAP_TYPE_STACK, 0, sizeof(val), MAP_SIZE,
>> + map_flags);
>> + /* Stack map does not support BPF_F_NO_PREALLOC */
>> + if (map_flags & BPF_F_NO_PREALLOC) {
>> + assert(fd < 0 && errno == EINVAL);
>> + return;
>> + }
>> + if (fd < 0) {
>> + printf("Failed to create stackmap '%s'!\n", strerror(errno));
>> + exit(1);
>> + }
>> +
>> + /* Push MAP_SIZE elements */
>> + for (i = 0; i < MAP_SIZE; i++)
>> + assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);
>> +
>> + /* Check that element cannot be pushed due to max_entries limit */
>> + assert(bpf_map_update_elem(fd, NULL, &val, 0) == -1 &&
>> + errno == E2BIG);
>> +
>> + /* Peek element */
>> + assert(bpf_map_lookup_elem(fd, NULL, &val) == 0 && val == vals[i - 1]);
>> +
>> + /* Replace half elements */
>> + for (i = MAP_SIZE; i < MAP_SIZE + MAP_SIZE/2; i++)
>> + assert(bpf_map_update_elem(fd, NULL, &vals[i], BPF_EXIST) == 0);
>> +
>> + /* Pop all elements */
>> + for (i = MAP_SIZE + MAP_SIZE/2 - 1; i >= MAP_SIZE/2; i--)
>> + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 &&
>> + val == vals[i]);
>> +
>> + /* Check that there are not elements left */
>> + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == -1 &&
>> + errno == ENOENT);
>> +
>> + /* Check that non supported functions set errno to EINVAL */
>> + assert(bpf_map_delete_elem(fd, NULL) == -1 && errno == EINVAL);
>> + assert(bpf_map_get_next_key(fd, NULL, NULL) == -1 && errno == EINVAL);
>> +
>> + close(fd);
>> +}
>> +
>> #include <sys/socket.h>
>> #include <sys/ioctl.h>
>> #include <arpa/inet.h>
>> @@ -1434,10 +1551,15 @@ static void run_all_tests(void)
>> test_map_wronly();
>>
>> test_reuseport_array();
>> +
>> + test_queuemap(0, NULL);
>> + test_stackmap(0, NULL);
>> }
>>
>> int main(void)
>> {
>> + srand(time(NULL));
>> +
>> map_flags = 0;
>> run_all_tests();
>>
>> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
>> index e8becca9c521..2d3c04f45530 100644
>> --- a/tools/testing/selftests/bpf/test_progs.c
>> +++ b/tools/testing/selftests/bpf/test_progs.c
>> @@ -1735,8 +1735,105 @@ static void test_reference_tracking()
>> bpf_object__close(obj);
>> }
>>
>> +enum {
>> + QUEUE,
>> + STACK,
>> +};
>> +
>> +static void test_queue_stack_map(int type)
>> +{
>> + const int MAP_SIZE = 32;
>> + __u32 vals[MAP_SIZE], duration, retval, size, val;
>> + int i, err, prog_fd, map_in_fd, map_out_fd;
>> + char file[32], buf[128];
>> + struct bpf_object *obj;
>> + struct iphdr *iph = (void *)buf + sizeof(struct ethhdr);
>> +
>> + /* Fill test values to be used */
>> + for (i = 0; i < MAP_SIZE; i++)
>> + vals[i] = rand();
>> +
>> + if (type == QUEUE)
>> + strncpy(file, "./test_queue_map.o", sizeof(file));
>> + else if (type == STACK)
>> + strncpy(file, "./test_stack_map.o", sizeof(file));
>> + else
>> + return;
>> +
>> + err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
>> + if (err) {
>> + error_cnt++;
>> + return;
>> + }
>> +
>> + map_in_fd = bpf_find_map(__func__, obj, "map_in");
>> + if (map_in_fd < 0)
>> + goto out;
>> +
>> + map_out_fd = bpf_find_map(__func__, obj, "map_out");
>> + if (map_out_fd < 0)
>> + goto out;
>> +
>> + /* Push 32 elements to the input map */
>> + for (i = 0; i < MAP_SIZE; i++) {
>> + err = bpf_map_update_elem(map_in_fd, NULL, &vals[i], 0);
>> + if (err) {
>> + error_cnt++;
>> + goto out;
>> + }
>> + }
>> +
>> + /* The eBPF program pushes iph.saddr in the output map,
>> + * pops the input map and saves this value in iph.daddr
>> + */
>> + for (i = 0; i < MAP_SIZE; i++) {
>> + if (type == QUEUE) {
>> + val = vals[i];
>> + pkt_v4.iph.saddr = vals[i] * 5;
>> + } else if (type == STACK) {
>> + val = vals[MAP_SIZE - 1 - i];
>> + pkt_v4.iph.saddr = vals[MAP_SIZE - 1 - i] * 5;
>> + }
>> +
>> + err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
>> + buf, &size, &retval, &duration);
>> + if (err || retval || size != sizeof(pkt_v4) ||
>> + iph->daddr != val)
>> + break;
>> + }
>> +
>> + CHECK(err || retval || size != sizeof(pkt_v4) || iph->daddr != val,
>> + "bpf_map_pop_elem",
>> + "err %d errno %d retval %d size %d iph->daddr %u\n",
>> + err, errno, retval, size, iph->daddr);
>> +
>> + /* Queue is empty, program should return TC_ACT_SHOT */
>> + err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
>> + buf, &size, &retval, &duration);
>> + CHECK(err || retval != 2 /* TC_ACT_SHOT */|| size != sizeof(pkt_v4),
>> + "check-queue-stack-map-empty",
>> + "err %d errno %d retval %d size %d\n",
>> + err, errno, retval, size);
>> +
>> + /* Check that the program pushed elements correctly */
>> + for (i = 0; i < MAP_SIZE; i++) {
>> + err = bpf_map_lookup_and_delete_elem(map_out_fd, NULL, &val);
>> + if (err || val != vals[i] * 5)
>> + break;
>> + }
>> +
>> + CHECK(i != MAP_SIZE && (err || val != vals[i] * 5),
>> + "bpf_map_push_elem", "err %d value %u\n", err, val);
>> +
>> +out:
>> + pkt_v4.iph.saddr = 0;
>> + bpf_object__close(obj);
>> +}
>> +
>> int main(void)
>> {
>> + srand(time(NULL));
>> +
>> jit_enabled = is_jit_enabled();
>>
>> test_pkt_access();
>> @@ -1757,6 +1854,8 @@ int main(void)
>> test_task_fd_query_rawtp();
>> test_task_fd_query_tp();
>> test_reference_tracking();
>> + test_queue_stack_map(QUEUE);
>> + test_queue_stack_map(STACK);
>>
>> printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
>> return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
>> diff --git a/tools/testing/selftests/bpf/test_queue_map.c b/tools/testing/selftests/bpf/test_queue_map.c
>> new file mode 100644
>> index 000000000000..87db1f9da33d
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/test_queue_map.c
>> @@ -0,0 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2018 Politecnico di Torino
>> +#define MAP_TYPE BPF_MAP_TYPE_QUEUE
>> +#include "test_queue_stack_map.h"
>> diff --git a/tools/testing/selftests/bpf/test_queue_stack_map.h b/tools/testing/selftests/bpf/test_queue_stack_map.h
>> new file mode 100644
>> index 000000000000..295b9b3bc5c7
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/test_queue_stack_map.h
>> @@ -0,0 +1,59 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +// Copyright (c) 2018 Politecnico di Torino
>> +#include <stddef.h>
>> +#include <string.h>
>> +#include <linux/bpf.h>
>> +#include <linux/if_ether.h>
>> +#include <linux/ip.h>
>> +#include <linux/pkt_cls.h>
>> +#include "bpf_helpers.h"
>> +
>> +int _version SEC("version") = 1;
>> +
>> +struct bpf_map_def __attribute__ ((section("maps"), used)) map_in = {
>> + .type = MAP_TYPE,
>> + .key_size = 0,
>> + .value_size = sizeof(__u32),
>> + .max_entries = 32,
>> + .map_flags = 0,
>> +};
>> +
>> +struct bpf_map_def __attribute__ ((section("maps"), used)) map_out = {
>> + .type = MAP_TYPE,
>> + .key_size = 0,
>> + .value_size = sizeof(__u32),
>> + .max_entries = 32,
>> + .map_flags = 0,
>> +};
>> +
>> +SEC("test")
>> +int _test(struct __sk_buff *skb)
>> +{
>> + void *data_end = (void *)(long)skb->data_end;
>> + void *data = (void *)(long)skb->data;
>> + struct ethhdr *eth = (struct ethhdr *)(data);
>> + __u32 value;
>> + int err;
>> +
>> + if (eth + 1 > data_end)
>> + return TC_ACT_SHOT;
>> +
>> + struct iphdr *iph = (struct iphdr *)(eth + 1);
>> +
>> + if (iph + 1 > data_end)
>> + return TC_ACT_SHOT;
>> +
>> + err = bpf_map_pop_elem(&map_in, &value);
>> + if (err)
>> + return TC_ACT_SHOT;
>> +
>> + iph->daddr = value;
>> +
>> + err = bpf_map_push_elem(&map_out, &iph->saddr, 0);
>> + if (err)
>> + return TC_ACT_SHOT;
>> +
>> + return TC_ACT_OK;
>> +}
>> +
>> +char _license[] SEC("license") = "GPL";
>> diff --git a/tools/testing/selftests/bpf/test_stack_map.c b/tools/testing/selftests/bpf/test_stack_map.c
>> new file mode 100644
>> index 000000000000..31c3880e6da0
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/test_stack_map.c
>> @@ -0,0 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2018 Politecnico di Torino
>> +#define MAP_TYPE BPF_MAP_TYPE_STACK
>> +#include "test_queue_stack_map.h"
>>
^ permalink raw reply
* Re: [bpf-next v2 1/2] bpf: skmsg, fix psock create on existing kcm/tls port
From: Eric Dumazet @ 2018-10-18 17:34 UTC (permalink / raw)
To: John Fastabend, ast, daniel; +Cc: netdev
In-Reply-To: <1539840004-26433-2-git-send-email-john.fastabend@gmail.com>
On 10/17/2018 10:20 PM, John Fastabend wrote:
> Before using the psock returned by sk_psock_get() when adding it to a
> sockmap we need to ensure it is actually a sockmap based psock.
> Previously we were only checking this after incrementing the reference
> counter which was an error. This resulted in a slab-out-of-bounds
> error when the psock was not actually a sockmap type.
>
> This moves the check up so the reference counter is only used
> if it is a sockmap psock.
>
> Eric reported the following KASAN BUG,
>
> BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
> BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
> Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387
>
> CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
> print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
> kasan_report_error mm/kasan/report.c:354 [inline]
> kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
> check_memory_region_inline mm/kasan/kasan.c:260 [inline]
> check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
> kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
> atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
> refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
> sk_psock_get include/linux/skmsg.h:379 [inline]
> sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178
> sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669
> sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738
> map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> ---
> include/linux/skmsg.h | 25 ++++++++++++++++++++-----
> net/core/sock_map.c | 11 ++++++-----
> 2 files changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 677b673..f44ca6b 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -275,11 +275,6 @@ static inline struct sk_psock *sk_psock(const struct sock *sk)
> return rcu_dereference_sk_user_data(sk);
> }
>
> -static inline bool sk_has_psock(struct sock *sk)
> -{
> - return sk_psock(sk) != NULL && sk->sk_prot->recvmsg == tcp_bpf_recvmsg;
> -}
> -
> static inline void sk_psock_queue_msg(struct sk_psock *psock,
> struct sk_msg *msg)
> {
> @@ -379,6 +374,26 @@ static inline bool sk_psock_test_state(const struct sk_psock *psock,
> return test_bit(bit, &psock->state);
> }
>
> +static inline struct sk_psock *sk_psock_get_checked(struct sock *sk)
> +{
> + struct sk_psock *psock;
> +
> + rcu_read_lock();
> + psock = sk_psock(sk);
> + if (psock) {
> + if (sk->sk_prot->recvmsg != tcp_bpf_recvmsg) {
> + psock = ERR_PTR(-EBUSY);
> + goto out;
> + }
> +
> + if (!refcount_inc_not_zero(&psock->refcnt))
> + psock = NULL;
Caller is using IS_ERR(), so you probably want to :
psock = ERR_PTR(-E<something>);
> + }
> +out:
> + rcu_read_unlock();
> + return psock;
> +}
> +
> static inline struct sk_psock *sk_psock_get(struct sock *sk)
> {
> struct sk_psock *psock;
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 3c0e44c..be6092a 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -175,12 +175,13 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
> }
> }
>
> - psock = sk_psock_get(sk);
> + psock = sk_psock_get_checked(sk);
> + if (IS_ERR(psock)) {
> + ret = PTR_ERR(psock);
> + goto out_progs;
> + }
> +
> if (psock) {
> - if (!sk_has_psock(sk)) {
> - ret = -EBUSY;
> - goto out_progs;
> - }
> if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) ||
> (skb_progs && READ_ONCE(psock->progs.skb_parser))) {
> sk_psock_put(sk, psock);
>
^ permalink raw reply
* Images 34
From: Nancy @ 2018-10-18 10:00 UTC (permalink / raw)
To: netdev
We are an image team who can process 400+ images each day.
If you need any image editing service, please let us know.
Image cut out and clipping path, masking.
Such as for ecommerce photos, jewelry photos retouching, beauty and skin
images
and wedding photos.
We give test editing for your photos if you send us some.
Thanks,
Nancy
^ permalink raw reply
* RE: [PATCH net-next] netpoll: allow cleanup to be synchronous
From: Banerjee, Debabrata @ 2018-10-18 17:47 UTC (permalink / raw)
To: 'Neil Horman'; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <20181018165935.GC2601@hmswarspite.think-freely.org>
> From: Neil Horman <nhorman@tuxdriver.com>
> The team driver still seems
> to be an outlier there though I think , in that it doesn't guarantee the holding
> of rtnl in its port add/delete paths.
Not seeing where this is the case in the team driver today? They were actually
calling __netpoll_cleanup synchronously already.
> It might be worth cleaning that up and simply replacing all the calls to
> __netpoll_free_async with direct calls to __netpoll_cleanup. That would let
> you eliminate the former function alltogether.
I believe v2 accomplishes that, please review.
-Deb
^ permalink raw reply
* [PATCH v2 net] r8169: fix NAPI handling under high load
From: Heiner Kallweit @ 2018-10-18 17:56 UTC (permalink / raw)
To: David Miller, Realtek linux nic maintainers; +Cc: netdev@vger.kernel.org
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.
Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.
Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
v2:
- added "Fixes" tag
---
drivers/net/ethernet/realtek/r8169.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 8c4f49adc..7caf3b7e9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6528,17 +6528,15 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
struct net_device *dev = tp->dev;
u16 enable_mask = RTL_EVENT_NAPI | tp->event_slow;
- int work_done= 0;
+ int work_done;
u16 status;
status = rtl_get_events(tp);
rtl_ack_events(tp, status & ~tp->event_slow);
- if (status & RTL_EVENT_NAPI_RX)
- work_done = rtl_rx(dev, tp, (u32) budget);
+ work_done = rtl_rx(dev, tp, (u32) budget);
- if (status & RTL_EVENT_NAPI_TX)
- rtl_tx(dev, tp);
+ rtl_tx(dev, tp);
if (status & tp->event_slow) {
enable_mask &= ~tp->event_slow;
--
2.19.1
^ permalink raw reply related
* Re: [PATCH v2 net] r8169: fix NAPI handling under high load
From: Eric Dumazet @ 2018-10-18 18:02 UTC (permalink / raw)
To: Heiner Kallweit, David Miller, Realtek linux nic maintainers
Cc: netdev@vger.kernel.org
In-Reply-To: <8d75498e-6f43-a699-3a0a-fb4a2b5dc860@gmail.com>
On 10/18/2018 10:56 AM, Heiner Kallweit wrote:
> rtl_rx() and rtl_tx() are called only if the respective bits are set
> in the interrupt status register. Under high load NAPI may not be
> able to process all data (work_done == budget) and it will schedule
> subsequent calls to the poll callback.
> rtl_ack_events() however resets the bits in the interrupt status
> register, therefore subsequent calls to rtl8169_poll() won't call
> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
>
> Fix this by calling rtl_rx() and rtl_tx() independent of the bits
> set in the interrupt status register. Both functions will detect
> if there's nothing to do for them.
>
> Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
> v2:
> - added "Fixes" tag
> ---
> drivers/net/ethernet/realtek/r8169.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 8c4f49adc..7caf3b7e9 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6528,17 +6528,15 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
> struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
> struct net_device *dev = tp->dev;
> u16 enable_mask = RTL_EVENT_NAPI | tp->event_slow;
> - int work_done= 0;
> + int work_done;
> u16 status;
>
> status = rtl_get_events(tp);
> rtl_ack_events(tp, status & ~tp->event_slow);
>
> - if (status & RTL_EVENT_NAPI_RX)
> - work_done = rtl_rx(dev, tp, (u32) budget);
> + work_done = rtl_rx(dev, tp, (u32) budget);
>
> - if (status & RTL_EVENT_NAPI_TX)
> - rtl_tx(dev, tp);
> + rtl_tx(dev, tp);
>
> if (status & tp->event_slow) {
> enable_mask &= ~tp->event_slow;
>
One has to wonder why rtl8169_poll(), which might be called in a loop under DOS,
has to call rtl_ack_events() ?
Should not this be called one time from hard irq handler instead ?
^ permalink raw reply
* [PATCH v3 net-next 0/2] Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY
From: Yuiko Oshino @ 2018-10-18 19:06 UTC (permalink / raw)
To: davem, robh+dt, devicetree, f.fainelli, andrew
Cc: linux-kernel, mark.rutland, m.felsch, Markus.Niebel, netdev,
UNGLinuxDriver
This is the initial driver for Microchip KSZ9131 10/100/1000 Ethernet PHY
v3:
- KSZ9131 uses picosecond units for values of devicetree properties.
- rewrite micrel.c and micrel-ksz90x1.txt to use the picosecond values.
v2:
- Creating a series from two related patches.
Yuiko Oshino (2):
net: phy: micrel: add Microchip KSZ9131 initial driver
dt-bindings: net: add support for Microchip KSZ9131
.../devicetree/bindings/net/micrel-ksz90x1.txt | 28 ++++-
drivers/net/phy/micrel.c | 130 ++++++++++++++++++++-
include/linux/micrel_phy.h | 1 +
3 files changed, 157 insertions(+), 2 deletions(-)
--
2.7.4
^ permalink raw reply
* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Martin Lau @ 2018-10-18 18:12 UTC (permalink / raw)
To: Edward Cree
Cc: Yonghong Song, Alexei Starovoitov, daniel@iogearbox.net,
netdev@vger.kernel.org, Kernel Team
In-Reply-To: <45d2473b-76f6-4450-785e-bcc0a7e4ab5d@solarflare.com>
On Thu, Oct 18, 2018 at 05:47:11PM +0100, Edward Cree wrote:
> On 17/10/18 18:50, Martin Lau wrote:
> > On Wed, Oct 17, 2018 at 10:25:21AM -0700, Yonghong Song wrote:
> >> What you stated is true, BTF_KIND_FUNC_PROTO corresponds to
> >> dwarf subroutine tag which has no name while BTF_KIND_FUNC
> >> must have a valid name. The original design is to have both
> >> since they are corresponding to different dwarf constructs.
> >>
> >> Martin, what do you think?
> > I prefer to have separate kinds. We need a way to distinguish them.
> > For example, the BTF verifier is checking it. Having two kinds is
> > cleaner instead of resorting to other hints from 'struct btf_type'.
> > We don't lack of bits for kind.
> But my point is that (a) they can be distinguished by how they are
> used, and (b) the _only_ difference between them is how they are
> used. In this C code:
> int a = 5;
> int foo(int x) { return a; }
> int *b = &a;
> int (*bar)(int) = &foo;
> foo and *bar are _the same type_, just as a and *b are. It's just
> that C has a slightly odd way of writing
> int foo(int) = lambda x: a;
> and foo itself is implicitly sorta-const.
> What am I missing?
The BTF verification and bpf_prog_load() has to treat
them differently.
Are you suggesting they can be treated the same for
the kernel verification purpose?
or
The concern is, having two kind, BTF_KIND_FUNC_PROTO and
BTF_KIND_FUNC, is confusing because they have almost the
same BTF metadata?
>From the BTF perspective, they are different because
they are allowed to contain different information.
For example, only "foo" can have func_info in patch 5
(the to-be-added line_info can only belong to "foo" also).
This patch 2 is also doing checks on func_proto.
e.g. in btf_ptr_resolve().
The early idea is to have "foo_a", "foo_b", "foo_c"...
in a separate BTF section called "func" section (on top of
the current "type" and "string" section). By doing this,
only one BTF_KIND_FUNC would be needed as you suggested
because "foo" could be clearly distinguished from "bar" by
observing they belong to different BTF sections. The down
side is the "func" section will have its own id space separated
from the current "type" section id space and it is less ideal
for debugging purpose.
Hence, "foo" is contained in the "type" section also
and added FUNC_PROTO to distinguish them.
^ permalink raw reply
* Re: [PATCH v2 net] r8169: fix NAPI handling under high load
From: Heiner Kallweit @ 2018-10-18 18:17 UTC (permalink / raw)
To: Eric Dumazet, David Miller, Realtek linux nic maintainers
Cc: netdev@vger.kernel.org
In-Reply-To: <05eb4233-bf98-4aeb-73f6-94fec46ca3de@gmail.com>
On 18.10.2018 20:02, Eric Dumazet wrote:
>
>
> On 10/18/2018 10:56 AM, Heiner Kallweit wrote:
>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>> in the interrupt status register. Under high load NAPI may not be
>> able to process all data (work_done == budget) and it will schedule
>> subsequent calls to the poll callback.
>> rtl_ack_events() however resets the bits in the interrupt status
>> register, therefore subsequent calls to rtl8169_poll() won't call
>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
>>
>> Fix this by calling rtl_rx() and rtl_tx() independent of the bits
>> set in the interrupt status register. Both functions will detect
>> if there's nothing to do for them.
>>
>> Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>> v2:
>> - added "Fixes" tag
>> ---
>> drivers/net/ethernet/realtek/r8169.c | 8 +++-----
>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>> index 8c4f49adc..7caf3b7e9 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -6528,17 +6528,15 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
>> struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
>> struct net_device *dev = tp->dev;
>> u16 enable_mask = RTL_EVENT_NAPI | tp->event_slow;
>> - int work_done= 0;
>> + int work_done;
>> u16 status;
>>
>> status = rtl_get_events(tp);
>> rtl_ack_events(tp, status & ~tp->event_slow);
>>
>> - if (status & RTL_EVENT_NAPI_RX)
>> - work_done = rtl_rx(dev, tp, (u32) budget);
>> + work_done = rtl_rx(dev, tp, (u32) budget);
>>
>> - if (status & RTL_EVENT_NAPI_TX)
>> - rtl_tx(dev, tp);
>> + rtl_tx(dev, tp);
>>
>> if (status & tp->event_slow) {
>> enable_mask &= ~tp->event_slow;
>>
>
> One has to wonder why rtl8169_poll(), which might be called in a loop under DOS,
> has to call rtl_ack_events() ?
>
> Should not this be called one time from hard irq handler instead ?
>
Yes, it should. I have a patch in my tree including this, in addition it
moves everything from rtl_slow_event_work() to the hard irq handler.
However, due to recent changes, this patch may not apply cleanly to
older kernel versions.
Calling rtl_ack_events() in the poll callback isn't nice but not directly
a bug, so I'd prefer to submit this to net-next.
>
^ permalink raw reply
* Re: [PATCH] net/ipv4: fix tcp_poll for SMC fallback
From: David Miller @ 2018-10-18 18:22 UTC (permalink / raw)
To: kgraul; +Cc: netdev, ubraun, hch, linux-s390
In-Reply-To: <20181016144554.49660-1-kgraul@linux.ibm.com>
From: Karsten Graul <kgraul@linux.ibm.com>
Date: Tue, 16 Oct 2018 16:45:54 +0200
> Commit dd979b4df817 ("net: simplify sock_poll_wait") breaks tcp_poll for
> SMC fallback: An AF_SMC socket establishes an internal TCP socket for the
> CLC handshake with the remote peer. Whenever the SMC connection can not be
> established this CLC socket is used as a fallback. All socket operations on the
> SMC socket are then forwarded to the CLC socket. In case of poll, the
> file->private_data pointer references the SMC socket because the CLC socket has
> no file assigned. This causes tcp_poll to wait on the wrong socket.
>
> This patch fixes the issue by (re)introducing a sock_poll_wait variant with
> a socket parameter, and let tcp_poll use this variant.
>
> Fixes: dd979b4df817 ("net: simplify sock_poll_wait")
> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Generally speaking, if the invariant of filp->private_data == sock
does not actually hold true universally, I'd rather revert the
simplifications and add a big comment to sock_poll_wait() explaining
this.
Thank you.
^ permalink raw reply
* Re: [PATCH net-next 0/2] sctp: fix sk_wmem_queued and use it to check for writable space
From: David Miller @ 2018-10-18 18:24 UTC (permalink / raw)
To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <cover.1539716812.git.lucien.xin@gmail.com>
From: Xin Long <lucien.xin@gmail.com>
Date: Wed, 17 Oct 2018 03:07:49 +0800
> sctp doesn't count and use asoc sndbuf_used, sk sk_wmem_alloc and
> sk_wmem_queued properly, which also causes some problem.
>
> This patchset is to improve it.
Series applied, thanks.
^ permalink raw reply
* Re: [net-next PATCH] net: sched: cls_flower: Classify packets using port ranges
From: Nambiar, Amritha @ 2018-10-18 18:24 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev, davem, jakub.kicinski, sridhar.samudrala, jhs,
xiyou.wangcong
In-Reply-To: <20181018121727.GA4558@nanopsycho.orion>
On 10/18/2018 5:17 AM, Jiri Pirko wrote:
> Fri, Oct 12, 2018 at 03:53:30PM CEST, amritha.nambiar@intel.com wrote:
>> Added support in tc flower for filtering based on port ranges.
>> This is a rework of the RFC patch at:
>> https://patchwork.ozlabs.org/patch/969595/
>>
>> Example:
>> 1. Match on a port range:
>> -------------------------
>> $ tc filter add dev enp4s0 protocol ip parent ffff:\
>> prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>> action drop
>>
>> $ tc -s filter show dev enp4s0 parent ffff:
>> filter protocol ip pref 1 flower chain 0
>> filter protocol ip pref 1 flower chain 0 handle 0x1
>> eth_type ipv4
>> ip_proto tcp
>> dst_port_min 20
>> dst_port_max 30
>> skip_hw
>> not_in_hw
>> action order 1: gact action drop
>> random type none pass val 0
>> index 1 ref 1 bind 1 installed 181 sec used 5 sec
>> Action statistics:
>> Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>>
>> 2. Match on IP address and port range:
>> --------------------------------------
>> $ tc filter add dev enp4s0 protocol ip parent ffff:\
>> prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>> skip_hw action drop
>>
>> $ tc -s filter show dev enp4s0 parent ffff:
>> filter protocol ip pref 1 flower chain 0 handle 0x2
>> eth_type ipv4
>> ip_proto tcp
>> dst_ip 192.168.1.1
>> dst_port_min 100
>> dst_port_max 200
>> skip_hw
>> not_in_hw
>> action order 1: gact action drop
>> random type none pass val 0
>> index 2 ref 1 bind 1 installed 28 sec used 6 sec
>> Action statistics:
>> Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>>
>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>> ---
>> include/uapi/linux/pkt_cls.h | 5 ++
>> net/sched/cls_flower.c | 134 ++++++++++++++++++++++++++++++++++++++++--
>> 2 files changed, 132 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b569308 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>> TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>> TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>
>> + TCA_FLOWER_KEY_PORT_SRC_MIN, /* be16 */
>> + TCA_FLOWER_KEY_PORT_SRC_MAX, /* be16 */
>> + TCA_FLOWER_KEY_PORT_DST_MIN, /* be16 */
>> + TCA_FLOWER_KEY_PORT_DST_MAX, /* be16 */
>> +
>> TCA_FLOWER_FLAGS,
>> TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>> TCA_FLOWER_KEY_VLAN_PRIO, /* u8 */
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 9aada2d..5f135f0 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -55,6 +55,9 @@ struct fl_flow_key {
>> struct flow_dissector_key_ip ip;
>> struct flow_dissector_key_ip enc_ip;
>> struct flow_dissector_key_enc_opts enc_opts;
>> +
>> + struct flow_dissector_key_ports tp_min;
>> + struct flow_dissector_key_ports tp_max;
>> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
>>
>> struct fl_flow_mask_range {
>> @@ -103,6 +106,11 @@ struct cls_fl_filter {
>> struct net_device *hw_dev;
>> };
>>
>> +enum fl_endpoint {
>> + FLOWER_ENDPOINT_DST,
>> + FLOWER_ENDPOINT_SRC
>> +};
>> +
>> static const struct rhashtable_params mask_ht_params = {
>> .key_offset = offsetof(struct fl_flow_mask, key),
>> .key_len = sizeof(struct fl_flow_key),
>> @@ -179,11 +187,86 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
>> memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
>> }
>>
>> +static int fl_range_compare_params(struct cls_fl_filter *filter,
>> + struct fl_flow_key *key,
>> + struct fl_flow_key *mkey,
>> + enum fl_endpoint endpoint)
>> +{
>> + __be16 min_mask, max_mask, min_val, max_val;
>> +
>> + if (endpoint == FLOWER_ENDPOINT_DST) {
>> + min_mask = htons(filter->mask->key.tp_min.dst);
>> + max_mask = htons(filter->mask->key.tp_max.dst);
>> + min_val = htons(filter->key.tp_min.dst);
>> + max_val = htons(filter->key.tp_max.dst);
>> +
>> + if (min_mask && max_mask) {
>> + if (htons(key->tp.dst) < min_val ||
>> + htons(key->tp.dst) > max_val)
>> + return -1;
>> +
>> + /* skb does not have min and max values */
>> + mkey->tp_min.dst = filter->mkey.tp_min.dst;
>> + mkey->tp_max.dst = filter->mkey.tp_max.dst;
>> + }
>> + } else {
>> + min_mask = htons(filter->mask->key.tp_min.src);
>> + max_mask = htons(filter->mask->key.tp_max.src);
>> + min_val = htons(filter->key.tp_min.src);
>> + max_val = htons(filter->key.tp_max.src);
>> +
>> + if (min_mask && max_mask) {
>> + if (htons(key->tp.src) < min_val ||
>> + htons(key->tp.src) > max_val)
>> + return -1;
>> +
>> + /* skb does not have min and max values */
>> + mkey->tp_min.src = filter->mkey.tp_min.src;
>> + mkey->tp_max.src = filter->mkey.tp_max.src;
>> + }
>
> You basically have 2 functions in 1 here. Just have 2 functions:
> fl_port_range_dst_cmp()
> and
> fl_port_range_src_cmp()
>
> And avoid the "endpoint enum.
> Also, as you return -1 or 0, just make it bool.
>
Makes sense. Will do.
>
>> + }
>> + return 0;
>> +}
>> +
>> +static struct cls_fl_filter *fl_lookup_range(struct fl_flow_mask *mask,
>> + struct fl_flow_key *mkey,
>> + struct fl_flow_key *key)
>> +{
>> + struct cls_fl_filter *filter, *f;
>> + int ret;
>> +
>> + list_for_each_entry_rcu(filter, &mask->filters, list) {
>> + ret = fl_range_compare_params(filter, key, mkey,
>> + FLOWER_ENDPOINT_DST);
>> + if (ret < 0)
>> + continue;
>> +
>> + ret = fl_range_compare_params(filter, key, mkey,
>> + FLOWER_ENDPOINT_SRC);
>> + if (ret < 0)
>> + continue;
>> +
>> + f = rhashtable_lookup_fast(&mask->ht,
>> + fl_key_get_start(mkey, mask),
>> + mask->filter_ht_params);
>> + if (f)
>> + return f;
>> + }
>> + return NULL;
>> +}
>> +
>> static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
>> - struct fl_flow_key *mkey)
>> + struct fl_flow_key *mkey,
>> + struct fl_flow_key *key, bool is_skb)
>> {
>> - return rhashtable_lookup_fast(&mask->ht, fl_key_get_start(mkey, mask),
>> - mask->filter_ht_params);
>> + if ((!(mask->key.tp_min.dst && mask->key.tp_max.dst) &&
>> + !(mask->key.tp_min.src && mask->key.tp_max.src)) || !is_skb) {
>
> Would be probably good to have a dedicated bit to check for and decide
> if you do normal/range lookup. This is fast path.
>
Will fix in v2.
>
>> + return rhashtable_lookup_fast(&mask->ht,
>
> Remove double space ^^
>
Will fix in v2.
>
>> + fl_key_get_start(mkey, mask),
>> + mask->filter_ht_params);
>> + }
>> + /* Classify based on range */
>> + return fl_lookup_range(mask, mkey, key);
>> }
>>
>> static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>> @@ -207,8 +290,8 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>> skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
>>
>> fl_set_masked_key(&skb_mkey, &skb_key, mask);
>> + f = fl_lookup(mask, &skb_mkey, &skb_key, true);
>>
>> - f = fl_lookup(mask, &skb_mkey);
>> if (f && !tc_skip_sw(f->flags)) {
>> *res = f->res;
>> return tcf_exts_exec(skb, &f->exts, res);
>> @@ -909,6 +992,23 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
>> sizeof(key->arp.tha));
>> }
>>
>> + if (key->basic.ip_proto == IPPROTO_TCP ||
>> + key->basic.ip_proto == IPPROTO_UDP ||
>> + key->basic.ip_proto == IPPROTO_SCTP) {
>> + fl_set_key_val(tb, &key->tp_min.dst,
>> + TCA_FLOWER_KEY_PORT_DST_MIN, &mask->tp_min.dst,
>> + TCA_FLOWER_UNSPEC, sizeof(key->tp_min.dst));
>> + fl_set_key_val(tb, &key->tp_max.dst,
>> + TCA_FLOWER_KEY_PORT_DST_MAX, &mask->tp_max.dst,
>> + TCA_FLOWER_UNSPEC, sizeof(key->tp_max.dst));
>> + fl_set_key_val(tb, &key->tp_min.src,
>> + TCA_FLOWER_KEY_PORT_SRC_MIN, &mask->tp_min.src,
>> + TCA_FLOWER_UNSPEC, sizeof(key->tp_min.src));
>> + fl_set_key_val(tb, &key->tp_max.src,
>> + TCA_FLOWER_KEY_PORT_SRC_MAX, &mask->tp_max.src,
>> + TCA_FLOWER_UNSPEC, sizeof(key->tp_max.src));
>> + }
>> +
>> if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
>> tb[TCA_FLOWER_KEY_ENC_IPV4_DST]) {
>> key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
>> @@ -1026,8 +1126,7 @@ static void fl_init_dissector(struct flow_dissector *dissector,
>> FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
>> FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>> FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
>> - FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>> - FLOW_DISSECTOR_KEY_PORTS, tp);
>> + FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);
>> FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>> FLOW_DISSECTOR_KEY_IP, ip);
>> FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>> @@ -1227,7 +1326,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>> goto errout_idr;
>>
>> if (!tc_skip_sw(fnew->flags)) {
>> - if (!fold && fl_lookup(fnew->mask, &fnew->mkey)) {
>> + if (!fold && fl_lookup(fnew->mask, &fnew->mkey, NULL, false)) {
>
>
> I don't undestand why do you need the "is_skb" arg here. Could you
> please explain?
>
> Thanks!
>
The reason to keep the 'is_skb' arg is because, fl_lookup is called in
two cases, one for skb classification and another for checking if a
filter exists every-time a new filter is added. In case of skb
classification, we need to go through the range-comparator to decide if
the skb's port-value falls within the range-filter's min and max limits.
In case of filter validation, the range-filter that we are trying to add
will have min and max values, and we are validating it against other
range-filters with min and max values. So, rhashtable lookup will
suffice here and there is no need to go through the range-comparator in
this case. In the above code, we are validating if a range-filter
exists, so 'is_skb' is false.
>
>> err = -EEXIST;
>> goto errout_mask;
>> }
>> @@ -1800,6 +1899,27 @@ static int fl_dump_key(struct sk_buff *skb, struct net *net,
>> sizeof(key->arp.tha))))
>> goto nla_put_failure;
>>
>> + if ((key->basic.ip_proto == IPPROTO_TCP ||
>> + key->basic.ip_proto == IPPROTO_UDP ||
>> + key->basic.ip_proto == IPPROTO_SCTP) &&
>> + (fl_dump_key_val(skb, &key->tp_min.dst,
>> + TCA_FLOWER_KEY_PORT_DST_MIN,
>> + &mask->tp_min.dst, TCA_FLOWER_UNSPEC,
>> + sizeof(key->tp_min.dst)) ||
>> + fl_dump_key_val(skb, &key->tp_max.dst,
>> + TCA_FLOWER_KEY_PORT_DST_MAX,
>> + &mask->tp_max.dst, TCA_FLOWER_UNSPEC,
>> + sizeof(key->tp_max.dst)) ||
>> + fl_dump_key_val(skb, &key->tp_min.src,
>> + TCA_FLOWER_KEY_PORT_SRC_MIN,
>> + &mask->tp_min.src, TCA_FLOWER_UNSPEC,
>> + sizeof(key->tp_min.src)) ||
>> + fl_dump_key_val(skb, &key->tp_max.src,
>> + TCA_FLOWER_KEY_PORT_SRC_MAX,
>> + &mask->tp_max.src, TCA_FLOWER_UNSPEC,
>> + sizeof(key->tp_max.src))))
>> + goto nla_put_failure;
>> +
>> if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&
>> (fl_dump_key_val(skb, &key->enc_ipv4.src,
>> TCA_FLOWER_KEY_ENC_IPV4_SRC, &mask->enc_ipv4.src,
>>
^ permalink raw reply
* Re: [iproute2 PATCH] tc: flower: Classify packets based port ranges
From: Nambiar, Amritha @ 2018-10-18 18:25 UTC (permalink / raw)
To: Jiri Pirko
Cc: stephen, netdev, jakub.kicinski, sridhar.samudrala, jhs,
xiyou.wangcong
In-Reply-To: <20181018122136.GB4558@nanopsycho.orion>
On 10/18/2018 5:21 AM, Jiri Pirko wrote:
> Fri, Oct 12, 2018 at 03:54:42PM CEST, amritha.nambiar@intel.com wrote:
>
> [...]
>
>> @@ -1516,6 +1625,22 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
>> if (nl_type >= 0)
>> flower_print_port("src_port", tb[nl_type]);
>>
>> + if (flower_port_range_attr_type(ip_proto, FLOWER_ENDPOINT_DST, &range)
>> + == 0) {
>> + flower_print_port_range("dst_port_min",
>> + tb[range.min_port_type]);
>> + flower_print_port_range("dst_port_max",
>> + tb[range.max_port_type]);
>
> The input and output of iproute2 utils, tc included should be in sync.
> So you need to print "range x-y" here.
>
Agree, will fix in v2. Thanks!
>
>> + }
>> +
>> + if (flower_port_range_attr_type(ip_proto, FLOWER_ENDPOINT_SRC, &range)
>> + == 0) {
>> + flower_print_port_range("src_port_min",
>> + tb[range.min_port_type]);
>> + flower_print_port_range("src_port_max",
>> + tb[range.max_port_type]);
>> + }
>> +
>> flower_print_tcp_flags("tcp_flags", tb[TCA_FLOWER_KEY_TCP_FLAGS],
>> tb[TCA_FLOWER_KEY_TCP_FLAGS_MASK]);
>>
>>
^ permalink raw reply
* Issues in error queue polling
From: Ricardo Biehl Pasquali @ 2018-10-18 18:26 UTC (permalink / raw)
To: netdev; +Cc: jacob.e.keller, vinicius.gomes, davem
The commit 7d4c04fc170087119727 ("net: add option to enable
error queue packets waking select") (2013-03-28) introduced
SO_SELECT_ERR_QUEUE, which masks POLLPRI with POLLERR event
return in some socket poll callbacks.
POLLERR event issued with sock_queue_err_skb() did not wake
up a poll when POLLERR is the only requested event because
sk_data_ready() (sock_def_readable()) was used and it
doesn't mask POLLERR in poll wake up:
wake_up_interruptible_sync_poll(&wq->wait,
EPOLLIN | EPOLLPRI |
EPOLLRDNORM | EPOLLRDBAND);
If POLLIN or POLLPRI are requested, for example, poll does
wake up.
POLLERR wakeup by requesting POLLPRI is possible without
set SO_SELECT_ERR_QUEUE. All the option does is masking
POLLPRI as a returned event before poll returns. poll
would return anyway because of POLLERR.
Also, the sentence "[...] enable software to wait on error
queue packets without waking up for regular data on the
socket." from the above commit is not true.
A POLLIN event issued via sock_def_readable() wakes up
threads waiting for POLLPRI, and vice versa. However,
poll() does not return, sleeping again, as the requested
events do not match events.
The commit 6e5d58fdc9bedd0255a8 ("skbuff: Fix not waking
applications when errors are enqueued") (2018-03-14) make
POLLERR alone wake up poll. It replaces sk_data_ready()
(sock_def_readable()) with sk_error_report()
(sock_def_error_report()). This makes "POLLERR wake up by
requesting POLLPRI" obsolete.
Rationale:
POLLIN-only and POLLERR-only wake up are useful when there
is a receiving thread, a sending thread, and a thread that
get transmit timestamps. The thread polling on POLLERR will
not wake up when regular data arrives (POLLIN). The thread
polling on POLLIN will not wake up when tx timestamps are
ready (POLLERR).
One solution is adding an option that disable POLLERR as
requested event. This is in the Virtual File System
subsystem, not in the network, though.
This solves the problem of waking up other threads that
not interested in error queue. Thus allowing a separate
thread take care of error queue (useful for receiving
transmit timestamps).
However, this may not be a good solution as it depends on
polling behavior. Thoughts?
By the way, as the kernel development shouldn't break user
space, SO_SELECT_ERR_QUEUE can become a compatibility
option.
P.S.: The kernel is slowly getting bigger (and perhaps
messy) with compatibility code. One day, the
compatibility code should be moved to a
compatibility-only place, or completely dropped
(unlikely).
pasquali
^ permalink raw reply
* Re: [PATCH v2 net] r8169: fix NAPI handling under high load
From: David Miller @ 2018-10-18 18:34 UTC (permalink / raw)
To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <8d75498e-6f43-a699-3a0a-fb4a2b5dc860@gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu, 18 Oct 2018 19:56:01 +0200
> rtl_rx() and rtl_tx() are called only if the respective bits are set
> in the interrupt status register. Under high load NAPI may not be
> able to process all data (work_done == budget) and it will schedule
> subsequent calls to the poll callback.
> rtl_ack_events() however resets the bits in the interrupt status
> register, therefore subsequent calls to rtl8169_poll() won't call
> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
>
> Fix this by calling rtl_rx() and rtl_tx() independent of the bits
> set in the interrupt status register. Both functions will detect
> if there's nothing to do for them.
>
> Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
> v2:
> - added "Fixes" tag
Applied and queued up for -stable, thanks!
And I agree that moving the IRQ ACKing to the hw irq handler is
net-next material.
^ permalink raw reply
* [PATCH bpf-next] tools: bpftool: use 4 context mode for the NFP disasm
From: Jakub Kicinski @ 2018-10-18 18:34 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: netdev, oss-drivers, Jakub Kicinski
The nfp driver is currently always JITing the BPF for 4 context/thread
mode of the NFP flow processors. Tell this to the disassembler,
otherwise some registers may be incorrectly decoded.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
---
tools/bpf/bpftool/common.c | 5 ++++-
tools/bpf/bpftool/jit_disasm.c | 4 +++-
tools/bpf/bpftool/main.h | 6 ++++--
tools/bpf/bpftool/prog.c | 14 +++++++++-----
4 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 3318da8060bd..25af85304ebe 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -554,7 +554,9 @@ static int read_sysfs_netdev_hex_int(char *devname, const char *entry_name)
return read_sysfs_hex_int(full_path);
}
-const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino)
+const char *
+ifindex_to_bfd_params(__u32 ifindex, __u64 ns_dev, __u64 ns_ino,
+ const char **opt)
{
char devname[IF_NAMESIZE];
int vendor_id;
@@ -579,6 +581,7 @@ const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino)
device_id != 0x6000 &&
device_id != 0x6003)
p_info("Unknown NFP device ID, assuming it is NFP-6xxx arch");
+ *opt = "ctx4";
return "NFP-6xxx";
default:
p_err("Can't get bfd arch name for device vendor id 0x%04x",
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 87439320ef70..c75ffd9ce2bb 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -77,7 +77,7 @@ static int fprintf_json(void *out, const char *fmt, ...)
}
void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
- const char *arch)
+ const char *arch, const char *disassembler_options)
{
disassembler_ftype disassemble;
struct disassemble_info info;
@@ -116,6 +116,8 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
info.arch = bfd_get_arch(bfdf);
info.mach = bfd_get_mach(bfdf);
+ if (disassembler_options)
+ info.disassembler_options = disassembler_options;
info.buffer = image;
info.buffer_length = len;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 28ee769bd11b..28322ace2856 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -145,13 +145,15 @@ int map_parse_fd(int *argc, char ***argv);
int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len);
void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
- const char *arch);
+ const char *arch, const char *disassembler_options);
void print_data_json(uint8_t *data, size_t len);
void print_hex_data_json(uint8_t *data, size_t len);
unsigned int get_page_size(void);
unsigned int get_possible_cpus(void);
-const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino);
+const char *
+ifindex_to_bfd_params(__u32 ifindex, __u64 ns_dev, __u64 ns_ino,
+ const char **opt);
struct btf_dumper {
const struct btf *btf;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 335028968dfb..5302ee282409 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -449,6 +449,7 @@ static int do_dump(int argc, char **argv)
unsigned long *func_ksyms = NULL;
struct bpf_prog_info info = {};
unsigned int *func_lens = NULL;
+ const char *disasm_opt = NULL;
unsigned int nr_func_ksyms;
unsigned int nr_func_lens;
struct dump_data dd = {};
@@ -607,9 +608,10 @@ static int do_dump(int argc, char **argv)
const char *name = NULL;
if (info.ifindex) {
- name = ifindex_to_bfd_name_ns(info.ifindex,
- info.netns_dev,
- info.netns_ino);
+ name = ifindex_to_bfd_params(info.ifindex,
+ info.netns_dev,
+ info.netns_ino,
+ &disasm_opt);
if (!name)
goto err_free;
}
@@ -651,7 +653,8 @@ static int do_dump(int argc, char **argv)
printf("%s:\n", sym_name);
}
- disasm_print_insn(img, lens[i], opcodes, name);
+ disasm_print_insn(img, lens[i], opcodes, name,
+ disasm_opt);
img += lens[i];
if (json_output)
@@ -663,7 +666,8 @@ static int do_dump(int argc, char **argv)
if (json_output)
jsonw_end_array(json_wtr);
} else {
- disasm_print_insn(buf, *member_len, opcodes, name);
+ disasm_print_insn(buf, *member_len, opcodes, name,
+ disasm_opt);
}
} else if (visual) {
if (json_output)
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] net: ethernet: fec: Add missing SPEED_
From: LABBE Corentin @ 2018-10-18 18:47 UTC (permalink / raw)
To: Florian Fainelli; +Cc: andrew, davem, fugang.duan, linux-kernel, netdev
In-Reply-To: <2621cbc9-47ed-ce2a-b7ee-262f17dc138f@gmail.com>
On Thu, Oct 18, 2018 at 11:39:24AM -0700, Florian Fainelli wrote:
> On 10/18/2018 08:05 AM, Corentin Labbe wrote:
> > Since commit 58056c1e1b0e ("net: ethernet: Use phy_set_max_speed() to limit advertised speed"), the fec driver is unable to get any link.
> > This is due to missing SPEED_.
>
> But SPEED_1000 is defined in include/uapi/linux/ethtool.h as 1000, so
> surely this would amount to the same code paths being taken or am I
> missing something here?
The bisect session pointed your patch, reverting it fix the issue.
BUT since the fix seemed trivial I sent the patch without more test then compile it.
Sorry, I have just found some minutes ago that it didnt fix the issue.
But your patch is still the cause for sure.
^ permalink raw reply
* [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Nitin Hande @ 2018-10-18 18:50 UTC (permalink / raw)
To: daniel; +Cc: netdev, ast, joe, brouer, john.fastabend
This patch proposes to extend the sk_lookup() BPF API to the
XDP hookpoint. The sk_lookup() helper supports a lookup
on incoming packet to find the corresponding socket that will
receive this packet. Current support for this BPF API is
at the tc hookpoint. This patch will extend this API at XDP
hookpoint. A XDP program can map the incoming packet to the
5-tuple parameter and invoke the API to find the corresponding
socket structure.
Open Issue
* The underlying code relies on presence of an skb to find out the
right sk for the case of REUSEPORT socket option. Since there is
no skb available at XDP hookpoint, the helper function will return
the first available sk based off the 5 tuple hash. If the desire
is to return a particular sk matching reuseport_cb function, please
suggest way to tackle it, which can be addressed in a future commit.
Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
---
net/core/filter.c | 92 +++++++++++++++++++++++++++++++++++++++--------
1 file changed, 78 insertions(+), 14 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..85862e41e291 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4694,14 +4694,11 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
#ifdef CONFIG_INET
static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
- struct sk_buff *skb, u8 family, u8 proto)
+ struct sk_buff *skb, int dif, u8 family,
+ u8 proto)
{
bool refcounted = false;
struct sock *sk = NULL;
- int dif = 0;
-
- if (skb->dev)
- dif = skb->dev->ifindex;
if (family == AF_INET) {
__be32 src4 = tuple->ipv4.saddr;
@@ -4751,10 +4748,10 @@ static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
* callers to satisfy BPF_CALL declarations.
*/
static unsigned long
-bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
- u8 proto, u64 netns_id, u64 flags)
+__bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+ struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
+ u64 flags)
{
- struct net *caller_net;
struct sock *sk = NULL;
u8 family = AF_UNSPEC;
struct net *net;
@@ -4763,19 +4760,15 @@ bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags))
goto out;
- if (skb->dev)
- caller_net = dev_net(skb->dev);
- else
- caller_net = sock_net(skb->sk);
if (netns_id) {
net = get_net_ns_by_id(caller_net, netns_id);
if (unlikely(!net))
goto out;
- sk = sk_lookup(net, tuple, skb, family, proto);
+ sk = sk_lookup(net, tuple, skb, ifindex, family, proto);
put_net(net);
} else {
net = caller_net;
- sk = sk_lookup(net, tuple, skb, family, proto);
+ sk = sk_lookup(net, tuple, skb, ifindex, family, proto);
}
if (sk)
@@ -4784,6 +4777,25 @@ bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
return (unsigned long) sk;
}
+static unsigned long
+bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+ u8 proto, u64 netns_id, u64 flags)
+{
+ struct net *caller_net;
+ int ifindex;
+
+ if (skb->dev) {
+ caller_net = dev_net(skb->dev);
+ ifindex = skb->dev->ifindex;
+ } else {
+ caller_net = sock_net(skb->sk);
+ ifindex = 0;
+ }
+
+ return __bpf_sk_lookup(skb, tuple, len, caller_net, ifindex,
+ proto, netns_id, flags);
+}
+
BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
@@ -4833,6 +4845,50 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_SOCKET,
};
+
+BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
+ struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
+{
+ struct net *caller_net = dev_net(ctx->rxq->dev);
+ int ifindex = ctx->rxq->dev->ifindex;
+
+ return __bpf_sk_lookup(NULL, tuple, len, caller_net, ifindex,
+ IPPROTO_UDP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_xdp_sk_lookup_udp_proto = {
+ .func = bpf_xdp_sk_lookup_udp,
+ .gpl_only = false,
+ .pkt_access = true,
+ .ret_type = RET_PTR_TO_SOCKET_OR_NULL,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+ .arg5_type = ARG_ANYTHING,
+};
+
+BPF_CALL_5(bpf_xdp_sk_lookup_tcp, struct xdp_buff *, ctx,
+ struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
+{
+ struct net *caller_net = dev_net(ctx->rxq->dev);
+ int ifindex = ctx->rxq->dev->ifindex;
+
+ return __bpf_sk_lookup(NULL, tuple, len, caller_net, ifindex,
+ IPPROTO_TCP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_xdp_sk_lookup_tcp_proto = {
+ .func = bpf_xdp_sk_lookup_tcp,
+ .gpl_only = false,
+ .pkt_access = true,
+ .ret_type = RET_PTR_TO_SOCKET_OR_NULL,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+ .arg5_type = ARG_ANYTHING,
+};
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func)
@@ -5076,6 +5132,14 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_xdp_adjust_tail_proto;
case BPF_FUNC_fib_lookup:
return &bpf_xdp_fib_lookup_proto;
+#ifdef CONFIG_INET
+ case BPF_FUNC_sk_lookup_udp:
+ return &bpf_xdp_sk_lookup_udp_proto;
+ case BPF_FUNC_sk_lookup_tcp:
+ return &bpf_xdp_sk_lookup_tcp_proto;
+ case BPF_FUNC_sk_release:
+ return &bpf_sk_release_proto;
+#endif
default:
return bpf_base_func_proto(func_id);
}
--
2.19.1
^ permalink raw reply related
* Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}
From: Daniel Borkmann @ 2018-10-18 19:00 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Peter Zijlstra, paulmck, will.deacon, acme, yhs, john.fastabend,
netdev
In-Reply-To: <20181018153307.ayvmq6du3gnsyvro@ast-mbp.dhcp.thefacebook.com>
On 10/18/2018 05:33 PM, Alexei Starovoitov wrote:
> On Thu, Oct 18, 2018 at 05:04:34PM +0200, Daniel Borkmann wrote:
>> #endif /* _TOOLS_LINUX_ASM_IA64_BARRIER_H */
>> diff --git a/tools/arch/powerpc/include/asm/barrier.h b/tools/arch/powerpc/include/asm/barrier.h
>> index a634da0..905a2c6 100644
>> --- a/tools/arch/powerpc/include/asm/barrier.h
>> +++ b/tools/arch/powerpc/include/asm/barrier.h
>> @@ -27,4 +27,20 @@
>> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
>>
>> +#if defined(__powerpc64__)
>> +#define smp_lwsync() __asm__ __volatile__ ("lwsync" : : : "memory")
>> +
>> +#define smp_store_release(p, v) \
>> +do { \
>> + smp_lwsync(); \
>> + WRITE_ONCE(*p, v); \
>> +} while (0)
>> +
>> +#define smp_load_acquire(p) \
>> +({ \
>> + typeof(*p) ___p1 = READ_ONCE(*p); \
>> + smp_lwsync(); \
>> + ___p1; \
>
> I don't like this proliferation of asm.
> Why do we think that we can do better job than compiler?
> can we please use gcc builtins instead?
> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
> __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
> are done specifically for this use case if I'm not mistaken.
> I think it pays to learn what compiler provides.
But are you sure the C11 memory model matches exact same model as kernel?
Seems like last time Will looked into it [0] it wasn't the case ...
The above was pulled in and slightly adapted from kernel side of arch
asm barriers. Hm, it would probably be safest if an arch decides to adapt
C11 barriers first from kernel side and user space could then use the
exact same matching builtin functions for scenarios like these as well.
[0] https://lore.kernel.org/lkml/20170308174300.GL20400@arm.com/
^ permalink raw reply
* Re: [PATCH linux-firmware] linux-firmware: liquidio: fix GPL compliance issue
From: Josh Boyer @ 2018-10-18 19:04 UTC (permalink / raw)
To: John W. Linville
Cc: Felix Manlunas, Linux Firmware, netdev, Florian Weimer,
Manish.Awasthi, Manojkumar.Panicker, Faisal.Masood,
Raghu.Vatsavayi, Derek.Chickles, Satananda.Burla
In-Reply-To: <20181017201149.GC1381@tuxdriver.com>
On Wed, Oct 17, 2018 at 5:09 PM John W. Linville <linville@tuxdriver.com> wrote:
>
> On Wed, Oct 17, 2018 at 07:34:42PM +0000, Manlunas, Felix wrote:
> > On Fri, Sep 28, 2018 at 04:50:51PM -0700, Felix Manlunas wrote:
> > > Part of the code inside the lio_vsw_23xx.bin firmware image is under GPL,
> > > but the LICENCE.cavium file neglects to indicate that. However,
> > > LICENCE.cavium does correctly specify the license that covers the other
> > > Cavium firmware images that do not contain any GPL code.
> > >
> > > Fix the GPL compliance issue by adding a new file, LICENCE.cavium_liquidio,
> > > which correctly shows the GPL boilerplate. This new file specifies the
> > > licenses for all liquidio firmware, including the ones that do not have
> > > GPL code.
> > >
> > > Change the liquidio section of WHENCE to point to LICENCE.cavium_liquidio.
> > >
> > > Reported-by: Florian Weimer <fweimer@redhat.com>
> > > Signed-off-by: Manish Awasthi <manish.awasthi@cavium.com>
> > > Signed-off-by: Manoj Panicker <manojkumar.panicker@cavium.com>
> > > Signed-off-by: Faisal Masood <faisal.masood@cavium.com>
> > > Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
> > > ---
> > > LICENCE.cavium_liquidio | 429 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > WHENCE | 2 +-
> > > 2 files changed, 430 insertions(+), 1 deletion(-)
> > > create mode 100644 LICENCE.cavium_liquidio
> >
> > Hello Maintainers of linux-firmware.git,
> >
> > Any feedback about this patch?
>
> I would prefer to see an offer that included a defined URL for anyone
> to download the source for the kernel in question without having to
> announce themselves. The "send an email to info@cavium.com" offer may
> (or may not) be sufficient for the letter of the law. But it seems
> both fragile and prone to subjective frustrations and delays for
> users to obtain the sources at some future date.
I agree with John here, but I also believe the patch is better than
the current text in the upstream repo. I've committed it and pushed
it out. If there are improvements to be made on source availability,
we can take those in a different patch.
Thank you for taking this seriously and responding quickly.
josh
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox