Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/5] hy: core: rework phy_set_mode to accept phy mode and submode
From: Grygorii Strashko @ 2018-11-08  0:36 UTC (permalink / raw)
  To: David S. Miller, Kishon Vijay Abraham I
  Cc: netdev, Sekhar Nori, linux-kernel, linux-arm-kernel,
	Tony Lindgren, linux-amlogic, linux-mediatek, Alexandre Belloni,
	Antoine Tenart, Quentin Schulz, Vivek Gautam, Maxime Ripard,
	Chen-Yu Tsai, Carlo Caione, Chunfeng Yun, Matthias Brugger,
	Manu Gautam, Grygorii Strashko

Hi Kishon, All,

As was discussed in [1] I'm posting series which introduces rework of
phy_set_mode to accept phy mode and submode. I've dropped TI specific patches as
this change is pretty big by itself.

Patch 1 is cumulative change which refactors PHY framework code to
support dual level PHYs mode configuration - PHY mode and PHY submode. It
extends .set_mode() callback to support additional parameter "int submode"
and converts all corresponding PHY drivers to support new .set_mode()
callback declaration.
The new extended PHY API
 int phy_set_mode_ext(struct phy *phy, enum phy_mode mode, int submode)
is introduced to support dual level PHYs mode configuration and existing
phy_set_mode() API is converted to macros, so PHY framework consumers do
not need to be changed (~21 matches).

Patches 2-4: Add new PHY's mode to be used by Ethernet PHY interface drivers or
multipurpose PHYs like serdes and convert ocelot-serdes and mvebu-cp110-comphy
PHY drivers to use recently introduced PHY_MODE_ETHERNET and phy_set_mode_ext().

Patch 5 - removes unused, ethernet specific phy modes from enum phy_mode.

[1] https://lkml.org/lkml/2018/10/25/366

Grygorii Strashko (5):
  phy: core: rework phy_set_mode to accept phy mode and submode
  phy: core: add PHY_MODE_ETHERNET
  phy: ocelot-serdes: convert to use eth phy mode and submode
  phy: mvebu-cp110-comphy: convert to use eth phy mode and submode
  phy: core: clean up unused ethernet specific phy modes

 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 21 ++-----
 drivers/net/ethernet/mscc/ocelot.c              |  9 +--
 drivers/phy/allwinner/phy-sun4i-usb.c           |  3 +-
 drivers/phy/amlogic/phy-meson-gxl-usb2.c        |  5 +-
 drivers/phy/amlogic/phy-meson-gxl-usb3.c        |  5 +-
 drivers/phy/marvell/phy-mvebu-cp110-comphy.c    | 83 ++++++++++++++-----------
 drivers/phy/mediatek/phy-mtk-tphy.c             |  2 +-
 drivers/phy/mediatek/phy-mtk-xsphy.c            |  2 +-
 drivers/phy/mscc/phy-ocelot-serdes.c            | 16 +++--
 drivers/phy/phy-core.c                          |  6 +-
 drivers/phy/qualcomm/phy-qcom-qmp.c             |  3 +-
 drivers/phy/qualcomm/phy-qcom-qusb2.c           |  3 +-
 drivers/phy/qualcomm/phy-qcom-ufs-qmp-14nm.c    |  3 +-
 drivers/phy/qualcomm/phy-qcom-ufs-qmp-20nm.c    |  3 +-
 drivers/phy/qualcomm/phy-qcom-usb-hs.c          |  3 +-
 drivers/phy/ti/phy-da8xx-usb.c                  |  3 +-
 drivers/phy/ti/phy-tusb1210.c                   |  2 +-
 include/linux/phy/phy.h                         | 18 +++---
 18 files changed, 100 insertions(+), 90 deletions(-)

-- 
2.10.5

^ permalink raw reply

* Re: [RFC PATCH 01/12] dt-bindings: soc: qcom: add IPA bindings
From: Rob Herring @ 2018-11-07 14:59 UTC (permalink / raw)
  To: Alex Elder
  Cc: Rob Herring, Mark Rutland, davem, Arnd Bergmann, Bjorn Andersson,
	ilias.apalodimas, netdev, devicetree, linux-arm-msm, linux-soc,
	linux-arm-kernel, Linux Kernel Mailing List, syadagir, mjavid
In-Reply-To: <20181107003250.5832-2-elder@linaro.org>

On Tue, Nov 6, 2018 at 6:33 PM Alex Elder <elder@linaro.org> wrote:
>
> Add the binding definitions for the "qcom,ipa" and "qcom,rmnet-ipa"
> device tree nodes.
>
> Signed-off-by: Alex Elder <elder@linaro.org>
> ---
>  .../devicetree/bindings/soc/qcom/qcom,ipa.txt | 136 ++++++++++++++++++
>  .../bindings/soc/qcom/qcom,rmnet-ipa.txt      |  15 ++
>  2 files changed, 151 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt
>  create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,rmnet-ipa.txt
>
> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt
> new file mode 100644
> index 000000000000..d4d3d37df029
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt
> @@ -0,0 +1,136 @@
> +Qualcomm IPA (IP Accelerator) Driver

Bindings are for h/w not drivers.

> +
> +This binding describes the Qualcomm IPA.  The IPA is capable of offloading
> +certain network processing tasks (e.g. filtering, routing, and NAT) from
> +the main processor.  The IPA currently serves only as a network interface,
> +providing access to an LTE network available via a modem.
> +
> +The IPA sits between multiple independent "execution environments,"
> +including the AP subsystem (APSS) and the modem.  The IPA presents
> +a Generic Software Interface (GSI) to each execution environment.
> +The GSI is an integral part of the IPA, but it is logically isolated
> +and has a distinct interrupt and a separately-defined address space.
> +
> +    ----------   -------------   ---------
> +    |        |   |G|       |G|   |       |
> +    |  APSS  |===|S|  IPA  |S|===| Modem |
> +    |        |   |I|       |I|   |       |
> +    ----------   -------------   ---------
> +
> +See also:
> +  bindings/interrupt-controller/interrupts.txt
> +  bindings/interconnect/interconnect.txt
> +  bindings/soc/qcom/qcom,smp2p.txt
> +  bindings/reserved-memory/reserved-memory.txt
> +  bindings/clock/clock-bindings.txt
> +
> +All properties defined below are required.
> +
> +- compatible:
> +       Must be one of the following compatible strings:
> +               "qcom,ipa-sdm845-modem_init"
> +               "qcom,ipa-sdm845-tz_init"

Normal order is <vendor>,<soc>-<ipblock>.

Don't use '_'.

What's the difference between these 2? It can't be detected somehow?
This might be better expressed as a property. Then if Trustzone
initializes things, it can just add a property.

> +
> +-reg:
> +       Resources specyfing the physical address spaces of the IPA and GSI.

typo

> +
> +-reg-names:
> +       The names of the address space ranges defined by the "reg" property.
> +       Must be "ipa" and "gsi".
> +
> +- interrupts-extended:

Use 'interrupts' here and describe what they are and the order. What
they are connected to (and the need for interrupts-extended) is
outside the scope of this binding.

> +       Specifies the IRQs used by the IPA.  Four cells are required,
> +       specifying: the IPA IRQ; the GSI IRQ; the clock query interrupt
> +       from the modem; and the "ready for stage 2 initialization"
> +       interrupt from the modem.  The first two are hardware IRQs; the
> +       third and fourth are SMP2P input interrupts.
> +
> +- interrupt-names:
> +       The names of the interrupts defined by the "interrupts-extended"
> +       property.  Must be "ipa", "gsi", "ipa-clock-query", and
> +       "ipa-post-init".

Format as one per line.

> +
> +- clocks:
> +       Resource that defines the IPA core clock.
> +
> +- clock-names:
> +       The name used for the IPA core clock.  Must be "core".
> +
> +- interconnects:
> +       Specifies the interconnects used by the IPA.  Three cells are
> +       required, specifying:  the path from the IPA to memory; from
> +       IPA to internal (SoC resident) memory; and between the AP
> +       subsystem and IPA for register access.
> +
> +- interconnect-names:
> +       The names of the interconnects defined by the "interconnects"
> +       property.  Must be "memory", "imem", and "config".
> +
> +- qcom,smem-states
> +       The state bits used for SMP2P output.  Two cells must be specified.
> +       The first indicates whether the value in the second bit is valid
> +       (1 means valid).  The second, if valid, defines whether the IPA
> +       clock is enabled (1 means enabled).
> +
> +- qcom,smem-state-names
> +       The names of the state bits used for SMP2P output.  These must be
> +       "ipa-clock-enabled-valid" and "ipa-clock-enabled".
> +
> +- memory-region
> +       A phandle for a reserved memory area that holds the firmware passed
> +       to Trust Zone for authentication.  (Note, this is required
> +       only for "qcom,ipa-sdm845-tz_init".)
> +
> += EXAMPLE
> +
> +The following example represents the IPA present in the SDM845 SoC.  It
> +shows portions of the "modem-smp2p" node to indicate its relationship
> +with the interrupts and SMEM states used by the IPA.
> +
> +       modem-smp2p {
> +               compatible = "qcom,smp2p";
> +               . . .
> +               ipa_smp2p_out: ipa-ap-to-modem {
> +                       qcom,entry-name = "ipa";
> +                       #qcom,smem-state-cells = <1>;
> +               };
> +
> +               ipa_smp2p_in: ipa-modem-to-ap {
> +                       qcom,entry-name = "ipa";
> +                       interrupt-controller;
> +                       #interrupt-cells = <2>;
> +               };
> +       };
> +
> +       ipa@1e00000 {

ipa@1e40000

> +               compatible = "qcom,ipa-sdm845-modem_init";
> +
> +               reg = <0x1e40000 0x34000>,
> +                     <0x1e04000 0x2c000>;
> +               reg-names = "ipa",
> +                           "gsi";
> +
> +               interrupts-extended = <&intc 0 311 IRQ_TYPE_LEVEL_HIGH>,
> +                                     <&intc 0 432 IRQ_TYPE_LEVEL_HIGH>,
> +                                     <&ipa_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
> +                                     <&ipa_smp2p_in 1 IRQ_TYPE_EDGE_RISING>;
> +               interrupt-names = "ipa",
> +                                 "gsi",
> +                                 "ipa-clock-query",
> +                                 "ipa-post-init";
> +
> +               clocks = <&rpmhcc RPMH_IPA_CLK>;
> +               clock-names = "core";
> +
> +               interconnects = <&qnoc MASTER_IPA &qnoc SLAVE_EBI1>,
> +                               <&qnoc MASTER_IPA &qnoc SLAVE_IMEM>,
> +                               <&qnoc MASTER_APPSS_PROC &qnoc SLAVE_IPA_CFG>;
> +               interconnect-names = "memory",
> +                                    "imem",
> +                                    "config";
> +
> +               qcom,smem-states = <&ipa_smp2p_out 0>,
> +                                  <&ipa_smp2p_out 1>;
> +               qcom,smem-state-names = "ipa-clock-enabled-valid",
> +                                       "ipa-clock-enabled";
> +       };
> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,rmnet-ipa.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,rmnet-ipa.txt
> new file mode 100644
> index 000000000000..3d0b2aabefc7
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,rmnet-ipa.txt
> @@ -0,0 +1,15 @@
> +Qualcomm IPA RMNet Driver
> +
> +This binding describes the IPA RMNet driver, which is used to
> +represent virtual interfaces available on the modem accessed via
> +the IPA.  Other than the compatible string there are no properties
> +associated with this device.

Only a compatible string is a sure sign this is not a h/w device and
you are just abusing DT to instantiate drivers. Make the IPA driver
instantiate any sub drivers it needs.

Rob

^ permalink raw reply

* Re: [RFC PATCH 04/12] soc: qcom: ipa: immediate commands
From: Arnd Bergmann @ 2018-11-07 14:36 UTC (permalink / raw)
  To: Alex Elder
  Cc: David Miller, Bjorn Andersson, Ilias Apalodimas, Networking, DTML,
	linux-arm-msm, linux-soc, Linux ARM, Linux Kernel Mailing List,
	syadagir, mjavid, Rob Herring, Mark Rutland
In-Reply-To: <20181107003250.5832-5-elder@linaro.org>

On Wed, Nov 7, 2018 at 1:33 AM Alex Elder <elder@linaro.org> wrote:
>
> +/**
> + * struct ipahal_context - HAL global context data
> + * @empty_fltrt_tbl:   Empty table to be used for table initialization
> + */
> +static struct ipahal_context {
> +       struct ipa_dma_mem empty_fltrt_tbl;
> +} ipahal_ctx_struct;
> +static struct ipahal_context *ipahal_ctx = &ipahal_ctx_struct;

Remove the global variables here

> +/* Immediate commands H/W structures */
> +
> +/* struct ipa_imm_cmd_hw_ip_fltrt_init - IP_V*_FILTER_INIT/IP_V*_ROUTING_INIT
> + * command payload in H/W format.
> + * Inits IPv4/v6 routing or filter block.
> + * @hash_rules_addr: Addr in system mem where hashable flt/rt rules starts
> + * @hash_rules_size: Size in bytes of the hashable tbl to cpy to local mem
> + * @hash_local_addr: Addr in shared mem where hashable flt/rt tbl should
> + *  be copied to
> + * @nhash_rules_size: Size in bytes of the non-hashable tbl to cpy to local mem
> + * @nhash_local_addr: Addr in shared mem where non-hashable flt/rt tbl should
> + *  be copied to
> + * @rsvd: reserved
> + * @nhash_rules_addr: Addr in sys mem where non-hashable flt/rt tbl starts
> + */
> +struct ipa_imm_cmd_hw_ip_fltrt_init {
> +       u64 hash_rules_addr;
> +       u64 hash_rules_size     : 12,
> +           hash_local_addr     : 16,
> +           nhash_rules_size    : 12,
> +           nhash_local_addr    : 16,
> +           rsvd                : 8;
> +       u64 nhash_rules_addr;
> +};

In hardware structures, you should not use bit fields, as the ordering
of the bits is not well-defined in C. The only portable way to do this
is to use shifts and masks unfortunately.

> +struct ipa_imm_cmd_hw_hdr_init_local {
> +       u64 hdr_table_addr;
> +       u32 size_hdr_table      : 12,
> +           hdr_addr            : 16,
> +           rsvd                : 4;
> +};

I would also add a 'u32 pad' member at the end to make the padding
explicit here, or mark the first member as '__aligned(4) __packed'
if you want to avoid the padding.

> +void *ipahal_dma_shared_mem_write_pyld(struct ipa_dma_mem *mem, u32 offset)
> +{
> +       struct ipa_imm_cmd_hw_dma_shared_mem *data;
> +
> +       ipa_assert(mem->size < 1 << 16);        /* size is 16 bits wide */
> +       ipa_assert(offset < 1 << 16);           /* local_addr is 16 bits wide */
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return NULL;
> +
> +       data->size = mem->size;
> +       data->local_addr = offset;
> +       data->direction = 0;    /* 0 = write to IPA; 1 = read from IPA */
> +       data->skip_pipeline_clear = 0;
> +       data->pipeline_clear_options = IPAHAL_HPS_CLEAR;
> +       data->system_addr = mem->phys;
> +
> +       return data;
> +}

The 'void *' return looks odd here, and also the dynamic allocation.
It looks to me like all these functions could be better done the
other way round, basically putting the
ipa_imm_cmd_hw_dma_shared_mem etc structures on the stack
of the caller. At least for this one, the dynamic allocation
doesn't help at all because the caller is the same that
frees it again after the command. I suspect the same is
true for a lot of those commands.

       Arnd

^ permalink raw reply

* Re: [RFC PATCH 09/12] soc: qcom: ipa: main IPA source file
From: Arnd Bergmann @ 2018-11-07 14:08 UTC (permalink / raw)
  To: Alex Elder
  Cc: David Miller, Bjorn Andersson, Ilias Apalodimas, Networking, DTML,
	linux-arm-msm, linux-soc, Linux ARM, Linux Kernel Mailing List,
	syadagir, mjavid, Rob Herring, Mark Rutland
In-Reply-To: <20181107003250.5832-10-elder@linaro.org>

On Wed, Nov 7, 2018 at 1:33 AM Alex Elder <elder@linaro.org> wrote:

> +static void ipa_client_remove_deferred(struct work_struct *work);

Try to avoid forward declarations by reordering the code in call order,
it will also make it easier to read.

> +static DECLARE_WORK(ipa_client_remove_work, ipa_client_remove_deferred);
> +
> +static struct ipa_context ipa_ctx_struct;
> +struct ipa_context *ipa_ctx = &ipa_ctx_struct;

Global state variables should generally be removed as well, and
passed around as function arguments.

> +static int hdr_init_local_cmd(u32 offset, u32 size)
> +{
> +       struct ipa_desc desc = { };
> +       struct ipa_dma_mem mem;
> +       void *payload;
> +       int ret;
> +
> +       if (ipa_dma_alloc(&mem, size, GFP_KERNEL))
> +               return -ENOMEM;
> +
> +       offset += ipa_ctx->smem_offset;
> +
> +       payload = ipahal_hdr_init_local_pyld(&mem, offset);
> +       if (!payload) {
> +               ret = -ENOMEM;
> +               goto err_dma_free;
> +       }
> +
> +       desc.type = IPA_IMM_CMD_DESC;
> +       desc.len_opcode = IPA_IMM_CMD_HDR_INIT_LOCAL;
> +       desc.payload = payload;
> +
> +       ret = ipa_send_cmd(&desc);

You have a bunch of dynamic allocations in here, which you
then immediately tear down again after the command is complete.
I can't see at all what you do with the DMA address, since you
seem to not use the virtual address at all but only store
the physical address in some kind of descriptor without ever
writing to it.

Am I missing something here?

> +/* Remoteproc callbacks for SSR events: prepare, start, stop, unprepare */
> +int ipa_ssr_prepare(struct rproc_subdev *subdev)
> +{
> +       printk("======== SSR prepare received ========\n");

I think you mean dev_dbg() here. A plain printk() without a level
is not correct and we probably don't want those messages to arrive
on the console for normal users.

> +static int ipa_firmware_load(struct de
> +
> +err_clear_dev:
> +       ipa_ctx->lan_cons_ep_id = 0;
> +       ipa_ctx->cmd_prod_ep_id = 0;
> +       ipahal_exit();
> +err_dma_exit:
> +       ipa_dma_exit();
> +err_clear_gsi:
> +       ipa_ctx->gsi = NULL;
> +       ipa_ctx->ipa_phys = 0;
> +       ipa_reg_exit();
> +err_clear_ipa_irq:
> +       ipa_ctx->ipa_irq = 0;
> +err_clear_filter_bitmap:
> +       ipa_ctx->filter_bitmap = 0;
> +err_interconnect_exit:
> +       ipa_interconnect_exit();
> +err_clock_exit:
> +       ipa_clock_exit();
> +       ipa_ctx->dev = NULL;
> +out_smp2p_exit:
> +       ipa_smp2p_exit(dev);
> +

No need to initialize members to zero when you are about
to free the structure.

> +static struct platform_driver ipa_plat_drv = {
> +       .probe = ipa_plat_drv_probe,
> +       .remove = ipa_plat_drv_remove,
> +       .driver = {
> +               .name = "ipa",
> +               .owner = THIS_MODULE,
> +               .pm = &ipa_pm_ops,
> +               .of_match_table = ipa_plat_drv_match,
> +       },
> +};
> +
> +builtin_platform_driver(ipa_plat_drv);

This should be module_platform_driver(), and allow unloading
the driver.

        Arnd

^ permalink raw reply

* Re: [PATCH net-next] tun: compute the RFS hash only if needed.
From: Jason Wang @ 2018-11-07 13:33 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller
In-Reply-To: <12347537bd5fd8b1176ca62ddf51ea9080fe1b41.1541436099.git.pabeni@redhat.com>


On 2018/11/7 下午5:34, Paolo Abeni wrote:
> The tun XDP sendmsg code path, unconditionally computes the symmetric
> hash of each packet for RFS's sake, even when we could skip it. e.g.
> when the device has a single queue.
>
> This change adds the check already in-place for the skb sendmsg path
> to avoid unneeded hashing.
>
> The above gives small, but measurable, performance gain for VM xmit
> path when zerocopy is not enabled.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   drivers/net/tun.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 060135ceaf0e..a65779c6d72f 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2448,7 +2448,8 @@ static int tun_xdp_one(struct tun_struct *tun,
>   			goto out;
>   	}
>   
> -	if (!rcu_dereference(tun->steering_prog))
> +	if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
> +	    !tfile->detached)
>   		rxhash = __skb_get_hash_symmetric(skb);
>   
>   	netif_receive_skb(skb);


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

^ permalink raw reply

* Re: [PATCH 2/5] VSOCK: support fill data to mergeable rx buffer in host
From: Jason Wang @ 2018-11-07 13:32 UTC (permalink / raw)
  To: jiangyiwen, stefanha, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE2903C.50608@huawei.com>


On 2018/11/7 下午3:11, jiangyiwen wrote:
> On 2018/11/7 14:18, Jason Wang wrote:
>> On 2018/11/6 下午2:30, jiangyiwen wrote:
>>>> Seems duplicated with the one used by vhost-net.
>>>>
>>>> In packed virtqueue implementation, I plan to move this to vhost.c.
>>>>
>>> Yes, this code is full copied from vhost-net, if it can be packed into
>>> vhost.c, it would be great.
>>>
>> If you try to reuse vhost-net, you don't even need to care about this:)
>>
>> Thanks
>>
>>
>> .
>>
> Hi Jason,
>
> Thank your advice, I will consider your idea. But I don't know
> what's stefan's suggestion? It seems that he doesn't care much
> about this community.:(


I think not. He is probably busy these days.


>
> I still hope this community can have some vitality.
>

Let's wait for few more days for the possible comments from Stefan or 
Michael. But I do prefer to unify the virtio networking datapath which 
will be easier to be extended and maintained.

Thanks

^ permalink raw reply

* Re: [PATCH 3/5] VSOCK: support receive mergeable rx buffer in guest
From: Jason Wang @ 2018-11-07 13:29 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE28F37.1030208@huawei.com>


On 2018/11/7 下午3:07, jiangyiwen wrote:
> On 2018/11/7 14:20, Jason Wang wrote:
>> On 2018/11/6 下午2:41, jiangyiwen wrote:
>>> On 2018/11/6 12:00, Jason Wang wrote:
>>>> On 2018/11/5 下午3:47, jiangyiwen wrote:
>>>>> Guest receive mergeable rx buffer, it can merge
>>>>> scatter rx buffer into a big buffer and then copy
>>>>> to user space.
>>>>>
>>>>> Signed-off-by: Yiwen Jiang<jiangyiwen@huawei.com>
>>>>> ---
>>>>>     include/linux/virtio_vsock.h            |  9 ++++
>>>>>     net/vmw_vsock/virtio_transport.c        | 75 +++++++++++++++++++++++++++++----
>>>>>     net/vmw_vsock/virtio_transport_common.c | 59 ++++++++++++++++++++++----
>>>>>     3 files changed, 127 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>>>> index da9e1fe..6be3cd7 100644
>>>>> --- a/include/linux/virtio_vsock.h
>>>>> +++ b/include/linux/virtio_vsock.h
>>>>> @@ -13,6 +13,8 @@
>>>>>     #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE    (1024 * 4)
>>>>>     #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>>>>     #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>>>>> +/* virtio_vsock_pkt + max_pkt_len(default MAX_PKT_BUF_SIZE) */
>>>>> +#define VIRTIO_VSOCK_MAX_MRG_BUF_NUM ((VIRTIO_VSOCK_MAX_PKT_BUF_SIZE / PAGE_SIZE) + 1)
>>>>>
>>>>>     /* Virtio-vsock feature */
>>>>>     #define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
>>>>> @@ -48,6 +50,11 @@ struct virtio_vsock_sock {
>>>>>         struct list_head rx_queue;
>>>>>     };
>>>>>
>>>>> +struct virtio_vsock_mrg_rxbuf {
>>>>> +    void *buf;
>>>>> +    u32 len;
>>>>> +};
>>>>> +
>>>>>     struct virtio_vsock_pkt {
>>>>>         struct virtio_vsock_hdr    hdr;
>>>>>         struct virtio_vsock_mrg_rxbuf_hdr mrg_rxbuf_hdr;
>>>>> @@ -59,6 +66,8 @@ struct virtio_vsock_pkt {
>>>>>         u32 len;
>>>>>         u32 off;
>>>>>         bool reply;
>>>>> +    bool mergeable;
>>>>> +    struct virtio_vsock_mrg_rxbuf mrg_rxbuf[VIRTIO_VSOCK_MAX_MRG_BUF_NUM];
>>>>>     };
>>>> It's better to use iov here I think, and drop buf completely.
>>>>
>>>> And this is better to be done in an independent patch.
>>>>
>>> You're right, I can use kvec instead of customized structure,
>>> in addition, I don't understand about drop buf completely and
>>> an independent patch.
>> I mean there a void *buf in struct virtio_vsock_pkt. You can drop it and switch to use iov(iter) or other data structure that supports sg.
>>
>> Thanks
>>
> Yes, I understand your idea, I don't want to modify tx process method, so I
> keep the buf.
>

I'm afraid this will end of codes that is hard to be maintained. Let's 
try to unify them.

Thanks

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
From: Jesper Dangaard Brouer @ 2018-11-07 12:40 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, oss-drivers, brouer
In-Reply-To: <1541593725-27703-1-git-send-email-quentin.monnet@netronome.com>

On Wed,  7 Nov 2018 12:28:45 +0000
Quentin Monnet <quentin.monnet@netronome.com> wrote:

> libbpf is now able to load successfully test_l4lb_noinline.o and
> samples/bpf/tracex3_kern.o.
> 
> For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
> and remove the associated "TODO".
> 
> For tracex3_kern.o, instead of loading a program from samples/bpf/ that
> might not have been compiled at this stage, try loading a program from
> BPF selftests. Since this test case is about loading a program compiled
> without the "-target bpf" flag, change the Makefile to compile one
> program accordingly (instead of passing the flag for compiling all
> programs).
> 
> Regarding test_xdp_noinline.o: in its current shape the program fails to
> load because it provides no version section, but the loader needs one.
> The test was added to make sure that libbpf could load XDP programs even
> if they do not provide a version number in a dedicated section. But
> libbpf is already capable of doing that: in our case loading fails
> because the loader does not know that this is an XDP program (it does
> not need to, since it does not attach the program). So trying to load
> test_xdp_noinline.o does not bring much here: just delete this subtest.
> 
> For the record, the error message obtained with tracex3_kern.o was
> fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF
> objects containing .eh_frames")
> 
> I have not been abled to reproduce the "libbpf: incorrect bpf_call
> opcode" error for test_l4lb_noinline.o, even with the version of libbpf
> present at the time when test_libbpf.sh and test_libbpf_open.c were
> created.
> 
> RFC -> v1:
> - Compile test_xdp without the "-target bpf" flag, and try to load it
>   instead of ../../samples/bpf/tracex3_kern.o.
> - Delete test_xdp_noinline.o subtest.
> 
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thanks for following up Quentin :-)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH bpf-next] tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps
From: Quentin Monnet @ 2018-11-07 12:29 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: netdev, oss-drivers, Quentin Monnet

The limit for memory locked in the kernel by a process is usually set to
64 bytes by default. This can be an issue when creating large BPF maps
and/or loading many programs. A workaround is to raise this limit for
the current process before trying to create a new BPF map. Changing the
hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
root user (for non-root users, a call to setrlimit fails (and sets
errno) and the program simply goes on with its rlimit unchanged).

There is no API to get the current amount of memory locked for a user,
therefore we cannot raise the limit only when required. One solution,
used by bcc, is to try to create the map, and on getting a EPERM error,
raising the limit to infinity before giving another try. Another
approach, used in iproute2, is to raise the limit in all cases, before
trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load programs or to create maps with bpftool.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/common.c | 8 ++++++++
 tools/bpf/bpftool/main.h   | 2 ++
 tools/bpf/bpftool/map.c    | 2 ++
 tools/bpf/bpftool/prog.c   | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 25af85304ebe..1149565be4b1 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -46,6 +46,7 @@
 #include <linux/magic.h>
 #include <net/if.h>
 #include <sys/mount.h>
+#include <sys/resource.h>
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <sys/vfs.h>
@@ -99,6 +100,13 @@ static bool is_bpffs(char *path)
 	return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
 }
 
+void set_max_rlimit(void)
+{
+	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+	setrlimit(RLIMIT_MEMLOCK, &rinf);
+}
+
 static int mnt_bpffs(const char *target, char *buff, size_t bufflen)
 {
 	bool bind_done = false;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 28322ace2856..14857c273bf6 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -100,6 +100,8 @@ bool is_prefix(const char *pfx, const char *str);
 void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep);
 void usage(void) __noreturn;
 
+void set_max_rlimit(void);
+
 struct pinned_obj_table {
 	DECLARE_HASHTABLE(table, 16);
 };
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7bf38f0e152e..101b8a881225 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1140,6 +1140,8 @@ static int do_create(int argc, char **argv)
 		return -1;
 	}
 
+	set_max_rlimit();
+
 	fd = bpf_create_map_xattr(&attr);
 	if (fd < 0) {
 		p_err("map create failed: %s", strerror(errno));
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 5302ee282409..b9b84553bec4 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -995,6 +995,8 @@ static int do_load(int argc, char **argv)
 		goto err_close_obj;
 	}
 
+	set_max_rlimit();
+
 	err = bpf_object__load(obj);
 	if (err) {
 		p_err("failed to load object file");
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
From: Quentin Monnet @ 2018-11-07 12:28 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, oss-drivers, Quentin Monnet, Jesper Dangaard Brouer

libbpf is now able to load successfully test_l4lb_noinline.o and
samples/bpf/tracex3_kern.o.

For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
and remove the associated "TODO".

For tracex3_kern.o, instead of loading a program from samples/bpf/ that
might not have been compiled at this stage, try loading a program from
BPF selftests. Since this test case is about loading a program compiled
without the "-target bpf" flag, change the Makefile to compile one
program accordingly (instead of passing the flag for compiling all
programs).

Regarding test_xdp_noinline.o: in its current shape the program fails to
load because it provides no version section, but the loader needs one.
The test was added to make sure that libbpf could load XDP programs even
if they do not provide a version number in a dedicated section. But
libbpf is already capable of doing that: in our case loading fails
because the loader does not know that this is an XDP program (it does
not need to, since it does not attach the program). So trying to load
test_xdp_noinline.o does not bring much here: just delete this subtest.

For the record, the error message obtained with tracex3_kern.o was
fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF
objects containing .eh_frames")

I have not been abled to reproduce the "libbpf: incorrect bpf_call
opcode" error for test_l4lb_noinline.o, even with the version of libbpf
present at the time when test_libbpf.sh and test_libbpf_open.c were
created.

RFC -> v1:
- Compile test_xdp without the "-target bpf" flag, and try to load it
  instead of ../../samples/bpf/tracex3_kern.o.
- Delete test_xdp_noinline.o subtest.

Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/testing/selftests/bpf/Makefile       | 10 ++++++++++
 tools/testing/selftests/bpf/test_libbpf.sh | 14 ++++----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e39dfb4e7970..ecd79b7fb107 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -135,6 +135,16 @@ endif
 endif
 endif
 
+# Have one program compiled without "-target bpf" to test whether libbpf loads
+# it successfully
+$(OUTPUT)/test_xdp.o: test_xdp.c
+	$(CLANG) $(CLANG_FLAGS) \
+		-O2 -emit-llvm -c $< -o - | \
+	$(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+	$(BTF_PAHOLE) -J $@
+endif
+
 $(OUTPUT)/%.o: %.c
 	$(CLANG) $(CLANG_FLAGS) \
 		 -O2 -target bpf -emit-llvm -c $< -o - |      \
diff --git a/tools/testing/selftests/bpf/test_libbpf.sh b/tools/testing/selftests/bpf/test_libbpf.sh
index 156d89f1edcc..2989b2e2d856 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -33,17 +33,11 @@ trap exit_handler 0 2 3 6 9
 
 libbpf_open_file test_l4lb.o
 
-# TODO: fix libbpf to load noinline functions
-# [warning] libbpf: incorrect bpf_call opcode
-#libbpf_open_file test_l4lb_noinline.o
+# Load a program with BPF-to-BPF calls
+libbpf_open_file test_l4lb_noinline.o
 
-# TODO: fix test_xdp_meta.c to load with libbpf
-# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
-#libbpf_open_file test_xdp_meta.o
-
-# TODO: fix libbpf to handle .eh_frame
-# [warning] libbpf: relocation failed: no section(10)
-#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+# Load a program compiled without the "-target bpf" flag
+libbpf_open_file test_xdp.o
 
 # Success
 exit 0
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] xfrm: Fix bucket count reported to userspace
From: Steffen Klassert @ 2018-11-07 12:12 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Jamal Hadi Salim, Herbert Xu, David S. Miller, netdev
In-Reply-To: <20181105080053.3778-1-bpoirier@suse.com>

On Mon, Nov 05, 2018 at 05:00:53PM +0900, Benjamin Poirier wrote:
> sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the
> hash mask.
> 
> Fixes: 28d8909bc790 ("[XFRM]: Export SAD info.")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Patch applied, thanks!

^ permalink raw reply

* [PATCH 4/4] FDDI: defza: Make the driver version string constant
From: Maciej W. Rozycki @ 2018-11-07 12:07 UTC (permalink / raw)
  To: netdev
In-Reply-To: <alpine.LFD.2.21.1811070323450.20378@eddie.linux-mips.org>

The driver version string is obviously not meant to be changed at run 
time, so mark it `const'.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
 drivers/net/fddi/defza.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

linux-defza-version-static-fix.patch
Index: linux-20181104-4maxp64/drivers/net/fddi/defza.c
===================================================================
--- linux-20181104-4maxp64.orig/drivers/net/fddi/defza.c
+++ linux-20181104-4maxp64/drivers/net/fddi/defza.c
@@ -56,7 +56,7 @@
 #define DRV_VERSION "v.1.1.4"
 #define DRV_RELDATE "Oct  6 2018"
 
-static char version[] =
+static const char version[] =
 	DRV_NAME ": " DRV_VERSION "  " DRV_RELDATE "  Maciej W. Rozycki\n";
 
 MODULE_AUTHOR("Maciej W. Rozycki <macro@linux-mips.org>");

^ permalink raw reply

* [PATCH 2/4] FDDI: defza: Add missing comment closing
From: Maciej W. Rozycki @ 2018-11-07 12:06 UTC (permalink / raw)
  To: netdev
In-Reply-To: <alpine.LFD.2.21.1811070323450.20378@eddie.linux-mips.org>

Fix:

drivers/net/fddi/defza.h:238:1: warning: "/*" within comment [-Wcomment]

by adding a missing comment closing.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
 drivers/net/fddi/defza.h |    1 +
 1 file changed, 1 insertion(+)

linux-defza-cmd-size-comment-fix.diff
Index: linux-20181104-4maxp64/drivers/net/fddi/defza.h
===================================================================
--- linux-20181104-4maxp64.orig/drivers/net/fddi/defza.h
+++ linux-20181104-4maxp64/drivers/net/fddi/defza.h
@@ -235,6 +235,7 @@ struct fza_ring_cmd {
 #define FZA_RING_CMD		0x200400	/* command ring address */
 #define FZA_RING_CMD_SIZE	0x40		/* command descriptor ring
 						 * size
+						 */
 /* Command constants. */
 #define FZA_RING_CMD_MASK	0x7fffffff
 #define FZA_RING_CMD_NOP	0x00000000	/* nop */

^ permalink raw reply

* [PATCH 1/4] FDDI: defza: Fix SPDX annotation
From: Maciej W. Rozycki @ 2018-11-07 12:06 UTC (permalink / raw)
  To: netdev
In-Reply-To: <alpine.LFD.2.21.1811070323450.20378@eddie.linux-mips.org>

The SPDX annotation for this driver does not match the license text, 
which specifies GNU GPL 2 or later.  Make the two match by correcting 
the SPDX tag.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
 drivers/net/fddi/defza.c |    2 +-
 drivers/net/fddi/defza.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

linux-defza-spdx-fix.diff
Index: linux-20181104-4maxp64/drivers/net/fddi/defza.c
===================================================================
--- linux-20181104-4maxp64.orig/drivers/net/fddi/defza.c
+++ linux-20181104-4maxp64/drivers/net/fddi/defza.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier: GPL-2.0
+// SPDX-License-Identifier: GPL-2.0+
 /*	FDDI network adapter driver for DEC FDDIcontroller 700/700-C devices.
  *
  *	Copyright (c) 2018  Maciej W. Rozycki
Index: linux-20181104-4maxp64/drivers/net/fddi/defza.h
===================================================================
--- linux-20181104-4maxp64.orig/drivers/net/fddi/defza.h
+++ linux-20181104-4maxp64/drivers/net/fddi/defza.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0+ */
 /*	FDDI network adapter driver for DEC FDDIcontroller 700/700-C devices.
  *
  *	Copyright (c) 2018  Maciej W. Rozycki

^ permalink raw reply

* [PATCH 0/4] FDDI: defza: Fix a bunch of small issues
From: Maciej W. Rozycki @ 2018-11-07 12:06 UTC (permalink / raw)
  To: netdev

Hi,

 Here is a bunch of small fixes addressing issues that I missed in my 
final round of testing.  None of these affect run-time behaviour.  One was 
actually found by the kbuild bot, which turned out to be more pedantic 
than my compiler.  See individual change descriptions for details.

 Please apply.

  Maciej

^ permalink raw reply

* [PATCH 3/4] FDDI: defza: Move SMT Tx data buffer declaration next to its skb
From: Maciej W. Rozycki @ 2018-11-07 12:07 UTC (permalink / raw)
  To: netdev
In-Reply-To: <alpine.LFD.2.21.1811070323450.20378@eddie.linux-mips.org>

Move the temporary data buffer used when tapping into the SMT Tx queue 
from the outer function level into the conditional block it's actually 
used in and its containing skb is also declared, making the structure of 
code better.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
---
Hi,

 This was also present, though not further complained about in kbuild bot 
output:

drivers/net/fddi/defza.c:787:45: warning: unused variable 'skb_data_ptr' [-Wunused-variable]

because it ran on a tree revision as at commit 61414f5ec983 ("FDDI: defza: 
Add support for DEC FDDIcontroller 700 TURBOchannel adapter") and 
therefore without commit 9f9a742db40f ("FDDI: defza: Support capturing 
outgoing SMT traffic"), indicating that the buffer should have been 
declared in the containing block rather than at the function level, 
especially as the skb it comes from is also declared within that block.

  Maciej
---
 drivers/net/fddi/defza.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

linux-defza-skb-data-ptr-fix.diff
Index: linux-20181104-4maxp64/drivers/net/fddi/defza.c
===================================================================
--- linux-20181104-4maxp64.orig/drivers/net/fddi/defza.c
+++ linux-20181104-4maxp64/drivers/net/fddi/defza.c
@@ -784,7 +784,7 @@ static void fza_rx(struct net_device *de
 static void fza_tx_smt(struct net_device *dev)
 {
 	struct fza_private *fp = netdev_priv(dev);
-	struct fza_buffer_tx __iomem *smt_tx_ptr, *skb_data_ptr;
+	struct fza_buffer_tx __iomem *smt_tx_ptr;
 	int i, len;
 	u32 own;
 
@@ -799,6 +799,7 @@ static void fza_tx_smt(struct net_device
 
 		if (!netif_queue_stopped(dev)) {
 			if (dev_nit_active(dev)) {
+				struct fza_buffer_tx *skb_data_ptr;
 				struct sk_buff *skb;
 
 				/* Length must be a multiple of 4 as only word

^ permalink raw reply

* Re: [RFC PATCH 01/12] dt-bindings: soc: qcom: add IPA bindings
From: Arnd Bergmann @ 2018-11-07 11:50 UTC (permalink / raw)
  To: Alex Elder
  Cc: Rob Herring, Mark Rutland, David Miller, Bjorn Andersson,
	Ilias Apalodimas, Networking, DTML, linux-arm-msm, linux-soc,
	Linux ARM, Linux Kernel Mailing List, syadagir, mjavid
In-Reply-To: <20181107003250.5832-2-elder@linaro.org>

On Wed, Nov 7, 2018 at 1:33 AM Alex Elder <elder@linaro.org> wrote:
>
> Add the binding definitions for the "qcom,ipa" and "qcom,rmnet-ipa"
> device tree nodes.
>
> Signed-off-by: Alex Elder <elder@linaro.org>
> ---
>  .../devicetree/bindings/soc/qcom/qcom,ipa.txt | 136 ++++++++++++++++++
>  .../bindings/soc/qcom/qcom,rmnet-ipa.txt      |  15 ++
>  2 files changed, 151 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt
>  create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,rmnet-ipa.txt

I think this should go into bindings/net instead of bindings/soc, since it's
mostly about networking rather than a specific detail of managing the SoC
itself.

> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt
> new file mode 100644
> index 000000000000..d4d3d37df029
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,ipa.txt
> @@ -0,0 +1,136 @@
> +Qualcomm IPA (IP Accelerator) Driver
> +
> +This binding describes the Qualcomm IPA.  The IPA is capable of offloading
> +certain network processing tasks (e.g. filtering, routing, and NAT) from
> +the main processor.  The IPA currently serves only as a network interface,
> +providing access to an LTE network available via a modem.

That doesn't belong into the binding. Say what the hardware can do here,
not what a specific implementation of the driver does at this moment.
The binding should be written in an OS independent way after all.

> +- interrupts-extended:
> +       Specifies the IRQs used by the IPA.  Four cells are required,
> +       specifying: the IPA IRQ; the GSI IRQ; the clock query interrupt
> +       from the modem; and the "ready for stage 2 initialization"
> +       interrupt from the modem.  The first two are hardware IRQs; the
> +       third and fourth are SMP2P input interrupts.

You mean 'four interrupts', not 'four cells' -- each interrupt specifier
already consists of at least two cells (one for the phandle to the
irqchip, plus one or more cells to describe that interrupt).

> +- interconnects:
> +       Specifies the interconnects used by the IPA.  Three cells are
> +       required, specifying:  the path from the IPA to memory; from
> +       IPA to internal (SoC resident) memory; and between the AP
> +       subsystem and IPA for register access.

Same here and in the rest.

      Arnd

^ permalink raw reply

* [PATCH net-next 10/10] selftests: add functionals test for UDP GRO
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

rfc v3 -> v1:
 - use ip route to attach the xdp helper to the veth

rfc v2 -> rfc v3:
 - add missing test program options documentation
 - fix sporatic test failures (receiver faster than sender)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/udpgro.sh         | 148 ++++++++++++++++++
 tools/testing/selftests/net/udpgro_bench.sh   |   8 +-
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 tools/testing/selftests/net/udpgso_bench_rx.c | 123 +++++++++++++--
 tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
 6 files changed, 282 insertions(+), 23 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index b1bba2f60467..eec359895feb 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,7 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
-TEST_PROGS += udpgro_bench.sh
+TEST_PROGS += udpgro_bench.sh udpgro.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh
new file mode 100755
index 000000000000..e94ef8067173
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro.sh
@@ -0,0 +1,148 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro functional tests.
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -1 ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+cfg_veth() {
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+	ip -n "${PEER_NS}" link set veth1 xdp object ../bpf/xdp_dummy.o section xdp_dummy
+}
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	cfg_veth
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} && \
+		echo "ok" || \
+		echo "failed" &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	wait $(jobs -p)
+}
+
+run_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess $2 rx -G -r $3
+}
+
+run_one_nat() {
+	# use 'rx' as separator between sender args and receiver args
+	local addr1 addr2 pid family="" ipt_cmd=ip6tables
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	if [[ ${tx_args} = *-4* ]]; then
+		ipt_cmd=iptables
+		family=-4
+		addr1=192.168.1.1
+		addr2=192.168.1.3/24
+	else
+		addr1=2001:db8::1
+		addr2="2001:db8::3/64 nodad"
+	fi
+
+	cfg_veth
+	ip -netns "${PEER_NS}" addr add dev veth1 ${addr2}
+
+	# fool the GRO engine changing the destination address ...
+	ip netns exec "${PEER_NS}" $ipt_cmd -t nat -I PREROUTING -d ${addr1} -j DNAT --to-destination ${addr2%/*}
+
+	# ... so that GRO will match the UDP_GRO enabled socket, but packets
+	# will land on the 'plain' one
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -G ${family} -b ${addr1} -n 0 &
+	pid=$!
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${family} -b ${addr2%/*} ${rx_args} && \
+		echo "ok" || \
+		echo "failed"&
+
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	kill -INT $pid
+	wait $(jobs -p)
+}
+
+run_nat_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess_nat $2 rx -r $3
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_test "no GRO" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400"
+
+	# explicitly check we are not receiving UDP_SEGMENT cmsg (-S -1)
+	# when GRO does not take place
+	run_test "no GRO chk cmsg" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400 -S -1"
+
+	# the GSO packets are aggregated because:
+	# * veth schedule napi after each xmit
+	# * segmentation happens in BH context, veth napi poll is delayed after
+	#   the transmission of the last segment
+	run_test "GRO" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720"
+	run_test "GRO chk cmsg" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720 -S 1472"
+	run_test "GRO with custom segment size" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720"
+	run_test "GRO with custom segment size cmsg" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv4_args} -M 1 -s 14720 -S 0" "-n 10 -l 1472"
+
+	echo "ipv6"
+	run_test "no GRO" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400"
+	run_test "no GRO chk cmsg" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400 -S -1"
+	run_test "GRO" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520"
+	run_test "GRO chk cmsg" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520 -S 1452"
+	run_test "GRO with custom segment size" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520"
+	run_test "GRO with custom segment size cmsg" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 10 -l 1452"
+}
+
+if [ ! -f ../bpf/xdp_dummy.o ]; then
+	echo "Missing xdp_dummy helper. Build bpf selftest first"
+	exit -1
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+elif [[ $1 == "__subprocess_nat" ]]; then
+	shift
+	run_one_nat $@
+fi
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
index 9ef4b1ee0f76..820bc50f6b68 100755
--- a/tools/testing/selftests/net/udpgro_bench.sh
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -18,7 +18,9 @@ run_one() {
 	# use 'rx' as separator between sender args and receiver args
 	local -r all="$@"
 	local -r tx_args=${all%rx*}
-	local -r rx_args=${all#*rx}
+	local rx_args=${all#*rx}
+
+	[[ "${tx_args}" == *"-4"* ]] && rx_args="${rx_args} -4"
 
 	ip netns add "${PEER_NS}"
 	ip -netns "${PEER_NS}" link set lo up
@@ -51,10 +53,10 @@ run_udp() {
 	local -r args=$@
 
 	echo "udp gso - over veth touching data"
-	run_in_netns ${args} -S rx
+	run_in_netns ${args} -S 0 rx
 
 	echo "udp gso and gro - over veth touching data"
-	run_in_netns ${args} -S rx -G
+	run_in_netns ${args} -S 0 rx -G
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh
index 99e537ab5ad9..0f0628613f81 100755
--- a/tools/testing/selftests/net/udpgso_bench.sh
+++ b/tools/testing/selftests/net/udpgso_bench.sh
@@ -34,7 +34,7 @@ run_udp() {
 	run_in_netns ${args}
 
 	echo "udp gso"
-	run_in_netns ${args} -S
+	run_in_netns ${args} -S 0
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 8f48d7fb32cf..0c960f673324 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -40,6 +40,12 @@ static bool cfg_tcp;
 static bool cfg_verify;
 static bool cfg_read_all;
 static bool cfg_gro_segment;
+static int  cfg_family		= PF_INET6;
+static int  cfg_alen 		= sizeof(struct sockaddr_in6);
+static int  cfg_expected_pkt_nr;
+static int  cfg_expected_pkt_len;
+static int  cfg_expected_gso_size;
+static struct sockaddr_storage cfg_bind_addr;
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -50,6 +56,29 @@ static void sigint_handler(int signum)
 		interrupted = true;
 }
 
+static void setup_sockaddr(int domain, const char *str_addr, void *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (domain) {
+	case PF_INET:
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(cfg_port);
+		if (inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+			error(1, 0, "ipv4 parse error: %s", str_addr);
+		break;
+	case PF_INET6:
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(cfg_port);
+		if (inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+			error(1, 0, "ipv6 parse error: %s", str_addr);
+		break;
+	default:
+		error(1, 0, "illegal domain");
+	}
+}
+
 static unsigned long gettimeofday_ms(void)
 {
 	struct timeval tv;
@@ -83,10 +112,9 @@ static void do_poll(int fd)
 
 static int do_socket(bool do_tcp)
 {
-	struct sockaddr_in6 addr = {0};
 	int fd, val;
 
-	fd = socket(PF_INET6, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
+	fd = socket(cfg_family, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
 	if (fd == -1)
 		error(1, errno, "socket");
 
@@ -97,10 +125,7 @@ static int do_socket(bool do_tcp)
 	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val)))
 		error(1, errno, "setsockopt reuseport");
 
-	addr.sin6_family =	PF_INET6;
-	addr.sin6_port =	htons(cfg_port);
-	addr.sin6_addr =	in6addr_any;
-	if (bind(fd, (void *) &addr, sizeof(addr)))
+	if (bind(fd, (void *)&cfg_bind_addr, cfg_alen))
 		error(1, errno, "bind");
 
 	if (do_tcp) {
@@ -174,52 +199,117 @@ static void do_verify_udp(const char *data, int len)
 	}
 }
 
+static int recv_msg(int fd, char *buf, int len, int *gso_size)
+{
+	char control[CMSG_SPACE(sizeof(uint16_t))] = {0};
+	struct msghdr msg = {0};
+	struct iovec iov = {0};
+	struct cmsghdr *cmsg;
+	uint16_t *gsosizeptr;
+	int ret;
+
+	iov.iov_base = buf;
+	iov.iov_len = len;
+
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+
+	msg.msg_control = control;
+	msg.msg_controllen = sizeof(control);
+
+	*gso_size = -1;
+	ret = recvmsg(fd, &msg, MSG_TRUNC | MSG_DONTWAIT);
+	if (ret != -1) {
+		for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL;
+		     cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+			if (cmsg->cmsg_level == SOL_UDP
+			    && cmsg->cmsg_type == UDP_GRO) {
+				gsosizeptr = (uint16_t *) CMSG_DATA(cmsg);
+				*gso_size = *gsosizeptr;
+				break;
+			}
+		}
+	}
+	return ret;
+}
+
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
 	static char rbuf[ETH_MAX_MTU];
-	int ret, len, budget = 256;
+	int ret, len, gso_size, budget = 256;
 
 	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
-		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		if (!cfg_expected_gso_size)
+			ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		else
+			ret = recv_msg(fd, rbuf, len, &gso_size);
 		if (ret == -1 && errno == EAGAIN)
-			return;
+			break;
 		if (ret == -1)
 			error(1, errno, "recv");
+		if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
+			error(1, 0, "recv: bad packet len, got %d,"
+			      " expected %d\n", ret, cfg_expected_pkt_len);
 		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
 			do_verify_udp(rbuf, ret);
 		}
+		if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+			error(1, 0, "recv: bad gso size, got %d, expected %d "
+			      "(-1 == no gso cmsg))\n", gso_size,
+			      cfg_expected_gso_size);
 
 		packets++;
 		bytes += ret;
+		if (cfg_expected_pkt_nr && packets >= cfg_expected_pkt_nr)
+			break;
 	}
 }
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-b addr] [-p port] [-l pktlen] [-n packetnr] [-S gsosize]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "Gp:rtv")) != -1) {
+	/* bind to any by default */
+	setup_sockaddr(PF_INET6, "::", &cfg_bind_addr);
+	while ((c = getopt(argc, argv, "4b:Gl:n:p:rS:tv")) != -1) {
 		switch (c) {
+		case '4':
+			cfg_family = PF_INET;
+			cfg_alen = sizeof(struct sockaddr_in);
+			setup_sockaddr(PF_INET, "0.0.0.0", &cfg_bind_addr);
+			break;
+		case 'b':
+			setup_sockaddr(cfg_family, optarg, &cfg_bind_addr);
+			break;
 		case 'G':
 			cfg_gro_segment = true;
 			break;
+		case 'l':
+			cfg_expected_pkt_len = strtoul(optarg, NULL, 0);
+			break;
+		case 'n':
+			cfg_expected_pkt_nr = strtoul(optarg, NULL, 0);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
 		case 'r':
 			cfg_read_all = true;
 			break;
+		case 'S':
+			cfg_expected_gso_size = strtol(optarg, NULL, 0);
+			break;
 		case 't':
 			cfg_tcp = true;
 			break;
@@ -240,7 +330,7 @@ static void parse_opts(int argc, char **argv)
 static void do_recv(void)
 {
 	unsigned long tnow, treport;
-	int fd;
+	int fd, loop = 0;
 
 	fd = do_socket(cfg_tcp);
 
@@ -252,6 +342,11 @@ static void do_recv(void)
 
 	treport = gettimeofday_ms() + 1000;
 	do {
+		/* force termination after the second poll(); this cope both
+		 * with sender slower than receiver and missing packet errors
+		 */
+		if (cfg_expected_pkt_nr && loop++)
+			interrupted = true;
 		do_poll(fd);
 
 		if (cfg_tcp)
@@ -272,6 +367,10 @@ static void do_recv(void)
 
 	} while (!interrupted);
 
+	if (cfg_expected_pkt_nr && (packets != cfg_expected_pkt_nr))
+		error(1, 0, "wrong packet number! got %ld, expected %d\n",
+		      packets, cfg_expected_pkt_nr);
+
 	if (close(fd))
 		error(1, errno, "close");
 }
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
index e821564053cf..4074538b5df5 100644
--- a/tools/testing/selftests/net/udpgso_bench_tx.c
+++ b/tools/testing/selftests/net/udpgso_bench_tx.c
@@ -52,6 +52,8 @@ static bool	cfg_segment;
 static bool	cfg_sendmmsg;
 static bool	cfg_tcp;
 static bool	cfg_zerocopy;
+static int	cfg_msg_nr;
+static uint16_t	cfg_gso_size;
 
 static socklen_t cfg_alen;
 static struct sockaddr_storage cfg_dst_addr;
@@ -205,14 +207,14 @@ static void send_udp_segment_cmsg(struct cmsghdr *cm)
 
 	cm->cmsg_level = SOL_UDP;
 	cm->cmsg_type = UDP_SEGMENT;
-	cm->cmsg_len = CMSG_LEN(sizeof(cfg_mss));
+	cm->cmsg_len = CMSG_LEN(sizeof(cfg_gso_size));
 	valp = (void *)CMSG_DATA(cm);
-	*valp = cfg_mss;
+	*valp = cfg_gso_size;
 }
 
 static int send_udp_segment(int fd, char *data)
 {
-	char control[CMSG_SPACE(sizeof(cfg_mss))] = {0};
+	char control[CMSG_SPACE(sizeof(cfg_gso_size))] = {0};
 	struct msghdr msg = {0};
 	struct iovec iov = {0};
 	int ret;
@@ -241,7 +243,7 @@ static int send_udp_segment(int fd, char *data)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
+	error(1, 0, "Usage: %s [-46cmtuz] [-C cpu] [-D dst ip] [-l secs] [-m messagenr] [-p port] [-s sendsize] [-S gsosize]",
 		    filepath);
 }
 
@@ -250,7 +252,7 @@ static void parse_opts(int argc, char **argv)
 	int max_len, hdrlen;
 	int c;
 
-	while ((c = getopt(argc, argv, "46cC:D:l:mp:s:Stuz")) != -1) {
+	while ((c = getopt(argc, argv, "46cC:D:l:mM:p:s:S:tuz")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -279,6 +281,9 @@ static void parse_opts(int argc, char **argv)
 		case 'm':
 			cfg_sendmmsg = true;
 			break;
+		case 'M':
+			cfg_msg_nr = strtoul(optarg, NULL, 10);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
@@ -286,6 +291,7 @@ static void parse_opts(int argc, char **argv)
 			cfg_payload_len = strtoul(optarg, NULL, 0);
 			break;
 		case 'S':
+			cfg_gso_size = strtoul(optarg, NULL, 0);
 			cfg_segment = true;
 			break;
 		case 't':
@@ -317,6 +323,8 @@ static void parse_opts(int argc, char **argv)
 
 	cfg_mss = ETH_DATA_LEN - hdrlen;
 	max_len = ETH_MAX_MTU - hdrlen;
+	if (!cfg_gso_size)
+		cfg_gso_size = cfg_mss;
 
 	if (cfg_payload_len > max_len)
 		error(1, 0, "payload length %u exceeds max %u",
@@ -392,10 +400,12 @@ int main(int argc, char **argv)
 		else
 			num_sends += send_udp(fd, buf[i]);
 		num_msgs++;
-
 		if (cfg_zerocopy && ((num_msgs & 0xF) == 0))
 			flush_zerocopy(fd);
 
+		if (cfg_msg_nr && num_msgs >= cfg_msg_nr)
+			break;
+
 		tnow = gettimeofday_ms();
 		if (tnow > treport) {
 			fprintf(stderr,
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 09/10] selftests: add some benchmark for UDP GRO
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

Run on top of veth pair, using a dummy XDP program to enable the GRO.

 rfc v3 -> v1:
  - use ip route to attach the xdp helper to the veth

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile        |  1 +
 tools/testing/selftests/net/udpgro_bench.sh | 93 +++++++++++++++++++++
 2 files changed, 94 insertions(+)
 create mode 100755 tools/testing/selftests/net/udpgro_bench.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 256d82d5fa87..b1bba2f60467 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,6 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
+TEST_PROGS += udpgro_bench.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
new file mode 100755
index 000000000000..9ef4b1ee0f76
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -0,0 +1,93 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro benchmarks
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -INT ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+
+	ip -n "${PEER_NS}" link set veth1 xdp object ../bpf/xdp_dummy.o section xdp_dummy
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r &
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -t ${rx_args} -r &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+}
+
+run_in_netns() {
+	local -r args=$@
+
+	./in_netns.sh $0 __subprocess ${args}
+}
+
+run_udp() {
+	local -r args=$@
+
+	echo "udp gso - over veth touching data"
+	run_in_netns ${args} -S rx
+
+	echo "udp gso and gro - over veth touching data"
+	run_in_netns ${args} -S rx -G
+}
+
+run_tcp() {
+	local -r args=$@
+
+	echo "tcp - over veth touching data"
+	run_in_netns ${args} -t rx
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv4_args}"
+
+	echo "ipv6"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv6_args}"
+}
+
+if [ ! -f ../bpf/xdp_dummy.o ]; then
+	echo "Missing xdp_dummy helper. Build bpf selftest first"
+	exit -1
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+else
+	run_in_netns $@
+fi
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 08/10] selftests: add dummy xdp test helper
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

This trivial XDP program does nothing, but will be used by the
next patch to test the GRO path in a net namespace, leveraging
the veth XDP implementation.

It's added here, despite its 'net' usage, to avoid the duplication
of the llc-related makefile boilerplate.

rfc v3 -> v1:
 - move the helper implementation into the bpf directory, don't
   touch udpgso_bench_rx

rfc v2 -> rfc v3:
 - move 'x' option handling here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/bpf/Makefile    |  3 ++-
 tools/testing/selftests/bpf/xdp_dummy.c | 13 +++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/xdp_dummy.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e39dfb4e7970..4a54236475ae 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -37,7 +37,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
 	get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
 	test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-	test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
+	test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o \
+	xdp_dummy.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/xdp_dummy.c b/tools/testing/selftests/bpf/xdp_dummy.c
new file mode 100644
index 000000000000..43b0ef1001ed
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_dummy.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define KBUILD_MODNAME "xdp_dummy"
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("xdp_dummy")
+int xdp_dummy_prog(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 07/10] selftests: add GRO support to udp bench rx program
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

And fix a couple of buglets (port option processing,
clean termination on SIGINT). This is preparatory work
for GRO tests.

rfc v2 -> rfc v3:
 - use ETH_MAX_MTU

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/udpgso_bench_rx.c | 37 +++++++++++++++----
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 727cf67a3f75..8f48d7fb32cf 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -31,9 +31,15 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#ifndef UDP_GRO
+#define UDP_GRO		104
+#endif
+
 static int  cfg_port		= 8000;
 static bool cfg_tcp;
 static bool cfg_verify;
+static bool cfg_read_all;
+static bool cfg_gro_segment;
 
 static bool interrupted;
 static unsigned long packets, bytes;
@@ -63,6 +69,8 @@ static void do_poll(int fd)
 
 	do {
 		ret = poll(&pfd, 1, 10);
+		if (interrupted)
+			break;
 		if (ret == -1)
 			error(1, errno, "poll");
 		if (ret == 0)
@@ -70,7 +78,7 @@ static void do_poll(int fd)
 		if (pfd.revents != POLLIN)
 			error(1, errno, "poll: 0x%x expected 0x%x\n",
 					pfd.revents, POLLIN);
-	} while (!ret && !interrupted);
+	} while (!ret);
 }
 
 static int do_socket(bool do_tcp)
@@ -102,6 +110,8 @@ static int do_socket(bool do_tcp)
 			error(1, errno, "listen");
 
 		do_poll(accept_fd);
+		if (interrupted)
+			exit(0);
 
 		fd = accept(accept_fd, NULL, NULL);
 		if (fd == -1)
@@ -167,10 +177,10 @@ static void do_verify_udp(const char *data, int len)
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
-	static char rbuf[ETH_DATA_LEN];
+	static char rbuf[ETH_MAX_MTU];
 	int ret, len, budget = 256;
 
-	len = cfg_verify ? sizeof(rbuf) : 0;
+	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
 		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
@@ -178,7 +188,7 @@ static void do_flush_udp(int fd)
 			return;
 		if (ret == -1)
 			error(1, errno, "recv");
-		if (len) {
+		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
@@ -192,23 +202,30 @@ static void do_flush_udp(int fd)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-tv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "ptv")) != -1) {
+	while ((c = getopt(argc, argv, "Gp:rtv")) != -1) {
 		switch (c) {
+		case 'G':
+			cfg_gro_segment = true;
+			break;
 		case 'p':
-			cfg_port = htons(strtoul(optarg, NULL, 0));
+			cfg_port = strtoul(optarg, NULL, 0);
+			break;
+		case 'r':
+			cfg_read_all = true;
 			break;
 		case 't':
 			cfg_tcp = true;
 			break;
 		case 'v':
 			cfg_verify = true;
+			cfg_read_all = true;
 			break;
 		}
 	}
@@ -227,6 +244,12 @@ static void do_recv(void)
 
 	fd = do_socket(cfg_tcp);
 
+	if (cfg_gro_segment && !cfg_tcp) {
+		int val = 1;
+		if (setsockopt(fd, IPPROTO_UDP, UDP_GRO, &val, sizeof(val)))
+			error(1, errno, "setsockopt UDP_GRO");
+	}
+
 	treport = gettimeofday_ms() + 1000;
 	do {
 		do_poll(fd);
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 06/10] udp: cope with UDP GRO packet misdirection
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

In some scenarios, the GRO engine can assemble an UDP GRO packet
that ultimately lands on a non GRO-enabled socket.
This patch tries to address the issue explicitly checking for the UDP
socket features before enqueuing the packet, and eventually segmenting
the unexpected GRO packet, as needed.

We must also cope with re-insertion requests: after segmentation the
UDP code calls the helper introduced by the previous patches, as needed.

Segmentation is performed by a common helper, which takes care of
updating socket and protocol stats is case of failure.

rfc v3 -> v1
 - fix compile issues with rxrpc
 - when gso_segment returns NULL, treat is as an error
 - added 'ipv4' argument to udp_rcv_segment()

rfc v2 -> rfc v3
 - moved udp_rcv_segment() into net/udp.h, account errors to socket
   and ns, always return NULL or segs list

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h |  6 ++++++
 include/net/udp.h   | 45 +++++++++++++++++++++++++++++++++++++--------
 net/ipv4/udp.c      | 23 ++++++++++++++++++++++-
 net/ipv6/udp.c      | 24 +++++++++++++++++++++++-
 4 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index e23d5024f42f..0a9c54e76305 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -132,6 +132,12 @@ static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
 	}
 }
 
+static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
+{
+	return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
+	       skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/include/net/udp.h b/include/net/udp.h
index 9e82cb391dea..90a2954962eb 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -406,17 +406,24 @@ static inline int copy_linear_skb(struct sk_buff *skb, int len, int off,
 } while(0)
 
 #if IS_ENABLED(CONFIG_IPV6)
-#define __UDPX_INC_STATS(sk, field)					\
-do {									\
-	if ((sk)->sk_family == AF_INET)					\
-		__UDP_INC_STATS(sock_net(sk), field, 0);		\
-	else								\
-		__UDP6_INC_STATS(sock_net(sk), field, 0);		\
-} while (0)
+#define __UDPX_MIB(sk, ipv4)						\
+({									\
+	ipv4 ? (IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :	\
+				 sock_net(sk)->mib.udp_statistics) :	\
+		(IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_stats_in6 :	\
+				 sock_net(sk)->mib.udp_stats_in6);	\
+})
 #else
-#define __UDPX_INC_STATS(sk, field) __UDP_INC_STATS(sock_net(sk), field, 0)
+#define __UDPX_MIB(sk, ipv4)						\
+({									\
+	IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :		\
+			 sock_net(sk)->mib.udp_statistics;		\
+})
 #endif
 
+#define __UDPX_INC_STATS(sk, field) \
+	__SNMP_INC_STATS(__UDPX_MIB(sk, (sk)->sk_family == AF_INET), field)
+
 #ifdef CONFIG_PROC_FS
 struct udp_seq_afinfo {
 	sa_family_t			family;
@@ -450,4 +457,26 @@ DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void);
 #endif
 
+static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
+					      struct sk_buff *skb, bool ipv4)
+{
+	struct sk_buff *segs;
+
+	/* the GSO CB lays after the UDP one, no need to save and restore any
+	 * CB fragment
+	 */
+	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
+	if (unlikely(IS_ERR_OR_NULL(segs))) {
+		int segs_nr = skb_shinfo(skb)->gso_segs;
+
+		atomic_add(segs_nr, &sk->sk_drops);
+		SNMP_ADD_STATS(__UDPX_MIB(sk, ipv4), UDP_MIB_INERRORS, segs_nr);
+		kfree_skb(skb);
+		return NULL;
+	}
+
+	consume_skb(skb);
+	return segs;
+}
+
 #endif	/* _UDP_H */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 725ece9d78af..73926352960c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1909,7 +1909,7 @@ EXPORT_SYMBOL(udp_encap_enable);
  * Note that in the success and error cases, the skb is assumed to
  * have either been requeued or freed.
  */
-static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
 	int is_udplite = IS_UDPLITE(sk);
@@ -2012,6 +2012,27 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return -1;
 }
 
+static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct sk_buff *next, *segs;
+	int ret;
+
+	if (likely(!udp_unexpected_gso(sk, skb)))
+		return udp_queue_rcv_one_skb(sk, skb);
+
+	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_SGO_CB_OFFSET);
+	__skb_push(skb, -skb_mac_offset(skb));
+	segs = udp_rcv_segment(sk, skb, true);
+	for (skb = segs; skb; skb = next) {
+		next = skb->next;
+		__skb_pull(skb, skb_transport_offset(skb));
+		ret = udp_queue_rcv_one_skb(sk, skb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(skb->dev), skb, -ret);
+	}
+	return 0;
+}
+
 /* For TCP sockets, sk_rx_dst is protected by socket lock
  * For UDP, we use xchg() to guard against concurrent changes.
  */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8e76e719305c..f7666e34fec9 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -558,7 +558,7 @@ void udpv6_encap_enable(void)
 }
 EXPORT_SYMBOL(udpv6_encap_enable);
 
-static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
 	int is_udplite = IS_UDPLITE(sk);
@@ -641,6 +641,28 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return -1;
 }
 
+static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct sk_buff *next, *segs;
+	int ret;
+
+	if (likely(!udp_unexpected_gso(sk, skb)))
+		return udpv6_queue_rcv_one_skb(sk, skb);
+
+	__skb_push(skb, -skb_mac_offset(skb));
+	segs = udp_rcv_segment(sk, skb, false);
+	for (skb = segs; skb; skb = next) {
+		next = skb->next;
+		__skb_pull(skb, skb_transport_offset(skb));
+
+		ret = udpv6_queue_rcv_one_skb(sk, skb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,
+						 true);
+	}
+	return 0;
+}
+
 static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 				   __be16 loc_port, const struct in6_addr *loc_addr,
 				   __be16 rmt_port, const struct in6_addr *rmt_addr,
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 05/10] ipv6: factor out protocol delivery helper
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

So that we can re-use it at the UDP level in the next patch

rfc v3 -> v1:
 - add the helper declaration into the ipv6 header

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/ipv6.h   |  2 ++
 net/ipv6/ip6_input.c | 28 ++++++++++++++++------------
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 829650540780..daf80863d3a5 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -975,6 +975,8 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb);
 int ip6_forward(struct sk_buff *skb);
 int ip6_input(struct sk_buff *skb);
 int ip6_mc_input(struct sk_buff *skb);
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
+			      bool have_final);
 
 int __ip6_local_out(struct net *net, struct sock *sk, struct sk_buff *skb);
 int ip6_local_out(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 96577e742afd..3065226bdc57 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -319,28 +319,26 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
 /*
  *	Deliver the packet to the host
  */
-
-
-static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
+			      bool have_final)
 {
 	const struct inet6_protocol *ipprot;
 	struct inet6_dev *idev;
 	unsigned int nhoff;
-	int nexthdr;
 	bool raw;
-	bool have_final = false;
 
 	/*
 	 *	Parse extension headers
 	 */
 
-	rcu_read_lock();
 resubmit:
 	idev = ip6_dst_idev(skb_dst(skb));
-	if (!pskb_pull(skb, skb_transport_offset(skb)))
-		goto discard;
 	nhoff = IP6CB(skb)->nhoff;
-	nexthdr = skb_network_header(skb)[nhoff];
+	if (!have_final) {
+		if (!pskb_pull(skb, skb_transport_offset(skb)))
+			goto discard;
+		nexthdr = skb_network_header(skb)[nhoff];
+	}
 
 resubmit_final:
 	raw = raw6_local_deliver(skb, nexthdr);
@@ -411,13 +409,19 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
 			consume_skb(skb);
 		}
 	}
-	rcu_read_unlock();
-	return 0;
+	return;
 
 discard:
 	__IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
-	rcu_read_unlock();
 	kfree_skb(skb);
+}
+
+static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	rcu_read_lock();
+	ip6_protocol_deliver_rcu(net, skb, 0, false);
+	rcu_read_unlock();
+
 	return 0;
 }
 
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 04/10] ip: factor out protocol delivery helper
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

So that we can re-use it at the UDP level in a later patch

rfc v3 -> v1
 - add the helper declaration into the ip header

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/ip.h    |  1 +
 net/ipv4/ip_input.c | 73 ++++++++++++++++++++++-----------------------
 2 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 72593e171d14..cece69b5c531 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -155,6 +155,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 void ip_list_rcv(struct list_head *head, struct packet_type *pt,
 		 struct net_device *orig_dev);
 int ip_local_deliver(struct sk_buff *skb);
+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
 int ip_mr_input(struct sk_buff *skb);
 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb);
 int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 35a786c0aaa0..72250b4e466d 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -188,51 +188,50 @@ bool ip_call_ra_chain(struct sk_buff *skb)
 	return false;
 }
 
-static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int protocol)
 {
-	__skb_pull(skb, skb_network_header_len(skb));
-
-	rcu_read_lock();
-	{
-		int protocol = ip_hdr(skb)->protocol;
-		const struct net_protocol *ipprot;
-		int raw;
+	const struct net_protocol *ipprot;
+	int raw, ret;
 
-	resubmit:
-		raw = raw_local_deliver(skb, protocol);
+resubmit:
+	raw = raw_local_deliver(skb, protocol);
 
-		ipprot = rcu_dereference(inet_protos[protocol]);
-		if (ipprot) {
-			int ret;
-
-			if (!ipprot->no_policy) {
-				if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					kfree_skb(skb);
-					goto out;
-				}
-				nf_reset(skb);
+	ipprot = rcu_dereference(inet_protos[protocol]);
+	if (ipprot) {
+		if (!ipprot->no_policy) {
+			if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				kfree_skb(skb);
+				return;
 			}
-			ret = ipprot->handler(skb);
-			if (ret < 0) {
-				protocol = -ret;
-				goto resubmit;
+			nf_reset(skb);
+		}
+		ret = ipprot->handler(skb);
+		if (ret < 0) {
+			protocol = -ret;
+			goto resubmit;
+		}
+		__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+	} else {
+		if (!raw) {
+			if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
+				icmp_send(skb, ICMP_DEST_UNREACH,
+					  ICMP_PROT_UNREACH, 0);
 			}
-			__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+			kfree_skb(skb);
 		} else {
-			if (!raw) {
-				if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
-					icmp_send(skb, ICMP_DEST_UNREACH,
-						  ICMP_PROT_UNREACH, 0);
-				}
-				kfree_skb(skb);
-			} else {
-				__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
-				consume_skb(skb);
-			}
+			__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
+			consume_skb(skb);
 		}
 	}
- out:
+}
+
+static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	__skb_pull(skb, skb_network_header_len(skb));
+
+	rcu_read_lock();
+	ip_protocol_deliver_rcu(net, skb, ip_hdr(skb)->protocol);
 	rcu_read_unlock();
 
 	return 0;
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next 03/10] udp: add support for UDP_GRO cmsg
From: Paolo Abeni @ 2018-11-07 11:38 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
datagram size. User-space can use such info to compute the original
packets layout.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h | 11 +++++++++++
 net/ipv4/udp.c      |  4 ++++
 net/ipv6/udp.c      |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index f613b329852e..e23d5024f42f 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -121,6 +121,17 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
 	return udp_sk(sk)->no_check6_rx;
 }
 
+static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
+				 struct sk_buff *skb)
+{
+	int gso_size;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
+		gso_size = skb_shinfo(skb)->gso_size;
+		put_cmsg(msg, SOL_UDP, UDP_GRO, sizeof(gso_size), &gso_size);
+	}
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0d447fd194c5..725ece9d78af 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1714,6 +1714,10 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 		memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
 		*addr_len = sizeof(*sin);
 	}
+
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (inet->cmsg_flags)
 		ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index fc0ce6c59ebb..8e76e719305c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -421,6 +421,9 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		*addr_len = sizeof(*sin6);
 	}
 
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (np->rxopt.all)
 		ip6_datagram_recv_common_ctl(sk, msg, skb);
 
-- 
2.17.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox