* Re: [PATCH] net: nicvf: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2016-12-10 22:32 UTC (permalink / raw)
To: tremyfr; +Cc: sgoutham, rric, netdev, linux-kernel
In-Reply-To: <1481378448-22278-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sat, 10 Dec 2016 15:00:48 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH 5/5] net: ethernet: ti: cpsw: sync rates for channels in dual emac mode
From: David Miller @ 2016-12-10 22:30 UTC (permalink / raw)
To: ivan.khoronzhuk
Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1481372630-14914-6-git-send-email-ivan.khoronzhuk@linaro.org>
From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Sat, 10 Dec 2016 14:23:50 +0200
> The channels are common for both ndevs in dual emac mode. Hence, keep
> in sync their rates.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Applied.
^ permalink raw reply
* Re: [PATCH 4/5] net: ethernet: ti: cpsw: re-split res only when speed is changed
From: David Miller @ 2016-12-10 22:30 UTC (permalink / raw)
To: ivan.khoronzhuk
Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1481372630-14914-5-git-send-email-ivan.khoronzhuk@linaro.org>
From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Sat, 10 Dec 2016 14:23:49 +0200
> Don't re-split res in the following cases:
> - speed of phys is not changed
> - speed of phys is changed and no rate limited channels
> - speed of phys is changed and all channels are rate limited
> - phy is unlinked while dev is open
> - phy is linked back but speed is not changed
>
> The maximum speed is sum of "linked" phys, thus res are split taken
> in account two interfaces, both for dual emac mode and for
> switch mode.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Applied.
^ permalink raw reply
* Re: [PATCH 3/5] net: ethernet: ti: cpsw: combine budget and weight split and check
From: David Miller @ 2016-12-10 22:30 UTC (permalink / raw)
To: ivan.khoronzhuk
Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1481372630-14914-4-git-send-email-ivan.khoronzhuk@linaro.org>
From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Sat, 10 Dec 2016 14:23:48 +0200
> Re-split weight along with budget. It simplify code a little
> and update state after every rate change. Also it's necessarily
> to move arguments checks to this combined function. Replace
> maximum rate check for an interface on maximum possible rate.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Applied.
^ permalink raw reply
* Re: [PATCH 2/5] net: ethernet: ti: cpsw: don't start queue twice
From: David Miller @ 2016-12-10 22:30 UTC (permalink / raw)
To: ivan.khoronzhuk
Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1481372630-14914-3-git-send-email-ivan.khoronzhuk@linaro.org>
From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Sat, 10 Dec 2016 14:23:47 +0200
> No need to start queues after cpsw is started as it will be done
> while cpsw_adjust_link(), after phy connection.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Applied.
^ permalink raw reply
* Re: [PATCH 1/5] net: ethernet: ti: cpsw: improve re-split policy
From: David Miller @ 2016-12-10 22:30 UTC (permalink / raw)
To: ivan.khoronzhuk
Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1481372630-14914-1-git-send-email-ivan.khoronzhuk@linaro.org>
From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Sat, 10 Dec 2016 14:23:45 +0200
> This patches add several simplifications and improvements to set
> maximum rate for channels taking in account switch and dual emac mode.
>
> Don't re-split res in the following cases:
> - speed of phys is not changed
> - speed of phys is changed and no rate limited channels
> - speed of phys is changed and all channels are rate limited
> - phy is unlinked while dev is open
> - phy is linked back but speed is not changed
>
> The maximum speed is sum of "linked" phys, thus res are split taken
> into account two interfaces, both for dual emac mode and for
> switch mode.
>
> Tested on am572x
>
> Based on net-next/master
Applied.
^ permalink raw reply
* Re: [PATCH net-next] net: mvneta: select GENERIC_ALLOCATOR
From: David Miller @ 2016-12-10 22:28 UTC (permalink / raw)
To: arnd; +Cc: gregory.clement, mw, f.fainelli, netdev, linux-kernel
In-Reply-To: <20161210103844.1465583-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Sat, 10 Dec 2016 11:38:32 +0100
> We previously relied on GENERIC_ALLOCATOR to be selected by CONFIG_ARM,
> but now we can compile-test the driver on other architectures that
> don't select it:
>
> drivers/net/built-in.o: In function `mvneta_bm_remove':
> mvneta_bm.c:(.text+0x4ee35): undefined reference to `gen_pool_free'
>
> This adds an explicit select for the part of the driver that has
> the dependency.
>
> Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied.
^ permalink raw reply
* Re: [PATCH] net: socket: removed an unnecessary newline
From: David Miller @ 2016-12-10 22:27 UTC (permalink / raw)
To: kushwaha.a; +Cc: sergei.shtylyov, netdev, akkushwaha9896
In-Reply-To: <1481348687-23904-1-git-send-email-kushwaha.a@samsung.com>
From: kushwaha.a@samsung.com
Date: Sat, 10 Dec 2016 11:14:47 +0530
> From: Amit Kushwaha <kushwaha.a@samsung.com>
>
> This patch removes a newline which was added
> in socket.c file in net-next
>
> Signed-off-by: Amit Kushwaha <kushwaha.a@samsung.com>
Applied.
^ permalink raw reply
* Re: [Patch net-next] netlink: use blocking notifier
From: David Miller @ 2016-12-10 22:26 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: netdev
In-Reply-To: <1481346661-25380-1-git-send-email-xiyou.wangcong@gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Fri, 9 Dec 2016 21:10:59 -0800
> netlink_chain is called in ->release(), which is apparently
> a process context, so we don't have to use an atomic notifier
> here.
>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Applied.
^ permalink raw reply
* [Patch net] e1000: use disable_hardirq() for e1000_netpoll()
From: Cong Wang @ 2016-12-10 22:22 UTC (permalink / raw)
To: netdev; +Cc: sd, davej, Cong Wang, Peter Zijlstra (Intel), Jeff Kirsher
In commit 02cea3958664 ("genirq: Provide disable_hardirq()")
Peter introduced disable_hardirq() for netpoll, but it is forgotten
to use it for e1000.
This patch changes disable_irq() to disable_hardirq() for e1000.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
drivers/net/ethernet/intel/e1000e/netdev.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f42129d..164c3bb 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5257,8 +5257,8 @@ static void e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
- disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev);
+ if (disable_hardirq(adapter->pdev->irq))
+ e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
}
#endif
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..9a0be77 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6762,13 +6762,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
- disable_irq(adapter->pdev->irq);
- e1000_intr_msi(adapter->pdev->irq, netdev);
+ if (disable_hardirq(adapter->pdev->irq))
+ e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default: /* E1000E_INT_MODE_LEGACY */
- disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev);
+ if (disable_hardirq(adapter->pdev->irq))
+ e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
}
--
2.5.5
^ permalink raw reply related
* Re: Misalignment, MIPS, and ip_hdr(skb)->version
From: Dan Lüdtke @ 2016-12-10 22:18 UTC (permalink / raw)
To: Daniel Kahn Gillmor
Cc: linux-mips, Netdev, LKML, Hannes Frederic Sowa,
WireGuard mailing list
In-Reply-To: <87vauvhwdu.fsf@alice.fifthhorseman.net>
> On 8 Dec 2016, at 05:34, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
>
> On Wed 2016-12-07 19:30:34 -0500, Hannes Frederic Sowa wrote:
>> Your custom protocol should be designed in a way you get an aligned ip
>> header. Most protocols of the IETF follow this mantra and it is always
>> possible to e.g. pad options so you end up on aligned boundaries for the
>> next header.
>
> fwiw, i'm not convinced that "most protocols of the IETF follow this
> mantra". we've had multiple discussions in different protocol groups
> about shaving or bloating by a few bytes here or there in different
> protocols, and i don't think anyone has brought up memory alignment as
> an argument in any of the discussions i've followed.
>
If the trade-off is between 1 padding byte and 2 byte alignment versus 3 padding bytes and 4 byte alignment I would definitely opt for 3 padding bytes. I know how that waste feels like to a protocol designer, but I think it is worth it. Maybe the padding/reserved will be useful some day for an additional feature.
I remember alignment being discussed and taken very seriously in 6man a couple of times. Often, though, protocol designers did align without much discussion. Implementing unaligned protocols is a pain I've experienced first hand.
^ permalink raw reply
* Re: [iproute2 net-next 1/8] lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
From: David Ahern @ 2016-12-10 22:15 UTC (permalink / raw)
To: Daniel Borkmann, netdev, stephen
In-Reply-To: <584C71F0.3000203@iogearbox.net>
On 12/10/16 2:21 PM, Daniel Borkmann wrote:
>>
>> Please name it bpf_prog_create() then, it would be consistent to
>> bpf_map_create() and shorter as well.
>
> Sorry, lack of coffee, scratch that.
>
> Can't the current bpf_prog_attach() stay as is, and you name the above new
> functions bpf_prog_attach_fd() and bpf_prog_detach_fd()? I think that would
> be better.
ok. no concerns about consistency with libbpf in the kernel repo?
Seems like making iproute2 and the kernel version the same will allow samples and code to move between them much easier.
^ permalink raw reply
* Re: [iproute2 net-next 3/8] Add libbpf.h header with BPF_ macros
From: Daniel Borkmann @ 2016-12-10 21:27 UTC (permalink / raw)
To: David Ahern, netdev, stephen
In-Reply-To: <1481401934-4026-4-git-send-email-dsa@cumulusnetworks.com>
On 12/10/2016 09:32 PM, David Ahern wrote:
> Based on version in kernel repo, samples/bpf/libbpf.h
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> include/libbpf.h | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 184 insertions(+)
> create mode 100644 include/libbpf.h
>
> diff --git a/include/libbpf.h b/include/libbpf.h
> new file mode 100644
> index 000000000000..37951f509a10
> --- /dev/null
> +++ b/include/libbpf.h
> @@ -0,0 +1,184 @@
> +/* eBPF mini library */
> +#ifndef __LIBBPF_H
> +#define __LIBBPF_H
Creating include/libbpf.h is a bit confusing, since all the function
declarations of the current bpf lib code are located in include/bpf_util.h.
Please add all this there as well instead of creating a new file.
^ permalink raw reply
* Re: [PATCH] sh_eth: add wake-on-lan support via magic packet
From: Sergei Shtylyov @ 2016-12-10 21:25 UTC (permalink / raw)
To: Niklas Söderlund; +Cc: Simon Horman, netdev, linux-renesas-soc
In-Reply-To: <20161208145635.GH21834@bigcity.dyn.berto.se>
Hello!
On 12/08/2016 05:56 PM, Niklas Söderlund wrote:
>> You only enable the WOL support fo the R-Car gen2 chips but never say that
>> explicitly, neither in the subject nor here.
>>
>>> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
>>> ---
>>> drivers/net/ethernet/renesas/sh_eth.c | 120 +++++++++++++++++++++++++++++++---
>>> drivers/net/ethernet/renesas/sh_eth.h | 4 ++
>>> 2 files changed, 116 insertions(+), 8 deletions(-)
>>
>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
>>> index 05b0dc5..3974046 100644
>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
[...]
>>> @@ -1657,6 +1658,10 @@ static irqreturn_t sh_eth_interrupt(int irq, void *netdev)
>>> goto out;
>>>
>>> if (!likely(mdp->irq_enabled)) {
>>
>> Oops, I guess unlikely(!mdp->irq_enabled) was meant here...
>
> I can correct this in a separate patch if you wish.
I'll look into this myself, I think.
>> + /* Handle MagicPacket interrupt */
>> + if (sh_eth_read(ndev, ECSR) & ECSR_MPD)
What if it wasn't enabled ATM?
[...]
>>> @@ -3111,6 +3150,10 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>>> if (ret)
>>> goto out_napi_del;
>>>
>>> + mdp->wol_enabled = false;
>>
>> No need, the '*mdp' was kzalloc'ed.
>
> OK, i prefer to explicitly set for easier reading of the code. But if
> you wish I will remove this in v2.
Yes, remove it please.
>>> @@ -3150,15 +3193,71 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
>>>
>>> #ifdef CONFIG_PM
>>> #ifdef CONFIG_PM_SLEEP
>>> +static int sh_eth_wol_setup(struct net_device *ndev)
>>> +{
>>> + struct sh_eth_private *mdp = netdev_priv(ndev);
>>> +
>>> + /* Only allow ECI interrupts */
>>> + mdp->irq_enabled = false;
>>
>> Why 'false' if you enable IRQs below?
>
> I mask all interrupts except MagicPacket (ECSIPR_MPDIP) interrupts form
> the ECI (DMAC_M_ECI) and by setting irq_enabled to false the interrupt
> handler will only ack any residue interrupt.
I don't see where it ack's anything, it just clears EESIPR and returns in
this case.
> This is how it's done in
> other parts of the driver when disabling interrupts.
Not in all parts of the driver that disable EESIPR interrupts... I must
confess that I never liked that 'mdp->irq_enabled' flag and still suspect we
can get things done without it... I need to look at this code again, sigh...
> This is also why I only check for MagicPacket interrupts if irq_enabled
> is false.
I would have preferred that this was done with the other EMAC interrupts,
in sh_eth_error().
>>> + synchronize_irq(ndev->irq);
>>> + napi_disable(&mdp->napi);
>>> + sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
>>> +
>>> + /* Enable ECI MagicPacket interrupt */
>>> + sh_eth_write(ndev, ECSIPR_MPDIP, ECSIPR);
I'd prefer if it was always enabled via 'ecsipr_value'.
>>> +
>>> + /* Enable MagicPacket */
>>> + sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
>>> +
>>> + /* Increased clock usage so device won't be suspended */
>>> + clk_enable(mdp->clk);
>>
>> Hum, intermixiggn runtime PM with clock API doesn't look good...
>
> I agree it looks weird but I need a way to increment the usage count for
> the clock otherwise the PM code will disable the module clock and WoL
> will not work.
How will it do it if you don't call sh_eth_close() in this case?
> Note that this call will not enable the clock just
> increase the usage count so it won't be disabled when the PM code
> decrease it after the sh_eth suspend function is run.
You mean that the PM code calls RPM or clk API on its own? That's strange...
> If you know of a different way of ensuring that the clock is not turned
> off I be happy to look at it. I did some investigation into this and
> calling clk_enable() directly is for example what happens in the
> enable_irq_wake() call path to ensure the clock for the irq_chip is not
> turned off if it is a wakeup source, se for example
> gpio_rcar_irq_set_wake() in drivers/gpio/gpio-rcar.c.
Thanks, will look into it...
[...]
MBR, Sergei
^ permalink raw reply
* Re: [iproute2 net-next 2/8] bpf: export bpf_prog_load
From: Daniel Borkmann @ 2016-12-10 21:24 UTC (permalink / raw)
To: David Ahern, netdev, stephen
In-Reply-To: <1481401934-4026-3-git-send-email-dsa@cumulusnetworks.com>
On 12/10/2016 09:32 PM, David Ahern wrote:
> Code move only; no functional change intended.
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> include/bpf_util.h | 3 +++
> lib/bpf.c | 40 ++++++++++++++++++++--------------------
> 2 files changed, 23 insertions(+), 20 deletions(-)
>
> diff --git a/include/bpf_util.h b/include/bpf_util.h
> index 49b96bbc208f..dcbdca6978d6 100644
> --- a/include/bpf_util.h
> +++ b/include/bpf_util.h
> @@ -75,6 +75,9 @@ int bpf_trace_pipe(void);
>
> void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
>
> +int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
> + size_t size_insns, const char *license, char *log,
> + size_t size_log);
Just a really minor nit: please add a newline here.
> int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
> int bpf_prog_detach(int target_fd, enum bpf_attach_type type);
>
^ permalink raw reply
* Re: [iproute2 net-next 1/8] lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
From: Daniel Borkmann @ 2016-12-10 21:21 UTC (permalink / raw)
To: David Ahern, netdev, stephen
In-Reply-To: <584C70C0.8040506@iogearbox.net>
On 12/10/2016 10:16 PM, Daniel Borkmann wrote:
> On 12/10/2016 09:32 PM, David Ahern wrote:
>> For consistency with other bpf commands, the functions are named
>> bpf_prog_attach and bpf_prog_detach. The existing bpf_prog_attach is
>> renamed to bpf_prog_load_and_report since it calls bpf_prog_load and
>> bpf_prog_report.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>> ---
>> include/bpf_util.h | 3 +++
>> lib/bpf.c | 31 ++++++++++++++++++++++++++-----
>> 2 files changed, 29 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/bpf_util.h b/include/bpf_util.h
>> index 05baeecda57f..49b96bbc208f 100644
>> --- a/include/bpf_util.h
>> +++ b/include/bpf_util.h
>> @@ -75,6 +75,9 @@ int bpf_trace_pipe(void);
>>
>> void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
>>
>> +int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
>> +int bpf_prog_detach(int target_fd, enum bpf_attach_type type);
>> +
>> #ifdef HAVE_ELF
>> int bpf_send_map_fds(const char *path, const char *obj);
>> int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
>> diff --git a/lib/bpf.c b/lib/bpf.c
>> index 2a8cd51d4dae..103fc1ef0593 100644
>> --- a/lib/bpf.c
>> +++ b/lib/bpf.c
>> @@ -850,6 +850,27 @@ int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv)
>> return ret;
>> }
>>
>> +int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
>> +{
>> + union bpf_attr attr = {
>> + .target_fd = target_fd,
>> + .attach_bpf_fd = prog_fd,
>> + .attach_type = type,
>> + };
>
> Please make this consistent with the other bpf(2) cmds we
> have in the current lib code. There were some gcc issues in
> the past, see:
>
> https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=67584e3ab289a22eb9a2e51f90d23e2ced2e76b0
>
> F.e. bpf_map_create() currently looks like:
>
> union bpf_attr attr = {};
>
> attr.map_type = type;
> attr.key_size = size_key;
> attr.value_size = size_value;
> attr.max_entries = max_elem;
> attr.map_flags = flags;
>
>> + return bpf(BPF_PROG_ATTACH, &attr, sizeof(attr));
>> +}
>> +
>> +int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
>> +{
>> + union bpf_attr attr = {
>> + .target_fd = target_fd,
>> + .attach_type = type,
>> + };
>
> Ditto.
>
>> + return bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
>> +}
>> +
>> #ifdef HAVE_ELF
>> struct bpf_elf_prog {
>> enum bpf_prog_type type;
>> @@ -1262,9 +1283,9 @@ static void bpf_prog_report(int fd, const char *section,
>> bpf_dump_error(ctx, "Verifier analysis:\n\n");
>> }
>>
>> -static int bpf_prog_attach(const char *section,
>> - const struct bpf_elf_prog *prog,
>> - struct bpf_elf_ctx *ctx)
>> +static int bpf_prog_load_and_report(const char *section,
>> + const struct bpf_elf_prog *prog,
>> + struct bpf_elf_ctx *ctx)
>> {
>
> Please name it bpf_prog_create() then, it would be consistent to
> bpf_map_create() and shorter as well.
Sorry, lack of coffee, scratch that.
Can't the current bpf_prog_attach() stay as is, and you name the above new
functions bpf_prog_attach_fd() and bpf_prog_detach_fd()? I think that would
be better.
>> int tries = 0, fd;
>> retry:
>> @@ -1656,7 +1677,7 @@ static int bpf_fetch_prog(struct bpf_elf_ctx *ctx, const char *section,
>> prog.size = data.sec_data->d_size;
>> prog.license = ctx->license;
>>
>> - fd = bpf_prog_attach(section, &prog, ctx);
>> + fd = bpf_prog_load_and_report(section, &prog, ctx);
>> if (fd < 0)
>> return fd;
>>
>> @@ -1755,7 +1776,7 @@ static int bpf_fetch_prog_relo(struct bpf_elf_ctx *ctx, const char *section,
>> prog.size = data_insn.sec_data->d_size;
>> prog.license = ctx->license;
>>
>> - fd = bpf_prog_attach(section, &prog, ctx);
>> + fd = bpf_prog_load_and_report(section, &prog, ctx);
>> if (fd < 0) {
>> *lderr = true;
>> return fd;
>>
>
^ permalink raw reply
* Re: [iproute2 net-next 1/8] lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
From: Daniel Borkmann @ 2016-12-10 21:16 UTC (permalink / raw)
To: David Ahern, netdev, stephen
In-Reply-To: <1481401934-4026-2-git-send-email-dsa@cumulusnetworks.com>
On 12/10/2016 09:32 PM, David Ahern wrote:
> For consistency with other bpf commands, the functions are named
> bpf_prog_attach and bpf_prog_detach. The existing bpf_prog_attach is
> renamed to bpf_prog_load_and_report since it calls bpf_prog_load and
> bpf_prog_report.
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> include/bpf_util.h | 3 +++
> lib/bpf.c | 31 ++++++++++++++++++++++++++-----
> 2 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/include/bpf_util.h b/include/bpf_util.h
> index 05baeecda57f..49b96bbc208f 100644
> --- a/include/bpf_util.h
> +++ b/include/bpf_util.h
> @@ -75,6 +75,9 @@ int bpf_trace_pipe(void);
>
> void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
>
> +int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
> +int bpf_prog_detach(int target_fd, enum bpf_attach_type type);
> +
> #ifdef HAVE_ELF
> int bpf_send_map_fds(const char *path, const char *obj);
> int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
> diff --git a/lib/bpf.c b/lib/bpf.c
> index 2a8cd51d4dae..103fc1ef0593 100644
> --- a/lib/bpf.c
> +++ b/lib/bpf.c
> @@ -850,6 +850,27 @@ int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv)
> return ret;
> }
>
> +int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
> +{
> + union bpf_attr attr = {
> + .target_fd = target_fd,
> + .attach_bpf_fd = prog_fd,
> + .attach_type = type,
> + };
Please make this consistent with the other bpf(2) cmds we
have in the current lib code. There were some gcc issues in
the past, see:
https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=67584e3ab289a22eb9a2e51f90d23e2ced2e76b0
F.e. bpf_map_create() currently looks like:
union bpf_attr attr = {};
attr.map_type = type;
attr.key_size = size_key;
attr.value_size = size_value;
attr.max_entries = max_elem;
attr.map_flags = flags;
> + return bpf(BPF_PROG_ATTACH, &attr, sizeof(attr));
> +}
> +
> +int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
> +{
> + union bpf_attr attr = {
> + .target_fd = target_fd,
> + .attach_type = type,
> + };
Ditto.
> + return bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
> +}
> +
> #ifdef HAVE_ELF
> struct bpf_elf_prog {
> enum bpf_prog_type type;
> @@ -1262,9 +1283,9 @@ static void bpf_prog_report(int fd, const char *section,
> bpf_dump_error(ctx, "Verifier analysis:\n\n");
> }
>
> -static int bpf_prog_attach(const char *section,
> - const struct bpf_elf_prog *prog,
> - struct bpf_elf_ctx *ctx)
> +static int bpf_prog_load_and_report(const char *section,
> + const struct bpf_elf_prog *prog,
> + struct bpf_elf_ctx *ctx)
> {
Please name it bpf_prog_create() then, it would be consistent to
bpf_map_create() and shorter as well.
> int tries = 0, fd;
> retry:
> @@ -1656,7 +1677,7 @@ static int bpf_fetch_prog(struct bpf_elf_ctx *ctx, const char *section,
> prog.size = data.sec_data->d_size;
> prog.license = ctx->license;
>
> - fd = bpf_prog_attach(section, &prog, ctx);
> + fd = bpf_prog_load_and_report(section, &prog, ctx);
> if (fd < 0)
> return fd;
>
> @@ -1755,7 +1776,7 @@ static int bpf_fetch_prog_relo(struct bpf_elf_ctx *ctx, const char *section,
> prog.size = data_insn.sec_data->d_size;
> prog.license = ctx->license;
>
> - fd = bpf_prog_attach(section, &prog, ctx);
> + fd = bpf_prog_load_and_report(section, &prog, ctx);
> if (fd < 0) {
> *lderr = true;
> return fd;
>
^ permalink raw reply
* Re: [PATCH net v3] ibmveth: set correct gso_size and gso_type
From: David Miller @ 2016-12-10 20:56 UTC (permalink / raw)
To: tlfalcon
Cc: netdev, brking, marcelo.leitner, pradeeps, jmaxwell37, zdai,
eric.dumazet
In-Reply-To: <1481395188-9137-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Sat, 10 Dec 2016 12:39:48 -0600
> v3: include a check for non-zero mss when calculating gso_segs
>
> v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
> and make sure everyone is included on CC
I already applied v1 which made it all the way even to Linus's
tree. So you'll have to send me relative fixups if there are
things to fix or change since v1.
You must always generate patches against the current 'net' tree.
^ permalink raw reply
* Re: Misalignment, MIPS, and ip_hdr(skb)->version
From: Felix Fietkau @ 2016-12-10 20:36 UTC (permalink / raw)
To: Måns Rullgård
Cc: linux-mips, Netdev, LKML, David Miller, WireGuard mailing list
In-Reply-To: <yw1xy3znwmmr.fsf@unicorn.mansr.com>
On 2016-12-10 21:32, Måns Rullgård wrote:
> Felix Fietkau <nbd@nbd.name> writes:
>
>> On 2016-12-10 14:25, Måns Rullgård wrote:
>>> Felix Fietkau <nbd@nbd.name> writes:
>>>
>>>> On 2016-12-07 19:54, Jason A. Donenfeld wrote:
>>>>> On Wed, Dec 7, 2016 at 7:51 PM, David Miller <davem@davemloft.net> wrote:
>>>>>> It's so much better to analyze properly where the misalignment comes from
>>>>>> and address it at the source, as we have for various cases that trip up
>>>>>> Sparc too.
>>>>>
>>>>> That's sort of my attitude too, hence starting this thread. Any
>>>>> pointers you have about this would be most welcome, so as not to
>>>>> perpetuate what already seems like an issue in other parts of the
>>>>> stack.
>>>> Hi Jason,
>>>>
>>>> I'm the author of that hackish LEDE/OpenWrt patch that works around the
>>>> misalignment issues. Here's some context regarding that patch:
>>>>
>>>> I intentionally put it in the target specific patches for only one of
>>>> our MIPS targets. There are a few ar71xx devices where the misalignment
>>>> cannot be fixed, because the Ethernet MAC has a 4-byte DMA alignment
>>>> requirement, and does not support inserting 2 bytes of padding to
>>>> correct the IP header misalignment.
>>>>
>>>> With these limitations the choice was between this ugly network stack
>>>> patch or inserting a very expensive memmove in the data path (which is
>>>> better than taking the mis-alignment traps, but still hurts routing
>>>> performance significantly).
>>>
>>> I solved this problem in an Ethernet driver by copying the initial part
>>> of the packet to an aligned skb and appending the remainder using
>>> skb_add_rx_frag(). The kernel network stack only cares about the
>>> headers, so the alignment of the packet payload doesn't matter.
>>
>> I considered that as well, but it's bad for routing performance if the
>> ethernet MAC does not support scatter/gather for xmit.
>> Unfortunately that limitation is quite common on embedded hardware.
>
> Yes, I can see that being an issue. However, if you're doing zero-copy
> routing, the header part of the original buffer should still be there,
> unused, so you could presumably copy the header of the outgoing packet
> there and then do dma as usual. Maybe there's something in the network
> stack that makes this impossible though.
That still puts more pressure on the ridiculously small dcache sizes
that are typical for embedded MIPS routers.
- Felix
_______________________________________________
WireGuard mailing list
WireGuard@lists.zx2c4.com
https://lists.zx2c4.com/mailman/listinfo/wireguard
^ permalink raw reply
* Re: Misalignment, MIPS, and ip_hdr(skb)->version
From: Måns Rullgård @ 2016-12-10 20:32 UTC (permalink / raw)
To: Felix Fietkau
Cc: linux-mips, Netdev, LKML, David Miller, WireGuard mailing list
In-Reply-To: <7f8ba817-73ef-e1e1-4fdf-b9178e922008@nbd.name>
Felix Fietkau <nbd@nbd.name> writes:
> On 2016-12-10 14:25, Måns Rullgård wrote:
>> Felix Fietkau <nbd@nbd.name> writes:
>>
>>> On 2016-12-07 19:54, Jason A. Donenfeld wrote:
>>>> On Wed, Dec 7, 2016 at 7:51 PM, David Miller <davem@davemloft.net> wrote:
>>>>> It's so much better to analyze properly where the misalignment comes from
>>>>> and address it at the source, as we have for various cases that trip up
>>>>> Sparc too.
>>>>
>>>> That's sort of my attitude too, hence starting this thread. Any
>>>> pointers you have about this would be most welcome, so as not to
>>>> perpetuate what already seems like an issue in other parts of the
>>>> stack.
>>> Hi Jason,
>>>
>>> I'm the author of that hackish LEDE/OpenWrt patch that works around the
>>> misalignment issues. Here's some context regarding that patch:
>>>
>>> I intentionally put it in the target specific patches for only one of
>>> our MIPS targets. There are a few ar71xx devices where the misalignment
>>> cannot be fixed, because the Ethernet MAC has a 4-byte DMA alignment
>>> requirement, and does not support inserting 2 bytes of padding to
>>> correct the IP header misalignment.
>>>
>>> With these limitations the choice was between this ugly network stack
>>> patch or inserting a very expensive memmove in the data path (which is
>>> better than taking the mis-alignment traps, but still hurts routing
>>> performance significantly).
>>
>> I solved this problem in an Ethernet driver by copying the initial part
>> of the packet to an aligned skb and appending the remainder using
>> skb_add_rx_frag(). The kernel network stack only cares about the
>> headers, so the alignment of the packet payload doesn't matter.
>
> I considered that as well, but it's bad for routing performance if the
> ethernet MAC does not support scatter/gather for xmit.
> Unfortunately that limitation is quite common on embedded hardware.
Yes, I can see that being an issue. However, if you're doing zero-copy
routing, the header part of the original buffer should still be there,
unused, so you could presumably copy the header of the outgoing packet
there and then do dma as usual. Maybe there's something in the network
stack that makes this impossible though.
--
Måns Rullgård
^ permalink raw reply
* [iproute2 net-next 8/8] Introduce ip vrf command
From: David Ahern @ 2016-12-10 20:32 UTC (permalink / raw)
To: netdev, stephen; +Cc: David Ahern
In-Reply-To: <1481401934-4026-1-git-send-email-dsa@cumulusnetworks.com>
'ip vrf' follows the user semnatics established by 'ip netns'.
The 'ip vrf' subcommand supports 3 usages:
1. Run a command against a given vrf:
ip vrf exec NAME CMD
Uses the recently committed cgroup/sock BPF option. vrf directory
is added to cgroup2 mount. Individual vrfs are created under it. BPF
filter attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the VRF
device index. From there the current process (ip's pid) is addded to
the cgroups.proc file and the given command is exected. In doing so
all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically bound to
the VRF domain.
The association is inherited parent to child allowing the command to
be a shell from which other commands are run relative to the VRF.
2. Show the VRF a process is bound to:
ip vrf id
This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
entry with the VRF name following.
3. Show process ids bound to a VRF
ip vrf pids NAME
This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
shows the process ids in the particular vrf cgroup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
ip/Makefile | 3 +-
ip/ip.c | 4 +-
ip/ip_common.h | 2 +
ip/ipvrf.c | 289 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
man/man8/ip-vrf.8 | 88 +++++++++++++++++
5 files changed, 384 insertions(+), 2 deletions(-)
create mode 100644 ip/ipvrf.c
create mode 100644 man/man8/ip-vrf.8
diff --git a/ip/Makefile b/ip/Makefile
index c8e6c6172741..1928489e7f90 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -7,7 +7,8 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \
- iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o ipila.o
+ iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o ipila.o \
+ ipvrf.o
RTMONOBJ=rtmon.o
diff --git a/ip/ip.c b/ip/ip.c
index cb3adcb3f57d..07050b07592a 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -51,7 +51,8 @@ static void usage(void)
" ip [ -force ] -batch filename\n"
"where OBJECT := { link | address | addrlabel | route | rule | neigh | ntable |\n"
" tunnel | tuntap | maddress | mroute | mrule | monitor | xfrm |\n"
-" netns | l2tp | fou | macsec | tcp_metrics | token | netconf | ila }\n"
+" netns | l2tp | fou | macsec | tcp_metrics | token | netconf | ila |\n"
+" vrf }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
" -h[uman-readable] | -iec |\n"
" -f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | link } |\n"
@@ -99,6 +100,7 @@ static const struct cmd {
{ "mrule", do_multirule },
{ "netns", do_netns },
{ "netconf", do_ipnetconf },
+ { "vrf", do_ipvrf},
{ "help", do_help },
{ 0 }
};
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 3162f1ca5b2c..28763e81e4a4 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -57,6 +57,8 @@ extern int do_ipila(int argc, char **argv);
int do_tcp_metrics(int argc, char **argv);
int do_ipnetconf(int argc, char **argv);
int do_iptoken(int argc, char **argv);
+int do_ipvrf(int argc, char **argv);
+
int iplink_get(unsigned int flags, char *name, __u32 filt_mask);
static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
diff --git a/ip/ipvrf.c b/ip/ipvrf.c
new file mode 100644
index 000000000000..c4f0e53532e2
--- /dev/null
+++ b/ip/ipvrf.c
@@ -0,0 +1,289 @@
+/*
+ * ipvrf.c "ip vrf"
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: David Ahern <dsa@cumulusnetworks.com>
+ *
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/mount.h>
+#include <linux/bpf.h>
+#include <linux/if.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <limits.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+#include "libbpf.h"
+#include "bpf_util.h"
+
+#define CGRP_PROC_FILE "/cgroup.procs"
+
+static void usage(void)
+{
+ fprintf(stderr, "Usage: ip vrf exec [NAME] cmd ...\n");
+ fprintf(stderr, " ip vrf identify [PID]\n");
+ fprintf(stderr, " ip vrf pids [NAME]\n");
+
+ exit(-1);
+}
+
+static int ipvrf_identify(int argc, char **argv)
+{
+ char path[PATH_MAX];
+ char buf[4096];
+ char *vrf, *end;
+ int fd, rc = -1;
+ unsigned int pid;
+ ssize_t n;
+
+ if (argc < 1)
+ pid = getpid();
+ else if (argc > 1)
+ invarg("Extra arguments specified\n", argv[1]);
+ else if (get_unsigned(&pid, argv[0], 10))
+ invarg("Invalid pid\n", argv[0]);
+
+ snprintf(path, sizeof(path), "/proc/%d/cgroup", pid);
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr,
+ "Failed to open cgroups file: %s\n", strerror(errno));
+ return -1;
+ }
+
+ n = read(fd, buf, sizeof(buf) - 1);
+ if (n < 0) {
+ fprintf(stderr,
+ "Failed to read cgroups file: %s\n", strerror(errno));
+ goto out;
+ }
+ buf[n] = '\0';
+ vrf = strstr(buf, "::/vrf/");
+ if (vrf) {
+ vrf += 7; /* skip past "::/vrf/" */
+ end = strchr(vrf, '\n');
+ if (end)
+ *end = '\0';
+
+ printf("%s\n", vrf);
+ }
+
+ rc = 0;
+out:
+ close(fd);
+
+ return rc;
+}
+
+static int ipvrf_pids(int argc, char **argv)
+{
+ char path[PATH_MAX];
+ char buf[4096];
+ char *mnt, *vrf;
+ int fd, rc = -1;
+ ssize_t n;
+
+ if (argc != 1) {
+ fprintf(stderr, "Invalid arguments\n");
+ return -1;
+ }
+
+ vrf = argv[0];
+
+ mnt = find_cgroup2_mount();
+ if (!mnt)
+ return -1;
+
+ snprintf(path, sizeof(path), "%s/vrf/%s%s", mnt, vrf, CGRP_PROC_FILE);
+ free(mnt);
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ return 0; /* no cgroup file, nothing to show */
+
+ while (1) {
+ n = read(fd, buf, sizeof(buf) - 1);
+ if (n < 0) {
+ fprintf(stderr,
+ "Failed to read cgroups file: %s\n", strerror(errno));
+ break;
+ } else if (n == 0) {
+ rc = 0;
+ break;
+ }
+ printf("%s", buf);
+ }
+
+ close(fd);
+
+ return rc;
+}
+
+/* load BPF program to set sk_bound_dev_if for sockets */
+static char bpf_log_buf[256*1024];
+
+static int prog_load(int idx)
+{
+ struct bpf_insn prog[] = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_MOV64_IMM(BPF_REG_3, idx),
+ BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
+ BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+ BPF_EXIT_INSN(),
+ };
+
+ return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+ "GPL", bpf_log_buf, sizeof(bpf_log_buf));
+}
+
+static int vrf_configure_cgroup(const char *path, int ifindex)
+{
+ int rc = -1, cg_fd, prog_fd = -1;
+
+ cg_fd = open(path, O_DIRECTORY | O_RDONLY);
+ if (cg_fd < 0) {
+ fprintf(stderr, "Failed to open cgroup path: '%s'\n", strerror(errno));
+ goto out;
+ }
+
+ /*
+ * Load bpf program into kernel and attach to cgroup to affect
+ * socket creates
+ */
+ prog_fd = prog_load(ifindex);
+ if (prog_fd < 0) {
+ printf("Failed to load BPF prog: '%s'\n", strerror(errno));
+ goto out;
+ }
+
+ if (bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE)) {
+ fprintf(stderr, "Failed to attach prog to cgroup: '%s'\n",
+ strerror(errno));
+ fprintf(stderr, "Kernel compiled with CGROUP_BPF enabled?\n");
+ goto out;
+ }
+
+ rc = 0;
+out:
+ close(cg_fd);
+ close(prog_fd);
+
+ return rc;
+}
+
+static int vrf_switch(const char *name)
+{
+ char path[PATH_MAX], *mnt, pid[16];
+ int ifindex = name_is_vrf(name);
+ bool default_vrf = false;
+ int rc = -1, len, fd = -1;
+
+ if (!ifindex) {
+ if (strcmp(name, "default")) {
+ fprintf(stderr, "Invalid VRF name\n");
+ return -1;
+ }
+ default_vrf = true;
+ }
+
+ mnt = find_cgroup2_mount();
+ if (!mnt)
+ return -1;
+
+ /* path to cgroup; make sure buffer has room to cat "/cgroup.procs"
+ * to the end of the path
+ */
+ len = snprintf(path, sizeof(path) - sizeof(CGRP_PROC_FILE), "%s%s/%s",
+ mnt, default_vrf ? "" : "/vrf", name);
+ if (len > sizeof(path) - sizeof(CGRP_PROC_FILE)) {
+ fprintf(stderr, "Invalid path to cgroup2 mount\n");
+ goto out;
+ }
+
+ if (make_path(path, 0755)) {
+ fprintf(stderr, "Failed to setup vrf cgroup2 directory\n");
+ goto out;
+ }
+
+ if (!default_vrf && vrf_configure_cgroup(path, ifindex))
+ goto out;
+
+ /*
+ * write pid to cgroup.procs making process part of cgroup
+ */
+ strcat(path, CGRP_PROC_FILE);
+ fd = open(path, O_RDWR | O_APPEND);
+ if (fd < 0) {
+ fprintf(stderr, "cgroups.procs file does not exist.\n");
+ goto out;
+ }
+
+ snprintf(pid, sizeof(pid), "%d", getpid());
+ if (write(fd, pid, strlen(pid)) < 0) {
+ fprintf(stderr, "Failed to join cgroup\n");
+ goto out;
+ }
+
+ rc = 0;
+out:
+ free(mnt);
+ close(fd);
+
+ return rc;
+}
+
+static int ipvrf_exec(int argc, char **argv)
+{
+ if (argc < 1) {
+ fprintf(stderr, "No VRF name specified\n");
+ return -1;
+ }
+ if (argc < 2) {
+ fprintf(stderr, "No command specified\n");
+ return -1;
+ }
+
+ if (vrf_switch(argv[0]))
+ return -1;
+
+ return -cmd_exec(argv[1], argv + 1, !!batch_mode);
+}
+
+int do_ipvrf(int argc, char **argv)
+{
+ if (argc == 0) {
+ fprintf(stderr, "No command given. Try \"ip vrf help\".\n");
+ exit(-1);
+ }
+
+ if (matches(*argv, "identify") == 0)
+ return ipvrf_identify(argc-1, argv+1);
+
+ if (matches(*argv, "pids") == 0)
+ return ipvrf_pids(argc-1, argv+1);
+
+ if (matches(*argv, "exec") == 0)
+ return ipvrf_exec(argc-1, argv+1);
+
+ if (matches(*argv, "help") == 0)
+ usage();
+
+ fprintf(stderr, "Command \"%s\" is unknown, try \"ip vrf help\".\n",
+ *argv);
+
+ exit(-1);
+}
diff --git a/man/man8/ip-vrf.8 b/man/man8/ip-vrf.8
new file mode 100644
index 000000000000..57a7c7692ce8
--- /dev/null
+++ b/man/man8/ip-vrf.8
@@ -0,0 +1,88 @@
+.TH IP\-VRF 8 "7 Dec 2016" "iproute2" "Linux"
+.SH NAME
+ip-vrf \- run a command against a vrf
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B ip
+.B vrf
+.RI " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.BR "ip vrf identify"
+.RI "[ " PID " ]"
+
+.ti -8
+.BR "ip vrf pids"
+.I NAME
+
+.ti -8
+.BR "ip vrf exec "
+.RI "[ " NAME " ] " command ...
+
+.SH DESCRIPTION
+A VRF provides traffic isolation at layer 3 for routing, similar to how a
+VLAN is used to isolate traffic at layer 2. Fundamentally, a VRF is a separate
+routing table. Network devices are associated with a VRF by enslaving the
+device to the VRF. At that point network addresses assigned to the device are
+local to the VRF with host and connected routes moved to the table associated
+with the VRF.
+
+A process can specify a VRF using several APIs -- binding the socket to the
+VRF device using SO_BINDTODEVICE, setting the VRF association using
+IP_UNICAST_IF or IPV6_UNICAST_IF, or specifying the VRF for a specific message
+using IP_PKTINFO or IPV6_PKTINFO.
+
+By default a process is not bound to any VRF. An association can be set
+explicitly by making the program use one of the APIs mentioned above or
+implicitly using a helper to set SO_BINDTODEVICE for all IPv4 and IPv6
+sockets (AF_INET and AF_INET6) when the socket is created. This ip-vrf command
+is a helper to run a command against a specific VRF with the VRF association
+inherited parent to child.
+
+.TP
+.B ip vrf exec [ NAME ] cmd ... - Run cmd against the named VRF
+.sp
+This command allows applications that are VRF unaware to be run against
+a VRF other than the default VRF (main table). A command can be run against
+the default VRF by passing the "default" as the VRF name. This is useful if
+the current shell is associated with another VRF (e.g, Management VRF).
+
+.TP
+.B ip vrf identify [PID] - Report VRF association for process
+.sp
+This command shows the VRF association of the specified process. If PID is
+not specified then the id of the current process is used.
+
+.TP
+.B ip vrf pids NAME - Report processes associated with the named VRF
+.sp
+This command shows all process ids that are associated with the given
+VRF.
+
+.SH CAVEATS
+This command requires a kernel compiled with CGROUPS and CGROUP_BPF enabled.
+
+The VRF helper *only* affects network layer sockets.
+
+.SH EXAMPLES
+.PP
+ip vrf exec red ssh 10.100.1.254
+.RS
+Executes ssh to 10.100.1.254 against the VRF red table.
+.RE
+
+.SH SEE ALSO
+.br
+.BR ip (8),
+.BR ip-link (8),
+.BR ip-address (8),
+.BR ip-route (8),
+.BR ip-neighbor (8)
+
+.SH AUTHOR
+Original Manpage by David Ahern
--
2.1.4
^ permalink raw reply related
* [iproute2 net-next 7/8] libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
From: David Ahern @ 2016-12-10 20:32 UTC (permalink / raw)
To: netdev, stephen; +Cc: David Ahern
In-Reply-To: <1481401934-4026-1-git-send-email-dsa@cumulusnetworks.com>
iplink_vrf has 2 functions used to validate a user given device name is
a VRF device and to return the table id. If the user string is not a
device name ip commands with a vrf keyword show a confusing error
message: "RTNETLINK answers: No such device".
Add a variant of rtnl_talk that does not display the "RTNETLINK answers"
message and update iplink_vrf to use it.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
include/libnetlink.h | 3 +++
ip/iplink_vrf.c | 14 +++++++++++---
lib/libnetlink.c | 20 +++++++++++++++++---
3 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/include/libnetlink.h b/include/libnetlink.h
index 751ebf186dd4..bd0267dfcc02 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -81,6 +81,9 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
struct nlmsghdr *answer, size_t len)
__attribute__((warn_unused_result));
+int rtnl_talk_suppress_rtnl_errmsg(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+ struct nlmsghdr *answer, size_t len)
+ __attribute__((warn_unused_result));
int rtnl_send(struct rtnl_handle *rth, const void *buf, int)
__attribute__((warn_unused_result));
int rtnl_send_check(struct rtnl_handle *rth, const void *buf, int)
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
index c101ed770f87..917630e85337 100644
--- a/ip/iplink_vrf.c
+++ b/ip/iplink_vrf.c
@@ -13,6 +13,7 @@
#include <string.h>
#include <sys/socket.h>
#include <linux/if_link.h>
+#include <errno.h>
#include "rt_names.h"
#include "utils.h"
@@ -126,8 +127,14 @@ __u32 ipvrf_get_table(const char *name)
addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
- if (rtnl_talk(&rth, &req.n, &answer.n, sizeof(answer)) < 0)
- return 0;
+ if (rtnl_talk_suppress_rtnl_errmsg(&rth, &req.n,
+ &answer.n, sizeof(answer)) < 0) {
+ /* special case "default" vrf to be the main table */
+ if (errno == ENODEV && !strcmp(name, "default"))
+ rtnl_rttable_a2n(&tb_id, "main");
+
+ return tb_id;
+ }
ifi = NLMSG_DATA(&answer.n);
len = answer.n.nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
@@ -186,7 +193,8 @@ int name_is_vrf(const char *name)
addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
- if (rtnl_talk(&rth, &req.n, &answer.n, sizeof(answer)) < 0)
+ if (rtnl_talk_suppress_rtnl_errmsg(&rth, &req.n,
+ &answer.n, sizeof(answer)) < 0)
return 0;
ifi = NLMSG_DATA(&answer.n);
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index a5db168e50eb..9d7e89aebbd0 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -12,6 +12,7 @@
#include <stdio.h>
#include <stdlib.h>
+#include <stdbool.h>
#include <unistd.h>
#include <syslog.h>
#include <fcntl.h>
@@ -397,8 +398,9 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
return rtnl_dump_filter_l(rth, a);
}
-int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
- struct nlmsghdr *answer, size_t maxlen)
+static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+ struct nlmsghdr *answer, size_t maxlen,
+ bool show_rtnl_err)
{
int status;
unsigned int seq;
@@ -485,7 +487,7 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
return 0;
}
- if (rtnl->proto != NETLINK_SOCK_DIAG)
+ if (rtnl->proto != NETLINK_SOCK_DIAG && show_rtnl_err)
fprintf(stderr,
"RTNETLINK answers: %s\n",
strerror(-err->error));
@@ -517,6 +519,18 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
}
}
+int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+ struct nlmsghdr *answer, size_t maxlen)
+{
+ return __rtnl_talk(rtnl, n, answer, maxlen, true);
+}
+
+int rtnl_talk_suppress_rtnl_errmsg(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+ struct nlmsghdr *answer, size_t maxlen)
+{
+ return __rtnl_talk(rtnl, n, answer, maxlen, false);
+}
+
int rtnl_listen_all_nsid(struct rtnl_handle *rth)
{
unsigned int on = 1;
--
2.1.4
^ permalink raw reply related
* [iproute2 net-next 6/8] change name_is_vrf to return index
From: David Ahern @ 2016-12-10 20:32 UTC (permalink / raw)
To: netdev, stephen; +Cc: David Ahern
In-Reply-To: <1481401934-4026-1-git-send-email-dsa@cumulusnetworks.com>
index of 0 means name is not a valid vrf.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
ip/ip_common.h | 2 +-
ip/iplink_vrf.c | 15 +++++++++------
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 0147f45a7a31..3162f1ca5b2c 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -91,7 +91,7 @@ struct link_util *get_link_kind(const char *kind);
void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
__u32 ipvrf_get_table(const char *name);
-bool name_is_vrf(const char *name);
+int name_is_vrf(const char *name);
#ifndef INFINITY_LIFE_TIME
#define INFINITY_LIFE_TIME 0xFFFFFFFFU
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
index a238b2906805..c101ed770f87 100644
--- a/ip/iplink_vrf.c
+++ b/ip/iplink_vrf.c
@@ -159,7 +159,7 @@ __u32 ipvrf_get_table(const char *name)
return tb_id;
}
-bool name_is_vrf(const char *name)
+int name_is_vrf(const char *name)
{
struct {
struct nlmsghdr n;
@@ -187,24 +187,27 @@ bool name_is_vrf(const char *name)
addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
if (rtnl_talk(&rth, &req.n, &answer.n, sizeof(answer)) < 0)
- return false;
+ return 0;
ifi = NLMSG_DATA(&answer.n);
len = answer.n.nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
if (len < 0) {
fprintf(stderr, "BUG: Invalid response to link query.\n");
- return false;
+ return 0;
}
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
if (!tb[IFLA_LINKINFO])
- return false;
+ return 0;
parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]);
if (!li[IFLA_INFO_KIND])
- return false;
+ return 0;
+
+ if (strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf"))
+ return 0;
- return strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf") == 0;
+ return ifi->ifi_index;
}
--
2.1.4
^ permalink raw reply related
* [iproute2 net-next 4/8] move cmd_exec to lib utils
From: David Ahern @ 2016-12-10 20:32 UTC (permalink / raw)
To: netdev, stephen; +Cc: David Ahern
In-Reply-To: <1481401934-4026-1-git-send-email-dsa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
include/utils.h | 2 ++
ip/ipnetns.c | 34 ----------------------------------
lib/Makefile | 2 +-
lib/exec.c | 41 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 44 insertions(+), 35 deletions(-)
create mode 100644 lib/exec.c
diff --git a/include/utils.h b/include/utils.h
index 26c970daa5d0..ac4517a3bde1 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -256,4 +256,6 @@ char *int_to_str(int val, char *buf);
int get_guid(__u64 *guid, const char *arg);
int get_real_family(int rtm_type, int rtm_family);
+int cmd_exec(const char *cmd, char **argv, bool do_fork);
+
#endif /* __UTILS_H__ */
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index bd1e9013706c..db9a541769f1 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -357,40 +357,6 @@ static int netns_list(int argc, char **argv)
return 0;
}
-static int cmd_exec(const char *cmd, char **argv, bool do_fork)
-{
- fflush(stdout);
- if (do_fork) {
- int status;
- pid_t pid;
-
- pid = fork();
- if (pid < 0) {
- perror("fork");
- exit(1);
- }
-
- if (pid != 0) {
- /* Parent */
- if (waitpid(pid, &status, 0) < 0) {
- perror("waitpid");
- exit(1);
- }
-
- if (WIFEXITED(status)) {
- return WEXITSTATUS(status);
- }
-
- exit(1);
- }
- }
-
- if (execvp(cmd, argv) < 0)
- fprintf(stderr, "exec of \"%s\" failed: %s\n",
- cmd, strerror(errno));
- _exit(1);
-}
-
static int on_netns_exec(char *nsname, void *arg)
{
char **argv = arg;
diff --git a/lib/Makefile b/lib/Makefile
index 5b7ec169048a..749073261c49 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -8,7 +8,7 @@ CFLAGS += -fPIC
UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
inet_proto.o namespace.o json_writer.o \
- names.o color.o bpf.o
+ names.o color.o bpf.o exec.o
NLOBJ=libgenl.o ll_map.o libnetlink.o
diff --git a/lib/exec.c b/lib/exec.c
new file mode 100644
index 000000000000..96edbc422e84
--- /dev/null
+++ b/lib/exec.c
@@ -0,0 +1,41 @@
+#define _ATFILE_SOURCE
+#include <sys/wait.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include "utils.h"
+
+int cmd_exec(const char *cmd, char **argv, bool do_fork)
+{
+ fflush(stdout);
+ if (do_fork) {
+ int status;
+ pid_t pid;
+
+ pid = fork();
+ if (pid < 0) {
+ perror("fork");
+ exit(1);
+ }
+
+ if (pid != 0) {
+ /* Parent */
+ if (waitpid(pid, &status, 0) < 0) {
+ perror("waitpid");
+ exit(1);
+ }
+
+ if (WIFEXITED(status)) {
+ return WEXITSTATUS(status);
+ }
+
+ exit(1);
+ }
+ }
+
+ if (execvp(cmd, argv) < 0)
+ fprintf(stderr, "exec of \"%s\" failed: %s\n",
+ cmd, strerror(errno));
+ _exit(1);
+}
--
2.1.4
^ permalink raw reply related
* [iproute2 net-next 5/8] Add filesystem APIs to lib
From: David Ahern @ 2016-12-10 20:32 UTC (permalink / raw)
To: netdev, stephen; +Cc: David Ahern
In-Reply-To: <1481401934-4026-1-git-send-email-dsa@cumulusnetworks.com>
Add make_path to recursively call mkdir as needed to create a given
path with the given mode.
Add find_cgroup2_mount to lookup path where cgroup2 is mounted. If it
is not already mounted, cgroup2 is mounted under /var/run/cgroup2 for
use by iproute2.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
include/utils.h | 2 +
lib/Makefile | 2 +-
lib/fs.c | 143 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 146 insertions(+), 1 deletion(-)
create mode 100644 lib/fs.c
diff --git a/include/utils.h b/include/utils.h
index ac4517a3bde1..dc1d6b9607dd 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -257,5 +257,7 @@ int get_guid(__u64 *guid, const char *arg);
int get_real_family(int rtm_type, int rtm_family);
int cmd_exec(const char *cmd, char **argv, bool do_fork);
+int make_path(const char *path, mode_t mode);
+char *find_cgroup2_mount(void);
#endif /* __UTILS_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index 749073261c49..0c57662b4f8f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -8,7 +8,7 @@ CFLAGS += -fPIC
UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
inet_proto.o namespace.o json_writer.o \
- names.o color.o bpf.o exec.o
+ names.o color.o bpf.o exec.o fs.o
NLOBJ=libgenl.o ll_map.o libnetlink.o
diff --git a/lib/fs.c b/lib/fs.c
new file mode 100644
index 000000000000..39cc96dccca9
--- /dev/null
+++ b/lib/fs.c
@@ -0,0 +1,143 @@
+/*
+ * fs.c filesystem APIs
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: David Ahern <dsa@cumulusnetworks.com>
+ *
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/mount.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <limits.h>
+
+#include "utils.h"
+
+#define CGROUP2_FS_NAME "cgroup2"
+
+/* if not already mounted cgroup2 is mounted here for iproute2's use */
+#define MNT_CGRP2_PATH "/var/run/cgroup2"
+
+/* return mount path of first occurrence of given fstype */
+static char *find_fs_mount(const char *fs_to_find)
+{
+ char path[4096];
+ char fstype[128]; /* max length of any filesystem name */
+ char *mnt = NULL;
+ FILE *fp;
+
+ fp = fopen("/proc/mounts", "r");
+ if (!fp) {
+ fprintf(stderr,
+ "Failed to open mounts file: %s\n", strerror(errno));
+ return NULL;
+ }
+
+ while (fscanf(fp, "%*s %4096s %127s %*s %*d %*d\n",
+ path, fstype) == 2) {
+ if (strcmp(fstype, fs_to_find) == 0) {
+ mnt = strdup(path);
+ break;
+ }
+ }
+
+ fclose(fp);
+
+ return mnt;
+}
+
+/* caller needs to free string returned */
+char *find_cgroup2_mount(void)
+{
+ char *mnt = find_fs_mount(CGROUP2_FS_NAME);
+
+ if (mnt)
+ return mnt;
+
+ mnt = strdup(MNT_CGRP2_PATH);
+ if (!mnt) {
+ fprintf(stderr, "Failed to allocate memory for cgroup2 path\n");
+ return NULL;
+
+ }
+
+ if (make_path(mnt, 0755)) {
+ fprintf(stderr, "Failed to setup vrf cgroup2 directory\n");
+ free(mnt);
+ return NULL;
+ }
+
+ if (mount("none", mnt, CGROUP2_FS_NAME, 0, NULL)) {
+ /* EBUSY means already mounted */
+ if (errno != EBUSY) {
+ fprintf(stderr,
+ "Failed to mount cgroup2. Are CGROUPS enabled in your kernel?\n");
+ free(mnt);
+ return NULL;
+ }
+ }
+ return mnt;
+}
+
+int make_path(const char *path, mode_t mode)
+{
+ char *dir, *delim;
+ struct stat sbuf;
+ int rc = -1;
+
+ delim = dir = strdup(path);
+ if (dir == NULL) {
+ fprintf(stderr, "strdup failed copying path");
+ return -1;
+ }
+
+ /* skip '/' -- it had better exist */
+ if (*delim == '/')
+ delim++;
+
+ while (1) {
+ delim = strchr(delim, '/');
+ if (delim)
+ *delim = '\0';
+
+ if (stat(dir, &sbuf) != 0) {
+ if (errno != ENOENT) {
+ fprintf(stderr,
+ "stat failed for %s: %s\n",
+ dir, strerror(errno));
+ goto out;
+ }
+
+ if (mkdir(dir, mode) != 0) {
+ fprintf(stderr,
+ "mkdir failed for %s: %s",
+ dir, strerror(errno));
+ goto out;
+ }
+ }
+
+ if (delim == NULL)
+ break;
+
+ *delim = '/';
+ delim++;
+ if (*delim == '\0')
+ break;
+ }
+ rc = 0;
+out:
+ free(dir);
+
+ return rc;
+}
--
2.1.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox