* Re: [RFC 1/3] devlink: Add config parameter get/set operations
From: Andy Gospodarek @ 2017-10-12 18:12 UTC (permalink / raw)
To: Jiri Pirko
Cc: Steve Lin, netdev, jiri, davem, michael.chan, linux-pci, linville
In-Reply-To: <20171012140317.GC14672@nanopsycho>
On Thu, Oct 12, 2017 at 04:03:17PM +0200, Jiri Pirko wrote:
> Thu, Oct 12, 2017 at 03:34:20PM CEST, steven.lin1@broadcom.com wrote:
> >Add support for config parameter get/set commands. Initially used by
> >bnxt driver, but other drivers can use the same, or new, attributes.
> >The config_get() and config_set() operations operate as expected, but
> >note that the driver implementation of the config_set() operation can
> >indicate whether a restart is necessary for the setting to take
> >effect.
> >
>
> First of all, I like this approach.
>
> I would like to see this patch split into:
> 1) config-options infrastructure introduction
> 2) specific config options introductions - would be best to have it
> per-option. We need to make sure every option is very well described
> and explained usecases. This is needed in order vendors to share
> attributes among drivers.
>
> More nits inlined.
>
>
> >Signed-off-by: Steve Lin <steven.lin1@broadcom.com>
> >Acked-by: Andy Gospodarek <gospo@broadcom.com>
> >---
> > include/net/devlink.h | 4 +
> > include/uapi/linux/devlink.h | 108 ++++++++++++++++++++++
> > net/core/devlink.c | 207 +++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 319 insertions(+)
> >
> > static inline void *devlink_priv(struct devlink *devlink)
> >diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> >index 0cbca96..e959716 100644
> >--- a/include/uapi/linux/devlink.h
> >+++ b/include/uapi/linux/devlink.h
[...]
> >@@ -202,6 +267,49 @@ enum devlink_attr {
> >
> > DEVLINK_ATTR_ESWITCH_ENCAP_MODE, /* u8 */
> >
> >+ /* Configuration Parameters */
> >+ DEVLINK_ATTR_SRIOV_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_NUM_VF_PER_PF, /* u32 */
> >+ DEVLINK_ATTR_MAX_NUM_PF_MSIX_VECT, /* u32 */
> >+ DEVLINK_ATTR_MSIX_VECTORS_PER_VF, /* u32 */
> >+ DEVLINK_ATTR_NPAR_NUM_PARTITIONS_PER_PORT, /* u32 */
> >+ DEVLINK_ATTR_NPAR_BW_IN_PERCENT, /* u8 */
> >+ DEVLINK_ATTR_NPAR_BW_RESERVATION, /* u8 */
> >+ DEVLINK_ATTR_NPAR_BW_RESERVATION_VALID, /* u8 */
> >+ DEVLINK_ATTR_NPAR_BW_LIMIT, /* u8 */
> >+ DEVLINK_ATTR_NPAR_BW_LIMIT_VALID, /* u8 */
> >+ DEVLINK_ATTR_DCBX_MODE, /* u8 */
> >+ DEVLINK_ATTR_RDMA_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_MULTIFUNC_MODE, /* u8 */
> >+ DEVLINK_ATTR_SECURE_NIC_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_IGNORE_ARI_CAPABILITY, /* u8 */
> >+ DEVLINK_ATTR_LLDP_NEAREST_BRIDGE_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_LLDP_NEAREST_NONTPMR_BRIDGE_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_PME_CAPABILITY_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_MAGIC_PACKET_WOL_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_EEE_PWR_SAVE_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_AUTONEG_PROTOCOL, /* u8 */
> >+ DEVLINK_ATTR_MEDIA_AUTO_DETECT, /* u8 */
> >+ DEVLINK_ATTR_PHY_SELECT, /* u8 */
> >+ DEVLINK_ATTR_PRE_OS_LINK_SPEED_D0, /* u8 */
> >+ DEVLINK_ATTR_PRE_OS_LINK_SPEED_D3, /* u8 */
> >+ DEVLINK_ATTR_MBA_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_MBA_BOOT_TYPE, /* u8 */
> >+ DEVLINK_ATTR_MBA_DELAY_TIME, /* u32 */
> >+ DEVLINK_ATTR_MBA_SETUP_HOT_KEY, /* u8 */
> >+ DEVLINK_ATTR_MBA_HIDE_SETUP_PROMPT, /* u8 */
> >+ DEVLINK_ATTR_MBA_BOOT_RETRY_COUNT, /* u32 */
> >+ DEVLINK_ATTR_MBA_VLAN_ENABLED, /* u8 */
> >+ DEVLINK_ATTR_MBA_VLAN_TAG, /* u16 */
> >+ DEVLINK_ATTR_MBA_BOOT_PROTOCOL, /* u8 */
> >+ DEVLINK_ATTR_MBA_LINK_SPEED, /* u8 */
>
> Okay, I think it is about the time we should start thinking about
> putting this new config attributes under nester attribute. What do you
> think?
>
Steve and I actually had a similar discussion yesterday when I was doing
a final review of the patches.
My only objection to nesting was coming up with a way to describe these
functions that made them seem different than existing configuration
options. In this case of the hardware we are trying to support these
are all permanent config options, so we would call them
DEVLINK_ATTR_NVRAM or DEVLINK_ATTR_PERM. Does that seem reasonable to
others?
^ permalink raw reply
* Re: [PATCH v2] net: ftgmac100: Request clock and set speed
From: Benjamin Herrenschmidt @ 2017-10-12 18:20 UTC (permalink / raw)
To: Joel Stanley, David S . Miller; +Cc: netdev, linux-kernel, Andrew Jeffery
In-Reply-To: <20171012033201.12845-1-joel@jms.id.au>
On Thu, 2017-10-12 at 11:32 +0800, Joel Stanley wrote:
> According to the ASPEED datasheet, gigabit speeds require a clock of
> 100MHz or higher. Other speeds require 25MHz or higher. This patch
> configures a 100MHz clock if the system has a direct-attached
> PHY, or 25MHz if the system is running NC-SI which is limited to 100MHz.
>
> There appear to be no other upstream users of the FTGMAC100 driver so it
> is hard to know the clocking requirements of other platforms. Therefore
> a conservative approach was taken with enabling clocks. If the platform
> is not ASPEED, both requesting the clock and configuring the speed is
> skipped.
We might still be able to check the PHY capabilities and it might also
be possible to do the live change as you were doing previously but it
needs testing. So I'm ok with this patch for now, and later I might be
able to try the live change option on the eval board (provided I still
have one) when I come back to Australia.
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
> Andrew, as I'm travelling can you please test this on the evb and a
> palmetto? Use my wip/aspeed-v4.14-clk branch, or OpenBMC's dev-4.13.
>
> David, please wait for Andrew's tested-by before applying.
>
> Cheers!
>
> v2:
> - only touch the clocks on Aspeed platforms
> - unconditionally call clk_unprepare_disable
>
> drivers/net/ethernet/faraday/ftgmac100.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> index 9ed8e4b81530..cd352bf41da1 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -21,6 +21,7 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/clk.h>
> #include <linux/dma-mapping.h>
> #include <linux/etherdevice.h>
> #include <linux/ethtool.h>
> @@ -59,6 +60,9 @@
> /* Min number of tx ring entries before stopping queue */
> #define TX_THRESHOLD (MAX_SKB_FRAGS + 1)
>
> +#define FTGMAC_100MHZ 100000000
> +#define FTGMAC_25MHZ 25000000
> +
> struct ftgmac100 {
> /* Registers */
> struct resource *res;
> @@ -96,6 +100,7 @@ struct ftgmac100 {
> struct napi_struct napi;
> struct work_struct reset_task;
> struct mii_bus *mii_bus;
> + struct clk *clk;
>
> /* Link management */
> int cur_speed;
> @@ -1734,6 +1739,22 @@ static void ftgmac100_ncsi_handler(struct ncsi_dev *nd)
> nd->link_up ? "up" : "down");
> }
>
> +static void ftgmac100_setup_clk(struct ftgmac100_priv *priv)
> +{
> + priv->clk = devm_clk_get(&pdev->dev, NULL);
> + if (IS_ERR(priv->clk))
> + return;
> +
> + clk_prepare_enable(priv->clk);
> +
> + /* Aspeed specifies a 100MHz clock is required for up to
> + * 1000Mbit link speeds. As NCSI is limited to 100Mbit, 25MHz
> + * is sufficient
> + */
> + clk_set_rate(priv->clk, priv->is_ncsi ? FTGMAC_25MHZ :
> + FTGMAC_100MHZ);
> +}
> +
> static int ftgmac100_probe(struct platform_device *pdev)
> {
> struct resource *res;
> @@ -1830,6 +1851,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
> goto err_setup_mdio;
> }
>
> + if (priv->is_aspeed)
> + ftgmac100_setup_clk(priv);
> +
> /* Default ring sizes */
> priv->rx_q_entries = priv->new_rx_q_entries = DEF_RX_QUEUE_ENTRIES;
> priv->tx_q_entries = priv->new_tx_q_entries = DEF_TX_QUEUE_ENTRIES;
> @@ -1883,6 +1907,8 @@ static int ftgmac100_remove(struct platform_device *pdev)
>
> unregister_netdev(netdev);
>
> + clk_disable_unprepare(priv->clk);
> +
> /* There's a small chance the reset task will have been re-queued,
> * during stop, make sure it's gone before we free the structure.
> */
^ permalink raw reply
* Re: [PATCH net-next v3 5/5] selinux: bpf: Add addtional check for bpf object file receive
From: Chenbo Feng @ 2017-10-12 18:28 UTC (permalink / raw)
To: Stephen Smalley
Cc: Daniel Borkmann, netdev, Chenbo Feng, linux-security-module,
SELinux, Alexei Starovoitov, Lorenzo Colitti
In-Reply-To: <1507811112.30307.2.camel@tycho.nsa.gov>
On Thu, Oct 12, 2017 at 5:25 AM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On Wed, 2017-10-11 at 13:43 -0700, Chenbo Feng via Selinux wrote:
>> On Wed, Oct 11, 2017 at 5:54 AM, Stephen Smalley <sds@tycho.nsa.gov>
>> wrote:
>> > On Tue, 2017-10-10 at 17:09 -0700, Chenbo Feng wrote:
>> > > From: Chenbo Feng <fengc@google.com>
>> > >
>> > > Introduce a bpf object related check when sending and receiving
>> > > files
>> > > through unix domain socket as well as binder. It checks if the
>> > > receiving
>> > > process have privilege to read/write the bpf map or use the bpf
>> > > program.
>> > > This check is necessary because the bpf maps and programs are
>> > > using a
>> > > anonymous inode as their shared inode so the normal way of
>> > > checking
>> > > the
>> > > files and sockets when passing between processes cannot work
>> > > properly
>> > > on
>> > > eBPF object. This check only works when the BPF_SYSCALL is
>> > > configured.
>> > > The information stored inside the file security struct is the
>> > > same as
>> > > the information in bpf object security struct.
>> > >
>> > > Signed-off-by: Chenbo Feng <fengc@google.com>
>> > > ---
>> > > include/linux/lsm_hooks.h | 17 ++++++++++
>> > > include/linux/security.h | 9 ++++++
>> > > kernel/bpf/syscall.c | 27 ++++++++++++++--
>> > > security/security.c | 8 +++++
>> > > security/selinux/hooks.c | 67
>> > > +++++++++++++++++++++++++++++++++++++++
>> > > security/selinux/include/objsec.h | 9 ++++++
>> > > 6 files changed, 135 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/include/linux/lsm_hooks.h
>> > > b/include/linux/lsm_hooks.h
>> > > index 7161d8e7ee79..517dea60b87b 100644
>> > > --- a/include/linux/lsm_hooks.h
>> > > +++ b/include/linux/lsm_hooks.h
>> > > @@ -1385,6 +1385,19 @@
>> > > * @bpf_prog_free_security:
>> > > * Clean up the security information stored inside bpf prog.
>> > > *
>> > > + * @bpf_map_file:
>> > > + * When creating a bpf map fd, set up the file security
>> > > information with
>> > > + * the bpf security information stored in the map struct. So
>> > > when the map
>> > > + * fd is passed between processes, the security module can
>> > > directly read
>> > > + * the security information from file security struct rather
>> > > than the bpf
>> > > + * security struct.
>> > > + *
>> > > + * @bpf_prog_file:
>> > > + * When creating a bpf prog fd, set up the file security
>> > > information with
>> > > + * the bpf security information stored in the prog struct. So
>> > > when the prog
>> > > + * fd is passed between processes, the security module can
>> > > directly read
>> > > + * the security information from file security struct rather
>> > > than the bpf
>> > > + * security struct.
>> > > */
>> > > union security_list_options {
>> > > int (*binder_set_context_mgr)(struct task_struct *mgr);
>> > > @@ -1726,6 +1739,8 @@ union security_list_options {
>> > > void (*bpf_map_free_security)(struct bpf_map *map);
>> > > int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
>> > > void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
>> > > + void (*bpf_map_file)(struct bpf_map *map, struct file
>> > > *file);
>> > > + void (*bpf_prog_file)(struct bpf_prog_aux *aux, struct file
>> > > *file);
>> > > #endif /* CONFIG_BPF_SYSCALL */
>> > > };
>> > >
>> > > @@ -1954,6 +1969,8 @@ struct security_hook_heads {
>> > > struct list_head bpf_map_free_security;
>> > > struct list_head bpf_prog_alloc_security;
>> > > struct list_head bpf_prog_free_security;
>> > > + struct list_head bpf_map_file;
>> > > + struct list_head bpf_prog_file;
>> > > #endif /* CONFIG_BPF_SYSCALL */
>> > > } __randomize_layout;
>> > >
>> > > diff --git a/include/linux/security.h b/include/linux/security.h
>> > > index 18800b0911e5..57573b794e2d 100644
>> > > --- a/include/linux/security.h
>> > > +++ b/include/linux/security.h
>> > > @@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct
>> > > bpf_map *map);
>> > > extern void security_bpf_map_free(struct bpf_map *map);
>> > > extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
>> > > extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
>> > > +extern void security_bpf_map_file(struct bpf_map *map, struct
>> > > file
>> > > *file);
>> > > +extern void security_bpf_prog_file(struct bpf_prog_aux *aux,
>> > > struct
>> > > file *file);
>> > > #else
>> > > static inline int security_bpf(int cmd, union bpf_attr *attr,
>> > > unsigned int size)
>> > > @@ -1772,6 +1774,13 @@ static inline int
>> > > security_bpf_prog_alloc(struct bpf_prog_aux *aux)
>> > >
>> > > static inline void security_bpf_prog_free(struct bpf_prog_aux
>> > > *aux)
>> > > { }
>> > > +
>> > > +static inline void security_bpf_map_file(struct bpf_map *map,
>> > > struct
>> > > file *file)
>> > > +{ }
>> > > +
>> > > +static inline void security_bpf_prog_file(struct bpf_prog_aux
>> > > *aux,
>> > > + struct file *file)
>> > > +{ }
>> > > #endif /* CONFIG_SECURITY */
>> > > #endif /* CONFIG_BPF_SYSCALL */
>> > >
>> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> > > index 1cf31ddd7616..aee69e564c50 100644
>> > > --- a/kernel/bpf/syscall.c
>> > > +++ b/kernel/bpf/syscall.c
>> > > @@ -324,11 +324,22 @@ static const struct file_operations
>> > > bpf_map_fops = {
>> > >
>> > > int bpf_map_new_fd(struct bpf_map *map, int flags)
>> > > {
>> > > + int fd;
>> > > + struct fd f;
>> > > if (security_bpf_map(map, OPEN_FMODE(flags)))
>> > > return -EPERM;
>> > >
>> > > - return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
>> > > + fd = anon_inode_getfd("bpf-map", &bpf_map_fops, map,
>> > > flags | O_CLOEXEC);
>> > > + if (fd < 0)
>> > > + return fd;
>> > > +
>> > > + f = fdget(fd);
>> > > + if (!f.file)
>> > > + return -EBADF;
>> >
>> > This seems convoluted and unnecessarily inefficient, since
>> > anon_inode_getfd() has the struct file and could have directly
>> > returned
>> > it instead of having to go through fdget() on a fd we just
>> > installed.
>> > Also, couldn't the fd->file mapping have changed underneath us
>> > between
>> > fd_install() and fdget()?
>> > I would think it would be safer and more efficient to create an
>> > anon_inode_getfdandfile() or similar interface and use that, so
>> > that we
>> > can just pass the file it set up to the hook. Obviously that would
>> > need to be reviewed by the vfs folks.
>> >
>>
>> Do you mean create a anonymous inode interface specifically for eBPF
>> object? Is it okay that we add the hooks inside anon_inode_getfd and
>> pass the file to the hook before fd install.
>
> No, I meant to create a general helper, anon_inode_getfile(), that
> returns the file and the fd to the caller, and then the BPF-specific
> logic can stay in the BPF code.
>
> However, if storing the bpf type in the file_security_struct is in fact
> having a significant impact on per-file memory usage, then perhaps your
> original approach of exporting and testing the fops was the right one,
> albeit ugly.
>
It seems to me adding a bpf_type field inside the file_security_struct
have the same impact on selinux_file_security cache size as adding
both bpf_type and bpf_sid. Both implementation will cause a
significant increase on memory size. I am testing on a ARM 64bit
device and it might be different on x86 devices. I will return to the
original way to implement the bpf_ops check instead of adding new
field.
>> > > + security_bpf_map_file(map, f.file);
>> > > + fdput(f);
>> > > + return fd;
>> > > }
>> > >
>> > > int bpf_get_file_flag(int flags)
>> > > @@ -975,11 +986,23 @@ static const struct file_operations
>> > > bpf_prog_fops = {
>> > >
>> > > int bpf_prog_new_fd(struct bpf_prog *prog)
>> > > {
>> > > + int fd;
>> > > + struct fd f;
>> > > +
>> > > if (security_bpf_prog(prog))
>> > > return -EPERM;
>> > >
>> > > - return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
>> > > + fd = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
>> > > O_RDWR | O_CLOEXEC);
>> > > + if (fd < 0)
>> > > + return fd;
>> > > +
>> > > + f = fdget(fd);
>> > > + if (!f.file)
>> > > + return -EBADF;
>> > > + security_bpf_prog_file(prog->aux, f.file);
>> > > + fdput(f);
>> > > + return fd;
>> > > }
>> > >
>> > > static struct bpf_prog *____bpf_prog_get(struct fd f)
>> > > diff --git a/security/security.c b/security/security.c
>> > > index 1cd8526cb0b7..dacf649b8cfa 100644
>> > > --- a/security/security.c
>> > > +++ b/security/security.c
>> > > @@ -1734,4 +1734,12 @@ void security_bpf_prog_free(struct
>> > > bpf_prog_aux *aux)
>> > > {
>> > > call_void_hook(bpf_prog_free_security, aux);
>> > > }
>> > > +void security_bpf_map_file(struct bpf_map *map, struct file
>> > > *file)
>> > > +{
>> > > + call_void_hook(bpf_map_file, map, file);
>> > > +}
>> > > +void security_bpf_prog_file(struct bpf_prog_aux *aux, struct
>> > > file
>> > > *file)
>> > > +{
>> > > + call_void_hook(bpf_prog_file, aux, file);
>> > > +}
>> > > #endif /* CONFIG_BPF_SYSCALL */
>> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> > > index 94e473b9c884..0a6ef20513b0 100644
>> > > --- a/security/selinux/hooks.c
>> > > +++ b/security/selinux/hooks.c
>> > > @@ -1815,6 +1815,10 @@ static inline int file_path_has_perm(const
>> > > struct cred *cred,
>> > > return inode_has_perm(cred, file_inode(file), av, &ad);
>> > > }
>> > >
>> > > +#ifdef CONFIG_BPF_SYSCALL
>> > > +static int bpf_file_check(struct file *file, u32 sid);
>> > > +#endif
>> > > +
>> > > /* Check whether a task can use an open file descriptor to
>> > > access an inode in a given way. Check access to the
>> > > descriptor itself, and then use dentry_has_perm to
>> > > @@ -1845,6 +1849,14 @@ static int file_has_perm(const struct cred
>> > > *cred,
>> > > goto out;
>> > > }
>> > >
>> > > +#ifdef CONFIG_BPF_SYSCALL
>> > > + if (fsec->bpf_type) {
>> > > + rc = bpf_file_check(file, cred_sid(cred));
>> > > + if (rc)
>> > > + goto out;
>> > > + }
>> > > +#endif
>> > > +
>> > > /* av is zero if only checking access to the descriptor. */
>> > > rc = 0;
>> > > if (av)
>> > > @@ -2165,6 +2177,14 @@ static int
>> > > selinux_binder_transfer_file(struct
>> > > task_struct *from,
>> > > return rc;
>> > > }
>> > >
>> > > +#ifdef CONFIG_BPF_SYSCALL
>> > > + if (fsec->bpf_type) {
>> > > + rc = bpf_file_check(file, sid);
>> > > + if (rc)
>> > > + return rc;
>> > > + }
>> > > +#endif
>> > > +
>> > > if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
>> > > return 0;
>> > >
>> > > @@ -6288,6 +6308,33 @@ static u32 bpf_map_fmode_to_av(fmode_t
>> > > fmode)
>> > > return av;
>> > > }
>> > >
>> > > +/* This function will check the file pass through unix socket or
>> > > binder to see
>> > > + * if it is a bpf related object. And apply correspinding checks
>> > > on
>> > > the bpf
>> > > + * object based on the type. The bpf maps and programs, not like
>> > > other files and
>> > > + * socket, are using a shared anonymous inode inside the kernel
>> > > as
>> > > their inode.
>> > > + * So checking that inode cannot identify if the process have
>> > > privilege to
>> > > + * access the bpf object and that's why we have to add this
>> > > additional check in
>> > > + * selinux_file_receive and selinux_binder_transfer_files.
>> > > + */
>> > > +static int bpf_file_check(struct file *file, u32 sid)
>> > > +{
>> > > + struct file_security_struct *fsec = file->f_security;
>> > > + int ret;
>> > > +
>> > > + if (fsec->bpf_type == BPF_MAP) {
>> > > + ret = avc_has_perm(sid, fsec->bpf_sid,
>> > > SECCLASS_BPF,
>> > > + bpf_map_fmode_to_av(file-
>> > > > f_mode), NULL);
>> > >
>> > > + if (ret)
>> > > + return ret;
>> > > + } else if (fsec->bpf_type == BPF_PROG) {
>> > > + ret = avc_has_perm(sid, fsec->bpf_sid,
>> > > SECCLASS_BPF,
>> > > + BPF__PROG_USE, NULL);
>> > > + if (ret)
>> > > + return ret;
>> > > + }
>> > > + return 0;
>> > > +}
>> > > +
>> > > static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
>> > > {
>> > > u32 sid = current_sid();
>> > > @@ -6351,6 +6398,24 @@ static void selinux_bpf_prog_free(struct
>> > > bpf_prog_aux *aux)
>> > > aux->security = NULL;
>> > > kfree(bpfsec);
>> > > }
>> > > +
>> > > +static void selinux_bpf_map_file(struct bpf_map *map, struct
>> > > file
>> > > *file)
>> > > +{
>> > > + struct bpf_security_struct *bpfsec = map->security;
>> > > + struct file_security_struct *fsec = file->f_security;
>> > > +
>> > > + fsec->bpf_type = BPF_MAP;
>> > > + fsec->bpf_sid = bpfsec->sid;
>> > > +}
>> > > +
>> > > +static void selinux_bpf_prog_file(struct bpf_prog_aux *aux,
>> > > struct
>> > > file *file)
>> > > +{
>> > > + struct bpf_security_struct *bpfsec = aux->security;
>> > > + struct file_security_struct *fsec = file->f_security;
>> > > +
>> > > + fsec->bpf_type = BPF_PROG;
>> > > + fsec->bpf_sid = bpfsec->sid;
>> > > +}
>> > > #endif
>> > >
>> > > static struct security_hook_list selinux_hooks[]
>> > > __lsm_ro_after_init
>> > > = {
>> > > @@ -6581,6 +6646,8 @@ static struct security_hook_list
>> > > selinux_hooks[] __lsm_ro_after_init = {
>> > > LSM_HOOK_INIT(bpf_prog_alloc_security,
>> > > selinux_bpf_prog_alloc),
>> > > LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
>> > > LSM_HOOK_INIT(bpf_prog_free_security,
>> > > selinux_bpf_prog_free),
>> > > + LSM_HOOK_INIT(bpf_map_file, selinux_bpf_map_file),
>> > > + LSM_HOOK_INIT(bpf_prog_file, selinux_bpf_prog_file),
>> > > #endif
>> > > };
>> > >
>> > > diff --git a/security/selinux/include/objsec.h
>> > > b/security/selinux/include/objsec.h
>> > > index 3d54468ce334..0162648761f9 100644
>> > > --- a/security/selinux/include/objsec.h
>> > > +++ b/security/selinux/include/objsec.h
>> > > @@ -67,11 +67,20 @@ struct inode_security_struct {
>> > > spinlock_t lock;
>> > > };
>> > >
>> > > +enum bpf_obj_type {
>> > > + BPF_MAP = 1,
>> > > + BPF_PROG,
>> > > +};
>> > > +
>> > > struct file_security_struct {
>> > > u32 sid; /* SID of open file description */
>> > > u32 fown_sid; /* SID of file owner (for
>> > > SIGIO) */
>> > > u32 isid; /* SID of inode at the time of file
>> > > open */
>> > > u32 pseqno; /* Policy seqno at the time of
>> > > file open */
>> > > +#ifdef CONFIG_BPF_SYSCALL
>> > > + unsigned char bpf_type;
>> > > + u32 bpf_sid;
>> > > +#endif
>> > > };
>> >
>> > Can you check how this impacts the size of the file_security_cache
>> > objects, and thus the memory overhead imposed on all open files?
>> >
>> > If it is significant, do we need to cache the bpf_sid here or could
>> > we
>> > just store the bpf_type and then dereference the bpfsec if it is a
>> > map
>> > or prog?
>> >
>> > From proc/slabinfo I find the number of object and the object size
>>
>> grows a lot after adding this two field. I will try to dereference
>> the
>> bpfsec instead to see if it helps.
>> > >
>> > > struct superblock_security_struct {
^ permalink raw reply
* [net-next PATCH] mqprio: Reserve last 32 classid values for HW traffic classes and misc IDs
From: Alexander Duyck @ 2017-10-12 18:38 UTC (permalink / raw)
To: jiri, amritha.nambiar, vinicius.gomes, netdev, jhs,
jesus.sanchez-palencia, davem
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch makes a slight tweak to mqprio in order to bring the
classid values used back in line with what is used for mq. The general idea
is to reserve values :ffe0 - :ffef to identify hardware traffic classes
normally reported via dev->num_tc. By doing this we can maintain a
consistent behavior with mq for classid where :1 - :ffdf will represent a
physical qdisc mapped onto a Tx queue represented by classid - 1, and the
traffic classes will be mapped onto a known subset of classid values
reserved for our virtual qdiscs.
Note I reserved the range from :fff0 - :ffff since this way we might be
able to reuse these classid values with clsact and ingress which would mean
that for mq, mqprio, ingress, and clsact we should be able to maintain a
similar classid layout.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
So I thought I would put this out here as a first step towards trying to
address some of Jiri's concerns about wanting to have a consistent
userspace API.
The plan is to follow this up with patches to ingress and clsact to look at
exposing a set of virtual qdiscs similar to what we already have for the HW
traffic classes in mqprio, although I won't bother with the ability to dump
class stats since they don't actually enqueue anything.
include/uapi/linux/pkt_sched.h | 1 +
net/sched/sch_mqprio.c | 79 +++++++++++++++++++++++-----------------
2 files changed, 47 insertions(+), 33 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf5528fed..174f1cf7e7f9 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -74,6 +74,7 @@ struct tc_estimator {
#define TC_H_INGRESS (0xFFFFFFF1U)
#define TC_H_CLSACT TC_H_INGRESS
+#define TC_H_MIN_PRIORITY 0xFFE0U
#define TC_H_MIN_INGRESS 0xFFF2U
#define TC_H_MIN_EGRESS 0xFFF3U
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6bcdfe6e7b63..a61ef119a556 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -115,6 +115,10 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
if (!netif_is_multiqueue(dev))
return -EOPNOTSUPP;
+ /* make certain can allocate enough classids to handle queues */
+ if (dev->num_tx_queues >= TC_H_MIN_PRIORITY)
+ return -ENOMEM;
+
if (!opt || nla_len(opt) < sizeof(*qopt))
return -EINVAL;
@@ -193,7 +197,7 @@ static struct netdev_queue *mqprio_queue_get(struct Qdisc *sch,
unsigned long cl)
{
struct net_device *dev = qdisc_dev(sch);
- unsigned long ntx = cl - 1 - netdev_get_num_tc(dev);
+ unsigned long ntx = cl - 1;
if (ntx >= dev->num_tx_queues)
return NULL;
@@ -282,38 +286,35 @@ static unsigned long mqprio_find(struct Qdisc *sch, u32 classid)
struct net_device *dev = qdisc_dev(sch);
unsigned int ntx = TC_H_MIN(classid);
- if (ntx > dev->num_tx_queues + netdev_get_num_tc(dev))
- return 0;
- return ntx;
+ /* There are essentially two regions here that have valid classid
+ * values. The first region will have a classid value of 1 through
+ * num_tx_queues. All of these are backed by actual Qdiscs.
+ */
+ if (ntx < TC_H_MIN_PRIORITY)
+ return (ntx <= dev->num_tx_queues) ? ntx : 0;
+
+ /* The second region represents the hardware traffic classes. These
+ * are represented by classid values of TC_H_MIN_PRIORITY through
+ * TC_H_MIN_PRIORITY + netdev_get_num_tc - 1
+ */
+ return ((ntx - TC_H_MIN_PRIORITY) < netdev_get_num_tc(dev)) ? ntx : 0;
}
static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
struct sk_buff *skb, struct tcmsg *tcm)
{
- struct net_device *dev = qdisc_dev(sch);
+ if (cl < TC_H_MIN_PRIORITY) {
+ struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
+ struct net_device *dev = qdisc_dev(sch);
+ int tc = netdev_txq_to_tc(dev, cl - 1);
- if (cl <= netdev_get_num_tc(dev)) {
+ tcm->tcm_parent = (tc < 0) ? 0 :
+ TC_H_MAKE(TC_H_MAJ(sch->handle),
+ TC_H_MIN(tc + TC_H_MIN_PRIORITY));
+ tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
+ } else {
tcm->tcm_parent = TC_H_ROOT;
tcm->tcm_info = 0;
- } else {
- int i;
- struct netdev_queue *dev_queue;
-
- dev_queue = mqprio_queue_get(sch, cl);
- tcm->tcm_parent = 0;
- for (i = 0; i < netdev_get_num_tc(dev); i++) {
- struct netdev_tc_txq tc = dev->tc_to_txq[i];
- int q_idx = cl - netdev_get_num_tc(dev);
-
- if (q_idx > tc.offset &&
- q_idx <= tc.offset + tc.count) {
- tcm->tcm_parent =
- TC_H_MAKE(TC_H_MAJ(sch->handle),
- TC_H_MIN(i + 1));
- break;
- }
- }
- tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
}
tcm->tcm_handle |= TC_H_MIN(cl);
return 0;
@@ -324,15 +325,14 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
__releases(d->lock)
__acquires(d->lock)
{
- struct net_device *dev = qdisc_dev(sch);
-
- if (cl <= netdev_get_num_tc(dev)) {
+ if (cl >= TC_H_MIN_PRIORITY) {
int i;
__u32 qlen = 0;
struct Qdisc *qdisc;
struct gnet_stats_queue qstats = {0};
struct gnet_stats_basic_packed bstats = {0};
- struct netdev_tc_txq tc = dev->tc_to_txq[cl - 1];
+ struct net_device *dev = qdisc_dev(sch);
+ struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
/* Drop lock here it will be reclaimed before touching
* statistics this is required because the d->lock we
@@ -385,12 +385,25 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
/* Walk hierarchy with a virtual class per tc */
arg->count = arg->skip;
- for (ntx = arg->skip;
- ntx < dev->num_tx_queues + netdev_get_num_tc(dev);
- ntx++) {
+ for (ntx = arg->skip; ntx < netdev_get_num_tc(dev); ntx++) {
+ if (arg->fn(sch, ntx + TC_H_MIN_PRIORITY, arg) < 0) {
+ arg->stop = 1;
+ return;
+ }
+ arg->count++;
+ }
+
+ /* Pad the values and skip over unused traffic classes */
+ if (ntx < TC_MAX_QUEUE) {
+ arg->count = TC_MAX_QUEUE;
+ ntx = TC_MAX_QUEUE;
+ }
+
+ /* Reset offset, sort out remaining per-queue qdiscs */
+ for (ntx -= TC_MAX_QUEUE; ntx < dev->num_tx_queues; ntx++) {
if (arg->fn(sch, ntx + 1, arg) < 0) {
arg->stop = 1;
- break;
+ return;
}
arg->count++;
}
^ permalink raw reply related
* Re: BUG:af_packet fails to TX TSO frames
From: Willem de Bruijn @ 2017-10-12 18:41 UTC (permalink / raw)
To: Anton Ivanov; +Cc: Network Development, David Miller
In-Reply-To: <beef71a3-2cfc-95d7-f334-04685d31b640@cambridgegreys.com>
On Thu, Oct 12, 2017 at 1:58 PM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
>
>
> On 10/12/17 18:25, Willem de Bruijn wrote:
>>
>> On Thu, Oct 12, 2017 at 12:30 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>>
>>> On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov
>>> <anton.ivanov@cambridgegreys.com> wrote:
>>>>
>>>> Found it.
>>>>
>>>> Two bugs canceling each other.
>>>> The bind sequence in: psock_txring_vnet.c is wrong.
>>>>
>>>> It does the following addr.sll_protocol = htons(ETH_P_IP);
>>>> before calling bind.
>>>>
>>>> If you set addr.sll_protocol to ETH_P_ALL where it should have been in
>>>> the
>>>> first place the test program blows up with -ENOBUFS
>>>
>>> There is no such requirement that the socket should bind to ETH_P_ALL.
>
>
> There is no requirement to bind to ETH_P_IP either and most code examples
> going back more than 10 years to the days of TCP Illustrated use ALL.
For send only sockets, it is often advised to pass 0 as protocol to
socket(), so as to avoid having to spend cycles on packet reception
at all.
Commit c72219b75fde ("packet: infer protocol from ethernet header
if unset") explicitly added logic to infer skb->protocol for this common
case of sockets, if using rings.
> I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken
> and returns NOBUF and vice versa.
Given that skb->protocol is set from proto, that is indeed not expected to work.
>>>> I think what is happening is that this value is taken into account when
>>>> looking at "what should I use to segment it with" in skb_mac_gso_segment
>>>> which is invoked at the end of the verification chain which starts in
>>>> packet_direct_xmit in af_packet.c
>>>
>>> packet_snd sets skb->protocol based on the protocol that the packet
>>> socket is bound to. Binding to ETH_P_IP is the right choice here.
>>
>> To avoid having to open multiple sockets for different protocols,
>> sockaddr_ll can also be passed in the msg_name argument on
>> each call.
>
>
> Does not work for vnet headers - it honors what you bound with. I tried to
> bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed
Odd. The code for looking up proto in packet_snd looks fairly straightforward:
/*
* Get and verify the address.
*/
if (likely(saddr == NULL)) {
dev = packet_cached_dev_get(po);
proto = po->num;
addr = NULL;
} else {
err = -EINVAL;
if (msg->msg_namelen < sizeof(struct sockaddr_ll))
goto out;
if (msg->msg_namelen < (saddr->sll_halen +
offsetof(struct sockaddr_ll, sll_addr)))
goto out;
proto = saddr->sll_protocol;
addr = saddr->sll_addr;
dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex);
}
followed later by
skb->protocol = proto;
skb->dev = dev;
^ permalink raw reply
* Re: [net-next 01/18] tipc: add ability to order and receive topology events in driver
From: David Miller @ 2017-10-12 18:54 UTC (permalink / raw)
To: jon.maloy; +Cc: netdev, parthasarathy.bhuvaragan, ying.xue, tipc-discussion
In-Reply-To: <1507816959-31787-2-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 12 Oct 2017 16:02:22 +0200
> @@ -288,6 +289,7 @@ static int tipc_receive_from_sock(struct tipc_conn *con)
> }
>
> static int tipc_accept_from_sock(struct tipc_conn *con)
> +
> {
This whitespace change seems unintentional.
> +bool tipc_topsrv_kern_subscr(struct net *net, u32 port, u32 type,
> + u32 lower, u32 upper, int *conid)
> +{
> + struct tipc_server *s;
> + struct tipc_conn *con;
> + struct tipc_subscriber *scbr;
> + struct tipc_subscr sub;
Please order local variables from longest to shortest line.
> static void tipc_send_to_sock(struct tipc_conn *con)
> {
> int count = 0;
> struct tipc_server *s = con->server;
> + struct tipc_event *evt;
> struct outqueue_entry *e;
> struct msghdr msg;
> int ret;
Likewise.
^ permalink raw reply
* Re: [net-next 04/18] tipc: refactor function filter_rcv()
From: David Miller @ 2017-10-12 18:56 UTC (permalink / raw)
To: jon.maloy; +Cc: netdev, parthasarathy.bhuvaragan, ying.xue, tipc-discussion
In-Reply-To: <1507816959-31787-5-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 12 Oct 2017 16:02:25 +0200
> +static void tipc_sk_proto_rcv(struct sock *sk,
> + struct sk_buff_head *inputq,
> + struct sk_buff_head *xmitq)
> +{
> + struct tipc_sock *tsk = tipc_sk(sk);
> + struct sk_buff *skb = __skb_dequeue(inputq);
> + struct tipc_msg *hdr = buf_msg(skb);
Please order local variables from longest to shortest line.
^ permalink raw reply
* Re: [net-next 06/18] tipc: improve destination linked list
From: David Miller @ 2017-10-12 18:57 UTC (permalink / raw)
To: jon.maloy; +Cc: netdev, parthasarathy.bhuvaragan, ying.xue, tipc-discussion
In-Reply-To: <1507816959-31787-7-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 12 Oct 2017 16:02:27 +0200
> @@ -259,19 +259,19 @@ static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *pkts,
> struct tipc_nlist *dests, u16 *cong_link_cnt)
> {
> struct sk_buff_head _pkts;
> - struct u32_item *n, *tmp;
> - u32 dst, selector;
> + struct tipc_dest *dst, *tmp;
> + u32 dnode, selector;
Order local variables from longest to shortest line please.
> -bool u32_find(struct list_head *l, u32 value)
> +struct tipc_dest *tipc_dest_find(struct list_head *l, u32 node, u32 port)
> {
> - struct u32_item *item;
> + struct tipc_dest *dst;
> + u64 value = (u64)node << 32 | port;
Likewise.
> -bool u32_push(struct list_head *l, u32 value)
> +bool tipc_dest_push(struct list_head *l, u32 node, u32 port)
> {
> - struct u32_item *item;
> + struct tipc_dest *dst;
> + u64 value = (u64)node << 32 | port;
Likewise.
You're probably tired of hearing me say this, so I'll just ask that you
audit the rest of your patch series for this problem as well.
Thank you :-)
^ permalink raw reply
* Re: [RFC 0/3] Adding config get/set to devlink
From: David Miller @ 2017-10-12 18:59 UTC (permalink / raw)
To: jiri
Cc: roopa, steven.lin1, netdev, jiri, michael.chan, linux-pci,
linville, gospo
In-Reply-To: <20171012144032.GG14672@nanopsycho>
From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 12 Oct 2017 16:40:32 +0200
> Thu, Oct 12, 2017 at 04:35:10PM CEST, roopa@cumulusnetworks.com wrote:
>>On Thu, Oct 12, 2017 at 6:34 AM, Steve Lin <steven.lin1@broadcom.com> wrote:
>>> Adds a devlink command for getting & setting device configuration
>>> parameters, and enumerates a bunch of those parameters as devlink
>>> attributes. Also introduces an attribute that can be set by a
>>> driver to indicate that the config change doesn't take effect
>>> until the next restart (as in the case of the bnxt driver changes
>>> in this patchset, for which all the configuration changes affect NVM
>>> only, and aren't loaded until the next restart.)
>>>
>>> bnxt driver patches make use of these new devlink cmds/attributes.
>>>
>>> Steve Lin (3):
>>> devlink: Add config parameter get/set operations
>>> bnxt: Move generic devlink code to new file
>>> bnxt: Add devlink support for config get/set
>>>
>>
>>Is the goal here to move all ethtool operations to devlink (I saw some
>>attrs related to speed etc). ?.
>>We do need to move ethtool attrs to netlink and devlink is a good
>>place (and of-course leave the current ethtool api around for backward
>>compatibility).
>
> We need to make sure we are not moving things to devlink which don't
> belong there. All options that use "netdev" as a handle should go into
> rtnetlink instead.
I agree. Let's keep devlink what is was created for, which is situations
where we don't have a netdev as the object to work upon.
^ permalink raw reply
* Re: [RFC 0/3] Adding config get/set to devlink
From: David Miller @ 2017-10-12 19:01 UTC (permalink / raw)
To: roopa
Cc: jiri, steven.lin1, netdev, jiri, michael.chan, linux-pci,
linville, gospo
In-Reply-To: <CAJieiUiFmBr72yYP9VNtm0VBTNjJcA6Lj5nHJNBjDBC4moxE2A@mail.gmail.com>
From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Thu, 12 Oct 2017 07:46:24 -0700
> FWIS, devlink is a driver api just like ethtool is.
Devlink a driver API for doing operations where we don't have a
specific 'netdev' object to work upon.
> and ethtool needs to move to netlink soon...and It would be better to
> not put the rtnl_lock burden on ethtool driver operations. Instead of
> adding yet another driver api, extending devlink seems like a great
> fit to me.
You can use genetlink and avoid RTNL.
Also, Florian Westphal has been pushing the RTNL lock down into the
actual rtnetlink operation implementations. So one does not have to
avoid rtnetlink to avoid the RTNL mutex at all.
^ permalink raw reply
* Re: [RFC 0/3] Adding config get/set to devlink
From: David Miller @ 2017-10-12 19:01 UTC (permalink / raw)
To: jiri
Cc: roopa, steven.lin1, netdev, jiri, michael.chan, linux-pci,
linville, gospo
In-Reply-To: <20171012150419.GI14672@nanopsycho>
From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 12 Oct 2017 17:04:19 +0200
> Not sure we want to go this way and add "netdev"-handle things into
> devlink. Thoughts?
I think we should avoid this and keep devlink to it's designed purpose.
^ permalink raw reply
* Re: [RFC 0/3] Adding config get/set to devlink
From: David Miller @ 2017-10-12 19:06 UTC (permalink / raw)
To: f.fainelli
Cc: jiri, roopa, steven.lin1, netdev, jiri, michael.chan, linux-pci,
linville, gospo
In-Reply-To: <24E5DE7C-A401-48BF-BF80-673ACC38FBBE@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 12 Oct 2017 08:43:59 -0700
> Once we move ethtool (or however we name its successor) over to
> netlink there is an opportunity for accessing objects that do and do
> not have a netdevice representor today (e.g: management ports on
> switches) with the same interface, and devlink could be used for
> that.
That is an interesting angle for including this in devlink.
I'm not so sure what to do about this.
One suggestion is that devlink is used for getting ethtool stats for
objects lacking netdev representor's, and a new genetlink family is
used for netdev based ethtool.
I think it's important that we don't expand the scope of devlink
beyond what it was originally designed for.
^ permalink raw reply
* Re: [PATCH net-next 0/5] Enable ACB for bcm_sf2 and bcmsysport
From: David Miller @ 2017-10-12 19:19 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <20171011175752.22030-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 11 Oct 2017 10:57:47 -0700
> This patch series enables Broadcom's Advanced Congestion Buffering mechanism
> which requires cooperation between the CPU/Management Ethernet MAC controller
> and the switch.
>
> I took the notifier approach because ultimately the information we need to
> carry to the master network device is DSA specific and I saw little room for
> generalizing beyond what DSA requires. Chances are that this is highly specific
> to the Broadcom HW as I don't know of any HW out there that supports something
> nearly similar for similar or identical needs.
Series applied, thanks.
^ permalink raw reply
* Re: [RFC 0/3] Adding config get/set to devlink
From: Florian Fainelli @ 2017-10-12 19:20 UTC (permalink / raw)
To: David Miller
Cc: jiri, roopa, steven.lin1, netdev, jiri, michael.chan, linux-pci,
linville, gospo
In-Reply-To: <20171012.120650.1063812043202847517.davem@davemloft.net>
On 10/12/2017 12:06 PM, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Thu, 12 Oct 2017 08:43:59 -0700
>
>> Once we move ethtool (or however we name its successor) over to
>> netlink there is an opportunity for accessing objects that do and do
>> not have a netdevice representor today (e.g: management ports on
>> switches) with the same interface, and devlink could be used for
>> that.
>
> That is an interesting angle for including this in devlink.
>
> I'm not so sure what to do about this.
>
> One suggestion is that devlink is used for getting ethtool stats for
> objects lacking netdev representor's, and a new genetlink family is
> used for netdev based ethtool.
Right, I was also thinking along those lines that we we would have a new
generic netlink family for ethtool to support ethtool over netlink.
>
> I think it's important that we don't expand the scope of devlink
> beyond what it was originally designed for.
It seems to me like devlink is well defined in what it is not for: it is
not meant to be used for any object that is/has a net_device, but it is
not well defined for what it can offer to these non network devices. For
instance, we have a tremendous amount of operations that are extremely
specific to its single user(s) such as mlx5 and mlxsw.
For instance, I am not sure how the buffer reservation scheme can be
generalized, and this is always the tricky part with a single user
facility in that you try to generalize the best you can based on the HW
you know. This is not a criticism or meant to be anything negative, this
just happens to be the case, and we did not have anything better.
So maybe the first thing is to clarify what devlink operations can and
should be and what they are absolutely not allowed to cover. We should
also clarify whether a generic set/get that Steven is proposing is
something that we tolerate, or whether there should be specific function
pointers implemented for each attribute, which would be more in line
with what has been done thus far.
--
Florian
^ permalink raw reply
* Re: [PATCH v2 net 0/2] net/smc: ib_query_gid() patches
From: David Miller @ 2017-10-12 19:20 UTC (permalink / raw)
To: ubraun
Cc: netdev, linux-rdma, linux-s390, jwi, schwidefsky, heiko.carstens,
raspl, parav
In-Reply-To: <20171011114723.30733-1-ubraun@linux.vnet.ibm.com>
From: Ursula Braun <ubraun@linux.vnet.ibm.com>
Date: Wed, 11 Oct 2017 13:47:21 +0200
> triggered by Parav Pandit here are 2 cleanup patches for usage of
> ib_query_gid() in the smc-code.
Series applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH] [net-next] ip_tunnel: fix building with NET_IP_TUNNEL=m
From: David Miller @ 2017-10-12 19:21 UTC (permalink / raw)
To: arnd; +Cc: netdev, linux-kernel
In-Reply-To: <20171011135546.3536829-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 11 Oct 2017 15:55:31 +0200
> When af_mpls is built-in but the tunnel support is a module,
> we get a link failure:
>
> net/mpls/af_mpls.o: In function `mpls_init':
> af_mpls.c:(.init.text+0xdc): undefined reference to `ip_tunnel_encap_add_ops'
>
> This adds a Kconfig statement to prevent the broken
> configuration and force mpls to be a module as well in
> this case.
>
> Fixes: bdc476413dcd ("ip_tunnel: add mpls over gre support")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied, thanks Arnd.
^ permalink raw reply
* Re: [PATCH net-next 0/2] Fix IFE meta modules loading
From: David Miller @ 2017-10-12 19:23 UTC (permalink / raw)
To: mrv; +Cc: netdev, jhs
In-Reply-To: <1507733430-17860-1-git-send-email-mrv@mojatatu.com>
From: Roman Mashak <mrv@mojatatu.com>
Date: Wed, 11 Oct 2017 10:50:28 -0400
> Adjust module alias names of IFE meta modules and fix the bug that
> prevented auto-loading IFE modules in run-time.
Anyone using the existing alises will be broken by these changes
no?
Is it possible to define multiple aliases? That would be a way to
get the more desirable names whilst not breaking existing users.
^ permalink raw reply
* Re: [PATCH net-next] sched: tc_mirred: Remove whitespaces
From: David Miller @ 2017-10-12 19:24 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, jhs, xiyou.wangcong, jiri, linux-kernel
In-Reply-To: <20171011181450.22648-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 11 Oct 2017 11:14:49 -0700
> This file contains unnecessary whitespaces as newlines, remove them,
> found by looking at what struct tc_mirred looks like.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] vxge: Clean up unused variables in vxge-traffic
From: David Miller @ 2017-10-12 19:25 UTC (permalink / raw)
To: chris.gekas; +Cc: jdmason, netdev, linux-kernel
In-Reply-To: <1507750018-7131-1-git-send-email-chris.gekas@gmail.com>
From: Christos Gkekas <chris.gekas@gmail.com>
Date: Wed, 11 Oct 2017 20:26:58 +0100
> Delete unused channel variables in vxge-traffic.
>
> Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH] ravb: Consolidate clock handling
From: Sergei Shtylyov @ 2017-10-12 19:31 UTC (permalink / raw)
To: Geert Uytterhoeven, David S . Miller, Simon Horman,
Niklas Söderlund
Cc: netdev, linux-renesas-soc
In-Reply-To: <1507796693-14268-1-git-send-email-geert+renesas@glider.be>
Hello!
On 10/12/2017 11:24 AM, Geert Uytterhoeven wrote:
> The module clock is used for two purposes:
> - Wake-on-LAN (WoL), which is optional,
> - gPTP Timer Increment (GTI) configuration, which is mandatory.
>
> As the clock is needed for GTI configuration anyway, WoL is always
> available. Hence remove duplication and repeated obtaining of the clock
> by making GTI use the stored clock for WoL use.
>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
MBR, Sergei
^ permalink raw reply
* Re: BUG:af_packet fails to TX TSO frames
From: Anton Ivanov @ 2017-10-12 19:55 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: Network Development, David Miller
In-Reply-To: <CAF=yD-K6f_+dw=ivu=jgrqdv+og4gvzPanYSBzizJBvM5rbiig@mail.gmail.com>
[snip]
>> There is no requirement to bind to ETH_P_IP either and most code examples
>> going back more than 10 years to the days of TCP Illustrated use ALL.
> For send only sockets, it is often advised to pass 0 as protocol to
> socket(), so as to avoid having to spend cycles on packet reception
> at all.
Normally, it would have been a single rw socket. Thankfully, I built
into the UML vector IO patchset the ability to use different fds for
read and write per virtual NIC.
Coming back to "normally" - as an application developer I would have
expected a rw socket to work and not to need 2+ sockets for this. Having
to work around at that level is IMHO a bit over the top.
>
> Commit c72219b75fde ("packet: infer protocol from ethernet header
> if unset") explicitly added logic to infer skb->protocol for this common
> case of sockets, if using rings.
I will look into it.
>
>> I just did the v6 test - if you bind with ETH_P_IP GSO on TCPv6 is broken
>> and returns NOBUF and vice versa.
> Given that skb->protocol is set from proto, that is indeed not expected to work.
>
>>>>> I think what is happening is that this value is taken into account when
>>>>> looking at "what should I use to segment it with" in skb_mac_gso_segment
>>>>> which is invoked at the end of the verification chain which starts in
>>>>> packet_direct_xmit in af_packet.c
>>>> packet_snd sets skb->protocol based on the protocol that the packet
>>>> socket is bound to. Binding to ETH_P_IP is the right choice here.
>>> To avoid having to open multiple sockets for different protocols,
>>> sockaddr_ll can also be passed in the msg_name argument on
>>> each call.
>>
>> Does not work for vnet headers - it honors what you bound with. I tried to
>> bind with ETH_ALL and pass ETH_P_IP as an arg and it ENOBUF-ed
> Odd. The code for looking up proto in packet_snd looks fairly straightforward:
I will double-check it tomorrow and send you a pull request for the
updated test application.
>
> /*
> * Get and verify the address.
> */
>
> if (likely(saddr == NULL)) {
> dev = packet_cached_dev_get(po);
> proto = po->num;
> addr = NULL;
> } else {
> err = -EINVAL;
> if (msg->msg_namelen < sizeof(struct sockaddr_ll))
> goto out;
> if (msg->msg_namelen < (saddr->sll_halen +
> offsetof(struct sockaddr_ll, sll_addr)))
> goto out;
> proto = saddr->sll_protocol;
> addr = saddr->sll_addr;
> dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex);
> }
>
> followed later by
>
> skb->protocol = proto;
> skb->dev = dev;
>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
^ permalink raw reply
* Re: Linux 4.9.55 break network setup because dhcp client gets an error
From: Cong Wang @ 2017-10-12 19:56 UTC (permalink / raw)
To: Shuah Khan
Cc: eric.valette, LKML, stable, Greg KH, David Miller, shuahkh,
Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAKocOOPvO4GjWPGVt+dUkti3=GruUz5eu3WXV8D-9fdF+KK0sg@mail.gmail.com>
(Cc'ing netdev and the author Eric)
On Thu, Oct 12, 2017 at 10:57 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
> On Thu, Oct 12, 2017 at 11:52 AM, Eric Valette <eric.valette@free.fr> wrote:
>> Hi,
>>
>> Just compiled a fresh 4.9.55, with same .config, same user space than 4.9.54
>> and discovered I had no network because ifup fails because dhcp cleint
>> fails. As everything is identical, 4.9.54 still works, I guess one patch in
>> net/core/... did break something.
>>
>> ifup eth0
>> Internet Systems Consortium DHCP Client 4.3.5
>> Copyright 2004-2016 Internet Systems Consortium.
>> All rights reserved.
>> For info, please visit https://www.isc.org/software/dhcp/
>>
>> Can't install packet filter program: Cannot allocate memory
>> <=====
>> If you think you have received this message due to a bug rather
>> than a configuration issue please read the section on submitting
>> bugs on either our web page at www.isc.org or in the README file
>> before submitting a bug. These pages explain the proper
>> process and the information we find helpful for debugging..
>>
>> exiting.
>> ifup: failed to bring up eth0
>>
>> Ps: CC me I'm not subscribed
>>
>
> Please revert commit 345c66695569db83eed100723e4df72cb54df7de
>
> Author: Eric Dumazet <edumazet@google.com>
> Date: Mon Oct 2 12:20:51 2017 -0700
>
> socket, bpf: fix possible use after free
>
> This will fix the problem.
>
> thanks,
> -- Shuah
^ permalink raw reply
* RE: [PATCH net-next 1/2] mqprio: Add a new hardware offload type in mqprio
From: Yuval Mintz @ 2017-10-12 20:10 UTC (permalink / raw)
To: Yunsheng Lin, davem@davemloft.net
Cc: huangdaode@hisilicon.com, xuwei5@hisilicon.com,
liguozhu@hisilicon.com, Yisen.Zhuang@huawei.com,
gabriele.paoloni@huawei.com, john.garry@huawei.com,
linuxarm@huawei.com, yisen.zhuang@huawei.com,
salil.mehta@huawei.com, lipeng321@huawei.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1507775912-22402-2-git-send-email-linyunsheng@huawei.com>
> When a driver supports both dcb and hardware offloaded mqprio, and
> user is running mqprio and dcb tool concurrently, the configuration
> set by each tool may be conflicted with each other because the dcb
(for second 'each') s/each/the
> and mqprio may be using the same hardwere offload component and share
s/hardwere/hardware
> the tc system in the network stack.
>
> This patch adds a new offload type to indicate that the underlying
> driver offload prio mapping as part of DCB. If the driver would be
'should' offload
> incapable of that it would refuse the offload. User would then have
> to explicitly request that qdisc offload.
^ permalink raw reply
* Re: Linux 4.9.55 break network setup because dhcp client gets an error
From: Greg KH @ 2017-10-12 20:11 UTC (permalink / raw)
To: Cong Wang
Cc: Shuah Khan, eric.valette, LKML, stable, David Miller, shuahkh,
Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpXcx3rzHf-vw_D0miPC+asUKWN8kUcXSQzgV8vMXHJpFA@mail.gmail.com>
On Thu, Oct 12, 2017 at 12:56:57PM -0700, Cong Wang wrote:
> (Cc'ing netdev and the author Eric)
Already fixed, try 4.9.56 :)
thanks,
greg k-h
^ permalink raw reply
* Re: [RFC 0/3] Adding config get/set to devlink
From: Steve Lin @ 2017-10-12 20:12 UTC (permalink / raw)
To: Florian Fainelli
Cc: David Miller, Jiri Pirko, Roopa Prabhu, netdev, Jiri Pirko,
Michael Chan, linux-pci, John Linville, Andy Gospodarek
In-Reply-To: <21ab4a5d-0b6a-7976-7bf0-acd334f2613f@gmail.com>
On Thu, Oct 12, 2017 at 3:20 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 10/12/2017 12:06 PM, David Miller wrote:
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Date: Thu, 12 Oct 2017 08:43:59 -0700
>>
>>> Once we move ethtool (or however we name its successor) over to
>>> netlink there is an opportunity for accessing objects that do and do
>>> not have a netdevice representor today (e.g: management ports on
>>> switches) with the same interface, and devlink could be used for
>>> that.
>>
>> That is an interesting angle for including this in devlink.
>>
>> I'm not so sure what to do about this.
>>
>> One suggestion is that devlink is used for getting ethtool stats for
>> objects lacking netdev representor's, and a new genetlink family is
>> used for netdev based ethtool.
>
> Right, I was also thinking along those lines that we we would have a new
> generic netlink family for ethtool to support ethtool over netlink.
>
>>
>> I think it's important that we don't expand the scope of devlink
>> beyond what it was originally designed for.
>
> It seems to me like devlink is well defined in what it is not for: it is
> not meant to be used for any object that is/has a net_device, but it is
> not well defined for what it can offer to these non network devices. For
> instance, we have a tremendous amount of operations that are extremely
> specific to its single user(s) such as mlx5 and mlxsw.
>
> For instance, I am not sure how the buffer reservation scheme can be
> generalized, and this is always the tricky part with a single user
> facility in that you try to generalize the best you can based on the HW
> you know. This is not a criticism or meant to be anything negative, this
> just happens to be the case, and we did not have anything better.
>
> So maybe the first thing is to clarify what devlink operations can and
> should be and what they are absolutely not allowed to cover. We should
> also clarify whether a generic set/get that Steven is proposing is
> something that we tolerate, or whether there should be specific function
> pointers implemented for each attribute, which would be more in line
> with what has been done thus far.
Hi Florian,
Some of this is subjective, of course, but just to clarify, it did
seem like implementing a new devlink_op function pointer per attribute
might be more consistent with what's been done so far. But for code
reuse purposes - i.e. to avoid replicating essentially the same
function for each of the 30+ config attributes - I elected to just
implement a single generic get and set devlink_op.
Steve
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox