Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov @ 2018-05-05  1:37 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto, netdev,
	linux-kernel, kernel-team, Al Viro, David Howells, Mimi Zohar,
	Kees Cook, Andrew Morton, Dominik Brodowski, Hugh Dickins,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	Rafael J. Wysocki, Linux FS Devel, p
In-Reply-To: <20180504195642.GB12838@wotan.suse.de>

On Fri, May 04, 2018 at 07:56:43PM +0000, Luis R. Rodriguez wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

sure.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> > Introduce helper:
> > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > struct umh_info {
> >        struct file *pipe_to_umh;
> >        struct file *pipe_from_umh;
> >        pid_t pid;
> > };
> > 
> > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > of its own data as swappable user mode process.
> > 
> > The kernel will do:
> > - mount "tmpfs"
> 
> Actually its a *shared* vfsmount tmpfs for all umh blobs.

yep

> > - allocate a unique file in tmpfs
> > - populate that file with [data, data + len] bytes
> > - user-mode-helper code will do_execve that file and, before the process
> >   starts, the kernel will create two unix pipes for bidirectional
> >   communication between kernel module and umh
> > - close tmpfs file, effectively deleting it
> > - the fork_usermode_blob will return zero on success and populate
> >   'struct umh_info' with two unix pipes and the pid of the user process
> 
> But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
> inception point was intended, well though out, and will run, but the return
> value in no way reflects the success or not of the execution. More below.

yep

> > As the first step in the development of the bpfilter project
> > the fork_usermode_blob() helper is introduced to allow user mode code
> > to be invoked from a kernel module. The idea is that user mode code plus
> > normal kernel module code are built as part of the kernel build
> > and installed as traditional kernel module into distro specified location,
> > such that from a distribution point of view, there is
> > no difference between regular kernel modules and kernel modules + umh code.
> > Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> > by a kernel module doesn't make it any special from kernel and user space
> > tooling point of view.
> > 
> > Such approach enables kernel to delegate functionality traditionally done
> > by the kernel modules into the user space processes (either root or !root) and
> > reduces security attack surface of the new code. The buggy umh code would crash
> > the user process, but not the kernel. Another advantage is that umh code
> > of the kernel module can be debugged and tested out of user space
> > (e.g. opening the possibility to run clang sanitizers, fuzzers or
> > user space test suites on the umh code).
> > In case of the bpfilter project such architecture allows complex control plane
> > to be done in the user space while bpf based data plane stays in the kernel.
> > 
> > Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> > the kernel module that uses them (like bpfilter) needs to manage life
> > time of umh on its own via two unix pipes and the pid of umh.
> > 
> > The exit code of such kernel module should kill the umh it started,
> > so that rmmod of the kernel module will cleanup the corresponding umh.
> > Just like if the kernel module does kmalloc() it should kfree() it in the exit code.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  fs/exec.c               |  38 ++++++++---
> >  include/linux/binfmts.h |   1 +
> >  include/linux/umh.h     |  12 ++++
> >  kernel/umh.c            | 176 +++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 215 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 183059c427b9..30a36c2a39bf 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
> >  /*
> >   * sys_execve() executes a new program.
> >   */
> > -static int do_execveat_common(int fd, struct filename *filename,
> > -			      struct user_arg_ptr argv,
> > -			      struct user_arg_ptr envp,
> > -			      int flags)
> > +static int __do_execve_file(int fd, struct filename *filename,
> > +			    struct user_arg_ptr argv,
> > +			    struct user_arg_ptr envp,
> > +			    int flags, struct file *file)
> >  {
> >  	char *pathbuf = NULL;
> >  	struct linux_binprm *bprm;
> > -	struct file *file;
> >  	struct files_struct *displaced;
> >  	int retval;
> 
> Keeping in mind a fuzzer...
> 
> Note, right below this, and not shown here in the hunk, is:
> 
>         if (IS_ERR(filename))                                                   
>                 return PTR_ERR(filename)
> >  
> > @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	check_unsafe_exec(bprm);
> >  	current->in_execve = 1;
> >  
> > -	file = do_open_execat(fd, filename, flags);
> > +	if (!file)
> > +		file = do_open_execat(fd, filename, flags);
> 
> 
> Here we now seem to allow !file and open the file with the passed fd as in
> the old days. This is an expected change.
> 
> >  	retval = PTR_ERR(file);
> >  	if (IS_ERR(file))
> >  		goto out_unmark;
> > @@ -1760,7 +1760,9 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	sched_exec();
> >  
> >  	bprm->file = file;
> > -	if (fd == AT_FDCWD || filename->name[0] == '/') {
> > +	if (!filename) {
> 
> If anything shouldn't this be:
> 
> 	if (IS_ERR(filename))

nope. it should be !filename as do_execve_file() passes NULL.
IS_ERR != IS_ERR_OR_NULL

> But, wouldn't the above first branch in the routine catch this?
> 
> > +		bprm->filename = "none";
> 
> Given this seems like a desirable branch which was tested, wonder how this
> ever got set if the above branch in the first hunk I noted hit true?
> 
> In any case, we seem to have two cases, can we rule out the exact requirements
> at the top so we can bail out with an error code if one or the other way to
> call this function does not align with expectations?

I think you're misreading the code or I don't understand the concern at all.

> > +	} else if (fd == AT_FDCWD || filename->name[0] == '/') {
> >  		bprm->filename = filename->name;
> >  	} else {
> >  		if (filename->name[0] == '\0')
> > @@ -1826,7 +1828,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	task_numa_free(current);
> >  	free_bprm(bprm);
> >  	kfree(pathbuf);
> > -	putname(filename);
> > +	if (filename)
> > +		putname(filename);
> >  	if (displaced)
> >  		put_files_struct(displaced);
> >  	return retval;
> > @@ -1849,10 +1852,27 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	if (displaced)
> >  		reset_files_struct(displaced);
> >  out_ret:
> > -	putname(filename);
> > +	if (filename)
> > +		putname(filename);
> >  	return retval;
> >  }
> >  
> > +static int do_execveat_common(int fd, struct filename *filename,
> 
> Further signs the filename is now optional. But I don't understand how these
> branches ever be true, but perhaps I'm missing something?
> 
> > +			      struct user_arg_ptr argv,
> > +			      struct user_arg_ptr envp,
> > +			      int flags)
> > +{
> > +	return __do_execve_file(fd, filename, argv, envp, flags, NULL);
> > +}
> > +
> > +int do_execve_file(struct file *file, void *__argv, void *__envp)
> > +{
> > +	struct user_arg_ptr argv = { .ptr.native = __argv };
> > +	struct user_arg_ptr envp = { .ptr.native = __envp };
> > +
> > +	return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
> > +}
> 
> Or maybe do the semantics expectations checks here, so we don't clutter
> do_execveat_common() with them?

specifically ?

> > +
> >  int do_execve(struct filename *filename,
> >  	const char __user *const __user *__argv,
> >  	const char __user *const __user *__envp)
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 4955e0863b83..c05f24fac4f6 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -150,5 +150,6 @@ extern int do_execveat(int, struct filename *,
> >  		       const char __user * const __user *,
> >  		       const char __user * const __user *,
> >  		       int);
> > +int do_execve_file(struct file *file, void *__argv, void *__envp);
> >  
> >  #endif /* _LINUX_BINFMTS_H */
> > diff --git a/include/linux/umh.h b/include/linux/umh.h
> > index 244aff638220..5c812acbb80a 100644
> > --- a/include/linux/umh.h
> > +++ b/include/linux/umh.h
> > @@ -22,8 +22,10 @@ struct subprocess_info {
> >  	const char *path;
> >  	char **argv;
> >  	char **envp;
> > +	struct file *file;
> >  	int wait;
> >  	int retval;
> > +	pid_t pid;
> >  	int (*init)(struct subprocess_info *info, struct cred *new);
> >  	void (*cleanup)(struct subprocess_info *info);
> >  	void *data;
> 
> While at it, can you kdocify struct subprocess_info and add new docs for at
> least these two entires you are adding ?

Sorry 'while at it' doesn't sound as a good reason to
add kdoc now instead of later.

> > @@ -38,6 +40,16 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp,
> >  			  int (*init)(struct subprocess_info *info, struct cred *new),
> >  			  void (*cleanup)(struct subprocess_info *), void *data);
> >  
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > +			  int (*init)(struct subprocess_info *info, struct cred *new),
> > +			  void (*cleanup)(struct subprocess_info *), void *data);
> 
> Likewise but on the umc.c file.
> 
> > +struct umh_info {
> > +	struct file *pipe_to_umh;
> > +	struct file *pipe_from_umh;
> > +	pid_t pid;
> > +};
> 
> Likewise.

what 'likewise' ? The kdoc ?

> 
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> 
> Likewise but on the umc.c files.
> 
> > +
> >  extern int
> >  call_usermodehelper_exec(struct subprocess_info *info, int wait);
> >  
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index f76b3ff876cf..c3f418d7d51a 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -25,6 +25,8 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/async.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/shmem_fs.h>
> > +#include <linux/pipe_fs_i.h>
> >  
> >  #include <trace/events/module.h>
> >  
> > @@ -97,9 +99,13 @@ static int call_usermodehelper_exec_async(void *data)
> >  
> >  	commit_creds(new);
> >  
> > -	retval = do_execve(getname_kernel(sub_info->path),
> > -			   (const char __user *const __user *)sub_info->argv,
> > -			   (const char __user *const __user *)sub_info->envp);
> > +	if (sub_info->file)
> > +		retval = do_execve_file(sub_info->file,
> > +					sub_info->argv, sub_info->envp);
> > +	else
> > +		retval = do_execve(getname_kernel(sub_info->path),
> > +				   (const char __user *const __user *)sub_info->argv,
> > +				   (const char __user *const __user *)sub_info->envp);
> >  out:
> >  	sub_info->retval = retval;
> >  	/*
> > @@ -185,6 +191,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
> >  		if (pid < 0) {
> >  			sub_info->retval = pid;
> >  			umh_complete(sub_info);
> > +		} else {
> > +			sub_info->pid = pid;
> >  		}
> >  	}
> >  }
> > @@ -393,6 +401,168 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
> >  }
> >  EXPORT_SYMBOL(call_usermodehelper_setup);
> >  
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > +		int (*init)(struct subprocess_info *info, struct cred *new),
> > +		void (*cleanup)(struct subprocess_info *info), void *data)
> 
> Should be static, no other users outside of this file.

good catch. will change to static.

> Please use umh_setup_file().

sorry. makes no sense.
There is call_usermodehelper_setup() right above it.
call_usermodehelper_setup_file() just follows the naming convention.
If you prefer shorter names, both have to be renamed in the separate patch series.

> > +{
> > +	struct subprocess_info *sub_info;
> 
> Considering a possible fuzzer triggering random data we should probably
> return NULL early and avoid the kzalloc if:

I missing 'fuzzer' point here and earlier.
'fuzzer' cannot reach here. It's all internal api.

> 	if (!file || !init || !cleanup)
> 		return NULL;

sorry, nope. in kernel we don't do defensive programming like this.

> Is data optional? The kdoc could clarify this.

No. Should be obvious from this patch.
The only caller of call_usermodehelper_setup_file() is fork_usermode_blob()
and it passes 'struct umh_info *info'.

> 
> > +
> > +	sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
> > +	if (!sub_info)
> > +		return NULL;
> > +
> > +	INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
> > +	sub_info->path = "none";
> > +	sub_info->file = file;
> > +	sub_info->init = init;
> > +	sub_info->cleanup = cleanup;
> > +	sub_info->data = data;
> > +	return sub_info;
> > +}
> > +
> > +static struct vfsmount *umh_fs;
> > +
> > +static int init_tmpfs(void)
> 
> Please use umh_init_tmpfs(). 

ok

> Also see init/main.c do_basic_setup() which calls
> usermodehelper_enable() prior to do_initcalls(). Now, fortunately TMPFS is only
> bool, saving us from some races and we do call tmpfs's init first shmem_init():
> 
> static void __init do_basic_setup(void)
> {
> 	cpuset_init_smp();
> 	shmem_init();
> 	driver_init();
> 	init_irq_proc();
> 	do_ctors();
> 	usermodehelper_enable();
> 	do_initcalls();
> }
> 
> But it also means we're enabling your new call call fork_usermode_blob() on
> early init code even if we're not setup. Since this umh tmpfs vfsmount is
> shared I'd say just call this init right before usermodehelper_enable()
> on do_basic_setup().

Not following.
Why init_tmpfs() should be called by __init function?
Are you saying make 'static struct vfsmount *shm_mnt;'
global and use it here? so no init_tmpfs() necessary?
I think that can work, but feels that having two
tmpfs mounts (one for shmem and one for umh) is cleaner.

> 
> > +{
> > +	struct file_system_type *type;
> > +
> > +	if (umh_fs)
> > +		return 0;
> > +	type = get_fs_type("tmpfs");
> > +	if (!type)
> > +		return -ENODEV;
> > +	umh_fs = kern_mount(type);
> > +	if (IS_ERR(umh_fs)) {
> > +		int err = PTR_ERR(umh_fs);
> > +
> > +		put_filesystem(type);
> > +		umh_fs = NULL;
> > +		return err;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int alloc_tmpfs_file(size_t size, struct file **filp)
> 
> Please use umh_alloc_tmpfs_file()

ok

> > +{
> > +	struct file *file;
> > +	int err;
> > +
> > +	err = init_tmpfs();
> > +	if (err)
> > +		return err;
> > +	file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> > +	if (IS_ERR(file))
> > +		return PTR_ERR(file);
> > +	*filp = file;
> > +	return 0;
> > +}
> > +
> > +static int populate_file(struct file *file, const void *data, size_t size)
> 
> Please use umh_populate_file()

ok

> > +{
> > +	size_t offset = 0;
> > +	int err;
> > +
> > +	do {
> > +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > +		struct page *page;
> > +		void *pgdata, *vaddr;
> > +
> > +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > +					    0, &page, &pgdata);
> > +		if (err < 0)
> > +			goto fail;
> > +
> > +		vaddr = kmap(page);
> > +		memcpy(vaddr, data, len);
> > +		kunmap(page);
> > +
> > +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> > +					  len, page, pgdata);
> > +		if (err < 0)
> > +			goto fail;
> > +
> > +		size -= len;
> > +		data += len;
> > +		offset += len;
> > +	} while (size);
> 
> Character for character, this looks like a wonderful copy and paste from
> i915_gem_object_create_from_data()'s own loop which does the same exact
> thing. Perhaps its time for a helper on mm/filemap.c with an export so
> if a bug is fixed in one place its fixed in both places.

yes, of course, but not right now.
Once it all lands that will be the time to create common helper.
It's not a good idea to mess with i915 in one patch set.

> > +	return 0;
> > +fail:
> > +	return err;
> > +}
> > +
> > +static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> 
> The function name umh_pipe_setup() is also used on fs/coredump.c, with the same
> prototype, perhaps rename that before we take this on, even if both are static.

hmm. why?
These are two static functions with the same name, so?
tags get confusing?

> > +{
> > +	struct umh_info *umh_info = info->data;
> > +	struct file *from_umh[2];
> > +	struct file *to_umh[2];
> > +	int err;
> > +
> > +	/* create pipe to send data to umh */
> > +	err = create_pipe_files(to_umh, 0);
> > +	if (err)
> > +		return err;
> > +	err = replace_fd(0, to_umh[0], 0);
> > +	fput(to_umh[0]);
> > +	if (err < 0) {
> > +		fput(to_umh[1]);
> > +		return err;
> > +	}
> > +
> > +	/* create pipe to receive data from umh */
> > +	err = create_pipe_files(from_umh, 0);
> > +	if (err) {
> > +		fput(to_umh[1]);
> > +		replace_fd(0, NULL, 0);
> > +		return err;
> > +	}
> > +	err = replace_fd(1, from_umh[1], 0);
> > +	fput(from_umh[1]);
> > +	if (err < 0) {
> > +		fput(to_umh[1]);
> > +		replace_fd(0, NULL, 0);
> > +		fput(from_umh[0]);
> > +		return err;
> > +	}
> > +
> > +	umh_info->pipe_to_umh = to_umh[1];
> > +	umh_info->pipe_from_umh = from_umh[0];
> > +	return 0;
> > +}
> > +
> > +static void umh_save_pid(struct subprocess_info *info)
> > +{
> > +	struct umh_info *umh_info = info->data;
> > +
> > +	umh_info->pid = info->pid;
> > +}
> > +
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> 
> Please use umh_fork_blob()

sorry, no. fork_usermode_blob() is much more descriptive name.

> > +{
> > +	struct subprocess_info *sub_info;
> > +	struct file *file = NULL;
> > +	int err;
> > +
> > +	err = alloc_tmpfs_file(len, &file);
> > +	if (err)
> > +		return err;
> > +
> > +	err = populate_file(file, data, len);
> > +	if (err)
> > +		goto out;
> > +
> > +	err = -ENOMEM;
> > +	sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
> > +						  umh_save_pid, info);
> > +	if (!sub_info)
> > +		goto out;
> > +
> > +	err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
> 
> Alright, neat, so to be clear, we're just glad to try inception, we have no
> clue or idea what the real return value would be, its up to the caller to track
> the progress somehow?

yep.

> Can you add a kdoc entry for this and clarify requirements?

ok. I'll add a comment to this helper.

> Also, can you extend lib/test_kmod.c with a test case for this with its own
> demo and try to blow it up?

in what sense? bpfilter is the test and the driving component for it.
I'm expecting that folks who want to use this helper to do usb drivers
in user space may want to extend this helper further, but that's their job.

> I hadn't tried suspend/resume during a kmod test, but since we're using a
> kernel_thread() I wouldn't be surprised if we barf while stress testing the
> module loader. Its surely a corner case, but better mention that now than cry
> later if we get heavy umh modules and all of a sudden we start using this for
> whatever reason close to suspend.

folks that care about suspend/resume should do that.
I'm happy to gate this helper for !CONFIG_SUSPEND, since I have
no idea what issues can be uncovered, how to fix them and no desire to do so.

Thanks

^ permalink raw reply

* Re: [PATCH v2 net-next] net: stmmac: Add support for U32 TC filter using Flexible RX Parser
From: Jakub Kicinski @ 2018-05-05  1:33 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <c933954a111938eb8aef46082993aa541372a9ee.1525424332.git.joabreu@synopsys.com>

On Fri,  4 May 2018 10:01:38 +0100, Jose Abreu wrote:
> This adds support for U32 filter by using an HW only feature called
> Flexible RX Parser. This allow us to match any given packet field with a
> pattern and accept/reject or even route the packet to a specific DMA
> channel.
> 
> Right now we only support acception or rejection of frame and we only
> support simple rules. Though, the Parser has the flexibility of jumping to
> specific rules as an if condition so complex rules can be established.
> 
> This is only supported in GMAC5.10+.
> 
> The following commands can be used to test this code:
> 
> 	1) Setup an ingress qdisk:
> 	# tc qdisc add dev eth0 handle ffff: ingress
> 
> 	2) Setup a filter (e.g. filter by IP):
> 	# tc filter add dev eth0 parent ffff: protocol ip u32 match ip \
> 		src 192.168.0.3 skip_sw action drop
> 
> In every tests performed we always used the "skip_sw" flag to make sure
> only the RX Parser was involved.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Vitor Soares <soares@synopsys.com>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> Cc: Jakub Kicinski <kubakici@wp.pl>
> ---
> Changes from v1:
> 	- Follow Linux network coding style (David)
> 	- Use tc_cls_can_offload_and_chain0() (Jakub)

Thanks!

> @@ -4223,6 +4277,11 @@ int stmmac_dvr_probe(struct device *device,
>  	ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
>  			    NETIF_F_RXCSUM;
>  
> +	ret = stmmac_tc_init(priv, priv);
> +	if (!ret) {
> +		ndev->hw_features |= NETIF_F_HW_TC;
> +	}
> +
>  	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
>  		ndev->hw_features |= NETIF_F_TSO | NETIF_F_TSO6;
>  		priv->tso = true;

One more comment, but perhaps not a showstopper, it's considered good
practice to disallow clearing/disabling this flag while filters are
installed.  Driver should return -EBUSY from .ndo_set_features if TC
rules are offloaded and user wants to disable HW_TC feature flag.

^ permalink raw reply

* [PATCH 24/24] selftests: net: return Kselftest Skip code for skipped tests
From: Shuah Khan (Samsung OSG) @ 2018-05-05  1:13 UTC (permalink / raw)
  To: shuah, davem; +Cc: linux-kselftest, linux-kernel, netdev
In-Reply-To: <20180505011328.32078-1-shuah@kernel.org>

When net test is skipped because of unmet dependencies and/or unsupported
configuration, it returns 0 which is treated as a pass by the Kselftest
framework. This leads to false positive result even when the test could
not be run.

Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.

Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.

Change psock_tpacket to use ksft_exit_skip() when a non-root user runs
the test and add an explicit check for root and a clear message, instead
of failing the test when /sys/power/state file open fails.

Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
---
 tools/testing/selftests/net/fib_tests.sh    |  8 +++++---
 tools/testing/selftests/net/netdevice.sh    | 16 +++++++++------
 tools/testing/selftests/net/pmtu.sh         |  5 ++++-
 tools/testing/selftests/net/psock_tpacket.c |  4 +++-
 tools/testing/selftests/net/rtnetlink.sh    | 31 ++++++++++++++++-------------
 5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 9164e60d4b66..5baac82b9287 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -5,6 +5,8 @@
 # different events.
 
 ret=0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
 
 VERBOSE=${VERBOSE:=0}
 PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
@@ -579,18 +581,18 @@ fib_test()
 
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
-	exit 0
+	exit $ksft_skip;
 fi
 
 if [ ! -x "$(command -v ip)" ]; then
 	echo "SKIP: Could not run test without ip tool"
-	exit 0
+	exit $ksft_skip
 fi
 
 ip route help 2>&1 | grep -q fibmatch
 if [ $? -ne 0 ]; then
 	echo "SKIP: iproute2 too old, missing fibmatch"
-	exit 0
+	exit $ksft_skip
 fi
 
 # start clean
diff --git a/tools/testing/selftests/net/netdevice.sh b/tools/testing/selftests/net/netdevice.sh
index 903679e0ff31..e3afcb424710 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -8,6 +8,9 @@
 # if not they probably have failed earlier in the boot process and their logged error will be catched by another test
 #
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 # this function will try to up the interface
 # if already up, nothing done
 # arg1: network interface name
@@ -18,7 +21,7 @@ kci_net_start()
 	ip link show "$netdev" |grep -q UP
 	if [ $? -eq 0 ];then
 		echo "SKIP: $netdev: interface already up"
-		return 0
+		return $ksft_skip
 	fi
 
 	ip link set "$netdev" up
@@ -61,12 +64,12 @@ kci_net_setup()
 	ip address show "$netdev" |grep '^[[:space:]]*inet'
 	if [ $? -eq 0 ];then
 		echo "SKIP: $netdev: already have an IP"
-		return 0
+		return $ksft_skip
 	fi
 
 	# TODO what ipaddr to set ? DHCP ?
 	echo "SKIP: $netdev: set IP address"
-	return 0
+	return $ksft_skip
 }
 
 # test an ethtool command
@@ -84,6 +87,7 @@ kci_netdev_ethtool_test()
 	if [ $ret -ne 0 ];then
 		if [ $ret -eq "$1" ];then
 			echo "SKIP: $netdev: ethtool $2 not supported"
+			return $ksft_skip
 		else
 			echo "FAIL: $netdev: ethtool $2"
 			return 1
@@ -104,7 +108,7 @@ kci_netdev_ethtool()
 	ethtool --version 2>/dev/null >/dev/null
 	if [ $? -ne 0 ];then
 		echo "SKIP: ethtool not present"
-		return 1
+		return $ksft_skip
 	fi
 
 	TMP_ETHTOOL_FEATURES="$(mktemp)"
@@ -176,13 +180,13 @@ kci_test_netdev()
 #check for needed privileges
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
-	exit 0
+	exit $ksft_skip
 fi
 
 ip link show 2>/dev/null >/dev/null
 if [ $? -ne 0 ];then
 	echo "SKIP: Could not run test without the ip tool"
-	exit 0
+	exit $ksft_skip
 fi
 
 TMP_LIST_NETDEV="$(mktemp)"
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 1e428781a625..7514f93e1624 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -43,6 +43,9 @@
 #	that MTU is properly calculated instead when MTU is not configured from
 #	userspace
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 tests="
 	pmtu_vti6_exception		vti6: PMTU exceptions
 	pmtu_vti4_exception		vti4: PMTU exceptions
@@ -162,7 +165,7 @@ setup_xfrm6() {
 }
 
 setup() {
-	[ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return 1
+	[ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return $ksft_skip
 
 	cleanup_done=0
 	for arg do
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 7f6cd9fdacf3..7ec4fa4d55dc 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -60,6 +60,8 @@
 
 #include "psock_lib.h"
 
+#include "../kselftest.h"
+
 #ifndef bug_on
 # define bug_on(cond)		assert(!(cond))
 #endif
@@ -825,7 +827,7 @@ static int test_tpacket(int version, int type)
 		fprintf(stderr, "test: skip %s %s since user and kernel "
 			"space have different bit width\n",
 			tpacket_str[version], type_str[type]);
-		return 0;
+		return KSFT_SKIP;
 	}
 
 	sock = pfsocket(version);
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index e6f485235435..fb3767844e42 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -7,6 +7,9 @@
 devdummy="test-dummy0"
 ret=0
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
 # set global exit status, but never reset nonzero one.
 check_err()
 {
@@ -333,7 +336,7 @@ kci_test_vrf()
 	ip link show type vrf 2>/dev/null
 	if [ $? -ne 0 ]; then
 		echo "SKIP: vrf: iproute2 too old"
-		return 0
+		return $ksft_skip
 	fi
 
 	ip link add "$vrfname" type vrf table 10
@@ -409,7 +412,7 @@ kci_test_encap_fou()
 	ip fou help 2>&1 |grep -q 'Usage: ip fou'
 	if [ $? -ne 0 ];then
 		echo "SKIP: fou: iproute2 too old"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns exec "$testns" ip fou add port 7777 ipproto 47 2>/dev/null
@@ -444,7 +447,7 @@ kci_test_encap()
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP encap tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns exec "$testns" ip link set lo up
@@ -469,7 +472,7 @@ kci_test_macsec()
 	ip macsec help 2>&1 | grep -q "^Usage: ip macsec"
 	if [ $? -ne 0 ]; then
 		echo "SKIP: macsec: iproute2 too old"
-		return 0
+		return $ksft_skip
 	fi
 
 	ip link add link "$devdummy" "$msname" type macsec port 42 encrypt on
@@ -511,14 +514,14 @@ kci_test_gretap()
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP gretap tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip link help gretap 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: gretap: iproute2 too old"
 		ip netns del "$testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel
@@ -561,14 +564,14 @@ kci_test_ip6gretap()
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP ip6gretap tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip link help ip6gretap 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: ip6gretap: iproute2 too old"
 		ip netns del "$testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel
@@ -611,13 +614,13 @@ kci_test_erspan()
 	ip link help erspan 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: erspan: iproute2 too old"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP erspan tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel erspan v1
@@ -676,13 +679,13 @@ kci_test_ip6erspan()
 	ip link help ip6erspan 2>&1 | grep -q "^Usage:"
 	if [ $? -ne 0 ];then
 		echo "SKIP: ip6erspan: iproute2 too old"
-		return 1
+		return $ksft_skip
 	fi
 
 	ip netns add "$testns"
 	if [ $? -ne 0 ]; then
 		echo "SKIP ip6erspan tests: cannot add net namespace $testns"
-		return 1
+		return $ksft_skip
 	fi
 
 	# test native tunnel ip6erspan v1
@@ -762,14 +765,14 @@ kci_test_rtnl()
 #check for needed privileges
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
-	exit 0
+	exit $ksft_skip
 fi
 
 for x in ip tc;do
 	$x -Version 2>/dev/null >/dev/null
 	if [ $? -ne 0 ];then
 		echo "SKIP: Could not run test without the $x tool"
-		exit 0
+		exit $ksft_skip
 	fi
 done
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next] vlan: correct the file path in vlan_dev_change_flags() comment
From: Sun Lianwen @ 2018-05-05  1:08 UTC (permalink / raw)
  To: davem; +Cc: netdev

The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file.
not in include/linux/if_vlan.h file.

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
---
 net/8021q/vlan_dev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 236452ebbd9e..546af0e73ac3 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -215,7 +215,9 @@ int vlan_dev_set_egress_priority(const struct net_device *dev,
 	return 0;
 }
 
-/* Flags are defined in the vlan_flags enum in include/linux/if_vlan.h file. */
+/* Flags are defined in the vlan_flags enum in
+ * include/uapi/linux/if_vlan.h file.
+ */
 int vlan_dev_change_flags(const struct net_device *dev, u32 flags, u32 mask)
 {
 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: Alexei Starovoitov @ 2018-05-05  1:00 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto, netdev,
	linux-kernel, kernel-team
In-Reply-To: <c0ad0d12-418f-8dfd-ed2c-92e36316310b@solarflare.com>

On Thu, May 03, 2018 at 03:23:55PM +0100, Edward Cree wrote:
> On 03/05/18 05:36, Alexei Starovoitov wrote:
> > bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
> > and user mode helper code that is embedded into bpfilter.ko
> >
> > The steps to build bpfilter.ko are the following:
> > - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
> > - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
> >   is converted into bpfilter_umh.o object file
> >   with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
> >   Example:
> >   $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
> >   0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
> >   0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
> >   0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
> > - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
> >
> > bpfilter_kern.c is a normal kernel module code that calls
> > the fork_usermode_blob() helper to execute part of its own data
> > as a user mode process.
> >
> > Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> > is placed into .init.rodata section, so it's freed as soon as __init
> > function of bpfilter.ko is finished.
> > As part of __init the bpfilter.ko does first request/reply action
> > via two unix pipe provided by fork_usermode_blob() helper to
> > make sure that umh is healthy. If not it will kill it via pid.
> >
> > Later bpfilter_process_sockopt() will be called from bpfilter hooks
> > in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
> >
> > If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
> > kill umh as well.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
...
> > +static void stop_umh(void)
> > +{
> > +	if (bpfilter_process_sockopt) {
> I worry about locking here.  Is it possible for two calls to
>  bpfilter_process_sockopt() to run in parallel, both fail, and thus both
>  call stop_umh()?  And if both end up calling shutdown_umh(), we double
>  fput().

I thought iptables sockopt is serialized earlier. Nope.
We need to grab the mutex to access these pipes.
Will fix.

Thanks for spelling nits. Will fix as well.

^ permalink raw reply

* Re: [PATCH iproute2-next] bpf: don't offload perf array maps
From: Daniel Borkmann @ 2018-05-05  0:48 UTC (permalink / raw)
  To: Jakub Kicinski, dsahern, alexei.starovoitov; +Cc: stephen, netdev, oss-drivers
In-Reply-To: <20180505003751.2232-1-jakub.kicinski@netronome.com>

On 05/05/2018 02:37 AM, Jakub Kicinski wrote:
> Perf arrays are handled specially by the kernel, don't request
> offload even when used by an offloaded program.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH iproute2-next] bpf: don't offload perf array maps
From: Jakub Kicinski @ 2018-05-05  0:37 UTC (permalink / raw)
  To: dsahern, alexei.starovoitov, daniel
  Cc: stephen, netdev, oss-drivers, Jakub Kicinski

Perf arrays are handled specially by the kernel, don't request
offload even when used by an offloaded program.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 lib/bpf.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index d9a406bf55f2..4e26c0df76c5 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -97,6 +97,11 @@ static const struct bpf_prog_meta __bpf_prog_meta[] = {
 	},
 };
 
+static bool bpf_map_offload_neutral(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
+}
+
 static const char *bpf_prog_to_subdir(enum bpf_prog_type type)
 {
 	assert(type < ARRAY_SIZE(__bpf_prog_meta) &&
@@ -1594,7 +1599,7 @@ static int bpf_map_attach(const char *name, struct bpf_elf_ctx *ctx,
 			  const struct bpf_elf_map *map, struct bpf_map_ext *ext,
 			  int *have_map_in_map)
 {
-	int fd, ret, map_inner_fd = 0;
+	int fd, ifindex, ret, map_inner_fd = 0;
 
 	fd = bpf_probe_pinned(name, ctx, map->pinning);
 	if (fd > 0) {
@@ -1631,10 +1636,10 @@ static int bpf_map_attach(const char *name, struct bpf_elf_ctx *ctx,
 		}
 	}
 
+	ifindex = bpf_map_offload_neutral(map->type) ? 0 : ctx->ifindex;
 	errno = 0;
 	fd = bpf_map_create(map->type, map->size_key, map->size_value,
-			    map->max_elem, map->flags, map_inner_fd,
-			    ctx->ifindex);
+			    map->max_elem, map->flags, map_inner_fd, ifindex);
 
 	if (fd < 0 || ctx->verbose) {
 		bpf_map_report(fd, name, map, ctx, map_inner_fd);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Alexei Starovoitov @ 2018-05-05  0:34 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Daniel Borkmann, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Alexander Duyck, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Willem de Bruijn,
	Michael S. Tsirkin, Network Development, Björn Töpel,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z
In-Reply-To: <CAJ8uoz3V8x4uv8Xeb+qaVB0_Rkd73TuU=3ubvkDh9b7nAkXSyw@mail.gmail.com>

On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote:
> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
> >> On 05/02/2018 01:01 PM, Björn Töpel wrote:
> >> > From: Björn Töpel <bjorn.topel@intel.com>
> >> >
> >> > This patch set introduces a new address family called AF_XDP that is
> >> > optimized for high performance packet processing and, in upcoming
> >> > patch sets, zero-copy semantics. In this patch set, we have removed
> >> > all zero-copy related code in order to make it smaller, simpler and
> >> > hopefully more review friendly. This patch set only supports copy-mode
> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and
> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his
> >> > work has already been accepted. We will publish our zero-copy support
> >> > for RX and TX on top of his patch sets at a later point in time.
> >>
> >> +1, would be great to see it land this cycle. Saw few minor nits here
> >> and there but nothing to hold it up, for the series:
> >>
> >> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> >>
> >> Thanks everyone!
> >
> > Great stuff!
> >
> > Applied to bpf-next, with one condition.
> > Upcoming zero-copy patches for both RX and TX need to be posted
> > and reviewed within this release window.
> > If netdev community as a whole won't be able to agree on the zero-copy
> > bits we'd need to revert this feature before the next merge window.
> 
> Thanks everyone for reviewing this. Highly appreciated.
> 
> Just so we understand the purpose correctly:
> 
> 1: Do you want to see the ZC patches in order to verify that the user
> space API holds? If so, we can produce an additional RFC  patch set
> using a big chunk of code that we had in RFC V1. We are not proud of
> this code since it is clunky, but it hopefully proves the point with
> the uapi being the same.
> 
> 2: And/Or are you worried about us all (the netdev community) not
> agreeing on a way to implement ZC internally in the drivers and the
> XDP infrastructure? This is not going to be possible to finish during
> this cycle since we do not like the implementation we had in RFC V1.
> Too intrusive and now we also have nicer abstractions from Jesper that
> we can use and extend to provide a (hopefully) much cleaner and less
> intrusive solution.

short answer: both.

Cleanliness and performance of the ZC code is not as important as
getting API right. The main concern that during ZC review process
we will find out that existing API has issues, so we have to
do this exercise before the merge window.
And RFC won't fly. Send the patches for real. They have to go
through the proper code review. The hackers of netdev community
can accept a partial, or a bit unclean, or slightly inefficient
implementation, since it can be and will be improved later,
but API we cannot change once it goes into official release.

Here is the example of API concern:
this patch set added shared umem concept. It sounds good in theory,
but will it perform well with ZC ? Earlier RFCs didn't have that
feature. If it won't perform well than it shouldn't be in the tree.
The key reason to let AF_XDP into the tree is its performance promise.
If it doesn't perform we should rip it out and redesign.

^ permalink raw reply

* pull-request: bpf-next 2018-05-05
From: Daniel Borkmann @ 2018-05-05  0:25 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add initial infrastructure for AF_XDP sockets, which is optimized
   for high performance packet processing. This early work only adds
   copy-mode, and zero-copy semantics with driver changes will land in
   subsequent patches. An AF_XDP socket has RX and/or TX queue associated
   to it for receiving and sending packets. In contrast to AF_PACKET v2/3
   descriptor queues are separated from packet buffers such that a RX or
   TX descriptor points to a data buffer in a memory area called UMEM.
   Latter can be shared so that packets don't need to be copied between
   RX and TX. A XDP BPF program will steer the packets to one of the
   AF_XDP sockets via a new BPF map called XSKMAP, from Björn and Magnus.

2) Add nfp BPF offload support for bpf_event_output() helper. Having
   the driver reimplement and manage the perf array itself seems fragile
   and unnecessary, therefore approach taken is that FW messages that
   carry the events are pushed out to the RB. Additionally bpftool gets
   support to connect to a perf event map and dump ring buffer contents,
   useful for debugging purposes, from Jakub.

3) Add a new eBPF JIT for x86_32. Like in arm32 case, 64 bit div/mod
   and xadd is still missing as well as BPF to BPF calls but other than
   that it's functional and numbers show 30% to 50% improvement compared
   to interpreter, from Wang.

4) Implement a new BPF helper bpf_get_stack() to overcome limitations
   of stackmap and bpf_get_stackid() helper. bpf_get_stack() allows
   to send stack traces directly to the BPF program which can perform
   in-kernel processing and push them out via bpf_perf_event_output(),
   from Yonghong.

5) Remove LD_ABS and LD_IND as native eBPF instructions and implement
   them as rewrites. This significantly reduces complexity from JITs
   while keeping similar performance characteristics, and allows to
   better evolve JITs long term by having them all in C only, from Daniel.

6) Improve the code logic related to managing subprog information by
   unifying main prog and subprogs, unifying entry points and stack
   depth tracking into struct bpf_subprog_info, and adding end marker
   into subprog_info array to simplify iteration logic, from Jiong.

7) Remove tracepoints from BPF core as they started to rot away,
   causing panics triggered from syzkaller. Earlier ones from BPF
   fs got already removed, so follow-up with rest since we also have
   better introspection infrastructure these days, from Alexei.

8) Relax the bpf_current_task_under_cgroup() helper to allow usage in
   interrupt which is particularly useful for BPF programs attached
   to perf events, from Teng.

9) Formatting fixes in the new BPF uapi helper documentation for
   bpf_perf_event_read() and bpf_get_stack() and relaxing whitespace
   constraints in bpf_helpers_doc.py to ease documentation, from Quentin.

10) Dump the bpftool 'loaded at:' information in ISO 8601 format in
    the plain variant and seconds since the Epoch in JSON to ease parsing,
    also from Quentin.

11) Various cleanups mostly around coding and comment style, and several
    capitalization, typo and grammar fixups in comments for the x64 BPF
    JIT, from Ingo.

12) Fix up BPF context struct types in uapi BPF helper documentation
    where some of them were mistakenly using kernel types, from Andrey.

13) Document that under CONFIG_BPF_JIT_ALWAYS_ON mode the bpf_jit_enable
    mode 2 is not available, from Leo.

14) Import erspan uapi header file into tools infra so that BPF tunnel
    helpers can use it and won't cause issues due to missing headers on
    some systems, from William.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

This has a minor merge conflict in tools/testing/selftests/bpf/test_progs.c.
Resolution is to take the hunk from bpf-next tree and change the first CHECK()
condition such that the missing '\n' is added to the end of the string, like:

        if (CHECK(build_id_matches < 1, "build id match",
                  "Didn't find expected build ID from the map\n"))
                goto disable_pmu;

Let me know if you run into any other unforeseen issue. Thanks a lot!

----------------------------------------------------------------

The following changes since commit 79741a38b4a2538a68342c45b813ecb9dd648ee8:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (2018-04-26 21:19:50 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to e94fa1d93117e7f1eb783dc9cae6c70650944449:

  bpf, xskmap: fix crash in xsk_map_alloc error path handling (2018-05-04 14:55:54 -0700)

----------------------------------------------------------------
Alexei Starovoitov (5):
      Merge branch 'bpf_get_stack'
      Merge branch 'fix-bpf-helpers-doc'
      bpf: remove tracepoints from bpf core
      Merge branch 'AF_XDP-initial-support'
      Merge branch 'move-ld_abs-to-native-BPF'

Andrey Ignatov (2):
      bpf: Fix helpers ctx struct types in uapi doc
      bpf: Sync bpf.h to tools/

Björn Töpel (7):
      net: initial AF_XDP skeleton
      xsk: add user memory registration support sockopt
      xsk: add Rx queue setup and mmap support
      xsk: add Rx receive functions and poll support
      bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP
      xsk: wire up XDP_DRV side of AF_XDP
      xsk: wire up XDP_SKB side of AF_XDP

Daniel Borkmann (17):
      Merge branch 'bpf-formatting-fixes-helpers'
      bpf: prefix cbpf internal helpers with bpf_
      bpf: migrate ebpf ld_abs/ld_ind tests to test_verifier
      bpf: implement ld_abs/ld_ind in native bpf
      bpf: add skb_load_bytes_relative helper
      bpf, x64: remove ld_abs/ld_ind
      bpf, arm64: remove ld_abs/ld_ind
      bpf, sparc64: remove ld_abs/ld_ind
      bpf, arm32: remove ld_abs/ld_ind
      bpf, mips64: remove ld_abs/ld_ind
      bpf, ppc64: remove ld_abs/ld_ind
      bpf, s390x: remove ld_abs/ld_ind
      bpf, x32: remove ld_abs/ld_ind
      bpf: sync tools bpf.h uapi header
      Merge branch 'bpf-subprog-mgmt-cleanup'
      Merge branch 'bpf-event-output-offload'
      bpf, xskmap: fix crash in xsk_map_alloc error path handling

Ingo Molnar (1):
      x86/bpf: Clean up non-standard comments, to make the code more readable

Jakub Kicinski (10):
      bpf: offload: allow offloaded programs to use perf event arrays
      nfp: bpf: record offload neutral maps in the driver
      bpf: export bpf_event_output()
      bpf: replace map pointer loads before calling into offloads
      nfp: bpf: perf event output helpers support
      nfp: bpf: rewrite map pointers with NFP TIDs
      tools: bpftool: fold hex keyword in command help
      tools: bpftool: move get_possible_cpus() to common code
      tools: bpftool: add simple perf event output reader
      bpf: fix references to free_bpf_prog_info() in comments

Jiong Wang (3):
      bpf: unify main prog and subprog
      bpf: centre subprog information fields
      bpf: add faked "ending" subprog

Leo Yan (1):
      bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON

Magnus Karlsson (8):
      xsk: add umem fill queue support and mmap
      xsk: add support for bind for Rx
      xsk: add umem completion queue support and mmap
      xsk: add Tx queue setup and mmap support
      dev: packet: make packet_direct_xmit a common function
      xsk: support for Tx
      xsk: statistics support
      samples/bpf: sample application and documentation for AF_XDP sockets

Quentin Monnet (5):
      bpf: fix formatting for bpf_perf_event_read() helper doc
      bpf: fix formatting for bpf_get_stack() helper doc
      bpf: update bpf.h uapi header for tools
      tools: bpftool: change time format for program 'loaded at:' information
      bpf: relax constraints on formatting for eBPF helper documentation

Teng Qin (1):
      bpf: Allow bpf_current_task_under_cgroup in interrupt

Wang YanQing (1):
      bpf, x86_32: add eBPF JIT compiler for ia32

William Tu (1):
      tools, include: Grab a copy of linux/erspan.h

Yonghong Song (11):
      bpf: change prototype for stack_map_get_build_id_offset
      bpf: add bpf_get_stack helper
      bpf/verifier: refine retval R0 state for bpf_get_stack helper
      bpf: remove never-hit branches in verifier adjust_scalar_min_max_vals
      bpf/verifier: improve register value range tracking with ARSH
      tools/bpf: add bpf_get_stack helper to tools headers
      samples/bpf: move common-purpose trace functions to selftests
      tools/bpf: add a verifier test case for bpf_get_stack helper and ARSH
      tools/bpf: add a test for bpf_get_stack with raw tracepoint prog
      tools/bpf: add a test for bpf_get_stack with tracepoint prog
      samples/bpf: fix kprobe attachment issue on x64

 Documentation/networking/af_xdp.rst                |  297 +++
 Documentation/networking/filter.txt                |    6 +
 Documentation/networking/index.rst                 |    1 +
 Documentation/sysctl/net.txt                       |    1 +
 MAINTAINERS                                        |    9 +-
 arch/arm/net/bpf_jit_32.c                          |   77 -
 arch/arm64/net/bpf_jit_comp.c                      |   65 -
 arch/mips/net/ebpf_jit.c                           |  104 -
 arch/powerpc/net/Makefile                          |    2 +-
 arch/powerpc/net/bpf_jit64.h                       |   37 +-
 arch/powerpc/net/bpf_jit_asm64.S                   |  180 --
 arch/powerpc/net/bpf_jit_comp64.c                  |  109 +-
 arch/s390/net/Makefile                             |    2 +-
 arch/s390/net/bpf_jit.S                            |  116 -
 arch/s390/net/bpf_jit.h                            |   20 +-
 arch/s390/net/bpf_jit_comp.c                       |  127 +-
 arch/sparc/net/Makefile                            |    5 +-
 arch/sparc/net/bpf_jit_64.h                        |   29 -
 arch/sparc/net/bpf_jit_asm_64.S                    |  162 --
 arch/sparc/net/bpf_jit_comp_64.c                   |   79 +-
 arch/x86/Kconfig                                   |    2 +-
 arch/x86/include/asm/nospec-branch.h               |   30 +-
 arch/x86/net/Makefile                              |    7 +-
 arch/x86/net/bpf_jit.S                             |  154 --
 arch/x86/net/bpf_jit_comp.c                        |  343 +--
 arch/x86/net/bpf_jit_comp32.c                      | 2419 ++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c      |   16 +-
 drivers/net/ethernet/netronome/nfp/bpf/fw.h        |   20 +-
 drivers/net/ethernet/netronome/nfp/bpf/jit.c       |   76 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.c      |   28 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h      |   24 +-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c   |  172 +-
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c  |   78 +-
 drivers/net/ethernet/netronome/nfp/nfp_app.c       |    2 +-
 include/linux/bpf.h                                |   35 +-
 include/linux/bpf_trace.h                          |    1 -
 include/linux/bpf_types.h                          |    3 +
 include/linux/bpf_verifier.h                       |    9 +-
 include/linux/filter.h                             |    9 +-
 include/linux/netdevice.h                          |    1 +
 include/linux/socket.h                             |    5 +-
 include/linux/tnum.h                               |    4 +-
 include/net/xdp.h                                  |    1 +
 include/net/xdp_sock.h                             |   66 +
 include/trace/events/bpf.h                         |  355 ---
 include/uapi/linux/bpf.h                           |   94 +-
 include/uapi/linux/if_xdp.h                        |   87 +
 kernel/bpf/Makefile                                |    3 +
 kernel/bpf/core.c                                  |  108 +-
 kernel/bpf/inode.c                                 |   16 +-
 kernel/bpf/offload.c                               |    6 +-
 kernel/bpf/stackmap.c                              |   80 +-
 kernel/bpf/syscall.c                               |   17 +-
 kernel/bpf/tnum.c                                  |   10 +
 kernel/bpf/verifier.c                              |  247 +-
 kernel/bpf/xskmap.c                                |  241 ++
 kernel/trace/bpf_trace.c                           |   52 +-
 lib/test_bpf.c                                     |  570 +++--
 net/Kconfig                                        |    1 +
 net/Makefile                                       |    1 +
 net/core/dev.c                                     |   73 +-
 net/core/filter.c                                  |  345 ++-
 net/core/sock.c                                    |   12 +-
 net/core/xdp.c                                     |   15 +-
 net/packet/af_packet.c                             |   42 +-
 net/xdp/Kconfig                                    |    7 +
 net/xdp/Makefile                                   |    2 +
 net/xdp/xdp_umem.c                                 |  260 +++
 net/xdp/xdp_umem.h                                 |   67 +
 net/xdp/xdp_umem_props.h                           |   23 +
 net/xdp/xsk.c                                      |  656 ++++++
 net/xdp/xsk_queue.c                                |   73 +
 net/xdp/xsk_queue.h                                |  247 ++
 samples/bpf/Makefile                               |   15 +-
 samples/bpf/bpf_load.c                             |   97 +-
 samples/bpf/bpf_load.h                             |    7 -
 samples/bpf/offwaketime_user.c                     |    1 +
 samples/bpf/sampleip_user.c                        |    1 +
 samples/bpf/spintest_user.c                        |    1 +
 samples/bpf/trace_event_user.c                     |    1 +
 samples/bpf/trace_output_user.c                    |  110 +-
 samples/bpf/xdpsock.h                              |   11 +
 samples/bpf/xdpsock_kern.c                         |   56 +
 samples/bpf/xdpsock_user.c                         |  948 ++++++++
 scripts/bpf_helpers_doc.py                         |   14 +-
 security/selinux/hooks.c                           |    4 +-
 security/selinux/include/classmap.h                |    4 +-
 tools/bpf/bpftool/Documentation/bpftool-map.rst    |   40 +-
 tools/bpf/bpftool/Documentation/bpftool.rst        |    2 +-
 tools/bpf/bpftool/Makefile                         |    7 +-
 tools/bpf/bpftool/bash-completion/bpftool          |   36 +-
 tools/bpf/bpftool/common.c                         |   77 +-
 tools/bpf/bpftool/main.h                           |    7 +-
 tools/bpf/bpftool/map.c                            |   80 +-
 tools/bpf/bpftool/map_perf_ring.c                  |  347 +++
 tools/bpf/bpftool/prog.c                           |    8 +-
 tools/include/uapi/linux/bpf.h                     |   93 +-
 tools/include/uapi/linux/erspan.h                  |   52 +
 tools/testing/selftests/bpf/Makefile               |    4 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |    2 +
 tools/testing/selftests/bpf/test_get_stack_rawtp.c |  102 +
 tools/testing/selftests/bpf/test_progs.c           |  242 +-
 .../selftests/bpf/test_stacktrace_build_id.c       |   20 +-
 tools/testing/selftests/bpf/test_stacktrace_map.c  |   19 +-
 tools/testing/selftests/bpf/test_verifier.c        |  311 ++-
 tools/testing/selftests/bpf/trace_helpers.c        |  180 ++
 tools/testing/selftests/bpf/trace_helpers.h        |   23 +
 107 files changed, 8852 insertions(+), 2713 deletions(-)
 create mode 100644 Documentation/networking/af_xdp.rst
 delete mode 100644 arch/powerpc/net/bpf_jit_asm64.S
 delete mode 100644 arch/s390/net/bpf_jit.S
 delete mode 100644 arch/sparc/net/bpf_jit_asm_64.S
 delete mode 100644 arch/x86/net/bpf_jit.S
 create mode 100644 arch/x86/net/bpf_jit_comp32.c
 create mode 100644 include/net/xdp_sock.h
 delete mode 100644 include/trace/events/bpf.h
 create mode 100644 include/uapi/linux/if_xdp.h
 create mode 100644 kernel/bpf/xskmap.c
 create mode 100644 net/xdp/Kconfig
 create mode 100644 net/xdp/Makefile
 create mode 100644 net/xdp/xdp_umem.c
 create mode 100644 net/xdp/xdp_umem.h
 create mode 100644 net/xdp/xdp_umem_props.h
 create mode 100644 net/xdp/xsk.c
 create mode 100644 net/xdp/xsk_queue.c
 create mode 100644 net/xdp/xsk_queue.h
 create mode 100644 samples/bpf/xdpsock.h
 create mode 100644 samples/bpf/xdpsock_kern.c
 create mode 100644 samples/bpf/xdpsock_user.c
 create mode 100644 tools/bpf/bpftool/map_perf_ring.c
 create mode 100644 tools/include/uapi/linux/erspan.h
 create mode 100644 tools/testing/selftests/bpf/test_get_stack_rawtp.c
 create mode 100644 tools/testing/selftests/bpf/trace_helpers.c
 create mode 100644 tools/testing/selftests/bpf/trace_helpers.h

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: rename rt6_next to fib6_next
From: David Miller @ 2018-05-04 23:55 UTC (permalink / raw)
  To: dsahern; +Cc: netdev
In-Reply-To: <20180504205424.10948-1-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Fri,  4 May 2018 13:54:24 -0700

> This slipped through the cracks in the followup set to the fib6_info flip.
> Rename rt6_next to fib6_next.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied, thanks David.

^ permalink raw reply

* Re: pull-request: bpf 2018-05-05
From: David Miller @ 2018-05-04 23:50 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20180504222147.18850-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat,  5 May 2018 00:21:47 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Sanitize attr->{prog,map}_type from bpf(2) since used as an array index
>    to retrieve prog/map specific ops such that we prevent potential out of
>    bounds value under speculation, from Mark and Daniel.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

^ permalink raw reply

* Re: [PATCH] net: disable UDP punt on sockets in RCV_SHUTDWON
From: Eric Dumazet @ 2018-05-04 23:44 UTC (permalink / raw)
  To: Chintan Shah, davem, kuznet, jmorris, yoshfuji, kaber, netdev,
	linux-kernel
  Cc: kamensky, takondra, xe-linux-external, enkechen
In-Reply-To: <1525468117-61242-1-git-send-email-chintsha@cisco.com>



On 05/04/2018 02:08 PM, Chintan Shah wrote:
> A UDP application which opens multiple sockets with same local
> address/port combination (using SO_REUSEPORT/SO_REUSEADDR socket options);
> and issues connect to a remote socket (using one of these local socket).
> Now if the same socket, which issued connect, issues shutdown (SHUT_RD);
> packets would still be queued to this socket (if sent from same remote
> client, which the local socket connected to), and not delivered to the
> other socket in the normal state.
> 

Confusing changelog.

sk_shutdown is on a different cache line, so this additional fetch would cause
loss of performance if many sockets are scanned in the hash bucket.

If you are trying to add full 4-tuple hash table to UDP, and accept() ability,
this would require a bit more than this hack...

^ permalink raw reply

* Re: [PATCH net] sctp: delay the authentication for the duplicated cookie-echo chunk
From: Marcelo Ricardo Leitner @ 2018-05-04 22:33 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <091d842812b99059231ff87e9bb7dff175336525.1525424710.git.lucien.xin@gmail.com>

On Fri, May 04, 2018 at 05:05:10PM +0800, Xin Long wrote:
> Now sctp only delays the authentication for the normal cookie-echo
> chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
> for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
> authentication first based on the old asoc, which will definitely
> fail due to the different auth info in the old asoc.
>
> The duplicated cookie-echo chunk will create a new asoc with the
> auth info from this chunk, and the authentication should also be
> done with the new asoc's auth info for all of the collision 'A',
> 'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
> will never pass the authentication and create the new connection.
>
> This issue exists since very beginning, and this fix is to make
> sctp_assoc_bh_rcv() follow the way sctp_assoc_bh_rcv() does for
   I guess you meant sctp_endpoint_bh_rcv here --^ right?

Other than this LGTM

> the normal cookie-echo chunk to delay the authentication.
>
> While at it, remove the unused params from sctp_sf_authenticate()
> and define sctp_auth_chunk_verify() used for all the places that
> do the delayed authentication.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/associola.c    | 30 ++++++++++++++++-
>  net/sctp/sm_statefuns.c | 86 ++++++++++++++++++++++++++-----------------------
>  2 files changed, 75 insertions(+), 41 deletions(-)
>
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 837806d..a47179d 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>  	struct sctp_endpoint *ep;
>  	struct sctp_chunk *chunk;
>  	struct sctp_inq *inqueue;
> -	int state;
> +	int first_time = 1;	/* is this the first time through the loop */
>  	int error = 0;
> +	int state;
>
>  	/* The association should be held so we should be safe. */
>  	ep = asoc->ep;
> @@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>  		state = asoc->state;
>  		subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
>
> +		/* If the first chunk in the packet is AUTH, do special
> +		 * processing specified in Section 6.3 of SCTP-AUTH spec
> +		 */
> +		if (first_time && subtype.chunk == SCTP_CID_AUTH) {
> +			struct sctp_chunkhdr *next_hdr;
> +
> +			next_hdr = sctp_inq_peek(inqueue);
> +			if (!next_hdr)
> +				goto normal;
> +
> +			/* If the next chunk is COOKIE-ECHO, skip the AUTH
> +			 * chunk while saving a pointer to it so we can do
> +			 * Authentication later (during cookie-echo
> +			 * processing).
> +			 */
> +			if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
> +				chunk->auth_chunk = skb_clone(chunk->skb,
> +							      GFP_ATOMIC);
> +				chunk->auth = 1;
> +				continue;
> +			}
> +		}
> +
> +normal:
>  		/* SCTP-AUTH, Section 6.3:
>  		 *    The receiver has a list of chunk types which it expects
>  		 *    to be received only after an AUTH-chunk.  This list has
> @@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>  		/* If there is an error on chunk, discard this packet. */
>  		if (error && chunk)
>  			chunk->pdiscard = 1;
> +
> +		if (first_time)
> +			first_time = 0;
>  	}
>  	sctp_association_put(asoc);
>  }
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 28c070e..c9ae340 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
>  					struct sctp_cmd_seq *commands);
>
>  static enum sctp_ierror sctp_sf_authenticate(
> -					struct net *net,
> -					const struct sctp_endpoint *ep,
>  					const struct sctp_association *asoc,
> -					const union sctp_subtype type,
>  					struct sctp_chunk *chunk);
>
>  static enum sctp_disposition __sctp_sf_do_9_1_abort(
> @@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
>  	return SCTP_DISPOSITION_CONSUME;
>  }
>
> +static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
> +				   const struct sctp_association *asoc)
> +{
> +	struct sctp_chunk auth;
> +
> +	if (!chunk->auth_chunk)
> +		return true;
> +
> +	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
> +	 * is supposed to be authenticated and we have to do delayed
> +	 * authentication.  We've just recreated the association using
> +	 * the information in the cookie and now it's much easier to
> +	 * do the authentication.
> +	 */
> +
> +	/* Make sure that we and the peer are AUTH capable */
> +	if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
> +		return false;
> +
> +	/* set-up our fake chunk so that we can process it */
> +	auth.skb = chunk->auth_chunk;
> +	auth.asoc = chunk->asoc;
> +	auth.sctp_hdr = chunk->sctp_hdr;
> +	auth.chunk_hdr = (struct sctp_chunkhdr *)
> +				skb_push(chunk->auth_chunk,
> +					 sizeof(struct sctp_chunkhdr));
> +	skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
> +	auth.transport = chunk->transport;
> +
> +	return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
> +}
> +
>  /*
>   * Respond to a normal COOKIE ECHO chunk.
>   * We are the side that is being asked for an association.
> @@ -763,37 +792,9 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
>  	if (error)
>  		goto nomem_init;
>
> -	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
> -	 * is supposed to be authenticated and we have to do delayed
> -	 * authentication.  We've just recreated the association using
> -	 * the information in the cookie and now it's much easier to
> -	 * do the authentication.
> -	 */
> -	if (chunk->auth_chunk) {
> -		struct sctp_chunk auth;
> -		enum sctp_ierror ret;
> -
> -		/* Make sure that we and the peer are AUTH capable */
> -		if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
> -			sctp_association_free(new_asoc);
> -			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> -		}
> -
> -		/* set-up our fake chunk so that we can process it */
> -		auth.skb = chunk->auth_chunk;
> -		auth.asoc = chunk->asoc;
> -		auth.sctp_hdr = chunk->sctp_hdr;
> -		auth.chunk_hdr = (struct sctp_chunkhdr *)
> -					skb_push(chunk->auth_chunk,
> -						 sizeof(struct sctp_chunkhdr));
> -		skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
> -		auth.transport = chunk->transport;
> -
> -		ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
> -		if (ret != SCTP_IERROR_NO_ERROR) {
> -			sctp_association_free(new_asoc);
> -			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> -		}
> +	if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
> +		sctp_association_free(new_asoc);
> +		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>  	}
>
>  	repl = sctp_make_cookie_ack(new_asoc, chunk);
> @@ -1797,13 +1798,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
>  	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>  		goto nomem;
>
> +	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
> +		return SCTP_DISPOSITION_DISCARD;
> +
>  	/* Make sure no new addresses are being added during the
>  	 * restart.  Though this is a pretty complicated attack
>  	 * since you'd have to get inside the cookie.
>  	 */
> -	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
> +	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
>  		return SCTP_DISPOSITION_CONSUME;
> -	}
>
>  	/* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
>  	 * the peer has restarted (Action A), it MUST NOT setup a new
> @@ -1912,6 +1915,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
>  	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>  		goto nomem;
>
> +	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
> +		return SCTP_DISPOSITION_DISCARD;
> +
>  	/* Update the content of current association.  */
>  	sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
>  	sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
> @@ -2009,6 +2015,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
>  	 * a COOKIE ACK.
>  	 */
>
> +	if (!sctp_auth_chunk_verify(net, chunk, asoc))
> +		return SCTP_DISPOSITION_DISCARD;
> +
>  	/* Don't accidentally move back into established state. */
>  	if (asoc->state < SCTP_STATE_ESTABLISHED) {
>  		sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
> @@ -4171,10 +4180,7 @@ enum sctp_disposition sctp_sf_eat_fwd_tsn_fast(
>   * The return value is the disposition of the chunk.
>   */
>  static enum sctp_ierror sctp_sf_authenticate(
> -					struct net *net,
> -					const struct sctp_endpoint *ep,
>  					const struct sctp_association *asoc,
> -					const union sctp_subtype type,
>  					struct sctp_chunk *chunk)
>  {
>  	struct sctp_shared_key *sh_key = NULL;
> @@ -4275,7 +4281,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
>  						  commands);
>
>  	auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
> -	error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
> +	error = sctp_sf_authenticate(asoc, chunk);
>  	switch (error) {
>  	case SCTP_IERROR_AUTH_BAD_HMAC:
>  		/* Generate the ERROR chunk and discard the rest
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH bpf-next 09/10] tools: bpftool: add simple perf event output reader
From: Jakub Kicinski @ 2018-05-04 22:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: daniel, oss-drivers, netdev, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim
In-Reply-To: <20180504212501.hn2rnv7t3ik563mg@ast-mbp>

CC perf folks

On Fri, 4 May 2018 14:25:03 -0700, Alexei Starovoitov wrote:
> > +static void
> > +perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
> > +{
> > +	volatile struct perf_event_mmap_page *header = ring->mem;
> > +	__u64 buffer_size = MMAP_PAGE_CNT * get_page_size();
> > +	__u64 data_tail = header->data_tail;
> > +	__u64 data_head = header->data_head;
> > +	void *base, *begin, *end;
> > +
> > +	asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
> > +	if (data_head == data_tail)
> > +		return;  
> 
> this function was copied several times into different places.
> I think it's time to put into common lib. Like libbpf.

Agreed, I think libbpf would work, although there is nothing BPF
specific in this loop AFAICT now.

> Would be great if you can do it in the follow up.

Looking into it now, I found these:

$ git grep 'data_head == data_tail'
tools/bpf/bpftool/map_perf_ring.c:      if (data_head == data_tail)
tools/testing/selftests/bpf/trace_helpers.c:    if (data_head == data_tail)

Are there any other copies I should try to cater to?  I have change a few
things compared to the selftest, I guess others may have modified their
copy too.  Just trying to make sure what we put in libbpf would cater
to most possible use cases.

Should I also move bpf_perf_event_open()/test_bpf_perf_event() to libbpf?

> for the set:
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Thanks!

^ permalink raw reply

* Re: [net-next PATCH v2 4/8] udp: Do not pass checksum as a parameter to GSO segmentation
From: Alexander Duyck @ 2018-05-04 22:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, Willem de Bruijn, David Miller
In-Reply-To: <52c9b572-ddcd-94ea-b9b6-787ca924698a@gmail.com>

On Fri, May 4, 2018 at 1:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/04/2018 11:30 AM, Alexander Duyck wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> This patch is meant to allow us to avoid having to recompute the checksum
>> from scratch and have it passed as a parameter.
>>
>> Instead of taking that approach we can take advantage of the fact that the
>> length that was used to compute the existing checksum is included in the
>> UDP header. If we cancel that out by adding the value XOR with 0xFFFF we
>> can then just add the new length in and fold that into the new result.
>>
>
>>
>> +     uh = udp_hdr(segs);
>> +
>> +     /* compute checksum adjustment based on old length versus new */
>> +     newlen = htons(sizeof(*uh) + mss);
>> +     check = ~csum_fold((__force __wsum)((__force u32)uh->check +
>> +                                         ((__force u32)uh->len ^ 0xFFFF) +
>> +                                         (__force u32)newlen));
>> +
>
>
> Can't this use csum_sub() instead of this ^ 0xFFFF trick ?

I could but that actually adds more instructions to all this since
csum_sub will perform the inversion across a 32b checksum when we only
need to bitflip a 16 bit value. I had considered doing (u16)(~uh->len)
but thought type casing it more than once would be a pain as well.

What I wanted to avoid is having to do the extra math to account for
the rollover. Adding 3 16 bit values will result in at most a 18 bit
value which can then be folded. Doing it this way we avoid that extra
add w/ carry logic that is needed for csum_add/sub.

^ permalink raw reply

* pull-request: bpf 2018-05-05
From: Daniel Borkmann @ 2018-05-04 22:21 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Sanitize attr->{prog,map}_type from bpf(2) since used as an array index
   to retrieve prog/map specific ops such that we prevent potential out of
   bounds value under speculation, from Mark and Daniel.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit a8d7aa17bbc970971ccdf71988ea19230ab368b1:

  dccp: fix tasklet usage (2018-05-03 15:14:57 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to d0f1a451e33d9ca834422622da30aa68daade56b:

  bpf: use array_index_nospec in find_prog_type (2018-05-03 19:29:35 -0700)

----------------------------------------------------------------
Daniel Borkmann (1):
      bpf: use array_index_nospec in find_prog_type

Mark Rutland (1):
      bpf: fix possible spectre-v1 in find_and_alloc_map()

 kernel/bpf/syscall.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

^ permalink raw reply

* Re: [PATCH iproute2] rdma: fix header files
From: David Ahern @ 2018-05-04 22:13 UTC (permalink / raw)
  To: Stephen Hemminger, swise; +Cc: netdev
In-Reply-To: <20180504215608.11305-1-stephen@networkplumber.org>

On 5/4/18 3:56 PM, Stephen Hemminger wrote:
> All user api headers in iproute2 should be in include/uapi
> so that script can be used to put correct sanitized kernel headers
> there. And the header files for rdma must be a complete set; if one
> header file includes another, all must be present.
> 
> This fixes build on older distributions, and Windows Services
> for Linux.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  include/uapi/rdma/ib_user_sa.h                |   77 ++
>  include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
>  .../uapi/rdma/rdma_netlink.h                  |   13 +
>  .../uapi/rdma/rdma_user_cm.h                  |    6 +-
>  4 files changed, 1303 insertions(+), 3 deletions(-)
>  create mode 100644 include/uapi/rdma/ib_user_sa.h
>  create mode 100644 include/uapi/rdma/ib_user_verbs.h
>  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
>  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> 

Stephen:

Per a recent discussion the RDMA folks need to take ownership of the
uapi files. RDMA features do not hit Dave's net-next tree so the rdma
code can never hit iproute2-next during a dev cycle.

^ permalink raw reply

* Re: [PATCH v2 4/4] smack: provide socketpair callback
From: Casey Schaufler @ 2018-05-04 22:01 UTC (permalink / raw)
  To: David Herrmann, linux-kernel
  Cc: James Morris, Paul Moore, teg, Stephen Smalley, selinux,
	linux-security-module, Eric Paris, serge, davem, netdev
In-Reply-To: <20180504142822.15233-5-dh.herrmann@gmail.com>

On 5/4/2018 7:28 AM, David Herrmann wrote:
> From: Tom Gundersen <teg@jklm.no>
>
> Make sure to implement the new socketpair callback so the SO_PEERSEC
> call on socketpair(2)s will return correct information.
>
> Signed-off-by: Tom Gundersen <teg@jklm.no>
> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>

This doesn't look like it will cause any problems.
I've only been able to test it in a general way. I
haven't created specific tests, but it passes the
usual Smack use cases.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>

> ---
>  security/smack/smack_lsm.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 0b414836bebd..dcb976f98df2 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -2842,6 +2842,27 @@ static int smack_socket_post_create(struct socket *sock, int family,
>  	return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
>  }
>  
> +/**
> + * smack_socket_socketpair - create socket pair
> + * @socka: one socket
> + * @sockb: another socket
> + *
> + * Cross reference the peer labels for SO_PEERSEC
> + *
> + * Returns 0 on success, and error code otherwise
> + */
> +static int smack_socket_socketpair(struct socket *socka,
> +		                   struct socket *sockb)
> +{
> +	struct socket_smack *asp = socka->sk->sk_security;
> +	struct socket_smack *bsp = sockb->sk->sk_security;
> +
> +	asp->smk_packet = bsp->smk_out;
> +	bsp->smk_packet = asp->smk_out;
> +
> +	return 0;
> +}
> +
>  #ifdef SMACK_IPV6_PORT_LABELING
>  /**
>   * smack_socket_bind - record port binding information.
> @@ -4724,6 +4745,7 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(unix_may_send, smack_unix_may_send),
>  
>  	LSM_HOOK_INIT(socket_post_create, smack_socket_post_create),
> +	LSM_HOOK_INIT(socket_socketpair, smack_socket_socketpair),
>  #ifdef SMACK_IPV6_PORT_LABELING
>  	LSM_HOOK_INIT(socket_bind, smack_socket_bind),
>  #endif

^ permalink raw reply

* [PATCH iproute2] rdma: fix header files
From: Stephen Hemminger @ 2018-05-04 21:56 UTC (permalink / raw)
  To: swise; +Cc: netdev, Stephen Hemminger

All user api headers in iproute2 should be in include/uapi
so that script can be used to put correct sanitized kernel headers
there. And the header files for rdma must be a complete set; if one
header file includes another, all must be present.

This fixes build on older distributions, and Windows Services
for Linux.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 include/uapi/rdma/ib_user_sa.h                |   77 ++
 include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
 .../uapi/rdma/rdma_netlink.h                  |   13 +
 .../uapi/rdma/rdma_user_cm.h                  |    6 +-
 4 files changed, 1303 insertions(+), 3 deletions(-)
 create mode 100644 include/uapi/rdma/ib_user_sa.h
 create mode 100644 include/uapi/rdma/ib_user_verbs.h
 rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
 rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)

diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h
new file mode 100644
index 000000000000..0d2607f0cd20
--- /dev/null
+++ b/include/uapi/rdma/ib_user_sa.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USER_SA_H
+#define IB_USER_SA_H
+
+#include <linux/types.h>
+
+enum {
+	IB_PATH_GMP		= 1,
+	IB_PATH_PRIMARY		= (1<<1),
+	IB_PATH_ALTERNATE	= (1<<2),
+	IB_PATH_OUTBOUND	= (1<<3),
+	IB_PATH_INBOUND		= (1<<4),
+	IB_PATH_INBOUND_REVERSE = (1<<5),
+	IB_PATH_BIDIRECTIONAL	= IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE
+};
+
+struct ib_path_rec_data {
+	__u32	flags;
+	__u32	reserved;
+	__u32	path_rec[16];
+};
+
+struct ib_user_path_rec {
+	__u8	dgid[16];
+	__u8	sgid[16];
+	__be16	dlid;
+	__be16	slid;
+	__u32	raw_traffic;
+	__be32	flow_label;
+	__u32	reversible;
+	__u32	mtu;
+	__be16	pkey;
+	__u8	hop_limit;
+	__u8	traffic_class;
+	__u8	numb_path;
+	__u8	sl;
+	__u8	mtu_selector;
+	__u8	rate_selector;
+	__u8	rate;
+	__u8	packet_life_time_selector;
+	__u8	packet_life_time;
+	__u8	preference;
+};
+
+#endif /* IB_USER_SA_H */
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
new file mode 100644
index 000000000000..9be07394fdbe
--- /dev/null
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -0,0 +1,1210 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
+ * Copyright (c) 2006 Mellanox Technologies.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USER_VERBS_H
+#define IB_USER_VERBS_H
+
+#include <linux/types.h>
+
+/*
+ * Increment this value if any changes that break userspace ABI
+ * compatibility are made.
+ */
+#define IB_USER_VERBS_ABI_VERSION	6
+#define IB_USER_VERBS_CMD_THRESHOLD    50
+
+enum {
+	IB_USER_VERBS_CMD_GET_CONTEXT,
+	IB_USER_VERBS_CMD_QUERY_DEVICE,
+	IB_USER_VERBS_CMD_QUERY_PORT,
+	IB_USER_VERBS_CMD_ALLOC_PD,
+	IB_USER_VERBS_CMD_DEALLOC_PD,
+	IB_USER_VERBS_CMD_CREATE_AH,
+	IB_USER_VERBS_CMD_MODIFY_AH,
+	IB_USER_VERBS_CMD_QUERY_AH,
+	IB_USER_VERBS_CMD_DESTROY_AH,
+	IB_USER_VERBS_CMD_REG_MR,
+	IB_USER_VERBS_CMD_REG_SMR,
+	IB_USER_VERBS_CMD_REREG_MR,
+	IB_USER_VERBS_CMD_QUERY_MR,
+	IB_USER_VERBS_CMD_DEREG_MR,
+	IB_USER_VERBS_CMD_ALLOC_MW,
+	IB_USER_VERBS_CMD_BIND_MW,
+	IB_USER_VERBS_CMD_DEALLOC_MW,
+	IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL,
+	IB_USER_VERBS_CMD_CREATE_CQ,
+	IB_USER_VERBS_CMD_RESIZE_CQ,
+	IB_USER_VERBS_CMD_DESTROY_CQ,
+	IB_USER_VERBS_CMD_POLL_CQ,
+	IB_USER_VERBS_CMD_PEEK_CQ,
+	IB_USER_VERBS_CMD_REQ_NOTIFY_CQ,
+	IB_USER_VERBS_CMD_CREATE_QP,
+	IB_USER_VERBS_CMD_QUERY_QP,
+	IB_USER_VERBS_CMD_MODIFY_QP,
+	IB_USER_VERBS_CMD_DESTROY_QP,
+	IB_USER_VERBS_CMD_POST_SEND,
+	IB_USER_VERBS_CMD_POST_RECV,
+	IB_USER_VERBS_CMD_ATTACH_MCAST,
+	IB_USER_VERBS_CMD_DETACH_MCAST,
+	IB_USER_VERBS_CMD_CREATE_SRQ,
+	IB_USER_VERBS_CMD_MODIFY_SRQ,
+	IB_USER_VERBS_CMD_QUERY_SRQ,
+	IB_USER_VERBS_CMD_DESTROY_SRQ,
+	IB_USER_VERBS_CMD_POST_SRQ_RECV,
+	IB_USER_VERBS_CMD_OPEN_XRCD,
+	IB_USER_VERBS_CMD_CLOSE_XRCD,
+	IB_USER_VERBS_CMD_CREATE_XSRQ,
+	IB_USER_VERBS_CMD_OPEN_QP,
+};
+
+enum {
+	IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE,
+	IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ,
+	IB_USER_VERBS_EX_CMD_CREATE_QP = IB_USER_VERBS_CMD_CREATE_QP,
+	IB_USER_VERBS_EX_CMD_MODIFY_QP = IB_USER_VERBS_CMD_MODIFY_QP,
+	IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
+	IB_USER_VERBS_EX_CMD_DESTROY_FLOW,
+	IB_USER_VERBS_EX_CMD_CREATE_WQ,
+	IB_USER_VERBS_EX_CMD_MODIFY_WQ,
+	IB_USER_VERBS_EX_CMD_DESTROY_WQ,
+	IB_USER_VERBS_EX_CMD_CREATE_RWQ_IND_TBL,
+	IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL,
+	IB_USER_VERBS_EX_CMD_MODIFY_CQ
+};
+
+/*
+ * Make sure that all structs defined in this file remain laid out so
+ * that they pack the same way on 32-bit and 64-bit architectures (to
+ * avoid incompatibility between 32-bit userspace and 64-bit kernels).
+ * Specifically:
+ *  - Do not use pointer types -- pass pointers in __u64 instead.
+ *  - Make sure that any structure larger than 4 bytes is padded to a
+ *    multiple of 8 bytes.  Otherwise the structure size will be
+ *    different between 32-bit and 64-bit architectures.
+ */
+
+struct ib_uverbs_async_event_desc {
+	__aligned_u64 element;
+	__u32 event_type;	/* enum ib_event_type */
+	__u32 reserved;
+};
+
+struct ib_uverbs_comp_event_desc {
+	__aligned_u64 cq_handle;
+};
+
+struct ib_uverbs_cq_moderation_caps {
+	__u16     max_cq_moderation_count;
+	__u16     max_cq_moderation_period;
+	__u32     reserved;
+};
+
+/*
+ * All commands from userspace should start with a __u32 command field
+ * followed by __u16 in_words and out_words fields (which give the
+ * length of the command block and response buffer if any in 32-bit
+ * words).  The kernel driver will read these fields first and read
+ * the rest of the command struct based on these value.
+ */
+
+#define IB_USER_VERBS_CMD_COMMAND_MASK 0xff
+#define IB_USER_VERBS_CMD_FLAG_EXTENDED 0x80000000u
+
+struct ib_uverbs_cmd_hdr {
+	__u32 command;
+	__u16 in_words;
+	__u16 out_words;
+};
+
+struct ib_uverbs_ex_cmd_hdr {
+	__aligned_u64 response;
+	__u16 provider_in_words;
+	__u16 provider_out_words;
+	__u32 cmd_hdr_reserved;
+};
+
+struct ib_uverbs_get_context {
+	__aligned_u64 response;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_get_context_resp {
+	__u32 async_fd;
+	__u32 num_comp_vectors;
+};
+
+struct ib_uverbs_query_device {
+	__aligned_u64 response;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_query_device_resp {
+	__aligned_u64 fw_ver;
+	__be64 node_guid;
+	__be64 sys_image_guid;
+	__aligned_u64 max_mr_size;
+	__aligned_u64 page_size_cap;
+	__u32 vendor_id;
+	__u32 vendor_part_id;
+	__u32 hw_ver;
+	__u32 max_qp;
+	__u32 max_qp_wr;
+	__u32 device_cap_flags;
+	__u32 max_sge;
+	__u32 max_sge_rd;
+	__u32 max_cq;
+	__u32 max_cqe;
+	__u32 max_mr;
+	__u32 max_pd;
+	__u32 max_qp_rd_atom;
+	__u32 max_ee_rd_atom;
+	__u32 max_res_rd_atom;
+	__u32 max_qp_init_rd_atom;
+	__u32 max_ee_init_rd_atom;
+	__u32 atomic_cap;
+	__u32 max_ee;
+	__u32 max_rdd;
+	__u32 max_mw;
+	__u32 max_raw_ipv6_qp;
+	__u32 max_raw_ethy_qp;
+	__u32 max_mcast_grp;
+	__u32 max_mcast_qp_attach;
+	__u32 max_total_mcast_qp_attach;
+	__u32 max_ah;
+	__u32 max_fmr;
+	__u32 max_map_per_fmr;
+	__u32 max_srq;
+	__u32 max_srq_wr;
+	__u32 max_srq_sge;
+	__u16 max_pkeys;
+	__u8  local_ca_ack_delay;
+	__u8  phys_port_cnt;
+	__u8  reserved[4];
+};
+
+struct ib_uverbs_ex_query_device {
+	__u32 comp_mask;
+	__u32 reserved;
+};
+
+struct ib_uverbs_odp_caps {
+	__aligned_u64 general_caps;
+	struct {
+		__u32 rc_odp_caps;
+		__u32 uc_odp_caps;
+		__u32 ud_odp_caps;
+	} per_transport_caps;
+	__u32 reserved;
+};
+
+struct ib_uverbs_rss_caps {
+	/* Corresponding bit will be set if qp type from
+	 * 'enum ib_qp_type' is supported, e.g.
+	 * supported_qpts |= 1 << IB_QPT_UD
+	 */
+	__u32 supported_qpts;
+	__u32 max_rwq_indirection_tables;
+	__u32 max_rwq_indirection_table_size;
+	__u32 reserved;
+};
+
+struct ib_uverbs_tm_caps {
+	/* Max size of rendezvous request message */
+	__u32 max_rndv_hdr_size;
+	/* Max number of entries in tag matching list */
+	__u32 max_num_tags;
+	/* TM flags */
+	__u32 flags;
+	/* Max number of outstanding list operations */
+	__u32 max_ops;
+	/* Max number of SGE in tag matching entry */
+	__u32 max_sge;
+	__u32 reserved;
+};
+
+struct ib_uverbs_ex_query_device_resp {
+	struct ib_uverbs_query_device_resp base;
+	__u32 comp_mask;
+	__u32 response_length;
+	struct ib_uverbs_odp_caps odp_caps;
+	__aligned_u64 timestamp_mask;
+	__aligned_u64 hca_core_clock; /* in KHZ */
+	__aligned_u64 device_cap_flags_ex;
+	struct ib_uverbs_rss_caps rss_caps;
+	__u32  max_wq_type_rq;
+	__u32 raw_packet_caps;
+	struct ib_uverbs_tm_caps tm_caps;
+	struct ib_uverbs_cq_moderation_caps cq_moderation_caps;
+	__aligned_u64 max_dm_size;
+};
+
+struct ib_uverbs_query_port {
+	__aligned_u64 response;
+	__u8  port_num;
+	__u8  reserved[7];
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_query_port_resp {
+	__u32 port_cap_flags;
+	__u32 max_msg_sz;
+	__u32 bad_pkey_cntr;
+	__u32 qkey_viol_cntr;
+	__u32 gid_tbl_len;
+	__u16 pkey_tbl_len;
+	__u16 lid;
+	__u16 sm_lid;
+	__u8  state;
+	__u8  max_mtu;
+	__u8  active_mtu;
+	__u8  lmc;
+	__u8  max_vl_num;
+	__u8  sm_sl;
+	__u8  subnet_timeout;
+	__u8  init_type_reply;
+	__u8  active_width;
+	__u8  active_speed;
+	__u8  phys_state;
+	__u8  link_layer;
+	__u8  reserved[2];
+};
+
+struct ib_uverbs_alloc_pd {
+	__aligned_u64 response;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_alloc_pd_resp {
+	__u32 pd_handle;
+};
+
+struct ib_uverbs_dealloc_pd {
+	__u32 pd_handle;
+};
+
+struct ib_uverbs_open_xrcd {
+	__aligned_u64 response;
+	__u32 fd;
+	__u32 oflags;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_open_xrcd_resp {
+	__u32 xrcd_handle;
+};
+
+struct ib_uverbs_close_xrcd {
+	__u32 xrcd_handle;
+};
+
+struct ib_uverbs_reg_mr {
+	__aligned_u64 response;
+	__aligned_u64 start;
+	__aligned_u64 length;
+	__aligned_u64 hca_va;
+	__u32 pd_handle;
+	__u32 access_flags;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_reg_mr_resp {
+	__u32 mr_handle;
+	__u32 lkey;
+	__u32 rkey;
+};
+
+struct ib_uverbs_rereg_mr {
+	__aligned_u64 response;
+	__u32 mr_handle;
+	__u32 flags;
+	__aligned_u64 start;
+	__aligned_u64 length;
+	__aligned_u64 hca_va;
+	__u32 pd_handle;
+	__u32 access_flags;
+};
+
+struct ib_uverbs_rereg_mr_resp {
+	__u32 lkey;
+	__u32 rkey;
+};
+
+struct ib_uverbs_dereg_mr {
+	__u32 mr_handle;
+};
+
+struct ib_uverbs_alloc_mw {
+	__aligned_u64 response;
+	__u32 pd_handle;
+	__u8  mw_type;
+	__u8  reserved[3];
+};
+
+struct ib_uverbs_alloc_mw_resp {
+	__u32 mw_handle;
+	__u32 rkey;
+};
+
+struct ib_uverbs_dealloc_mw {
+	__u32 mw_handle;
+};
+
+struct ib_uverbs_create_comp_channel {
+	__aligned_u64 response;
+};
+
+struct ib_uverbs_create_comp_channel_resp {
+	__u32 fd;
+};
+
+struct ib_uverbs_create_cq {
+	__aligned_u64 response;
+	__aligned_u64 user_handle;
+	__u32 cqe;
+	__u32 comp_vector;
+	__s32 comp_channel;
+	__u32 reserved;
+	__aligned_u64 driver_data[0];
+};
+
+enum ib_uverbs_ex_create_cq_flags {
+	IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION = 1 << 0,
+	IB_UVERBS_CQ_FLAGS_IGNORE_OVERRUN = 1 << 1,
+};
+
+struct ib_uverbs_ex_create_cq {
+	__aligned_u64 user_handle;
+	__u32 cqe;
+	__u32 comp_vector;
+	__s32 comp_channel;
+	__u32 comp_mask;
+	__u32 flags;  /* bitmask of ib_uverbs_ex_create_cq_flags */
+	__u32 reserved;
+};
+
+struct ib_uverbs_create_cq_resp {
+	__u32 cq_handle;
+	__u32 cqe;
+};
+
+struct ib_uverbs_ex_create_cq_resp {
+	struct ib_uverbs_create_cq_resp base;
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
+struct ib_uverbs_resize_cq {
+	__aligned_u64 response;
+	__u32 cq_handle;
+	__u32 cqe;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_resize_cq_resp {
+	__u32 cqe;
+	__u32 reserved;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_poll_cq {
+	__aligned_u64 response;
+	__u32 cq_handle;
+	__u32 ne;
+};
+
+struct ib_uverbs_wc {
+	__aligned_u64 wr_id;
+	__u32 status;
+	__u32 opcode;
+	__u32 vendor_err;
+	__u32 byte_len;
+	union {
+		__be32 imm_data;
+		__u32 invalidate_rkey;
+	} ex;
+	__u32 qp_num;
+	__u32 src_qp;
+	__u32 wc_flags;
+	__u16 pkey_index;
+	__u16 slid;
+	__u8 sl;
+	__u8 dlid_path_bits;
+	__u8 port_num;
+	__u8 reserved;
+};
+
+struct ib_uverbs_poll_cq_resp {
+	__u32 count;
+	__u32 reserved;
+	struct ib_uverbs_wc wc[0];
+};
+
+struct ib_uverbs_req_notify_cq {
+	__u32 cq_handle;
+	__u32 solicited_only;
+};
+
+struct ib_uverbs_destroy_cq {
+	__aligned_u64 response;
+	__u32 cq_handle;
+	__u32 reserved;
+};
+
+struct ib_uverbs_destroy_cq_resp {
+	__u32 comp_events_reported;
+	__u32 async_events_reported;
+};
+
+struct ib_uverbs_global_route {
+	__u8  dgid[16];
+	__u32 flow_label;
+	__u8  sgid_index;
+	__u8  hop_limit;
+	__u8  traffic_class;
+	__u8  reserved;
+};
+
+struct ib_uverbs_ah_attr {
+	struct ib_uverbs_global_route grh;
+	__u16 dlid;
+	__u8  sl;
+	__u8  src_path_bits;
+	__u8  static_rate;
+	__u8  is_global;
+	__u8  port_num;
+	__u8  reserved;
+};
+
+struct ib_uverbs_qp_attr {
+	__u32	qp_attr_mask;
+	__u32	qp_state;
+	__u32	cur_qp_state;
+	__u32	path_mtu;
+	__u32	path_mig_state;
+	__u32	qkey;
+	__u32	rq_psn;
+	__u32	sq_psn;
+	__u32	dest_qp_num;
+	__u32	qp_access_flags;
+
+	struct ib_uverbs_ah_attr ah_attr;
+	struct ib_uverbs_ah_attr alt_ah_attr;
+
+	/* ib_qp_cap */
+	__u32	max_send_wr;
+	__u32	max_recv_wr;
+	__u32	max_send_sge;
+	__u32	max_recv_sge;
+	__u32	max_inline_data;
+
+	__u16	pkey_index;
+	__u16	alt_pkey_index;
+	__u8	en_sqd_async_notify;
+	__u8	sq_draining;
+	__u8	max_rd_atomic;
+	__u8	max_dest_rd_atomic;
+	__u8	min_rnr_timer;
+	__u8	port_num;
+	__u8	timeout;
+	__u8	retry_cnt;
+	__u8	rnr_retry;
+	__u8	alt_port_num;
+	__u8	alt_timeout;
+	__u8	reserved[5];
+};
+
+struct ib_uverbs_create_qp {
+	__aligned_u64 response;
+	__aligned_u64 user_handle;
+	__u32 pd_handle;
+	__u32 send_cq_handle;
+	__u32 recv_cq_handle;
+	__u32 srq_handle;
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u8  sq_sig_all;
+	__u8  qp_type;
+	__u8  is_srq;
+	__u8  reserved;
+	__aligned_u64 driver_data[0];
+};
+
+enum ib_uverbs_create_qp_mask {
+	IB_UVERBS_CREATE_QP_MASK_IND_TABLE = 1UL << 0,
+};
+
+enum {
+	IB_UVERBS_CREATE_QP_SUP_COMP_MASK = IB_UVERBS_CREATE_QP_MASK_IND_TABLE,
+};
+
+enum {
+	/*
+	 * This value is equal to IB_QP_DEST_QPN.
+	 */
+	IB_USER_LEGACY_LAST_QP_ATTR_MASK = 1ULL << 20,
+};
+
+enum {
+	/*
+	 * This value is equal to IB_QP_RATE_LIMIT.
+	 */
+	IB_USER_LAST_QP_ATTR_MASK = 1ULL << 25,
+};
+
+struct ib_uverbs_ex_create_qp {
+	__aligned_u64 user_handle;
+	__u32 pd_handle;
+	__u32 send_cq_handle;
+	__u32 recv_cq_handle;
+	__u32 srq_handle;
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u8  sq_sig_all;
+	__u8  qp_type;
+	__u8  is_srq;
+	__u8 reserved;
+	__u32 comp_mask;
+	__u32 create_flags;
+	__u32 rwq_ind_tbl_handle;
+	__u32  source_qpn;
+};
+
+struct ib_uverbs_open_qp {
+	__aligned_u64 response;
+	__aligned_u64 user_handle;
+	__u32 pd_handle;
+	__u32 qpn;
+	__u8  qp_type;
+	__u8  reserved[7];
+	__aligned_u64 driver_data[0];
+};
+
+/* also used for open response */
+struct ib_uverbs_create_qp_resp {
+	__u32 qp_handle;
+	__u32 qpn;
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u32 reserved;
+};
+
+struct ib_uverbs_ex_create_qp_resp {
+	struct ib_uverbs_create_qp_resp base;
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
+/*
+ * This struct needs to remain a multiple of 8 bytes to keep the
+ * alignment of the modify QP parameters.
+ */
+struct ib_uverbs_qp_dest {
+	__u8  dgid[16];
+	__u32 flow_label;
+	__u16 dlid;
+	__u16 reserved;
+	__u8  sgid_index;
+	__u8  hop_limit;
+	__u8  traffic_class;
+	__u8  sl;
+	__u8  src_path_bits;
+	__u8  static_rate;
+	__u8  is_global;
+	__u8  port_num;
+};
+
+struct ib_uverbs_query_qp {
+	__aligned_u64 response;
+	__u32 qp_handle;
+	__u32 attr_mask;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_query_qp_resp {
+	struct ib_uverbs_qp_dest dest;
+	struct ib_uverbs_qp_dest alt_dest;
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u32 qkey;
+	__u32 rq_psn;
+	__u32 sq_psn;
+	__u32 dest_qp_num;
+	__u32 qp_access_flags;
+	__u16 pkey_index;
+	__u16 alt_pkey_index;
+	__u8  qp_state;
+	__u8  cur_qp_state;
+	__u8  path_mtu;
+	__u8  path_mig_state;
+	__u8  sq_draining;
+	__u8  max_rd_atomic;
+	__u8  max_dest_rd_atomic;
+	__u8  min_rnr_timer;
+	__u8  port_num;
+	__u8  timeout;
+	__u8  retry_cnt;
+	__u8  rnr_retry;
+	__u8  alt_port_num;
+	__u8  alt_timeout;
+	__u8  sq_sig_all;
+	__u8  reserved[5];
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_modify_qp {
+	struct ib_uverbs_qp_dest dest;
+	struct ib_uverbs_qp_dest alt_dest;
+	__u32 qp_handle;
+	__u32 attr_mask;
+	__u32 qkey;
+	__u32 rq_psn;
+	__u32 sq_psn;
+	__u32 dest_qp_num;
+	__u32 qp_access_flags;
+	__u16 pkey_index;
+	__u16 alt_pkey_index;
+	__u8  qp_state;
+	__u8  cur_qp_state;
+	__u8  path_mtu;
+	__u8  path_mig_state;
+	__u8  en_sqd_async_notify;
+	__u8  max_rd_atomic;
+	__u8  max_dest_rd_atomic;
+	__u8  min_rnr_timer;
+	__u8  port_num;
+	__u8  timeout;
+	__u8  retry_cnt;
+	__u8  rnr_retry;
+	__u8  alt_port_num;
+	__u8  alt_timeout;
+	__u8  reserved[2];
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_ex_modify_qp {
+	struct ib_uverbs_modify_qp base;
+	__u32	rate_limit;
+	__u32	reserved;
+};
+
+struct ib_uverbs_modify_qp_resp {
+};
+
+struct ib_uverbs_ex_modify_qp_resp {
+	__u32  comp_mask;
+	__u32  response_length;
+};
+
+struct ib_uverbs_destroy_qp {
+	__aligned_u64 response;
+	__u32 qp_handle;
+	__u32 reserved;
+};
+
+struct ib_uverbs_destroy_qp_resp {
+	__u32 events_reported;
+};
+
+/*
+ * The ib_uverbs_sge structure isn't used anywhere, since we assume
+ * the ib_sge structure is packed the same way on 32-bit and 64-bit
+ * architectures in both kernel and user space.  It's just here to
+ * document the ABI.
+ */
+struct ib_uverbs_sge {
+	__aligned_u64 addr;
+	__u32 length;
+	__u32 lkey;
+};
+
+struct ib_uverbs_send_wr {
+	__aligned_u64 wr_id;
+	__u32 num_sge;
+	__u32 opcode;
+	__u32 send_flags;
+	union {
+		__be32 imm_data;
+		__u32 invalidate_rkey;
+	} ex;
+	union {
+		struct {
+			__aligned_u64 remote_addr;
+			__u32 rkey;
+			__u32 reserved;
+		} rdma;
+		struct {
+			__aligned_u64 remote_addr;
+			__aligned_u64 compare_add;
+			__aligned_u64 swap;
+			__u32 rkey;
+			__u32 reserved;
+		} atomic;
+		struct {
+			__u32 ah;
+			__u32 remote_qpn;
+			__u32 remote_qkey;
+			__u32 reserved;
+		} ud;
+	} wr;
+};
+
+struct ib_uverbs_post_send {
+	__aligned_u64 response;
+	__u32 qp_handle;
+	__u32 wr_count;
+	__u32 sge_count;
+	__u32 wqe_size;
+	struct ib_uverbs_send_wr send_wr[0];
+};
+
+struct ib_uverbs_post_send_resp {
+	__u32 bad_wr;
+};
+
+struct ib_uverbs_recv_wr {
+	__aligned_u64 wr_id;
+	__u32 num_sge;
+	__u32 reserved;
+};
+
+struct ib_uverbs_post_recv {
+	__aligned_u64 response;
+	__u32 qp_handle;
+	__u32 wr_count;
+	__u32 sge_count;
+	__u32 wqe_size;
+	struct ib_uverbs_recv_wr recv_wr[0];
+};
+
+struct ib_uverbs_post_recv_resp {
+	__u32 bad_wr;
+};
+
+struct ib_uverbs_post_srq_recv {
+	__aligned_u64 response;
+	__u32 srq_handle;
+	__u32 wr_count;
+	__u32 sge_count;
+	__u32 wqe_size;
+	struct ib_uverbs_recv_wr recv[0];
+};
+
+struct ib_uverbs_post_srq_recv_resp {
+	__u32 bad_wr;
+};
+
+struct ib_uverbs_create_ah {
+	__aligned_u64 response;
+	__aligned_u64 user_handle;
+	__u32 pd_handle;
+	__u32 reserved;
+	struct ib_uverbs_ah_attr attr;
+};
+
+struct ib_uverbs_create_ah_resp {
+	__u32 ah_handle;
+};
+
+struct ib_uverbs_destroy_ah {
+	__u32 ah_handle;
+};
+
+struct ib_uverbs_attach_mcast {
+	__u8  gid[16];
+	__u32 qp_handle;
+	__u16 mlid;
+	__u16 reserved;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_detach_mcast {
+	__u8  gid[16];
+	__u32 qp_handle;
+	__u16 mlid;
+	__u16 reserved;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_flow_spec_hdr {
+	__u32 type;
+	__u16 size;
+	__u16 reserved;
+	/* followed by flow_spec */
+	__aligned_u64 flow_spec_data[0];
+};
+
+struct ib_uverbs_flow_eth_filter {
+	__u8  dst_mac[6];
+	__u8  src_mac[6];
+	__be16 ether_type;
+	__be16 vlan_tag;
+};
+
+struct ib_uverbs_flow_spec_eth {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	struct ib_uverbs_flow_eth_filter val;
+	struct ib_uverbs_flow_eth_filter mask;
+};
+
+struct ib_uverbs_flow_ipv4_filter {
+	__be32 src_ip;
+	__be32 dst_ip;
+	__u8	proto;
+	__u8	tos;
+	__u8	ttl;
+	__u8	flags;
+};
+
+struct ib_uverbs_flow_spec_ipv4 {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	struct ib_uverbs_flow_ipv4_filter val;
+	struct ib_uverbs_flow_ipv4_filter mask;
+};
+
+struct ib_uverbs_flow_tcp_udp_filter {
+	__be16 dst_port;
+	__be16 src_port;
+};
+
+struct ib_uverbs_flow_spec_tcp_udp {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	struct ib_uverbs_flow_tcp_udp_filter val;
+	struct ib_uverbs_flow_tcp_udp_filter mask;
+};
+
+struct ib_uverbs_flow_ipv6_filter {
+	__u8    src_ip[16];
+	__u8    dst_ip[16];
+	__be32	flow_label;
+	__u8	next_hdr;
+	__u8	traffic_class;
+	__u8	hop_limit;
+	__u8	reserved;
+};
+
+struct ib_uverbs_flow_spec_ipv6 {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	struct ib_uverbs_flow_ipv6_filter val;
+	struct ib_uverbs_flow_ipv6_filter mask;
+};
+
+struct ib_uverbs_flow_spec_action_tag {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	__u32			      tag_id;
+	__u32			      reserved1;
+};
+
+struct ib_uverbs_flow_spec_action_drop {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+};
+
+struct ib_uverbs_flow_spec_action_handle {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	__u32			      handle;
+	__u32			      reserved1;
+};
+
+struct ib_uverbs_flow_tunnel_filter {
+	__be32 tunnel_id;
+};
+
+struct ib_uverbs_flow_spec_tunnel {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	struct ib_uverbs_flow_tunnel_filter val;
+	struct ib_uverbs_flow_tunnel_filter mask;
+};
+
+struct ib_uverbs_flow_spec_esp_filter {
+	__u32 spi;
+	__u32 seq;
+};
+
+struct ib_uverbs_flow_spec_esp {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	struct ib_uverbs_flow_spec_esp_filter val;
+	struct ib_uverbs_flow_spec_esp_filter mask;
+};
+
+struct ib_uverbs_flow_attr {
+	__u32 type;
+	__u16 size;
+	__u16 priority;
+	__u8  num_of_specs;
+	__u8  reserved[2];
+	__u8  port;
+	__u32 flags;
+	/* Following are the optional layers according to user request
+	 * struct ib_flow_spec_xxx
+	 * struct ib_flow_spec_yyy
+	 */
+	struct ib_uverbs_flow_spec_hdr flow_specs[0];
+};
+
+struct ib_uverbs_create_flow  {
+	__u32 comp_mask;
+	__u32 qp_handle;
+	struct ib_uverbs_flow_attr flow_attr;
+};
+
+struct ib_uverbs_create_flow_resp {
+	__u32 comp_mask;
+	__u32 flow_handle;
+};
+
+struct ib_uverbs_destroy_flow  {
+	__u32 comp_mask;
+	__u32 flow_handle;
+};
+
+struct ib_uverbs_create_srq {
+	__aligned_u64 response;
+	__aligned_u64 user_handle;
+	__u32 pd_handle;
+	__u32 max_wr;
+	__u32 max_sge;
+	__u32 srq_limit;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_create_xsrq {
+	__aligned_u64 response;
+	__aligned_u64 user_handle;
+	__u32 srq_type;
+	__u32 pd_handle;
+	__u32 max_wr;
+	__u32 max_sge;
+	__u32 srq_limit;
+	__u32 max_num_tags;
+	__u32 xrcd_handle;
+	__u32 cq_handle;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_create_srq_resp {
+	__u32 srq_handle;
+	__u32 max_wr;
+	__u32 max_sge;
+	__u32 srqn;
+};
+
+struct ib_uverbs_modify_srq {
+	__u32 srq_handle;
+	__u32 attr_mask;
+	__u32 max_wr;
+	__u32 srq_limit;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_query_srq {
+	__aligned_u64 response;
+	__u32 srq_handle;
+	__u32 reserved;
+	__aligned_u64 driver_data[0];
+};
+
+struct ib_uverbs_query_srq_resp {
+	__u32 max_wr;
+	__u32 max_sge;
+	__u32 srq_limit;
+	__u32 reserved;
+};
+
+struct ib_uverbs_destroy_srq {
+	__aligned_u64 response;
+	__u32 srq_handle;
+	__u32 reserved;
+};
+
+struct ib_uverbs_destroy_srq_resp {
+	__u32 events_reported;
+};
+
+struct ib_uverbs_ex_create_wq  {
+	__u32 comp_mask;
+	__u32 wq_type;
+	__aligned_u64 user_handle;
+	__u32 pd_handle;
+	__u32 cq_handle;
+	__u32 max_wr;
+	__u32 max_sge;
+	__u32 create_flags; /* Use enum ib_wq_flags */
+	__u32 reserved;
+};
+
+struct ib_uverbs_ex_create_wq_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+	__u32 wq_handle;
+	__u32 max_wr;
+	__u32 max_sge;
+	__u32 wqn;
+};
+
+struct ib_uverbs_ex_destroy_wq  {
+	__u32 comp_mask;
+	__u32 wq_handle;
+};
+
+struct ib_uverbs_ex_destroy_wq_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+	__u32 events_reported;
+	__u32 reserved;
+};
+
+struct ib_uverbs_ex_modify_wq  {
+	__u32 attr_mask;
+	__u32 wq_handle;
+	__u32 wq_state;
+	__u32 curr_wq_state;
+	__u32 flags; /* Use enum ib_wq_flags */
+	__u32 flags_mask; /* Use enum ib_wq_flags */
+};
+
+/* Prevent memory allocation rather than max expected size */
+#define IB_USER_VERBS_MAX_LOG_IND_TBL_SIZE 0x0d
+struct ib_uverbs_ex_create_rwq_ind_table  {
+	__u32 comp_mask;
+	__u32 log_ind_tbl_size;
+	/* Following are the wq handles according to log_ind_tbl_size
+	 * wq_handle1
+	 * wq_handle2
+	 */
+	__u32 wq_handles[0];
+};
+
+struct ib_uverbs_ex_create_rwq_ind_table_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+	__u32 ind_tbl_handle;
+	__u32 ind_tbl_num;
+};
+
+struct ib_uverbs_ex_destroy_rwq_ind_table  {
+	__u32 comp_mask;
+	__u32 ind_tbl_handle;
+};
+
+struct ib_uverbs_cq_moderation {
+	__u16 cq_count;
+	__u16 cq_period;
+};
+
+struct ib_uverbs_ex_modify_cq {
+	__u32 cq_handle;
+	__u32 attr_mask;
+	struct ib_uverbs_cq_moderation attr;
+	__u32 reserved;
+};
+
+#define IB_DEVICE_NAME_MAX 64
+
+#endif /* IB_USER_VERBS_H */
diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
similarity index 95%
rename from rdma/include/uapi/rdma/rdma_netlink.h
rename to include/uapi/rdma/rdma_netlink.h
index 9446a72136e8..60416ed71c0f 100644
--- a/rdma/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -388,6 +388,19 @@ enum rdma_nldev_attr {
 	RDMA_NLDEV_ATTR_RES_LOCAL_DMA_LKEY,	/* u32 */
 	RDMA_NLDEV_ATTR_RES_UNSAFE_GLOBAL_RKEY,	/* u32 */
 
+	/*
+	 * Provides logical name and index of netdevice which is
+	 * connected to physical port. This information is relevant
+	 * for RoCE and iWARP.
+	 *
+	 * The netdevices which are associated with containers are
+	 * supposed to be exported together with GID table once it
+	 * will be exposed through the netlink. Because the
+	 * associated netdevices are properties of GIDs.
+	 */
+	RDMA_NLDEV_ATTR_NDEV_INDEX,		/* u32 */
+	RDMA_NLDEV_ATTR_NDEV_NAME,		/* string */
+
 	RDMA_NLDEV_ATTR_MAX
 };
 #endif /* _RDMA_NETLINK_H */
diff --git a/rdma/include/uapi/rdma/rdma_user_cm.h b/include/uapi/rdma/rdma_user_cm.h
similarity index 98%
rename from rdma/include/uapi/rdma/rdma_user_cm.h
rename to include/uapi/rdma/rdma_user_cm.h
index da099af0ace7..e1269024af47 100644
--- a/rdma/include/uapi/rdma/rdma_user_cm.h
+++ b/include/uapi/rdma/rdma_user_cm.h
@@ -31,8 +31,8 @@
  * SOFTWARE.
  */
 
-#ifndef _RDMA_USER_CM_H
-#define _RDMA_USER_CM_H
+#ifndef RDMA_USER_CM_H
+#define RDMA_USER_CM_H
 
 #include <linux/types.h>
 #include <linux/socket.h>
@@ -321,4 +321,4 @@ struct rdma_ucm_migrate_resp {
 	__u32 events_reported;
 };
 
-#endif /* _RDMA_USER_CM_H */
+#endif /* RDMA_USER_CM_H */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next 09/10] tools: bpftool: add simple perf event output reader
From: Daniel Borkmann @ 2018-05-04 21:53 UTC (permalink / raw)
  To: Jakub Kicinski, alexei.starovoitov; +Cc: oss-drivers, netdev
In-Reply-To: <20180504013717.29317-10-jakub.kicinski@netronome.com>

On 05/04/2018 03:37 AM, Jakub Kicinski wrote:
> Users of BPF sooner or later discover perf_event_output() helpers
> and BPF_MAP_TYPE_PERF_EVENT_ARRAY.  Dumping this array type is
> not possible, however, we can add simple reading of perf events.
> Create a new event_pipe subcommand for maps, this sub command
> will only work with BPF_MAP_TYPE_PERF_EVENT_ARRAY maps.
> 
> Parts of the code from samples/bpf/trace_output_user.c.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
[...]

One remark below:

[...]
> +static void
> +print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
> +{
> +	struct {
> +		struct perf_event_header header;
> +		__u64 id;
> +		__u64 lost;
> +	} *lost = (void *)e;
> +	struct timespec ts;
> +
> +	if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
> +		perror("Can't read clock for timestamp");
> +		return;
> +	}
Instead of the timestamp above, probably better to pick it up via
PERF_SAMPLE_TIME which needs to be added to sample_type so it also
ends up in the RB. Given below you poll with 200 and you don't set
a wakeup event for perf RB (it's probably fine not to here, but it
can be done based on watermark or events), the clock_gettime() will
be off compared to when it was actually put into the RB.

> +	if (json_output) {
> +		jsonw_start_object(json_wtr);
> +		jsonw_name(json_wtr, "timestamp");
> +		jsonw_uint(json_wtr, ts.tv_sec * 1000000000ull + ts.tv_nsec);
> +		jsonw_name(json_wtr, "type");
> +		jsonw_uint(json_wtr, e->header.type);
> +		jsonw_name(json_wtr, "cpu");
> +		jsonw_uint(json_wtr, ring->cpu);
> +		jsonw_name(json_wtr, "index");
> +		jsonw_uint(json_wtr, ring->key);
> +		if (e->header.type == PERF_RECORD_SAMPLE) {
> +			jsonw_name(json_wtr, "data");
> +			print_data_json(e->data, e->size);
> +		} else if (e->header.type == PERF_RECORD_LOST) {
> +			jsonw_name(json_wtr, "lost");
> +			jsonw_start_object(json_wtr);
> +			jsonw_name(json_wtr, "id");
> +			jsonw_uint(json_wtr, lost->id);
> +			jsonw_name(json_wtr, "count");
> +			jsonw_uint(json_wtr, lost->lost);
> +			jsonw_end_object(json_wtr);
> +		}
> +		jsonw_end_object(json_wtr);
> +	} else {
> +		if (e->header.type == PERF_RECORD_SAMPLE) {
> +			printf("== @%ld.%ld CPU: %d index: %d =====\n",
> +			       (long)ts.tv_sec, ts.tv_nsec,
> +			       ring->cpu, ring->key);
> +			fprint_hex(stdout, e->data, e->size, " ");
> +			printf("\n");
> +		} else if (e->header.type == PERF_RECORD_LOST) {
> +			printf("lost %lld events\n", lost->lost);
> +		} else {
> +			printf("unknown event type=%d size=%d\n",
> +			       e->header.type, e->header.size);
> +		}

^ permalink raw reply

* [PATCH bpf-next 5/6] bpf: btf: Update tools/include/uapi/linux/btf.h with BTF ID
From: Martin KaFai Lau @ 2018-05-04 21:49 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180504214955.1058805-1-kafai@fb.com>

This patch sync the tools/include/uapi/linux/btf.h with
the newly introduced BTF ID support.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
---
 tools/include/uapi/linux/bpf.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 83a95ae388dd..fff51c187d1e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -96,6 +96,7 @@ enum bpf_cmd {
 	BPF_PROG_QUERY,
 	BPF_RAW_TRACEPOINT_OPEN,
 	BPF_BTF_LOAD,
+	BPF_BTF_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -343,6 +344,7 @@ union bpf_attr {
 			__u32		start_id;
 			__u32		prog_id;
 			__u32		map_id;
+			__u32		btf_id;
 		};
 		__u32		next_id;
 		__u32		open_flags;
@@ -2129,6 +2131,15 @@ struct bpf_map_info {
 	__u32 ifindex;
 	__u64 netns_dev;
 	__u64 netns_ino;
+	__u32 btf_id;
+	__u32 btf_key_id;
+	__u32 btf_value_id;
+} __attribute__((aligned(8)));
+
+struct bpf_btf_info {
+	__aligned_u64 btf;
+	__u32 btf_size;
+	__u32 id;
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 1/6] bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y
From: Martin KaFai Lau @ 2018-05-04 21:49 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180504214955.1058805-1-kafai@fb.com>

If CONFIG_REFCOUNT_FULL=y, refcount_inc() WARN when refcount is 0.
When creating a new btf, the initial btf->refcnt is 0 and
triggered the following:

[   34.855452] refcount_t: increment on 0; use-after-free.
[   34.856252] WARNING: CPU: 6 PID: 1857 at lib/refcount.c:153 refcount_inc+0x26/0x30
....
[   34.868809] Call Trace:
[   34.869168]  btf_new_fd+0x1af6/0x24d0
[   34.869645]  ? btf_type_seq_show+0x200/0x200
[   34.870212]  ? lock_acquire+0x3b0/0x3b0
[   34.870726]  ? security_capable+0x54/0x90
[   34.871247]  __x64_sys_bpf+0x1b2/0x310
[   34.871761]  ? __ia32_sys_bpf+0x310/0x310
[   34.872285]  ? bad_area_access_error+0x310/0x310
[   34.872894]  do_syscall_64+0x95/0x3f0

This patch uses refcount_set() instead.

Reported-by: Yonghong Song <yhs@fb.com>
Tested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 kernel/bpf/btf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 22e1046a1a86..fa0dce0452e7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -1977,7 +1977,7 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
 
 	if (!err) {
 		btf_verifier_env_free(env);
-		btf_get(btf);
+		refcount_set(&btf->refcnt, 1);
 		return btf;
 	}
 
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 3/6] bpf: btf: Add struct bpf_btf_info
From: Martin KaFai Lau @ 2018-05-04 21:49 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180504214955.1058805-1-kafai@fb.com>

During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
info.info is directly filled with the BTF binary data.  It is
not extensible.  In this case, we want to add BTF ID.

This patch adds "struct bpf_btf_info" which has the BTF ID as
one of its member.  The BTF binary data itself is exposed through
the "btf" and "btf_size" members.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
---
 include/uapi/linux/bpf.h |  6 ++++++
 kernel/bpf/btf.c         | 26 +++++++++++++++++++++-----
 kernel/bpf/syscall.c     | 17 ++++++++++++++++-
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6106f23a9a8a..d615c777b573 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2137,6 +2137,12 @@ struct bpf_map_info {
 	__u32 btf_value_id;
 } __attribute__((aligned(8)));
 
+struct bpf_btf_info {
+	__aligned_u64 btf;
+	__u32 btf_size;
+	__u32 id;
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 40950b6bf395..ded10ab47b8a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -2114,12 +2114,28 @@ int btf_get_info_by_fd(const struct btf *btf,
 		       const union bpf_attr *attr,
 		       union bpf_attr __user *uattr)
 {
-	void __user *udata = u64_to_user_ptr(attr->info.info);
-	u32 copy_len = min_t(u32, btf->data_size,
-			     attr->info.info_len);
+	struct bpf_btf_info __user *uinfo;
+	struct bpf_btf_info info = {};
+	u32 info_copy, btf_copy;
+	void __user *ubtf;
+	u32 uinfo_len;
 
-	if (copy_to_user(udata, btf->data, copy_len) ||
-	    put_user(btf->data_size, &uattr->info.info_len))
+	uinfo = u64_to_user_ptr(attr->info.info);
+	uinfo_len = attr->info.info_len;
+
+	info_copy = min_t(u32, uinfo_len, sizeof(info));
+	if (copy_from_user(&info, uinfo, info_copy))
+		return -EFAULT;
+
+	info.id = btf->id;
+	ubtf = u64_to_user_ptr(info.btf);
+	btf_copy = min_t(u32, btf->data_size, info.btf_size);
+	if (copy_to_user(ubtf, btf->data, btf_copy))
+		return -EFAULT;
+	info.btf_size = btf->data_size;
+
+	if (copy_to_user(uinfo, &info, info_copy) ||
+	    put_user(info_copy, &uattr->info.info_len))
 		return -EFAULT;
 
 	return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8b0a45d65454..d2895e3e5cbf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2019,6 +2019,21 @@ static int bpf_map_get_info_by_fd(struct bpf_map *map,
 	return 0;
 }
 
+static int bpf_btf_get_info_by_fd(struct btf *btf,
+				  const union bpf_attr *attr,
+				  union bpf_attr __user *uattr)
+{
+	struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+	u32 info_len = attr->info.info_len;
+	int err;
+
+	err = check_uarg_tail_zero(uinfo, sizeof(*uinfo), info_len);
+	if (err)
+		return err;
+
+	return btf_get_info_by_fd(btf, attr, uattr);
+}
+
 #define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info
 
 static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
@@ -2042,7 +2057,7 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 		err = bpf_map_get_info_by_fd(f.file->private_data, attr,
 					     uattr);
 	else if (f.file->f_op == &btf_fops)
-		err = btf_get_info_by_fd(f.file->private_data, attr, uattr);
+		err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr);
 	else
 		err = -EINVAL;
 
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 4/6] bpf: btf: Some test_btf clean up
From: Martin KaFai Lau @ 2018-05-04 21:49 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180504214955.1058805-1-kafai@fb.com>

This patch adds a CHECK() macro for condition checking
and error report purpose.  Something similar to test_progs.c

It also counts the number of tests passed/skipped/failed and
print them at the end of the test run.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
---
 tools/testing/selftests/bpf/test_btf.c | 201 ++++++++++++++++-----------------
 1 file changed, 99 insertions(+), 102 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index 7b39b1f712a1..b7880a20fad1 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -20,6 +20,30 @@
 
 #include "bpf_rlimit.h"
 
+static uint32_t pass_cnt;
+static uint32_t error_cnt;
+static uint32_t skip_cnt;
+
+#define CHECK(condition, format...) ({					\
+	int __ret = !!(condition);					\
+	if (__ret) {							\
+		fprintf(stderr, "%s:%d:FAIL ", __func__, __LINE__);	\
+		fprintf(stderr, format);				\
+	}								\
+	__ret;								\
+})
+
+static int count_result(int err)
+{
+	if (err)
+		error_cnt++;
+	else
+		pass_cnt++;
+
+	fprintf(stderr, "\n");
+	return err;
+}
+
 #define min(a, b) ((a) < (b) ? (a) : (b))
 #define __printf(a, b)	__attribute__((format(printf, a, b)))
 
@@ -894,17 +918,13 @@ static void *btf_raw_create(const struct btf_header *hdr,
 	void *raw_btf;
 
 	type_sec_size = get_type_sec_size(raw_types);
-	if (type_sec_size < 0) {
-		fprintf(stderr, "Cannot get nr_raw_types\n");
+	if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types"))
 		return NULL;
-	}
 
 	size_needed = sizeof(*hdr) + type_sec_size + str_sec_size;
 	raw_btf = malloc(size_needed);
-	if (!raw_btf) {
-		fprintf(stderr, "Cannot allocate memory for raw_btf\n");
+	if (CHECK(!raw_btf, "Cannot allocate memory for raw_btf"))
 		return NULL;
-	}
 
 	/* Copy header */
 	memcpy(raw_btf, hdr, sizeof(*hdr));
@@ -915,8 +935,7 @@ static void *btf_raw_create(const struct btf_header *hdr,
 	for (i = 0; i < type_sec_size / sizeof(raw_types[0]); i++) {
 		if (raw_types[i] == NAME_TBD) {
 			next_str = get_next_str(next_str, end_str);
-			if (!next_str) {
-				fprintf(stderr, "Error in getting next_str\n");
+			if (CHECK(!next_str, "Error in getting next_str")) {
 				free(raw_btf);
 				return NULL;
 			}
@@ -973,9 +992,8 @@ static int do_test_raw(unsigned int test_num)
 	free(raw_btf);
 
 	err = ((btf_fd == -1) != test->btf_load_err);
-	if (err)
-		fprintf(stderr, "btf_load_err:%d btf_fd:%d\n",
-			test->btf_load_err, btf_fd);
+	CHECK(err, "btf_fd:%d test->btf_load_err:%u",
+	      btf_fd, test->btf_load_err);
 
 	if (err || btf_fd == -1)
 		goto done;
@@ -992,16 +1010,15 @@ static int do_test_raw(unsigned int test_num)
 	map_fd = bpf_create_map_xattr(&create_attr);
 
 	err = ((map_fd == -1) != test->map_create_err);
-	if (err)
-		fprintf(stderr, "map_create_err:%d map_fd:%d\n",
-			test->map_create_err, map_fd);
+	CHECK(err, "map_fd:%d test->map_create_err:%u",
+	      map_fd, test->map_create_err);
 
 done:
 	if (!err)
-		fprintf(stderr, "OK\n");
+		fprintf(stderr, "OK");
 
 	if (*btf_log_buf && (err || args.always_log))
-		fprintf(stderr, "%s\n", btf_log_buf);
+		fprintf(stderr, "\n%s", btf_log_buf);
 
 	if (btf_fd != -1)
 		close(btf_fd);
@@ -1017,10 +1034,10 @@ static int test_raw(void)
 	int err = 0;
 
 	if (args.raw_test_num)
-		return do_test_raw(args.raw_test_num);
+		return count_result(do_test_raw(args.raw_test_num));
 
 	for (i = 1; i <= ARRAY_SIZE(raw_tests); i++)
-		err |= do_test_raw(i);
+		err |= count_result(do_test_raw(i));
 
 	return err;
 }
@@ -1080,8 +1097,7 @@ static int do_test_get_info(unsigned int test_num)
 	*btf_log_buf = '\0';
 
 	user_btf = malloc(raw_btf_size);
-	if (!user_btf) {
-		fprintf(stderr, "Cannot allocate memory for user_btf\n");
+	if (CHECK(!user_btf, "!user_btf")) {
 		err = -1;
 		goto done;
 	}
@@ -1089,9 +1105,7 @@ static int do_test_get_info(unsigned int test_num)
 	btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
 			      btf_log_buf, BTF_LOG_BUF_SIZE,
 			      args.always_log);
-	if (btf_fd == -1) {
-		fprintf(stderr, "bpf_load_btf:%s(%d)\n",
-			strerror(errno), errno);
+	if (CHECK(btf_fd == -1, "errno:%d", errno)) {
 		err = -1;
 		goto done;
 	}
@@ -1103,31 +1117,31 @@ static int do_test_get_info(unsigned int test_num)
 		       raw_btf_size - expected_nbytes);
 
 	err = bpf_obj_get_info_by_fd(btf_fd, user_btf, &user_btf_size);
-	if (err || user_btf_size != raw_btf_size ||
-	    memcmp(raw_btf, user_btf, expected_nbytes)) {
-		fprintf(stderr,
-			"err:%d(errno:%d) raw_btf_size:%u user_btf_size:%u expected_nbytes:%u memcmp:%d\n",
-			err, errno,
-			raw_btf_size, user_btf_size, expected_nbytes,
-			memcmp(raw_btf, user_btf, expected_nbytes));
+	if (CHECK(err || user_btf_size != raw_btf_size ||
+		  memcmp(raw_btf, user_btf, expected_nbytes),
+		  "err:%d(errno:%d) raw_btf_size:%u user_btf_size:%u expected_nbytes:%u memcmp:%d",
+		  err, errno,
+		  raw_btf_size, user_btf_size, expected_nbytes,
+		  memcmp(raw_btf, user_btf, expected_nbytes))) {
 		err = -1;
 		goto done;
 	}
 
 	while (expected_nbytes < raw_btf_size) {
 		fprintf(stderr, "%u...", expected_nbytes);
-		if (user_btf[expected_nbytes++] != 0xff) {
-			fprintf(stderr, "!= 0xff\n");
+		if (CHECK(user_btf[expected_nbytes++] != 0xff,
+			  "user_btf[%u]:%x != 0xff", expected_nbytes - 1,
+			  user_btf[expected_nbytes - 1])) {
 			err = -1;
 			goto done;
 		}
 	}
 
-	fprintf(stderr, "OK\n");
+	fprintf(stderr, "OK");
 
 done:
 	if (*btf_log_buf && (err || args.always_log))
-		fprintf(stderr, "%s\n", btf_log_buf);
+		fprintf(stderr, "\n%s", btf_log_buf);
 
 	free(raw_btf);
 	free(user_btf);
@@ -1144,10 +1158,10 @@ static int test_get_info(void)
 	int err = 0;
 
 	if (args.get_info_test_num)
-		return do_test_get_info(args.get_info_test_num);
+		return count_result(do_test_get_info(args.get_info_test_num));
 
 	for (i = 1; i <= ARRAY_SIZE(get_info_tests); i++)
-		err |= do_test_get_info(i);
+		err |= count_result(do_test_get_info(i));
 
 	return err;
 }
@@ -1175,28 +1189,21 @@ static int file_has_btf_elf(const char *fn)
 	Elf *elf;
 	int ret;
 
-	if (elf_version(EV_CURRENT) == EV_NONE) {
-		fprintf(stderr, "Failed to init libelf\n");
+	if (CHECK(elf_version(EV_CURRENT) == EV_NONE,
+		  "elf_version(EV_CURRENT) == EV_NONE"))
 		return -1;
-	}
 
 	elf_fd = open(fn, O_RDONLY);
-	if (elf_fd == -1) {
-		fprintf(stderr, "Cannot open file %s: %s(%d)\n",
-			fn, strerror(errno), errno);
+	if (CHECK(elf_fd == -1, "open(%s): errno:%d", fn, errno))
 		return -1;
-	}
 
 	elf = elf_begin(elf_fd, ELF_C_READ, NULL);
-	if (!elf) {
-		fprintf(stderr, "Failed to read ELF from %s. %s\n", fn,
-			elf_errmsg(elf_errno()));
+	if (CHECK(!elf, "elf_begin(%s): %s", fn, elf_errmsg(elf_errno()))) {
 		ret = -1;
 		goto done;
 	}
 
-	if (!gelf_getehdr(elf, &ehdr)) {
-		fprintf(stderr, "Failed to get EHDR from %s\n", fn);
+	if (CHECK(!gelf_getehdr(elf, &ehdr), "!gelf_getehdr(%s)", fn)) {
 		ret = -1;
 		goto done;
 	}
@@ -1205,9 +1212,8 @@ static int file_has_btf_elf(const char *fn)
 		const char *sh_name;
 		GElf_Shdr sh;
 
-		if (gelf_getshdr(scn, &sh) != &sh) {
-			fprintf(stderr,
-				"Failed to get section header from %s\n", fn);
+		if (CHECK(gelf_getshdr(scn, &sh) != &sh,
+			  "file:%s gelf_getshdr != &sh", fn)) {
 			ret = -1;
 			goto done;
 		}
@@ -1243,53 +1249,44 @@ static int do_test_file(unsigned int test_num)
 		return err;
 
 	if (err == 0) {
-		fprintf(stderr, "SKIP. No ELF %s found\n", BTF_ELF_SEC);
+		fprintf(stderr, "SKIP. No ELF %s found", BTF_ELF_SEC);
+		skip_cnt++;
 		return 0;
 	}
 
 	obj = bpf_object__open(test->file);
-	if (IS_ERR(obj))
+	if (CHECK(IS_ERR(obj), "obj: %ld", PTR_ERR(obj)))
 		return PTR_ERR(obj);
 
 	err = bpf_object__btf_fd(obj);
-	if (err == -1) {
-		fprintf(stderr, "bpf_object__btf_fd: -1\n");
+	if (CHECK(err == -1, "bpf_object__btf_fd: -1"))
 		goto done;
-	}
 
 	prog = bpf_program__next(NULL, obj);
-	if (!prog) {
-		fprintf(stderr, "Cannot find bpf_prog\n");
+	if (CHECK(!prog, "Cannot find bpf_prog")) {
 		err = -1;
 		goto done;
 	}
 
 	bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
 	err = bpf_object__load(obj);
-	if (err < 0) {
-		fprintf(stderr, "bpf_object__load: %d\n", err);
+	if (CHECK(err < 0, "bpf_object__load: %d", err))
 		goto done;
-	}
 
 	map = bpf_object__find_map_by_name(obj, "btf_map");
-	if (!map) {
-		fprintf(stderr, "btf_map not found\n");
+	if (CHECK(!map, "btf_map not found")) {
 		err = -1;
 		goto done;
 	}
 
 	err = (bpf_map__btf_key_id(map) == 0 || bpf_map__btf_value_id(map) == 0)
 		!= test->btf_kv_notfound;
-	if (err) {
-		fprintf(stderr,
-			"btf_kv_notfound:%u btf_key_id:%u btf_value_id:%u\n",
-			test->btf_kv_notfound,
-			bpf_map__btf_key_id(map),
-			bpf_map__btf_value_id(map));
+	if (CHECK(err, "btf_key_id:%u btf_value_id:%u test->btf_kv_notfound:%u",
+		  bpf_map__btf_key_id(map), bpf_map__btf_value_id(map),
+		  test->btf_kv_notfound))
 		goto done;
-	}
 
-	fprintf(stderr, "OK\n");
+	fprintf(stderr, "OK");
 
 done:
 	bpf_object__close(obj);
@@ -1302,10 +1299,10 @@ static int test_file(void)
 	int err = 0;
 
 	if (args.file_test_num)
-		return do_test_file(args.file_test_num);
+		return count_result(do_test_file(args.file_test_num));
 
 	for (i = 1; i <= ARRAY_SIZE(file_tests); i++)
-		err |= do_test_file(i);
+		err |= count_result(do_test_file(i));
 
 	return err;
 }
@@ -1425,7 +1422,7 @@ static int test_pprint(void)
 	unsigned int key;
 	uint8_t *raw_btf;
 	ssize_t nread;
-	int err;
+	int err, ret;
 
 	fprintf(stderr, "%s......", test->descr);
 	raw_btf = btf_raw_create(&hdr_tmpl, test->raw_types,
@@ -1441,10 +1438,8 @@ static int test_pprint(void)
 			      args.always_log);
 	free(raw_btf);
 
-	if (btf_fd == -1) {
+	if (CHECK(btf_fd == -1, "errno:%d", errno)) {
 		err = -1;
-		fprintf(stderr, "bpf_load_btf: %s(%d)\n",
-			strerror(errno), errno);
 		goto done;
 	}
 
@@ -1458,26 +1453,23 @@ static int test_pprint(void)
 	create_attr.btf_value_id = test->value_id;
 
 	map_fd = bpf_create_map_xattr(&create_attr);
-	if (map_fd == -1) {
+	if (CHECK(map_fd == -1, "errno:%d", errno)) {
 		err = -1;
-		fprintf(stderr, "bpf_creat_map_btf: %s(%d)\n",
-			strerror(errno), errno);
 		goto done;
 	}
 
-	if (snprintf(pin_path, sizeof(pin_path), "%s/%s",
-		     "/sys/fs/bpf", test->map_name) == sizeof(pin_path)) {
+	ret = snprintf(pin_path, sizeof(pin_path), "%s/%s",
+		       "/sys/fs/bpf", test->map_name);
+
+	if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long",
+		  "/sys/fs/bpf", test->map_name)) {
 		err = -1;
-		fprintf(stderr, "pin_path is too long\n");
 		goto done;
 	}
 
 	err = bpf_obj_pin(map_fd, pin_path);
-	if (err) {
-		fprintf(stderr, "Cannot pin to %s. %s(%d).\n", pin_path,
-			strerror(errno), errno);
+	if (CHECK(err, "bpf_obj_pin(%s): errno:%d.", pin_path, errno))
 		goto done;
-	}
 
 	for (key = 0; key < test->max_entries; key++) {
 		set_pprint_mapv(&mapv, key);
@@ -1485,10 +1477,8 @@ static int test_pprint(void)
 	}
 
 	pin_file = fopen(pin_path, "r");
-	if (!pin_file) {
+	if (CHECK(!pin_file, "fopen(%s): errno:%d", pin_path, errno)) {
 		err = -1;
-		fprintf(stderr, "fopen(%s): %s(%d)\n", pin_path,
-			strerror(errno), errno);
 		goto done;
 	}
 
@@ -1497,9 +1487,8 @@ static int test_pprint(void)
 	       *line == '#')
 		;
 
-	if (nread <= 0) {
+	if (CHECK(nread <= 0, "Unexpected EOF")) {
 		err = -1;
-		fprintf(stderr, "Unexpected EOF\n");
 		goto done;
 	}
 
@@ -1518,9 +1507,9 @@ static int test_pprint(void)
 					  mapv.ui8a[4], mapv.ui8a[5], mapv.ui8a[6], mapv.ui8a[7],
 					  pprint_enum_str[mapv.aenum]);
 
-		if (nexpected_line == sizeof(expected_line)) {
+		if (CHECK(nexpected_line == sizeof(expected_line),
+			  "expected_line is too long")) {
 			err = -1;
-			fprintf(stderr, "expected_line is too long\n");
 			goto done;
 		}
 
@@ -1535,15 +1524,15 @@ static int test_pprint(void)
 		nread = getline(&line, &line_len, pin_file);
 	} while (++key < test->max_entries && nread > 0);
 
-	if (key < test->max_entries) {
+	if (CHECK(key < test->max_entries,
+		  "Unexpected EOF. key:%u test->max_entries:%u",
+		  key, test->max_entries)) {
 		err = -1;
-		fprintf(stderr, "Unexpected EOF\n");
 		goto done;
 	}
 
-	if (nread > 0) {
+	if (CHECK(nread > 0, "Unexpected extra pprint output: %s", line)) {
 		err = -1;
-		fprintf(stderr, "Unexpected extra pprint output: %s\n", line);
 		goto done;
 	}
 
@@ -1551,9 +1540,9 @@ static int test_pprint(void)
 
 done:
 	if (!err)
-		fprintf(stderr, "OK\n");
+		fprintf(stderr, "OK");
 	if (*btf_log_buf && (err || args.always_log))
-		fprintf(stderr, "%s\n", btf_log_buf);
+		fprintf(stderr, "\n%s", btf_log_buf);
 	if (btf_fd != -1)
 		close(btf_fd);
 	if (map_fd != -1)
@@ -1634,6 +1623,12 @@ static int parse_args(int argc, char **argv)
 	return 0;
 }
 
+static void print_summary(void)
+{
+	fprintf(stderr, "PASS:%u SKIP:%u FAIL:%u\n",
+		pass_cnt - skip_cnt, skip_cnt, error_cnt);
+}
+
 int main(int argc, char **argv)
 {
 	int err = 0;
@@ -1655,15 +1650,17 @@ int main(int argc, char **argv)
 		err |= test_file();
 
 	if (args.pprint_test)
-		err |= test_pprint();
+		err |= count_result(test_pprint());
 
 	if (args.raw_test || args.get_info_test || args.file_test ||
 	    args.pprint_test)
-		return err;
+		goto done;
 
 	err |= test_raw();
 	err |= test_get_info();
 	err |= test_file();
 
+done:
+	print_summary();
 	return err;
 }
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 6/6] bpf: btf: Tests for BPF_OBJ_GET_INFO_BY_FD and BPF_BTF_GET_FD_BY_ID
From: Martin KaFai Lau @ 2018-05-04 21:49 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180504214955.1058805-1-kafai@fb.com>

This patch adds test for BPF_BTF_GET_FD_BY_ID and the new
btf_id/btf_key_id/btf_value_id in the "struct bpf_map_info".

It also modifies the existing BPF_OBJ_GET_INFO_BY_FD test
to reflect the new "struct bpf_btf_info".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
---
 tools/lib/bpf/bpf.c                    |  10 ++
 tools/lib/bpf/bpf.h                    |   1 +
 tools/testing/selftests/bpf/test_btf.c | 289 +++++++++++++++++++++++++++++++--
 3 files changed, 287 insertions(+), 13 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 76b36cc16e7f..a3a8fb2ac697 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -458,6 +458,16 @@ int bpf_map_get_fd_by_id(__u32 id)
 	return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr));
 }
 
+int bpf_btf_get_fd_by_id(__u32 id)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.btf_id = id;
+
+	return sys_bpf(BPF_BTF_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
 {
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 553b11ad52b3..fb3a146d92ff 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -98,6 +98,7 @@ int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
 int bpf_prog_get_fd_by_id(__u32 id);
 int bpf_map_get_fd_by_id(__u32 id);
+int bpf_btf_get_fd_by_id(__u32 id);
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index b7880a20fad1..c8bceae7ec02 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -1047,9 +1047,13 @@ struct btf_get_info_test {
 	const char *str_sec;
 	__u32 raw_types[MAX_NR_RAW_TYPES];
 	__u32 str_sec_size;
-	int info_size_delta;
+	int btf_size_delta;
+	int (*special_test)(unsigned int test_num);
 };
 
+static int test_big_btf_info(unsigned int test_num);
+static int test_btf_id(unsigned int test_num);
+
 const struct btf_get_info_test get_info_tests[] = {
 {
 	.descr = "== raw_btf_size+1",
@@ -1060,7 +1064,7 @@ const struct btf_get_info_test get_info_tests[] = {
 	},
 	.str_sec = "",
 	.str_sec_size = sizeof(""),
-	.info_size_delta = 1,
+	.btf_size_delta = 1,
 },
 {
 	.descr = "== raw_btf_size-3",
@@ -1071,20 +1075,274 @@ const struct btf_get_info_test get_info_tests[] = {
 	},
 	.str_sec = "",
 	.str_sec_size = sizeof(""),
-	.info_size_delta = -3,
+	.btf_size_delta = -3,
+},
+{
+	.descr = "Large bpf_btf_info",
+	.raw_types = {
+		/* int */				/* [1] */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+		BTF_END_RAW,
+	},
+	.str_sec = "",
+	.str_sec_size = sizeof(""),
+	.special_test = test_big_btf_info,
+},
+{
+	.descr = "BTF ID",
+	.raw_types = {
+		/* int */				/* [1] */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+		/* unsigned int */			/* [2] */
+		BTF_TYPE_INT_ENC(0, 0, 0, 32, 4),
+		BTF_END_RAW,
+	},
+	.str_sec = "",
+	.str_sec_size = sizeof(""),
+	.special_test = test_btf_id,
 },
 };
 
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+	return (__u64)(unsigned long)ptr;
+}
+
+static int test_big_btf_info(unsigned int test_num)
+{
+	const struct btf_get_info_test *test = &get_info_tests[test_num - 1];
+	uint8_t *raw_btf = NULL, *user_btf = NULL;
+	unsigned int raw_btf_size;
+	struct {
+		struct bpf_btf_info info;
+		uint64_t garbage;
+	} info_garbage;
+	struct bpf_btf_info *info;
+	int btf_fd = -1, err;
+	uint32_t info_len;
+
+	raw_btf = btf_raw_create(&hdr_tmpl,
+				 test->raw_types,
+				 test->str_sec,
+				 test->str_sec_size,
+				 &raw_btf_size);
+
+	if (!raw_btf)
+		return -1;
+
+	*btf_log_buf = '\0';
+
+	user_btf = malloc(raw_btf_size);
+	if (CHECK(!user_btf, "!user_btf")) {
+		err = -1;
+		goto done;
+	}
+
+	btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+			      btf_log_buf, BTF_LOG_BUF_SIZE,
+			      args.always_log);
+	if (CHECK(btf_fd == -1, "errno:%d", errno)) {
+		err = -1;
+		goto done;
+	}
+
+	/*
+	 * GET_INFO should error out if the userspace info
+	 * has non zero tailing bytes.
+	 */
+	info = &info_garbage.info;
+	memset(info, 0, sizeof(*info));
+	info_garbage.garbage = 0xdeadbeef;
+	info_len = sizeof(info_garbage);
+	info->btf = ptr_to_u64(user_btf);
+	info->btf_size = raw_btf_size;
+
+	err = bpf_obj_get_info_by_fd(btf_fd, info, &info_len);
+	if (CHECK(!err, "!err")) {
+		err = -1;
+		goto done;
+	}
+
+	/*
+	 * GET_INFO should succeed even info_len is larger than
+	 * the kernel supported as long as tailing bytes are zero.
+	 * The kernel supported info len should also be returned
+	 * to userspace.
+	 */
+	info_garbage.garbage = 0;
+	err = bpf_obj_get_info_by_fd(btf_fd, info, &info_len);
+	if (CHECK(err || info_len != sizeof(*info),
+		  "err:%d errno:%d info_len:%u sizeof(*info):%lu",
+		  err, errno, info_len, sizeof(*info))) {
+		err = -1;
+		goto done;
+	}
+
+	fprintf(stderr, "OK");
+
+done:
+	if (*btf_log_buf && (err || args.always_log))
+		fprintf(stderr, "\n%s", btf_log_buf);
+
+	free(raw_btf);
+	free(user_btf);
+
+	if (btf_fd != -1)
+		close(btf_fd);
+
+	return err;
+}
+
+static int test_btf_id(unsigned int test_num)
+{
+	const struct btf_get_info_test *test = &get_info_tests[test_num - 1];
+	struct bpf_create_map_attr create_attr = {};
+	uint8_t *raw_btf = NULL, *user_btf[2] = {};
+	int btf_fd[2] = {-1, -1}, map_fd = -1;
+	struct bpf_map_info map_info = {};
+	struct bpf_btf_info info[2] = {};
+	unsigned int raw_btf_size;
+	uint32_t info_len;
+	int err, i, ret;
+
+	raw_btf = btf_raw_create(&hdr_tmpl,
+				 test->raw_types,
+				 test->str_sec,
+				 test->str_sec_size,
+				 &raw_btf_size);
+
+	if (!raw_btf)
+		return -1;
+
+	*btf_log_buf = '\0';
+
+	for (i = 0; i < 2; i++) {
+		user_btf[i] = malloc(raw_btf_size);
+		if (CHECK(!user_btf[i], "!user_btf[%d]", i)) {
+			err = -1;
+			goto done;
+		}
+		info[i].btf = ptr_to_u64(user_btf[i]);
+		info[i].btf_size = raw_btf_size;
+	}
+
+	btf_fd[0] = bpf_load_btf(raw_btf, raw_btf_size,
+				 btf_log_buf, BTF_LOG_BUF_SIZE,
+				 args.always_log);
+	if (CHECK(btf_fd[0] == -1, "errno:%d", errno)) {
+		err = -1;
+		goto done;
+	}
+
+	/* Test BPF_OBJ_GET_INFO_BY_ID on btf_id */
+	info_len = sizeof(info[0]);
+	err = bpf_obj_get_info_by_fd(btf_fd[0], &info[0], &info_len);
+	if (CHECK(err, "errno:%d", errno)) {
+		err = -1;
+		goto done;
+	}
+
+	btf_fd[1] = bpf_btf_get_fd_by_id(info[0].id);
+	if (CHECK(btf_fd[1] == -1, "errno:%d", errno)) {
+		err = -1;
+		goto done;
+	}
+
+	ret = 0;
+	err = bpf_obj_get_info_by_fd(btf_fd[1], &info[1], &info_len);
+	if (CHECK(err || info[0].id != info[1].id ||
+		  info[0].btf_size != info[1].btf_size ||
+		  (ret = memcmp(user_btf[0], user_btf[1], info[0].btf_size)),
+		  "err:%d errno:%d id0:%u id1:%u btf_size0:%u btf_size1:%u memcmp:%d",
+		  err, errno, info[0].id, info[1].id,
+		  info[0].btf_size, info[1].btf_size, ret)) {
+		err = -1;
+		goto done;
+	}
+
+	/* Test btf members in struct bpf_map_info */
+	create_attr.name = "test_btf_id";
+	create_attr.map_type = BPF_MAP_TYPE_ARRAY;
+	create_attr.key_size = sizeof(int);
+	create_attr.value_size = sizeof(unsigned int);
+	create_attr.max_entries = 4;
+	create_attr.btf_fd = btf_fd[0];
+	create_attr.btf_key_id = 1;
+	create_attr.btf_value_id = 2;
+
+	map_fd = bpf_create_map_xattr(&create_attr);
+	if (CHECK(map_fd == -1, "errno:%d", errno)) {
+		err = -1;
+		goto done;
+	}
+
+	info_len = sizeof(map_info);
+	err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len);
+	if (CHECK(err || map_info.btf_id != info[0].id ||
+		  map_info.btf_key_id != 1 || map_info.btf_value_id != 2,
+		  "err:%d errno:%d info.id:%u btf_id:%u btf_key_id:%u btf_value_id:%u",
+		  err, errno, info[0].id, map_info.btf_id, map_info.btf_key_id,
+		  map_info.btf_value_id)) {
+		err = -1;
+		goto done;
+	}
+
+	for (i = 0; i < 2; i++) {
+		close(btf_fd[i]);
+		btf_fd[i] = -1;
+	}
+
+	/* Test BTF ID is removed from the kernel */
+	btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
+	if (CHECK(btf_fd[0] == -1, "errno:%d", errno)) {
+		err = -1;
+		goto done;
+	}
+	close(btf_fd[0]);
+	btf_fd[0] = -1;
+
+	/* The map holds the last ref to BTF and its btf_id */
+	close(map_fd);
+	map_fd = -1;
+	btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
+	if (CHECK(btf_fd[0] != -1, "BTF lingers")) {
+		err = -1;
+		goto done;
+	}
+
+	fprintf(stderr, "OK");
+
+done:
+	if (*btf_log_buf && (err || args.always_log))
+		fprintf(stderr, "\n%s", btf_log_buf);
+
+	free(raw_btf);
+	if (map_fd != -1)
+		close(map_fd);
+	for (i = 0; i < 2; i++) {
+		free(user_btf[i]);
+		if (btf_fd[i] != -1)
+			close(btf_fd[i]);
+	}
+
+	return err;
+}
+
 static int do_test_get_info(unsigned int test_num)
 {
 	const struct btf_get_info_test *test = &get_info_tests[test_num - 1];
 	unsigned int raw_btf_size, user_btf_size, expected_nbytes;
 	uint8_t *raw_btf = NULL, *user_btf = NULL;
-	int btf_fd = -1, err;
+	struct bpf_btf_info info = {};
+	int btf_fd = -1, err, ret;
+	uint32_t info_len;
 
-	fprintf(stderr, "BTF GET_INFO_BY_ID test[%u] (%s): ",
+	fprintf(stderr, "BTF GET_INFO test[%u] (%s): ",
 		test_num, test->descr);
 
+	if (test->special_test)
+		return test->special_test(test_num);
+
 	raw_btf = btf_raw_create(&hdr_tmpl,
 				 test->raw_types,
 				 test->str_sec,
@@ -1110,19 +1368,24 @@ static int do_test_get_info(unsigned int test_num)
 		goto done;
 	}
 
-	user_btf_size = (int)raw_btf_size + test->info_size_delta;
+	user_btf_size = (int)raw_btf_size + test->btf_size_delta;
 	expected_nbytes = min(raw_btf_size, user_btf_size);
 	if (raw_btf_size > expected_nbytes)
 		memset(user_btf + expected_nbytes, 0xff,
 		       raw_btf_size - expected_nbytes);
 
-	err = bpf_obj_get_info_by_fd(btf_fd, user_btf, &user_btf_size);
-	if (CHECK(err || user_btf_size != raw_btf_size ||
-		  memcmp(raw_btf, user_btf, expected_nbytes),
-		  "err:%d(errno:%d) raw_btf_size:%u user_btf_size:%u expected_nbytes:%u memcmp:%d",
-		  err, errno,
-		  raw_btf_size, user_btf_size, expected_nbytes,
-		  memcmp(raw_btf, user_btf, expected_nbytes))) {
+	info_len = sizeof(info);
+	info.btf = ptr_to_u64(user_btf);
+	info.btf_size = user_btf_size;
+
+	ret = 0;
+	err = bpf_obj_get_info_by_fd(btf_fd, &info, &info_len);
+	if (CHECK(err || !info.id || info_len != sizeof(info) ||
+		  info.btf_size != raw_btf_size ||
+		  (ret = memcmp(raw_btf, user_btf, expected_nbytes)),
+		  "err:%d errno:%d info.id:%u info_len:%u sizeof(info):%lu raw_btf_size:%u info.btf_size:%u expected_nbytes:%u memcmp:%d",
+		  err, errno, info.id, info_len, sizeof(info),
+		  raw_btf_size, info.btf_size, expected_nbytes, ret)) {
 		err = -1;
 		goto done;
 	}
-- 
2.9.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox