Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] kcmp: Move kcmp.h into uapi
From: Arnd Bergmann @ 2014-10-23  7:49 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shuahkh-JPH+aEBZ4P+UEJcrhfAQsw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, gorcunov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <1414040834-30209-1-git-send-email-mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>

On Thursday 23 October 2014 16:07:12 Michael Ellerman wrote:
> --- a/include/linux/kcmp.h
> +++ b/include/linux/kcmp.h
> @@ -1,17 +1,6 @@
>  #ifndef _LINUX_KCMP_H
>  #define _LINUX_KCMP_H
>  
> -/* Comparison type */
> -enum kcmp_type {
> -       KCMP_FILE,
> -       KCMP_VM,
> -       KCMP_FILES,
> -       KCMP_FS,
> -       KCMP_SIGHAND,
> -       KCMP_IO,
> -       KCMP_SYSVSEM,
> -
> -       KCMP_TYPES,
> -};
> +#include <uapi/linux/kcmp.h>
>  
>  #endif /* _LINUX_KCMP_H */
> 

If the file is empty except for the uapi include, I think it's better to
delete it completely. The include path logic should ensure we pick the
other one up.

	Arnd

^ permalink raw reply

* Re: [PATCH RFC v3 01/16] virtio: memory access APIs
From: Cornelia Huck @ 2014-10-23  7:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bjarke Istrup Pedersen, Anup Patel, Greg Kroah-Hartman,
	linux-kernel, virtualization, Geert Uytterhoeven, Sakari Ailus,
	linux-api, David S. Miller, Piotr Król, Alexei Starovoitov
In-Reply-To: <1414003404-505-2-git-send-email-mst@redhat.com>

On Wed, 22 Oct 2014 21:44:08 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> virtio 1.0 makes all memory structures LE, so
> we need APIs to conditionally do a byteswap on BE
> architectures.
> 
> To make it easier to check code statically,
> add virtio specific types for multi-byte integers
> in memory.
> 
> Add low level wrappers that do a byteswap conditionally, these will be
> useful e.g. for vhost.  Add high level wrappers that will (in the
> future) query device endian-ness and act accordingly.
> 
> At the moment, stub them out and assume native endian-ness everywhere.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/linux/virtio_byteorder.h | 29 ++++++++++++++++++++++++
>  include/linux/virtio_config.h    | 16 +++++++++++++
>  include/uapi/linux/virtio_ring.h | 49 ++++++++++++++++++++--------------------
>  include/uapi/linux/Kbuild        |  1 +
>  4 files changed, 71 insertions(+), 24 deletions(-)
>  create mode 100644 include/linux/virtio_byteorder.h
> 
> diff --git a/include/linux/virtio_byteorder.h b/include/linux/virtio_byteorder.h
> new file mode 100644
> index 0000000..7afdd8a
> --- /dev/null
> +++ b/include/linux/virtio_byteorder.h
> @@ -0,0 +1,29 @@
> +#ifndef _LINUX_VIRTIO_BYTEORDER_H
> +#define _LINUX_VIRTIO_BYTEORDER_H
> +#include <linux/types.h>
> +#include <uapi/linux/virtio_types.h>
> +
> +/* Memory accessors for handling virtio in modern little endian and in
> + * compatibility big endian format. */

s/big/native/

> +
> +#define __DEFINE_VIRTIO_XX_TO_CPU(bits) \
> +static inline u##bits __virtio##bits##_to_cpu(bool little_endian, __virtio##bits val) \
> +{ \
> +	if (little_endian) \
> +		return le##bits##_to_cpu((__force __le##bits)val); \
> +	else \
> +		return (__force u##bits)val; \
> +} \
> +static inline __virtio##bits __cpu_to_virtio##bits(bool little_endian, u##bits val) \
> +{ \
> +	if (little_endian) \
> +		return (__force __virtio##bits)cpu_to_le##bits(val); \
> +	else \
> +		return val; \
> +}
> +
> +__DEFINE_VIRTIO_XX_TO_CPU(16)
> +__DEFINE_VIRTIO_XX_TO_CPU(32)
> +__DEFINE_VIRTIO_XX_TO_CPU(64)

...although I'm still not too happy with macro-generated helpers.

> +
> +#endif /* _LINUX_VIRTIO_BYTEORDER */

> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index a99f9b7..6c00632 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h

> @@ -61,32 +62,32 @@
>  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>  struct vring_desc {
>  	/* Address (guest-physical). */
> -	__u64 addr;
> +	__virtio64 addr;
>  	/* Length. */
> -	__u32 len;
> +	__virtio32 len;
>  	/* The flags as indicated above. */
> -	__u16 flags;
> +	__virtio16 flags;
>  	/* We chain unused descriptors via this, too */
> -	__u16 next;
> +	__virtio16 next;
>  };

I think all of these __virtio types need an explanation somewhere as to
what they mean, e.g.:

/*
 * __virtio{16,32,64} have the following meaning:
 * - __u{16,32,64} for virtio devices in legacy mode,
 *   accessed in native endian
 * - __le{16,32,64} for standard-compliant virtio devices
 */

^ permalink raw reply

* Re: [PATCH 1/3] kcmp: Move kcmp.h into uapi
From: Cyrill Gorcunov @ 2014-10-23  8:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Michael Ellerman, linux-kernel, shuahkh, linux-api, Andrew Morton
In-Reply-To: <2940943.qiYEMaLEXx@wuerfel>

On Thu, Oct 23, 2014 at 09:49:14AM +0200, Arnd Bergmann wrote:
> On Thursday 23 October 2014 16:07:12 Michael Ellerman wrote:
> > --- a/include/linux/kcmp.h
> > +++ b/include/linux/kcmp.h
> > @@ -1,17 +1,6 @@
> >  #ifndef _LINUX_KCMP_H
> >  #define _LINUX_KCMP_H
> >  
> > -/* Comparison type */
> > -enum kcmp_type {
> > -       KCMP_FILE,
> > -       KCMP_VM,
> > -       KCMP_FILES,
> > -       KCMP_FS,
> > -       KCMP_SIGHAND,
> > -       KCMP_IO,
> > -       KCMP_SYSVSEM,
> > -
> > -       KCMP_TYPES,
> > -};
> > +#include <uapi/linux/kcmp.h>
> >  
> >  #endif /* _LINUX_KCMP_H */
> > 
> 
> If the file is empty except for the uapi include, I think it's better to
> delete it completely. The include path logic should ensure we pick the
> other one up.

Good point, somehow managed to miss this.

^ permalink raw reply

* Re: [PATCH RFC v3 01/16] virtio: memory access APIs
From: Michael S. Tsirkin @ 2014-10-23  9:15 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Bjarke Istrup Pedersen,
	Anup Patel, Greg Kroah-Hartman,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Geert Uytterhoeven, Sakari Ailus,
	linux-api-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	Piotr Król, Alexei Starovoitov
In-Reply-To: <20141023095405.6bdd5a1a.cornelia.huck-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>

On Thu, Oct 23, 2014 at 09:54:05AM +0200, Cornelia Huck wrote:
> On Wed, 22 Oct 2014 21:44:08 +0300
> "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > virtio 1.0 makes all memory structures LE, so
> > we need APIs to conditionally do a byteswap on BE
> > architectures.
> > 
> > To make it easier to check code statically,
> > add virtio specific types for multi-byte integers
> > in memory.
> > 
> > Add low level wrappers that do a byteswap conditionally, these will be
> > useful e.g. for vhost.  Add high level wrappers that will (in the
> > future) query device endian-ness and act accordingly.
> > 
> > At the moment, stub them out and assume native endian-ness everywhere.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  include/linux/virtio_byteorder.h | 29 ++++++++++++++++++++++++
> >  include/linux/virtio_config.h    | 16 +++++++++++++
> >  include/uapi/linux/virtio_ring.h | 49 ++++++++++++++++++++--------------------
> >  include/uapi/linux/Kbuild        |  1 +
> >  4 files changed, 71 insertions(+), 24 deletions(-)
> >  create mode 100644 include/linux/virtio_byteorder.h
> > 
> > diff --git a/include/linux/virtio_byteorder.h b/include/linux/virtio_byteorder.h
> > new file mode 100644
> > index 0000000..7afdd8a
> > --- /dev/null
> > +++ b/include/linux/virtio_byteorder.h
> > @@ -0,0 +1,29 @@
> > +#ifndef _LINUX_VIRTIO_BYTEORDER_H
> > +#define _LINUX_VIRTIO_BYTEORDER_H
> > +#include <linux/types.h>
> > +#include <uapi/linux/virtio_types.h>
> > +
> > +/* Memory accessors for handling virtio in modern little endian and in
> > + * compatibility big endian format. */
> 
> s/big/native/

Thanks.

> > +
> > +#define __DEFINE_VIRTIO_XX_TO_CPU(bits) \
> > +static inline u##bits __virtio##bits##_to_cpu(bool little_endian, __virtio##bits val) \
> > +{ \
> > +	if (little_endian) \
> > +		return le##bits##_to_cpu((__force __le##bits)val); \
> > +	else \
> > +		return (__force u##bits)val; \
> > +} \
> > +static inline __virtio##bits __cpu_to_virtio##bits(bool little_endian, u##bits val) \
> > +{ \
> > +	if (little_endian) \
> > +		return (__force __virtio##bits)cpu_to_le##bits(val); \
> > +	else \
> > +		return val; \
> > +}
> > +
> > +__DEFINE_VIRTIO_XX_TO_CPU(16)
> > +__DEFINE_VIRTIO_XX_TO_CPU(32)
> > +__DEFINE_VIRTIO_XX_TO_CPU(64)
> 
> ...although I'm still not too happy with macro-generated helpers.

I'm fine with open-coding them.

> > +
> > +#endif /* _LINUX_VIRTIO_BYTEORDER */
> 
> > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> > index a99f9b7..6c00632 100644
> > --- a/include/uapi/linux/virtio_ring.h
> > +++ b/include/uapi/linux/virtio_ring.h
> 
> > @@ -61,32 +62,32 @@
> >  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
> >  struct vring_desc {
> >  	/* Address (guest-physical). */
> > -	__u64 addr;
> > +	__virtio64 addr;
> >  	/* Length. */
> > -	__u32 len;
> > +	__virtio32 len;
> >  	/* The flags as indicated above. */
> > -	__u16 flags;
> > +	__virtio16 flags;
> >  	/* We chain unused descriptors via this, too */
> > -	__u16 next;
> > +	__virtio16 next;
> >  };
> 
> I think all of these __virtio types need an explanation somewhere as to
> what they mean, e.g.:
> 
> /*
>  * __virtio{16,32,64} have the following meaning:
>  * - __u{16,32,64} for virtio devices in legacy mode,
>  *   accessed in native endian
>  * - __le{16,32,64} for standard-compliant virtio devices
>  */

Will do.

-- 
MST

^ permalink raw reply

* Re: [PATCH RFC v3 05/16] virtio: add virtio 1.0 feature bit
From: Cornelia Huck @ 2014-10-23 11:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-api, linux-kernel, virtualization
In-Reply-To: <1414003404-505-6-git-send-email-mst@redhat.com>

On Wed, 22 Oct 2014 21:44:31 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Based on original patches by Rusty Russell, Thomas Huth
> and Cornelia Huck.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/uapi/linux/virtio_config.h | 7 +++++--
>  drivers/virtio/virtio_ring.c       | 2 ++
>  2 files changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply

* Re: [PATCH RFC v3 09/16] virtio: set FEATURES_OK
From: Cornelia Huck @ 2014-10-23 12:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <1414003404-505-10-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, 22 Oct 2014 21:44:44 +0300
"Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> set FEATURES_OK as per virtio 1.0 spec
> 
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  include/uapi/linux/virtio_config.h |  2 ++
>  drivers/virtio/virtio.c            | 29 ++++++++++++++++++++++-------
>  2 files changed, 24 insertions(+), 7 deletions(-)
> 

>  	dev->config->finalize_features(dev);
> 
> +	if (virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> +		add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> +		status = dev->config->get_status(dev);
> +		if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> +			printk(KERN_ERR "virtio: device refuses features: %x\n",
> +			       status);
> +			err = -ENODEV;
> +			goto err;
> +		}
> +	}
> +

Ugh, I just realize that virtio-ccw has a problem with that mechanism :(

Up to now, the driver only propagated status to the device: For
virtio-ccw, this was easily implemented via a ccw that transmitted
"status" to the device. However, the "read back status" part now
actually requires that the driver can get "status" from the device, or
has a comparable way to find out that the device won't accept the
status it tried to write.

I can think of two solutions:

(1) Introduce a new ccw that actually reads the device status.
(2) Make the WRITE_STATUS ccw fail (with a unit check) if the driver
    sets FEATURES_OK after it tried to set features the device won't
    accept.

(1) is probably more generic, while (2) is more straightforward to
implement.

Good thing we actually try to finally implement this, I did not notice
this problem during the review :(

^ permalink raw reply

* Re: [PATCH RFC v3 09/16] virtio: set FEATURES_OK
From: Michael S. Tsirkin @ 2014-10-23 12:51 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: linux-api, linux-kernel, virtualization
In-Reply-To: <20141023142808.377551d8.cornelia.huck@de.ibm.com>

On Thu, Oct 23, 2014 at 02:28:08PM +0200, Cornelia Huck wrote:
> On Wed, 22 Oct 2014 21:44:44 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > set FEATURES_OK as per virtio 1.0 spec
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  include/uapi/linux/virtio_config.h |  2 ++
> >  drivers/virtio/virtio.c            | 29 ++++++++++++++++++++++-------
> >  2 files changed, 24 insertions(+), 7 deletions(-)
> > 
> 
> >  	dev->config->finalize_features(dev);
> > 
> > +	if (virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> > +		add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> > +		status = dev->config->get_status(dev);
> > +		if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> > +			printk(KERN_ERR "virtio: device refuses features: %x\n",
> > +			       status);
> > +			err = -ENODEV;
> > +			goto err;
> > +		}
> > +	}
> > +
> 
> Ugh, I just realize that virtio-ccw has a problem with that mechanism :(
> 
> Up to now, the driver only propagated status to the device: For
> virtio-ccw, this was easily implemented via a ccw that transmitted
> "status" to the device. However, the "read back status" part now
> actually requires that the driver can get "status" from the device, or
> has a comparable way to find out that the device won't accept the
> status it tried to write.

Ugh, it actually caches the status in the transport :(


> I can think of two solutions:
> 
> (1) Introduce a new ccw that actually reads the device status.
> (2) Make the WRITE_STATUS ccw fail (with a unit check) if the driver
>     sets FEATURES_OK after it tried to set features the device won't
>     accept.
> 
> (1) is probably more generic, while (2) is more straightforward to
> implement.
> 
> Good thing we actually try to finally implement this,

> I did not notice
> this problem during the review :(

Well, it's a nuisance, but the spec is out.
It seems to me a new command would be a substantive change so we can't
do this in errata.

Option (2) would require two statements for drivers and devices,
but since it's clearly the case for correct drivers/devices
that command does not fail, it follows that this
is not a substantive change so it can be fixed
in an errata.

So the new command would have to be optional, please open
two issues in the TC: one documenting that driver must check
WRITE_STATUS and device can fail WRITE_STATUS, and another
for adding READ_STATUS (which will have to wait until
the next CS).


-- 
MST

^ permalink raw reply

* [PATCH V3] kernel, add bug_on_warn
From: Prarit Bhargava @ 2014-10-23 12:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Jonathan Corbet, Andrew Morton, Rusty Russell,
	H. Peter Anvin, Andi Kleen, Masami Hiramatsu, Fabian Frederick,
	vgoyal, isimatu.yasuaki, linux-doc, kexec, linux-api

There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to the
user.

A much easier method would be a switch to change the WARN() over to a
BUG().  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a bug_on_warn kernel parameter, which calls BUG() in the
warn_slowpath_common() path.  The function will still print out the
location of the warning.

An example of the bug_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s location.
After that the new BUG() call is displayed.

 WARNING: CPU: 27 PID: 3204 at
/home/rhel7/redhat/debug/dummy-module/dummy-module.c:25 init_dummy+0x28/0x30
[dummy_module]()
 bug_on_warn set, calling BUG()...
 ------------[ cut here ]------------
 kernel BUG at kernel/panic.c:434!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: dummy_module(OE+) sg nfsv3 rpcsec_gss_krb5 nfsv4
dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel igb iTCO_wdt aesni_intel iTCO_vendor_support lrw gf128mul
sb_edac ptp edac_core glue_helper lpc_ich ioatdma pcspkr ablk_helper pps_core
i2c_i801 mfd_core cryptd dca shpchp ipmi_si wmi ipmi_msghandler acpi_cpufreq
nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod cdrom sd_mod
mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper isci ttm
drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror
dm_region_hash dm_log dm_mod
 CPU: 27 PID: 3204 Comm: insmod Tainted: G           OE  3.17.0+ #19
 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS
RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 task: ffff880034e75160 ti: ffff8807fc5ac000 task.ti: ffff8807fc5ac000
 RIP: 0010:[<ffffffff81076b81>]  [<ffffffff81076b81>] warn_slowpath_common+0xc1/0xd0
 RSP: 0018:ffff8807fc5afc68  EFLAGS: 00010246
 RAX: 0000000000000021 RBX: ffff8807fc5afcb0 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff88081efee5f8 RDI: ffff88081efee5f8
 RBP: ffff8807fc5afc98 R08: 0000000000000096 R09: 0000000000000000
 R10: 0000000000000711 R11: ffff8807fc5af93e R12: ffffffffa0424070
 R13: 0000000000000019 R14: ffffffffa0423068 R15: 0000000000000009
 FS:  00007f2d4b034740(0000) GS:ffff88081efe0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f2d4a99f3c0 CR3: 00000007fd88b000 CR4: 00000000001407e0
 Stack:
  ffff8807fc5afcb8 ffffffff8199f020 ffff88080e396160 0000000000000000
  ffffffffa0423040 ffffffffa0425000 ffff8807fc5afd08 ffffffff81076be5
  0000000000000008 ffffffffa0424053 ffff880700000018 ffff8807fc5afd18
 Call Trace:
  [<ffffffffa0423040>] ? dummy_greetings+0x40/0x40 [dummy_module]
  [<ffffffff81076be5>] warn_slowpath_fmt+0x55/0x70
  [<ffffffffa0423068>] init_dummy+0x28/0x30 [dummy_module]
  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
  [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
  [<ffffffff810f8889>] load_module+0x16a9/0x1b30
  [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
  [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
  [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
  [<ffffffff8166ce29>] system_call_fastpath+0x12/0x17
 Code: c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c7 20 42 8a 81 31 c0 e8 fc
80 5e 00 eb 80 48 c7 c7 78 42 8a 81 31 c0 e8 ec 80 5e 00 <0f> 0b 66 66 66 66 2e
0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
 RIP  [<ffffffff81076b81>] warn_slowpath_common+0xc1/0xd0
  RSP <ffff8807fc5afc68>
 ---[ end trace 428218934a12088b ]---

Successfully tested by me.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: vgoyal@redhat.com
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: linux-doc@vger.kernel.org
Cc: kexec@lists.infradead.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>

[v2]: add /proc/sys/kernel/bug_on_warn, additional documentation, modify
      !slowpath cases
[v3]: use proc_dointvec_minmax() in sysctl handler
---
 Documentation/kdump/kdump.txt       |    7 +++++++
 Documentation/kernel-parameters.txt |    3 +++
 Documentation/sysctl/kernel.txt     |   12 ++++++++++++
 include/asm-generic/bug.h           |   12 ++++++++++--
 include/linux/kernel.h              |    1 +
 include/uapi/linux/sysctl.h         |    1 +
 kernel/panic.c                      |   21 ++++++++++++++++++++-
 kernel/sysctl.c                     |    9 +++++++++
 kernel/sysctl_binary.c              |    1 +
 9 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 6c0b9f2..a04ed72 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -471,6 +471,13 @@ format. Crash is available on Dave Anderson's site at the following URL:
 
    http://people.redhat.com/~anderson/
 
+Trigger Kdump on WARN()
+=======================
+
+The kernel parameter, bug_on_warn, calls BUG() in all WARN() paths.  This
+will cause a kdump to occur at the BUG() call.  In cases where a user
+wants to specify this during runtime, /proc/sys/kernel/bug_on_warn can be
+set to 1 to achieve the same behaviour.
 
 Contact
 =======
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 988160a..3890a3a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -553,6 +553,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	bttv.pll=	See Documentation/video4linux/bttv/Insmod-options
 	bttv.tuner=
 
+	bug_on_warn	BUG() instead of WARN().  Useful to cause kdump
+			on a WARN().
+
 	bulk_remove=off	[PPC]  This parameter disables the use of the pSeries
 			firmware feature for flushing multiple hpte entries
 			at a time.
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 57baff5..dcadcdc 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -23,6 +23,7 @@ show up in /proc/sys/kernel:
 - auto_msgmni
 - bootloader_type	     [ X86 only ]
 - bootloader_version	     [ X86 only ]
+- bug_on_warn
 - callhome		     [ S390 only ]
 - cap_last_cap
 - core_pattern
@@ -152,6 +153,17 @@ Documentation/x86/boot.txt for additional information.
 
 ==============================================================
 
+bug_on_warn:
+
+Calls BUG() in the WARN() path when set to 1.  This is useful to avoid
+a kernel rebuild when attempting to kdump at the location of a WARN().
+
+0: only WARN(), default behaviour.
+
+1: call BUG() after printing out WARN() location.
+
+==============================================================
+
 callhome:
 
 Controls the kernel's callhome behavior in case of a kernel panic.
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 630dd23..4d0c763 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -75,10 +75,18 @@ extern void warn_slowpath_null(const char *file, const int line);
 #define __WARN_printf_taint(taint, arg...)				\
 	warn_slowpath_fmt_taint(__FILE__, __LINE__, taint, arg)
 #else
-#define __WARN()		__WARN_TAINT(TAINT_WARN)
+#define check_bug_on_warn()						\
+	do {								\
+		if (bug_on_warn)					\
+			BUG();						\
+	} while (0)
+
+#define __WARN()							\
+	do { __WARN_TAINT(TAINT_WARN); check_bug_on_warn(); } while (0)
+
 #define __WARN_printf(arg...)	do { printk(arg); __WARN(); } while (0)
 #define __WARN_printf_taint(taint, arg...)				\
-	do { printk(arg); __WARN_TAINT(taint); } while (0)
+	do { printk(arg); __WARN_TAINT(taint); check_bug_on_warn(); } while (0)
 #endif
 
 #ifndef WARN_ON
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 40728cf..4094a60 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -422,6 +422,7 @@ extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int sysctl_panic_on_stackoverflow;
+extern int bug_on_warn;
 /*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 43aaba1..2ba0a58 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -153,6 +153,7 @@ enum
 	KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
 	KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
 	KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
+	KERN_BUG_ON_WARN=77, /* int: call BUG() in WARN() functions */
 };
 
 
diff --git a/kernel/panic.c b/kernel/panic.c
index d09dc5c..a6d2e2f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -33,6 +33,7 @@ static int pause_on_oops;
 static int pause_on_oops_flag;
 static DEFINE_SPINLOCK(pause_on_oops_lock);
 static bool crash_kexec_post_notifiers;
+int bug_on_warn;
 
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
@@ -420,13 +421,24 @@ static void warn_slowpath_common(const char *file, int line, void *caller,
 {
 	disable_trace_on_warning();
 
-	pr_warn("------------[ cut here ]------------\n");
+	if (!bug_on_warn)
+		pr_warn("------------[ cut here ]------------\n");
 	pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS()\n",
 		raw_smp_processor_id(), current->pid, file, line, caller);
 
 	if (args)
 		vprintk(args->fmt, args->args);
 
+	if (bug_on_warn) {
+		pr_warn("bug_on_warn set, calling BUG()...\n");
+		/*
+		 * A flood of WARN()s may occur.  Prevent further WARN()s
+		 * from panicking the system.
+		 */
+		bug_on_warn = 0;
+		BUG();
+	}
+
 	print_modules();
 	dump_stack();
 	print_oops_end_marker();
@@ -501,3 +513,10 @@ static int __init oops_setup(char *s)
 	return 0;
 }
 early_param("oops", oops_setup);
+
+static int __init bug_on_warn_setup(char *s)
+{
+	bug_on_warn = 1;
+	return 0;
+}
+early_param("bug_on_warn", bug_on_warn_setup);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4aada6d..818cd31 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1103,6 +1103,15 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 #endif
+	{
+		.procname	= "bug_on_warn",
+		.data		= &bug_on_warn,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 	{ }
 };
 
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 9a4f750..28376bf 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = {
 	{ CTL_INT,	KERN_COMPAT_LOG,		"compat-log" },
 	{ CTL_INT,	KERN_MAX_LOCK_DEPTH,		"max_lock_depth" },
 	{ CTL_INT,	KERN_PANIC_ON_NMI,		"panic_on_unrecovered_nmi" },
+	{ CTL_INT,	KERN_BUG_ON_WARN,		"bug_on_warn" },
 	{}
 };
 
-- 
1.7.9.3

^ permalink raw reply related

* Re: [PATCH 3/3] selftests/kcmp: Always try to build the test
From: Christopher Covington @ 2014-10-23 13:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	shuahkh-JPH+aEBZ4P+UEJcrhfAQsw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, gorcunov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <1414040834-30209-3-git-send-email-mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>

Hi Michael,

On 10/23/2014 01:07 AM, Michael Ellerman wrote:
> Don't prevent the test building on non-x86. Just try and build it and
> let the chips fall where they may.

As a user of kcmp via CRIU on arm and arm64, thanks!

> diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
> index 4f00c0524501..cda9cc4004c9 100644
> --- a/tools/testing/selftests/kcmp/Makefile
> +++ b/tools/testing/selftests/kcmp/Makefile
> @@ -1,21 +1,7 @@
> -uname_M := $(shell uname -m 2>/dev/null || echo not)
> -ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
> -ifeq ($(ARCH),i386)
> -        ARCH := x86
> -	CFLAGS := -DCONFIG_X86_32 -D__i386__
> -endif
> -ifeq ($(ARCH),x86_64)
> -	ARCH := x86
> -	CFLAGS := -DCONFIG_X86_64 -D__x86_64__
> -endif
>  CFLAGS += -I../../../../usr/include/
>  
>  all:
> -ifeq ($(ARCH),x86)
>  	gcc $(CFLAGS) kcmp_test.c -o kcmp_test

Not that this needs to be addressed in this patch, but this looks broken for
cross compilation. It looks like some of the other selftests use:

CC = $(CROSS_COMPILE)gcc

But perhaps this should be set (and perhaps with ':=') once at the top level.

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply

* Re: [PATCH RFC v3 09/16] virtio: set FEATURES_OK
From: Cornelia Huck @ 2014-10-23 13:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-api, linux-kernel, virtualization
In-Reply-To: <20141023125139.GA10033@redhat.com>

On Thu, 23 Oct 2014 15:51:39 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Oct 23, 2014 at 02:28:08PM +0200, Cornelia Huck wrote:
> > On Wed, 22 Oct 2014 21:44:44 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > set FEATURES_OK as per virtio 1.0 spec
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  include/uapi/linux/virtio_config.h |  2 ++
> > >  drivers/virtio/virtio.c            | 29 ++++++++++++++++++++++-------
> > >  2 files changed, 24 insertions(+), 7 deletions(-)
> > > 
> > 
> > >  	dev->config->finalize_features(dev);
> > > 
> > > +	if (virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> > > +		add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> > > +		status = dev->config->get_status(dev);
> > > +		if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> > > +			printk(KERN_ERR "virtio: device refuses features: %x\n",
> > > +			       status);
> > > +			err = -ENODEV;
> > > +			goto err;
> > > +		}
> > > +	}
> > > +
> > 
> > Ugh, I just realize that virtio-ccw has a problem with that mechanism :(
> > 
> > Up to now, the driver only propagated status to the device: For
> > virtio-ccw, this was easily implemented via a ccw that transmitted
> > "status" to the device. However, the "read back status" part now
> > actually requires that the driver can get "status" from the device, or
> > has a comparable way to find out that the device won't accept the
> > status it tried to write.
> 
> Ugh, it actually caches the status in the transport :(

Well, it worked as long as this was unidirectional...

> 
> 
> > I can think of two solutions:
> > 
> > (1) Introduce a new ccw that actually reads the device status.
> > (2) Make the WRITE_STATUS ccw fail (with a unit check) if the driver
> >     sets FEATURES_OK after it tried to set features the device won't
> >     accept.
> > 
> > (1) is probably more generic, while (2) is more straightforward to
> > implement.
> > 
> > Good thing we actually try to finally implement this,
> 
> > I did not notice
> > this problem during the review :(
> 
> Well, it's a nuisance, but the spec is out.
> It seems to me a new command would be a substantive change so we can't
> do this in errata.
> 
> Option (2) would require two statements for drivers and devices,
> but since it's clearly the case for correct drivers/devices
> that command does not fail, it follows that this
> is not a substantive change so it can be fixed
> in an errata.

It only adds a new failure case, so it's not really magic, agreed.

> 
> So the new command would have to be optional, 

I think a new command should be tied to a new virtio-ccw revision.

> please open
> two issues in the TC: one documenting that driver must check
> WRITE_STATUS and device can fail WRITE_STATUS, and another
> for adding READ_STATUS (which will have to wait until
> the next CS).

I think I need to contemplate that a bit more. The problem with a new
READ_STATUS is that it is, by nature, an asynchronous command (as all
ccw commands are - the Linux guest driver uses a simple wrapper that
makes it appear synchronous). This means, for example, that every
status update now involves two channel programs instead of one, and
that reading e.g. the status sysfs attribute in Linux now triggers
channel I/O, which means several exits (at least ssch, interrupt
injection, tsch; probably more depending on what the guest does)...

^ permalink raw reply

* Re: [PATCH 3/3] selftests/kcmp: Always try to build the test
From: Shuah Khan @ 2014-10-23 13:52 UTC (permalink / raw)
  To: Christopher Covington, Michael Ellerman
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	gorcunov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <5448FD72.9040301-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

On 10/23/2014 07:06 AM, Christopher Covington wrote:
> Hi Michael,
> 
> On 10/23/2014 01:07 AM, Michael Ellerman wrote:
>> Don't prevent the test building on non-x86. Just try and build it and
>> let the chips fall where they may.
> 
> As a user of kcmp via CRIU on arm and arm64, thanks!
> 
>> diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
>> index 4f00c0524501..cda9cc4004c9 100644
>> --- a/tools/testing/selftests/kcmp/Makefile
>> +++ b/tools/testing/selftests/kcmp/Makefile
>> @@ -1,21 +1,7 @@
>> -uname_M := $(shell uname -m 2>/dev/null || echo not)
>> -ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
>> -ifeq ($(ARCH),i386)
>> -        ARCH := x86
>> -	CFLAGS := -DCONFIG_X86_32 -D__i386__
>> -endif
>> -ifeq ($(ARCH),x86_64)
>> -	ARCH := x86
>> -	CFLAGS := -DCONFIG_X86_64 -D__x86_64__
>> -endif
>>  CFLAGS += -I../../../../usr/include/
>>  
>>  all:
>> -ifeq ($(ARCH),x86)
>>  	gcc $(CFLAGS) kcmp_test.c -o kcmp_test
> 
> Not that this needs to be addressed in this patch, but this looks broken for
> cross compilation. It looks like some of the other selftests use:
> 
> CC = $(CROSS_COMPILE)gcc
> 

It makes sense to fix the cross-compile problem now, since
this patch is extending the support to other archs.

thanks,
-- Shuah


-- 
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org | (970) 217-8978

^ permalink raw reply

* Re: [PATCH RFC v3 09/16] virtio: set FEATURES_OK
From: Michael S. Tsirkin @ 2014-10-23 14:03 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: linux-api, linux-kernel, virtualization
In-Reply-To: <20141023153006.0cb1e7c3.cornelia.huck@de.ibm.com>

On Thu, Oct 23, 2014 at 03:30:06PM +0200, Cornelia Huck wrote:
> On Thu, 23 Oct 2014 15:51:39 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Oct 23, 2014 at 02:28:08PM +0200, Cornelia Huck wrote:
> > > On Wed, 22 Oct 2014 21:44:44 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > set FEATURES_OK as per virtio 1.0 spec
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  include/uapi/linux/virtio_config.h |  2 ++
> > > >  drivers/virtio/virtio.c            | 29 ++++++++++++++++++++++-------
> > > >  2 files changed, 24 insertions(+), 7 deletions(-)
> > > > 
> > > 
> > > >  	dev->config->finalize_features(dev);
> > > > 
> > > > +	if (virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> > > > +		add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> > > > +		status = dev->config->get_status(dev);
> > > > +		if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> > > > +			printk(KERN_ERR "virtio: device refuses features: %x\n",
> > > > +			       status);
> > > > +			err = -ENODEV;
> > > > +			goto err;
> > > > +		}
> > > > +	}
> > > > +
> > > 
> > > Ugh, I just realize that virtio-ccw has a problem with that mechanism :(
> > > 
> > > Up to now, the driver only propagated status to the device: For
> > > virtio-ccw, this was easily implemented via a ccw that transmitted
> > > "status" to the device. However, the "read back status" part now
> > > actually requires that the driver can get "status" from the device, or
> > > has a comparable way to find out that the device won't accept the
> > > status it tried to write.
> > 
> > Ugh, it actually caches the status in the transport :(
> 
> Well, it worked as long as this was unidirectional...
> 
> > 
> > 
> > > I can think of two solutions:
> > > 
> > > (1) Introduce a new ccw that actually reads the device status.
> > > (2) Make the WRITE_STATUS ccw fail (with a unit check) if the driver
> > >     sets FEATURES_OK after it tried to set features the device won't
> > >     accept.
> > > 
> > > (1) is probably more generic, while (2) is more straightforward to
> > > implement.
> > > 
> > > Good thing we actually try to finally implement this,
> > 
> > > I did not notice
> > > this problem during the review :(
> > 
> > Well, it's a nuisance, but the spec is out.
> > It seems to me a new command would be a substantive change so we can't
> > do this in errata.
> > 
> > Option (2) would require two statements for drivers and devices,
> > but since it's clearly the case for correct drivers/devices
> > that command does not fail, it follows that this
> > is not a substantive change so it can be fixed
> > in an errata.
> 
> It only adds a new failure case, so it's not really magic, agreed.
> 
> > 
> > So the new command would have to be optional, 
> 
> I think a new command should be tied to a new virtio-ccw revision.

Or a feature bit.

> > please open
> > two issues in the TC: one documenting that driver must check
> > WRITE_STATUS and device can fail WRITE_STATUS, and another
> > for adding READ_STATUS (which will have to wait until
> > the next CS).
> 
> I think I need to contemplate that a bit more. The problem with a new
> READ_STATUS is that it is, by nature, an asynchronous command (as all
> ccw commands are - the Linux guest driver uses a simple wrapper that
> makes it appear synchronous). This means, for example, that every
> status update now involves two channel programs instead of one, and
> that reading e.g. the status sysfs attribute in Linux now triggers
> channel I/O, which means several exits (at least ssch, interrupt
> injection, tsch; probably more depending on what the guest does)...

Well it seems more robust than just caching it within guest.
Anyway, let's start with issue tracker, it does not force any specific
solution.

-- 
MST

^ permalink raw reply

* [PATCH RFC v4 01/17] virtio: memory access APIs
From: Michael S. Tsirkin @ 2014-10-23 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Bjarke Istrup Pedersen, Anup Patel, Greg Kroah-Hartman,
	virtualization, Geert Uytterhoeven, linux-api, Alexei Starovoitov,
	David S. Miller, =?UTF-8?q?Piotr=20Kr=C3=B3l?=,
	Mauro Carvalho Chehab
In-Reply-To: <1414081380-14623-1-git-send-email-mst@redhat.com>

virtio 1.0 makes all memory structures LE, so
we need APIs to conditionally do a byteswap on BE
architectures.

To make it easier to check code statically,
add virtio specific types for multi-byte integers
in memory.

Add low level wrappers that do a byteswap conditionally, these will be
useful e.g. for vhost.  Add high level wrappers that will (in the
future) query device endian-ness and act accordingly.

At the moment, stub them out and assume native endian-ness everywhere.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/virtio_byteorder.h  | 59 +++++++++++++++++++++++++++++++++++++++
 include/linux/virtio_config.h     | 42 ++++++++++++++++++++++++++++
 include/uapi/linux/virtio_ring.h  | 45 ++++++++++++++---------------
 include/uapi/linux/virtio_types.h | 48 +++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild         |  1 +
 5 files changed, 173 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/virtio_byteorder.h
 create mode 100644 include/uapi/linux/virtio_types.h

diff --git a/include/linux/virtio_byteorder.h b/include/linux/virtio_byteorder.h
new file mode 100644
index 0000000..824ed0b
--- /dev/null
+++ b/include/linux/virtio_byteorder.h
@@ -0,0 +1,59 @@
+#ifndef _LINUX_VIRTIO_BYTEORDER_H
+#define _LINUX_VIRTIO_BYTEORDER_H
+#include <linux/types.h>
+#include <uapi/linux/virtio_types.h>
+
+/*
+ * Memory accessors for handling virtio in modern little endian and in
+ * compatibility native endian format.
+ */
+
+static inline u16 __virtio16_to_cpu(bool little_endian, __virtio16 val)
+{
+	if (little_endian)
+		return le16_to_cpu((__force __le16)val);
+	else
+		return (__force u16)val;
+}
+
+static inline __virtio16 __cpu_to_virtio16(bool little_endian, u16 val)
+{
+	if (little_endian)
+		return (__force __virtio16)cpu_to_le16(val);
+	else
+		return (__force __virtio16)val;
+}
+
+static inline u32 __virtio32_to_cpu(bool little_endian, __virtio32 val)
+{
+	if (little_endian)
+		return le32_to_cpu((__force __le32)val);
+	else
+		return (__force u32)val;
+}
+
+static inline __virtio32 __cpu_to_virtio32(bool little_endian, u32 val)
+{
+	if (little_endian)
+		return (__force __virtio32)cpu_to_le32(val);
+	else
+		return (__force __virtio32)val;
+}
+
+static inline u64 __virtio64_to_cpu(bool little_endian, __virtio64 val)
+{
+	if (little_endian)
+		return le64_to_cpu((__force __le64)val);
+	else
+		return (__force u64)val;
+}
+
+static inline __virtio64 __cpu_to_virtio64(bool little_endian, u64 val)
+{
+	if (little_endian)
+		return (__force __virtio64)cpu_to_le64(val);
+	else
+		return (__force __virtio64)val;
+}
+
+#endif /* _LINUX_VIRTIO_BYTEORDER */
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 7f4ef66..93c2b617 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -4,6 +4,7 @@
 #include <linux/err.h>
 #include <linux/bug.h>
 #include <linux/virtio.h>
+#include <linux/virtio_byteorder.h>
 #include <uapi/linux/virtio_config.h>
 
 /**
@@ -152,6 +153,47 @@ int virtqueue_set_affinity(struct virtqueue *vq, int cpu)
 	return 0;
 }
 
+/* Memory accessors */
+#define DEFINE_VIRTIO_XX_TO_CPU(bits) \
+static inline u##bits virtio##bits##_to_cpu(struct virtio_device *vdev, __virtio##bits val) \
+{ \
+	return __virtio##bits##_to_cpu(false, val); \
+} \
+static inline __virtio##bits cpu_to_virtio##bits(struct virtio_device *vdev, u##bits val) \
+{ \
+	return __cpu_to_virtio##bits(false, val); \
+}
+
+static inline u16 virtio16_to_cpu(struct virtio_device *vdev, __virtio16 val)
+{
+	return __virtio16_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), val);
+}
+
+static inline __virtio16 cpu_to_virtio16(struct virtio_device *vdev, u16 val)
+{
+	return __cpu_to_virtio16(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), val);
+}
+
+static inline u32 virtio32_to_cpu(struct virtio_device *vdev, __virtio32 val)
+{
+	return __virtio32_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), val);
+}
+
+static inline __virtio32 cpu_to_virtio32(struct virtio_device *vdev, u32 val)
+{
+	return __cpu_to_virtio32(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), val);
+}
+
+static inline u64 virtio64_to_cpu(struct virtio_device *vdev, __virtio64 val)
+{
+	return __virtio64_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), val);
+}
+
+static inline __virtio64 cpu_to_virtio64(struct virtio_device *vdev, u64 val)
+{
+	return __cpu_to_virtio64(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), val);
+}
+
 /* Config space accessors. */
 #define virtio_cread(vdev, structname, member, ptr)			\
 	do {								\
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index a99f9b7..61c818a 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -32,6 +32,7 @@
  *
  * Copyright Rusty Russell IBM Corporation 2007. */
 #include <linux/types.h>
+#include <linux/virtio_types.h>
 
 /* This marks a buffer as continuing via the next field. */
 #define VRING_DESC_F_NEXT	1
@@ -61,32 +62,32 @@
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc {
 	/* Address (guest-physical). */
-	__u64 addr;
+	__virtio64 addr;
 	/* Length. */
-	__u32 len;
+	__virtio32 len;
 	/* The flags as indicated above. */
-	__u16 flags;
+	__virtio16 flags;
 	/* We chain unused descriptors via this, too */
-	__u16 next;
+	__virtio16 next;
 };
 
 struct vring_avail {
-	__u16 flags;
-	__u16 idx;
-	__u16 ring[];
+	__virtio16 flags;
+	__virtio16 idx;
+	__virtio16 ring[];
 };
 
 /* u32 is used here for ids for padding reasons. */
 struct vring_used_elem {
 	/* Index of start of used descriptor chain. */
-	__u32 id;
+	__virtio32 id;
 	/* Total length of the descriptor chain which was used (written to) */
-	__u32 len;
+	__virtio32 len;
 };
 
 struct vring_used {
-	__u16 flags;
-	__u16 idx;
+	__virtio16 flags;
+	__virtio16 idx;
 	struct vring_used_elem ring[];
 };
 
@@ -109,25 +110,25 @@ struct vring {
  *	struct vring_desc desc[num];
  *
  *	// A ring of available descriptor heads with free-running index.
- *	__u16 avail_flags;
- *	__u16 avail_idx;
- *	__u16 available[num];
- *	__u16 used_event_idx;
+ *	__virtio16 avail_flags;
+ *	__virtio16 avail_idx;
+ *	__virtio16 available[num];
+ *	__virtio16 used_event_idx;
  *
  *	// Padding to the next align boundary.
  *	char pad[];
  *
  *	// A ring of used descriptor heads with free-running index.
- *	__u16 used_flags;
- *	__u16 used_idx;
+ *	__virtio16 used_flags;
+ *	__virtio16 used_idx;
  *	struct vring_used_elem used[num];
- *	__u16 avail_event_idx;
+ *	__virtio16 avail_event_idx;
  * };
  */
 /* We publish the used event index at the end of the available ring, and vice
  * versa. They are at the end for backwards compatibility. */
 #define vring_used_event(vr) ((vr)->avail->ring[(vr)->num])
-#define vring_avail_event(vr) (*(__u16 *)&(vr)->used->ring[(vr)->num])
+#define vring_avail_event(vr) (*(__virtio16 *)&(vr)->used->ring[(vr)->num])
 
 static inline void vring_init(struct vring *vr, unsigned int num, void *p,
 			      unsigned long align)
@@ -135,15 +136,15 @@ static inline void vring_init(struct vring *vr, unsigned int num, void *p,
 	vr->num = num;
 	vr->desc = p;
 	vr->avail = p + num*sizeof(struct vring_desc);
-	vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + sizeof(__u16)
+	vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + sizeof(__virtio16)
 		+ align-1) & ~(align - 1));
 }
 
 static inline unsigned vring_size(unsigned int num, unsigned long align)
 {
-	return ((sizeof(struct vring_desc) * num + sizeof(__u16) * (3 + num)
+	return ((sizeof(struct vring_desc) * num + sizeof(__virtio16) * (3 + num)
 		 + align - 1) & ~(align - 1))
-		+ sizeof(__u16) * 3 + sizeof(struct vring_used_elem) * num;
+		+ sizeof(__virtio16) * 3 + sizeof(struct vring_used_elem) * num;
 }
 
 /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
diff --git a/include/uapi/linux/virtio_types.h b/include/uapi/linux/virtio_types.h
new file mode 100644
index 0000000..b90385f
--- /dev/null
+++ b/include/uapi/linux/virtio_types.h
@@ -0,0 +1,48 @@
+#ifndef _UAPI_LINUX_VIRTIO_TYPES_H
+#define _UAPI_LINUX_VIRTIO_TYPES_H
+/* An interface for efficient virtio implementation, currently for use by KVM
+ * and lguest, but hopefully others soon.  Do NOT change this since it will
+ * break existing servers and clients.
+ *
+ * This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) 2014 Red Hat, Inc.
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ */
+#include <linux/types.h>
+
+/*
+ * __virtio{16,32,64} have the following meaning:
+ * - __u{16,32,64} for virtio devices in legacy mode, accessed in native endian
+ * - __le{16,32,64} for standard-compliant virtio devices
+ */
+
+typedef __u16 __bitwise__ __virtio16;
+typedef __u32 __bitwise__ __virtio32;
+typedef __u64 __bitwise__ __virtio64;
+
+#endif /* _UAPI_LINUX_VIRTIO_TYPES_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 6cad974..39c161a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -419,6 +419,7 @@ header-y += virtio_blk.h
 header-y += virtio_config.h
 header-y += virtio_console.h
 header-y += virtio_ids.h
+header-y += virtio_types.h
 header-y += virtio_net.h
 header-y += virtio_pci.h
 header-y += virtio_ring.h
-- 
MST

^ permalink raw reply related

* [PATCH RFC v4 05/17] virtio: add virtio 1.0 feature bit
From: Michael S. Tsirkin @ 2014-10-23 16:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, virtualization
In-Reply-To: <1414081380-14623-1-git-send-email-mst@redhat.com>

Based on original patches by Rusty Russell, Thomas Huth
and Cornelia Huck.

Note: at this time, we do not negotiate this feature bit
in core, drivers have to declare VERSION_1 support explicitly.

After all drivers are converted, we will be able to
move VERSION_1 to core and drop it from all drivers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 include/uapi/linux/virtio_config.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 3ce768c..f3fe33a 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -41,11 +41,11 @@
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED		0x80
 
-/* Some virtio feature bits (currently bits 28 through 31) are reserved for the
+/* Some virtio feature bits (currently bits 28 through 32) are reserved for the
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		32
+#define VIRTIO_TRANSPORT_F_END		33
 
 /* Do we get callbacks when the ring is completely used, even if we've
  * suppressed them? */
@@ -54,4 +54,7 @@
 /* Can the device handle any descriptor layout? */
 #define VIRTIO_F_ANY_LAYOUT		27
 
+/* v1.0 compliant. */
+#define VIRTIO_F_VERSION_1		32
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
MST

^ permalink raw reply related

* [PATCH RFC v4 09/17] virtio: set FEATURES_OK
From: Michael S. Tsirkin @ 2014-10-23 16:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, virtualization
In-Reply-To: <1414081380-14623-1-git-send-email-mst@redhat.com>

set FEATURES_OK as per virtio 1.0 spec

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/virtio_config.h |  2 ++
 drivers/virtio/virtio.c            | 29 ++++++++++++++++++++++-------
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index f3fe33a..a6d0cde 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -38,6 +38,8 @@
 #define VIRTIO_CONFIG_S_DRIVER		2
 /* Driver has used its parts of the config, and is happy */
 #define VIRTIO_CONFIG_S_DRIVER_OK	4
+/* Driver has finished configuring features */
+#define VIRTIO_CONFIG_S_FEATURES_OK	8
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED		0x80
 
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index d213567..a3df817 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -160,6 +160,7 @@ static int virtio_dev_probe(struct device *_d)
 	struct virtio_device *dev = dev_to_virtio(_d);
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
 	u64 device_features;
+	unsigned status;
 
 	/* We have a driver! */
 	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
@@ -183,18 +184,32 @@ static int virtio_dev_probe(struct device *_d)
 
 	dev->config->finalize_features(dev);
 
+	if (virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
+		add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+		status = dev->config->get_status(dev);
+		if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
+			printk(KERN_ERR "virtio: device refuses features: %x\n",
+			       status);
+			err = -ENODEV;
+			goto err;
+		}
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		add_status(dev, VIRTIO_CONFIG_S_FAILED);
-	else {
-		add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-		if (drv->scan)
-			drv->scan(dev);
+		goto err;
 
-		virtio_config_enable(dev);
-	}
+	add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	if (drv->scan)
+		drv->scan(dev);
+
+	virtio_config_enable(dev);
 
+	return 0;
+err:
+	add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
+
 }
 
 static int virtio_dev_remove(struct device *_d)
-- 
MST

^ permalink raw reply related

* [PATCH RFC v4 12/17] virtio_net: v1.0 support
From: Michael S. Tsirkin @ 2014-10-23 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rusty Russell, Cornelia Huck, virtualization, netdev, linux-api
In-Reply-To: <1414081380-14623-1-git-send-email-mst@redhat.com>

Based on patches by Rusty Russell, Cornelia Huck.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/virtio_net.h | 15 ++++++++-------
 drivers/net/virtio_net.c        | 34 +++++++++++++++++++++-------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 172a7f0..b5f1677 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -28,6 +28,7 @@
 #include <linux/types.h>
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
 #include <linux/if_ether.h>
 
 /* The feature bitmap for virtio net */
@@ -84,17 +85,17 @@ struct virtio_net_hdr {
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	// GSO frame, IPv6 TCP
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	// TCP has ECN set
 	__u8 gso_type;
-	__u16 hdr_len;		/* Ethernet + IP + tcp/udp hdrs */
-	__u16 gso_size;		/* Bytes to append to hdr_len per frame */
-	__u16 csum_start;	/* Position to start checksumming from */
-	__u16 csum_offset;	/* Offset after that to place checksum */
+	__virtio16 hdr_len;		/* Ethernet + IP + tcp/udp hdrs */
+	__virtio16 gso_size;		/* Bytes to append to hdr_len per frame */
+	__virtio16 csum_start;	/* Position to start checksumming from */
+	__virtio16 csum_offset;	/* Offset after that to place checksum */
 };
 
 /* This is the version of the header to use when the MRG_RXBUF
  * feature has been negotiated. */
 struct virtio_net_hdr_mrg_rxbuf {
 	struct virtio_net_hdr hdr;
-	__u16 num_buffers;	/* Number of merged rx buffers */
+	__virtio16 num_buffers;	/* Number of merged rx buffers */
 };
 
 /*
@@ -149,7 +150,7 @@ typedef __u8 virtio_net_ctrl_ack;
  * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
  */
 struct virtio_net_ctrl_mac {
-	__u32 entries;
+	__virtio32 entries;
 	__u8 macs[][ETH_ALEN];
 } __attribute__((packed));
 
@@ -193,7 +194,7 @@ struct virtio_net_ctrl_mac {
  * specified.
  */
 struct virtio_net_ctrl_mq {
-	__u16 virtqueue_pairs;
+	__virtio16 virtqueue_pairs;
 };
 
 #define VIRTIO_NET_CTRL_MQ   4
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d75256bd..57cbc7d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -347,13 +347,14 @@ err:
 }
 
 static struct sk_buff *receive_mergeable(struct net_device *dev,
+					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
 					 unsigned long ctx,
 					 unsigned int len)
 {
 	void *buf = mergeable_ctx_to_buf_address(ctx);
 	struct skb_vnet_hdr *hdr = buf;
-	int num_buf = hdr->mhdr.num_buffers;
+	u16 num_buf = virtio16_to_cpu(rq->vq->vdev, hdr->mhdr.num_buffers);
 	struct page *page = virt_to_head_page(buf);
 	int offset = buf - page_address(page);
 	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
@@ -369,7 +370,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &len);
 		if (unlikely(!ctx)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
-				 dev->name, num_buf, hdr->mhdr.num_buffers);
+				 dev->name, num_buf,
+				 virtio16_to_cpu(rq->vq->vdev,
+						 hdr->mhdr.num_buffers));
 			dev->stats.rx_length_errors++;
 			goto err_buf;
 		}
@@ -454,7 +457,7 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 	}
 
 	if (vi->mergeable_rx_bufs)
-		skb = receive_mergeable(dev, rq, (unsigned long)buf, len);
+		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
 	else if (vi->big_packets)
 		skb = receive_big(dev, rq, buf, len);
 	else
@@ -473,8 +476,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 	if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		pr_debug("Needs csum!\n");
 		if (!skb_partial_csum_set(skb,
-					  hdr->hdr.csum_start,
-					  hdr->hdr.csum_offset))
+			  virtio16_to_cpu(vi->vdev, hdr->hdr.csum_start),
+			  virtio16_to_cpu(vi->vdev, hdr->hdr.csum_offset)))
 			goto frame_err;
 	} else if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID) {
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -505,7 +508,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 		if (hdr->hdr.gso_type & VIRTIO_NET_HDR_GSO_ECN)
 			skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
 
-		skb_shinfo(skb)->gso_size = hdr->hdr.gso_size;
+		skb_shinfo(skb)->gso_size = virtio16_to_cpu(vi->vdev,
+							    hdr->hdr.gso_size);
 		if (skb_shinfo(skb)->gso_size == 0) {
 			net_warn_ratelimited("%s: zero gso size.\n", dev->name);
 			goto frame_err;
@@ -867,16 +871,19 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		hdr->hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-		hdr->hdr.csum_start = skb_checksum_start_offset(skb);
-		hdr->hdr.csum_offset = skb->csum_offset;
+		hdr->hdr.csum_start = cpu_to_virtio16(vi->vdev,
+						skb_checksum_start_offset(skb));
+		hdr->hdr.csum_offset = cpu_to_virtio16(vi->vdev,
+							 skb->csum_offset);
 	} else {
 		hdr->hdr.flags = 0;
 		hdr->hdr.csum_offset = hdr->hdr.csum_start = 0;
 	}
 
 	if (skb_is_gso(skb)) {
-		hdr->hdr.hdr_len = skb_headlen(skb);
-		hdr->hdr.gso_size = skb_shinfo(skb)->gso_size;
+		hdr->hdr.hdr_len = cpu_to_virtio16(vi->vdev, skb_headlen(skb));
+		hdr->hdr.gso_size = cpu_to_virtio16(vi->vdev,
+						    skb_shinfo(skb)->gso_size);
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
 			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
@@ -1105,7 +1112,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 	if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
 		return 0;
 
-	s.virtqueue_pairs = queue_pairs;
+	s.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
 	sg_init_one(&sg, &s, sizeof(s));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
@@ -1182,7 +1189,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 	sg_init_table(sg, 2);
 
 	/* Store the unicast list and count in the front of the buffer */
-	mac_data->entries = uc_count;
+	mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
 	i = 0;
 	netdev_for_each_uc_addr(ha, dev)
 		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
@@ -1193,7 +1200,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 	/* multicast list and count fill the end */
 	mac_data = (void *)&mac_data->macs[uc_count][0];
 
-	mac_data->entries = mc_count;
+	mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
 	i = 0;
 	netdev_for_each_mc_addr(ha, dev)
 		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
@@ -1960,6 +1967,7 @@ static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
 	VIRTIO_NET_F_CTRL_MAC_ADDR,
 	VIRTIO_F_ANY_LAYOUT,
+	VIRTIO_F_VERSION_1,
 };
 
 static struct virtio_driver virtio_net_driver = {
-- 
MST

^ permalink raw reply related

* [PATCH RFC v4 13/17] virtio_blk: v1.0 support
From: Michael S. Tsirkin @ 2014-10-23 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Rusty Russell,
	virtualization, linux-api
In-Reply-To: <1414081380-14623-1-git-send-email-mst@redhat.com>

Based on patch by Cornelia Huck.

Note: for consistency, and to avoid sparse errors,
      convert all fields, even those no longer in use
      for virtio v1.0.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/virtio_blk.h | 15 ++++-----
 drivers/block/virtio_blk.c      | 67 ++++++++++++++++++++++++-----------------
 2 files changed, 47 insertions(+), 35 deletions(-)

diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 9ad67b2..247c8ba 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -28,6 +28,7 @@
 #include <linux/types.h>
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
 
 /* Feature bits */
 #define VIRTIO_BLK_F_BARRIER	0	/* Does host support barriers? */
@@ -114,18 +115,18 @@ struct virtio_blk_config {
 /* This is the first element of the read scatter-gather list. */
 struct virtio_blk_outhdr {
 	/* VIRTIO_BLK_T* */
-	__u32 type;
+	__virtio32 type;
 	/* io priority. */
-	__u32 ioprio;
+	__virtio32 ioprio;
 	/* Sector (ie. 512 byte offset) */
-	__u64 sector;
+	__virtio64 sector;
 };
 
 struct virtio_scsi_inhdr {
-	__u32 errors;
-	__u32 data_len;
-	__u32 sense_len;
-	__u32 residual;
+	__virtio32 errors;
+	__virtio32 data_len;
+	__virtio32 sense_len;
+	__virtio32 residual;
 };
 
 /* And this is the final byte of the write scatter-gather list. */
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index c6a27d5..17d3d91 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -80,7 +80,7 @@ static int __virtblk_add_req(struct virtqueue *vq,
 {
 	struct scatterlist hdr, status, cmd, sense, inhdr, *sgs[6];
 	unsigned int num_out = 0, num_in = 0;
-	int type = vbr->out_hdr.type & ~VIRTIO_BLK_T_OUT;
+	__virtio32 type = vbr->out_hdr.type & ~cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT);
 
 	sg_init_one(&hdr, &vbr->out_hdr, sizeof(vbr->out_hdr));
 	sgs[num_out++] = &hdr;
@@ -91,19 +91,19 @@ static int __virtblk_add_req(struct virtqueue *vq,
 	 * block, and before the normal inhdr we put the sense data and the
 	 * inhdr with additional status information.
 	 */
-	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+	if (type == cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_SCSI_CMD)) {
 		sg_init_one(&cmd, vbr->req->cmd, vbr->req->cmd_len);
 		sgs[num_out++] = &cmd;
 	}
 
 	if (have_data) {
-		if (vbr->out_hdr.type & VIRTIO_BLK_T_OUT)
+		if (vbr->out_hdr.type & cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT))
 			sgs[num_out++] = data_sg;
 		else
 			sgs[num_out + num_in++] = data_sg;
 	}
 
-	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+	if (type == cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_SCSI_CMD)) {
 		sg_init_one(&sense, vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
 		sgs[num_out + num_in++] = &sense;
 		sg_init_one(&inhdr, &vbr->in_hdr, sizeof(vbr->in_hdr));
@@ -119,12 +119,13 @@ static int __virtblk_add_req(struct virtqueue *vq,
 static inline void virtblk_request_done(struct request *req)
 {
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
+	struct virtio_blk *vblk = req->q->queuedata;
 	int error = virtblk_result(vbr);
 
 	if (req->cmd_type == REQ_TYPE_BLOCK_PC) {
-		req->resid_len = vbr->in_hdr.residual;
-		req->sense_len = vbr->in_hdr.sense_len;
-		req->errors = vbr->in_hdr.errors;
+		req->resid_len = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.residual);
+		req->sense_len = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.sense_len);
+		req->errors = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.errors);
 	} else if (req->cmd_type == REQ_TYPE_SPECIAL) {
 		req->errors = (error != 0);
 	}
@@ -173,25 +174,25 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req,
 
 	vbr->req = req;
 	if (req->cmd_flags & REQ_FLUSH) {
-		vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH;
+		vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_FLUSH);
 		vbr->out_hdr.sector = 0;
-		vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+		vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 	} else {
 		switch (req->cmd_type) {
 		case REQ_TYPE_FS:
 			vbr->out_hdr.type = 0;
-			vbr->out_hdr.sector = blk_rq_pos(vbr->req);
-			vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+			vbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, blk_rq_pos(vbr->req));
+			vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 			break;
 		case REQ_TYPE_BLOCK_PC:
-			vbr->out_hdr.type = VIRTIO_BLK_T_SCSI_CMD;
+			vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_SCSI_CMD);
 			vbr->out_hdr.sector = 0;
-			vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+			vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 			break;
 		case REQ_TYPE_SPECIAL:
-			vbr->out_hdr.type = VIRTIO_BLK_T_GET_ID;
+			vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_GET_ID);
 			vbr->out_hdr.sector = 0;
-			vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+			vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 			break;
 		default:
 			/* We don't put anything else in the queue. */
@@ -204,9 +205,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req,
 	num = blk_rq_map_sg(hctx->queue, vbr->req, vbr->sg);
 	if (num) {
 		if (rq_data_dir(vbr->req) == WRITE)
-			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
+			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_OUT);
 		else
-			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
+			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_IN);
 	}
 
 	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
@@ -821,25 +822,35 @@ static const struct virtio_device_id id_table[] = {
 	{ 0 },
 };
 
-static unsigned int features[] = {
+static unsigned int features_legacy[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
 	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
 	VIRTIO_BLK_F_MQ,
+}
+;
+static unsigned int features[] = {
+	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
+	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
+	VIRTIO_BLK_F_TOPOLOGY,
+	VIRTIO_BLK_F_MQ,
+	VIRTIO_F_VERSION_1,
 };
 
 static struct virtio_driver virtio_blk = {
-	.feature_table		= features,
-	.feature_table_size	= ARRAY_SIZE(features),
-	.driver.name		= KBUILD_MODNAME,
-	.driver.owner		= THIS_MODULE,
-	.id_table		= id_table,
-	.probe			= virtblk_probe,
-	.remove			= virtblk_remove,
-	.config_changed		= virtblk_config_changed,
+	.feature_table			= features,
+	.feature_table_size		= ARRAY_SIZE(features),
+	.feature_table_legacy		= features_legacy,
+	.feature_table_size_legacy	= ARRAY_SIZE(features_legacy),
+	.driver.name			= KBUILD_MODNAME,
+	.driver.owner			= THIS_MODULE,
+	.id_table			= id_table,
+	.probe				= virtblk_probe,
+	.remove				= virtblk_remove,
+	.config_changed			= virtblk_config_changed,
 #ifdef CONFIG_PM_SLEEP
-	.freeze			= virtblk_freeze,
-	.restore		= virtblk_restore,
+	.freeze				= virtblk_freeze,
+	.restore			= virtblk_restore,
 #endif
 };
 
-- 
MST

^ permalink raw reply related

* Re: [PATCH V3] kernel, add bug_on_warn
From: Andrew Morton @ 2014-10-23 19:40 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jonathan Corbet,
	Rusty Russell, H. Peter Anvin, Andi Kleen, Masami Hiramatsu,
	Fabian Frederick, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
	isimatu.yasuaki-+CUm20s59erQFUHtdCDX3A,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1414068794-22788-1-git-send-email-prarit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, 23 Oct 2014 08:53:14 -0400 Prarit Bhargava <prarit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> There have been several times where I have had to rebuild a kernel to
> cause a panic when hitting a WARN() in the code in order to get a crash
> dump from a system.  Sometimes this is easy to do, other times (such as
> in the case of a remote admin) it is not trivial to send new images to the
> user.
> 
> A much easier method would be a switch to change the WARN() over to a
> BUG().  This makes debugging easier in that I can now test the actual
> image the WARN() was seen on and I do not have to engage in remote
> debugging.
> 
> This patch adds a bug_on_warn kernel parameter, which calls BUG() in the
> warn_slowpath_common() path.  The function will still print out the
> location of the warning.

The changelog doesn't mention /proc/sys/kernel/bug_on_warn?

Why do we need both the sysctl and the kernel parameter?  Only to
trigger BUG for warnings which occur prior to initscripts.  Is there a
legitimate case for this?  Is kdump even usable at this time?

> --- a/include/asm-generic/bug.h
> +++ b/include/asm-generic/bug.h
> @@ -75,10 +75,18 @@ extern void warn_slowpath_null(const char *file, const int line);
>  #define __WARN_printf_taint(taint, arg...)				\
>  	warn_slowpath_fmt_taint(__FILE__, __LINE__, taint, arg)
>  #else
> -#define __WARN()		__WARN_TAINT(TAINT_WARN)
> +#define check_bug_on_warn()						\
> +	do {								\
> +		if (bug_on_warn)					\
> +			BUG();						\
> +	} while (0)

#define check_bug_on_warn() BUG_ON(bug_on_warn)

would suffice?

> +#define __WARN()							\
> +	do { __WARN_TAINT(TAINT_WARN); check_bug_on_warn(); } while (0)
> +
>  #define __WARN_printf(arg...)	do { printk(arg); __WARN(); } while (0)
>  #define __WARN_printf_taint(taint, arg...)				\
> -	do { printk(arg); __WARN_TAINT(taint); } while (0)
> +	do { printk(arg); __WARN_TAINT(taint); check_bug_on_warn(); } while (0)
>  #endif

What's this code here for anyway?  The changes to
warn_slowpath_common() aren't sufficient?

>  #ifndef WARN_ON
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 40728cf..4094a60 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -422,6 +422,7 @@ extern int panic_on_oops;
>  extern int panic_on_unrecovered_nmi;
>  extern int panic_on_io_nmi;
>  extern int sysctl_panic_on_stackoverflow;
> +extern int bug_on_warn;
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> index 43aaba1..2ba0a58 100644
> --- a/include/uapi/linux/sysctl.h
> +++ b/include/uapi/linux/sysctl.h
> @@ -153,6 +153,7 @@ enum
>  	KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
>  	KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
>  	KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
> +	KERN_BUG_ON_WARN=77, /* int: call BUG() in WARN() functions */
>  };
>  
>  
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d09dc5c..a6d2e2f 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -33,6 +33,7 @@ static int pause_on_oops;
>  static int pause_on_oops_flag;
>  static DEFINE_SPINLOCK(pause_on_oops_lock);
>  static bool crash_kexec_post_notifiers;
> +int bug_on_warn;

I suppose this should be __read_mostly.  Assuming __read_mostly is
useful :(

>
> ...
>

^ permalink raw reply

* Re: [PATCHv1 5/8] cgroup: introduce cgroup namespaces
From: Aditya Kali @ 2014-10-24  1:03 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux API, Linux Containers, Serge Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andy Lutomirski, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar
In-Reply-To: <20141016163703.GE1392-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>

I will include the suggested changes in the new patchset. Some comments inline.

On Thu, Oct 16, 2014 at 9:37 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org):
>> Introduce the ability to create new cgroup namespace. The newly created
>> cgroup namespace remembers the 'struct cgroup *root_cgrp' at the point
>> of creation of the cgroup namespace. The task that creates the new
>> cgroup namespace and all its future children will now be restricted only
>> to the cgroup hierarchy under this root_cgrp.
>> The main purpose of cgroup namespace is to virtualize the contents
>> of /proc/self/cgroup file. Processes inside a cgroup namespace
>> are only able to see paths relative to their namespace root.
>> This allows container-tools (like libcontainer, lxc, lmctfy, etc.)
>> to create completely virtualized containers without leaking system
>> level cgroup hierarchy to the task.
>> This patch only implements the 'unshare' part of the cgroupns.
>>
>> Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>
> I'm not sure that the CONFIG_CGROUP_NS is worthwhile.  If you already
> have cgroups in the kernel this won't add much in the way of memory
> usage, right?  And I think the 'experimental' argument has long since
> been squashed.  So I'd argue for simplifying this patch by removing
> CONFIG_CGROUP_NS.
>

With no pinning involved, I think its safe to enable the feature
without needing a config option. Removed it from next version. This
feature is now implicitly available with CONFIG_CGROUPS.

> (more below)
>
>> ---
>>  fs/proc/namespaces.c             |   3 +
>>  include/linux/cgroup.h           |  18 +++++-
>>  include/linux/cgroup_namespace.h |  62 +++++++++++++++++++
>>  include/linux/nsproxy.h          |   2 +
>>  include/linux/proc_ns.h          |   4 ++
>>  init/Kconfig                     |   9 +++
>>  kernel/Makefile                  |   1 +
>>  kernel/cgroup.c                  |  11 ++++
>>  kernel/cgroup_namespace.c        | 128 +++++++++++++++++++++++++++++++++++++++
>>  kernel/fork.c                    |   2 +-
>>  kernel/nsproxy.c                 |  19 +++++-
>>  11 files changed, 255 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
>> index 8902609..e04ed4b 100644
>> --- a/fs/proc/namespaces.c
>> +++ b/fs/proc/namespaces.c
>> @@ -32,6 +32,9 @@ static const struct proc_ns_operations *ns_entries[] = {
>>       &userns_operations,
>>  #endif
>>       &mntns_operations,
>> +#ifdef CONFIG_CGROUP_NS
>> +     &cgroupns_operations,
>> +#endif
>>  };
>>
>>  static const struct file_operations ns_file_operations = {
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 4a0eb2d..aa86495 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -22,6 +22,8 @@
>>  #include <linux/seq_file.h>
>>  #include <linux/kernfs.h>
>>  #include <linux/wait.h>
>> +#include <linux/nsproxy.h>
>> +#include <linux/types.h>
>>
>>  #ifdef CONFIG_CGROUPS
>>
>> @@ -460,6 +462,13 @@ struct cftype {
>>  #endif
>>  };
>>
>> +struct cgroup_namespace {
>> +     atomic_t                count;
>> +     unsigned int            proc_inum;
>> +     struct user_namespace   *user_ns;
>> +     struct cgroup           *root_cgrp;
>> +};
>> +
>>  extern struct cgroup_root cgrp_dfl_root;
>>  extern struct css_set init_css_set;
>>
>> @@ -584,10 +593,17 @@ static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
>>       return kernfs_name(cgrp->kn, buf, buflen);
>>  }
>>
>> +static inline char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
>> +                                              struct cgroup *cgrp, char *buf,
>> +                                              size_t buflen)
>> +{
>> +     return kernfs_path_from_node(ns->root_cgrp->kn, cgrp->kn, buf, buflen);
>> +}
>> +
>>  static inline char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
>>                                             size_t buflen)
>>  {
>> -     return kernfs_path(cgrp->kn, buf, buflen);
>> +     return cgroup_path_ns(current->nsproxy->cgroup_ns, cgrp, buf, buflen);
>>  }
>>
>>  static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
>> diff --git a/include/linux/cgroup_namespace.h b/include/linux/cgroup_namespace.h
>> new file mode 100644
>> index 0000000..9f637fe
>> --- /dev/null
>> +++ b/include/linux/cgroup_namespace.h
>> @@ -0,0 +1,62 @@
>> +#ifndef _LINUX_CGROUP_NAMESPACE_H
>> +#define _LINUX_CGROUP_NAMESPACE_H
>> +
>> +#include <linux/nsproxy.h>
>> +#include <linux/cgroup.h>
>> +#include <linux/types.h>
>> +#include <linux/user_namespace.h>
>> +
>> +extern struct cgroup_namespace init_cgroup_ns;
>> +
>> +static inline struct cgroup *task_cgroupns_root(struct task_struct *tsk)
>> +{
>> +     return tsk->nsproxy->cgroup_ns->root_cgrp;
>
> Per the rules in nsproxy.h, you should be taking the task_lock here.
>
> (If you are making assumptions about tsk then you need to state them
> here - I only looked quickly enough that you pass in 'leader')
>

In the new version of the patch, we call this function only for the
'current' task. As per nsproxy.h, no special precautions needed when
reading current task's nsproxy. So I just remodeled this function into
"current_cgroupns_root(void)".

>> +}
>> +
>> +#ifdef CONFIG_CGROUP_NS
>> +
>> +extern void free_cgroup_ns(struct cgroup_namespace *ns);
>> +
>> +static inline struct cgroup_namespace *get_cgroup_ns(
>> +             struct cgroup_namespace *ns)
>> +{
>> +     if (ns)
>> +             atomic_inc(&ns->count);
>> +     return ns;
>> +}
>> +
>> +static inline void put_cgroup_ns(struct cgroup_namespace *ns)
>> +{
>> +     if (ns && atomic_dec_and_test(&ns->count))
>> +             free_cgroup_ns(ns);
>> +}
>> +
>> +extern struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
>> +                                            struct user_namespace *user_ns,
>> +                                            struct cgroup_namespace *old_ns);
>> +
>> +#else  /* CONFIG_CGROUP_NS */
>> +
>> +static inline struct cgroup_namespace *get_cgroup_ns(
>> +             struct cgroup_namespace *ns)
>> +{
>> +     return &init_cgroup_ns;
>> +}
>> +
>> +static inline void put_cgroup_ns(struct cgroup_namespace *ns)
>> +{
>> +}
>> +
>> +static inline struct cgroup_namespace *copy_cgroup_ns(
>> +             unsigned long flags,
>> +             struct user_namespace *user_ns,
>> +             struct cgroup_namespace *old_ns) {
>> +     if (flags & CLONE_NEWCGROUP)
>> +             return ERR_PTR(-EINVAL);
>> +
>> +     return old_ns;
>> +}
>> +
>> +#endif  /* CONFIG_CGROUP_NS */
>> +
>> +#endif  /* _LINUX_CGROUP_NAMESPACE_H */
>> diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
>> index 35fa08f..ac0d65b 100644
>> --- a/include/linux/nsproxy.h
>> +++ b/include/linux/nsproxy.h
>> @@ -8,6 +8,7 @@ struct mnt_namespace;
>>  struct uts_namespace;
>>  struct ipc_namespace;
>>  struct pid_namespace;
>> +struct cgroup_namespace;
>>  struct fs_struct;
>>
>>  /*
>> @@ -33,6 +34,7 @@ struct nsproxy {
>>       struct mnt_namespace *mnt_ns;
>>       struct pid_namespace *pid_ns_for_children;
>>       struct net           *net_ns;
>> +     struct cgroup_namespace *cgroup_ns;
>>  };
>>  extern struct nsproxy init_nsproxy;
>>
>> diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
>> index 34a1e10..e56dd73 100644
>> --- a/include/linux/proc_ns.h
>> +++ b/include/linux/proc_ns.h
>> @@ -6,6 +6,8 @@
>>
>>  struct pid_namespace;
>>  struct nsproxy;
>> +struct task_struct;
>> +struct inode;
>>
>>  struct proc_ns_operations {
>>       const char *name;
>> @@ -27,6 +29,7 @@ extern const struct proc_ns_operations ipcns_operations;
>>  extern const struct proc_ns_operations pidns_operations;
>>  extern const struct proc_ns_operations userns_operations;
>>  extern const struct proc_ns_operations mntns_operations;
>> +extern const struct proc_ns_operations cgroupns_operations;
>>
>>  /*
>>   * We always define these enumerators
>> @@ -37,6 +40,7 @@ enum {
>>       PROC_UTS_INIT_INO       = 0xEFFFFFFEU,
>>       PROC_USER_INIT_INO      = 0xEFFFFFFDU,
>>       PROC_PID_INIT_INO       = 0xEFFFFFFCU,
>> +     PROC_CGROUP_INIT_INO    = 0xEFFFFFFBU,
>>  };
>>
>>  #ifdef CONFIG_PROC_FS
>> diff --git a/init/Kconfig b/init/Kconfig
>> index e84c642..c3be001 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1144,6 +1144,15 @@ config DEBUG_BLK_CGROUP
>>       Enable some debugging help. Currently it exports additional stat
>>       files in a cgroup which can be useful for debugging.
>>
>> +config CGROUP_NS
>> +     bool "CGroup Namespaces"
>> +     default n
>> +     help
>> +       This options enables CGroup Namespaces which can be used to isolate
>> +       cgroup paths. This feature is only useful when unified cgroup
>> +       hierarchy is in use (i.e. cgroups are mounted with sane_behavior
>> +       option).
>> +
>>  endif # CGROUPS
>>
>>  config CHECKPOINT_RESTORE
>> diff --git a/kernel/Makefile b/kernel/Makefile
>> index dc5c775..75334f8 100644
>> --- a/kernel/Makefile
>> +++ b/kernel/Makefile
>> @@ -51,6 +51,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
>>  obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
>>  obj-$(CONFIG_COMPAT) += compat.o
>>  obj-$(CONFIG_CGROUPS) += cgroup.o
>> +obj-$(CONFIG_CGROUP_NS) += cgroup_namespace.o
>>  obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
>>  obj-$(CONFIG_CPUSETS) += cpuset.o
>>  obj-$(CONFIG_UTS_NS) += utsname.o
>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> index 2b3e9f9..f8099b4 100644
>> --- a/kernel/cgroup.c
>> +++ b/kernel/cgroup.c
>> @@ -57,6 +57,8 @@
>>  #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
>>  #include <linux/kthread.h>
>>  #include <linux/delay.h>
>> +#include <linux/proc_ns.h>
>> +#include <linux/cgroup_namespace.h>
>>
>>  #include <linux/atomic.h>
>>
>> @@ -195,6 +197,15 @@ static void kill_css(struct cgroup_subsys_state *css);
>>  static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
>>                             bool is_add);
>>
>> +struct cgroup_namespace init_cgroup_ns = {
>> +     .count = {
>> +             .counter = 1,
>> +     },
>> +     .proc_inum = PROC_CGROUP_INIT_INO,
>> +     .user_ns = &init_user_ns,
>
> This might mean that you should bump the init_user_ns refcount.
>

Humm. Doesn't look like all other namespaces are doing it though (ex:
init_pid_ns or init_ipc_ns). The initial count in init_user_ns is set
to 3 which only accounts for some current users, but not all. I will
increment it for init_cgroup_ns nevertheless (in cgroup_init()).

>> +     .root_cgrp = &cgrp_dfl_root.cgrp,
>> +};
>> +
>>  /* IDR wrappers which synchronize using cgroup_idr_lock */
>>  static int cgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
>>                           gfp_t gfp_mask)
>> diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
>> new file mode 100644
>> index 0000000..c16604f
>> --- /dev/null
>> +++ b/kernel/cgroup_namespace.c
>> @@ -0,0 +1,128 @@
>> +
>> +#include <linux/cgroup.h>
>> +#include <linux/cgroup_namespace.h>
>> +#include <linux/sched.h>
>> +#include <linux/slab.h>
>> +#include <linux/nsproxy.h>
>> +#include <linux/proc_ns.h>
>> +
>> +static struct cgroup_namespace *alloc_cgroup_ns(void)
>> +{
>> +     struct cgroup_namespace *new_ns;
>> +
>> +     new_ns = kmalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
>> +     if (new_ns)
>> +             atomic_set(&new_ns->count, 1);
>> +     return new_ns;
>> +}
>> +
>> +void free_cgroup_ns(struct cgroup_namespace *ns)
>> +{
>> +     cgroup_put(ns->root_cgrp);
>> +     put_user_ns(ns->user_ns);
>
> This is a problem on error patch in copy_cgroup_ns.  The
> alloc_cgroup_ns() doesn't initialize these values, so if
> you should fail in proc_alloc_inum() you'll show up here
> with fandom values in ns->*.
>

I don't see the codepath that leads to calling free_cgroup_ns() with
uninitialized members. We don't call free_cgroup_ns() on the error
path in copy_cgroup_ns().

>> +     proc_free_inum(ns->proc_inum);

BTW, I was missing the actual kfree(ns) here. Added it.

>> +}
>> +EXPORT_SYMBOL(free_cgroup_ns);
>> +
>> +struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
>> +                                     struct user_namespace *user_ns,
>> +                                     struct cgroup_namespace *old_ns)
>> +{
>> +     struct cgroup_namespace *new_ns = NULL;
>> +     struct cgroup *cgrp = NULL;
>> +     int err;
>> +
>> +     BUG_ON(!old_ns);
>> +
>> +     if (!(flags & CLONE_NEWCGROUP))
>> +             return get_cgroup_ns(old_ns);
>> +
>> +     /* Allow only sysadmin to create cgroup namespace. */
>> +     err = -EPERM;
>> +     if (!ns_capable(user_ns, CAP_SYS_ADMIN))
>> +             goto err_out;
>> +
>> +     /* Prevent cgroup changes for this task. */
>> +     threadgroup_lock(current);
>> +
>> +     cgrp = get_task_cgroup(current);
>> +
>> +     /* Creating new CGROUPNS is supported only when unified hierarchy is in
>> +      * use. */
>
> Oh, drat.  Well, I'll take, it, but under protest  :)
>

Actually, I realized that this comment and the check below is bogus.
The 'get_task_cgroup(current)' always only returns the cgroup on the
default hierarchy. And so, the check below is unnecessary.
What this comment should really say is that cgroup namespace only
virtualizes the cgroup path for the default(unified) hierarchy. Its
fine if you have other hierarchies mounted too. Just that for those
hierarchies, full (non-virtualized) cgroup path will be visible in
/proc/self/cgroup. So cgroupns won't help there.

I have updated the comment in the new version of the patch.

>> +     err = -EINVAL;
>> +     if (!cgroup_on_dfl(cgrp))
>> +             goto err_out_unlock;
>> +
>> +     err = -ENOMEM;
>> +     new_ns = alloc_cgroup_ns();
>> +     if (!new_ns)
>> +             goto err_out_unlock;
>> +
>> +     err = proc_alloc_inum(&new_ns->proc_inum);
>> +     if (err)
>> +             goto err_out_unlock;
>> +
>> +     new_ns->user_ns = get_user_ns(user_ns);
>> +     new_ns->root_cgrp = cgrp;
>> +
>> +     threadgroup_unlock(current);
>> +
>> +     return new_ns;
>> +
>> +err_out_unlock:
>> +     threadgroup_unlock(current);
>> +err_out:
>> +     if (cgrp)
>> +             cgroup_put(cgrp);
>> +     kfree(new_ns);
>> +     return ERR_PTR(err);
>> +}
>> +
>> +static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
>> +{
>> +     pr_info("setns not supported for cgroup namespace");
>> +     return -EINVAL;
>> +}
>> +
>> +static void *cgroupns_get(struct task_struct *task)
>> +{
>> +     struct cgroup_namespace *ns = NULL;
>> +     struct nsproxy *nsproxy;
>> +
>> +     rcu_read_lock();
>> +     nsproxy = task->nsproxy;
>> +     if (nsproxy) {
>> +             ns = nsproxy->cgroup_ns;
>> +             get_cgroup_ns(ns);
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     return ns;
>> +}
>> +
>> +static void cgroupns_put(void *ns)
>> +{
>> +     put_cgroup_ns(ns);
>> +}
>> +
>> +static unsigned int cgroupns_inum(void *ns)
>> +{
>> +     struct cgroup_namespace *cgroup_ns = ns;
>> +
>> +     return cgroup_ns->proc_inum;
>> +}
>> +
>> +const struct proc_ns_operations cgroupns_operations = {
>> +     .name           = "cgroup",
>> +     .type           = CLONE_NEWCGROUP,
>> +     .get            = cgroupns_get,
>> +     .put            = cgroupns_put,
>> +     .install        = cgroupns_install,
>> +     .inum           = cgroupns_inum,
>> +};
>> +
>> +static __init int cgroup_namespaces_init(void)
>> +{
>> +     return 0;
>> +}
>> +subsys_initcall(cgroup_namespaces_init);
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 0cf9cdb..cc06851 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -1790,7 +1790,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
>>       if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
>>                               CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
>>                               CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
>> -                             CLONE_NEWUSER|CLONE_NEWPID))
>> +                             CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP))
>>               return -EINVAL;
>>       /*
>>        * Not implemented, but pretend it works if there is nothing to
>> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
>> index ef42d0a..a8b1970 100644
>> --- a/kernel/nsproxy.c
>> +++ b/kernel/nsproxy.c
>> @@ -25,6 +25,7 @@
>>  #include <linux/proc_ns.h>
>>  #include <linux/file.h>
>>  #include <linux/syscalls.h>
>> +#include <linux/cgroup_namespace.h>
>>
>>  static struct kmem_cache *nsproxy_cachep;
>>
>> @@ -39,6 +40,7 @@ struct nsproxy init_nsproxy = {
>>  #ifdef CONFIG_NET
>>       .net_ns                 = &init_net,
>>  #endif
>> +     .cgroup_ns              = &init_cgroup_ns,
>>  };
>>
>>  static inline struct nsproxy *create_nsproxy(void)
>> @@ -92,6 +94,13 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
>>               goto out_pid;
>>       }
>>
>> +     new_nsp->cgroup_ns = copy_cgroup_ns(flags, user_ns,
>> +                                         tsk->nsproxy->cgroup_ns);
>> +     if (IS_ERR(new_nsp->cgroup_ns)) {
>> +             err = PTR_ERR(new_nsp->cgroup_ns);
>> +             goto out_cgroup;
>> +     }
>> +
>>       new_nsp->net_ns = copy_net_ns(flags, user_ns, tsk->nsproxy->net_ns);
>>       if (IS_ERR(new_nsp->net_ns)) {
>>               err = PTR_ERR(new_nsp->net_ns);
>> @@ -101,6 +110,9 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
>>       return new_nsp;
>>
>>  out_net:
>> +     if (new_nsp->cgroup_ns)
>> +             put_cgroup_ns(new_nsp->cgroup_ns);
>> +out_cgroup:
>>       if (new_nsp->pid_ns_for_children)
>>               put_pid_ns(new_nsp->pid_ns_for_children);
>>  out_pid:
>> @@ -128,7 +140,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>>       struct nsproxy *new_ns;
>>
>>       if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
>> -                           CLONE_NEWPID | CLONE_NEWNET)))) {
>> +                           CLONE_NEWPID | CLONE_NEWNET |
>> +                           CLONE_NEWCGROUP)))) {
>>               get_nsproxy(old_ns);
>>               return 0;
>>       }
>> @@ -165,6 +178,8 @@ void free_nsproxy(struct nsproxy *ns)
>>               put_ipc_ns(ns->ipc_ns);
>>       if (ns->pid_ns_for_children)
>>               put_pid_ns(ns->pid_ns_for_children);
>> +     if (ns->cgroup_ns)
>> +             put_cgroup_ns(ns->cgroup_ns);
>>       put_net(ns->net_ns);
>>       kmem_cache_free(nsproxy_cachep, ns);
>>  }
>> @@ -180,7 +195,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
>>       int err = 0;
>>
>>       if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
>> -                            CLONE_NEWNET | CLONE_NEWPID)))
>> +                            CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP)))
>>               return 0;
>>
>>       user_ns = new_cred ? new_cred->user_ns : current_user_ns();
>> --
>> 2.1.0.rc2.206.gedb03e5
>>
>> _______________________________________________
>> Containers mailing list
>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linuxfoundation.org/mailman/listinfo/containers


Thanks for the review!
-- 
Aditya

^ permalink raw reply

* Re: [PATCH] staging: android: binder: move to the "real" part of the kernel
From: Dan Carpenter @ 2014-10-24  5:00 UTC (permalink / raw)
  To: Rom Lemarchand
  Cc: devel@driverdev.osuosl.org, Anup Patel, Greg Kroah-Hartman, LKML,
	Arve Hjønnevåg, John Stultz, linux-api,
	Rebecca Schultz Zavin, Santosh Shilimkar, Sumit Semwal,
	Christoffer Dall
In-Reply-To: <CAABpnA8Oe9_8r9hyAQADPwNh4Zn63tUQAUD7f0WSYh1xcdtz6g@mail.gmail.com>

On Tue, Oct 21, 2014 at 08:10:32PM -0700, Rom Lemarchand wrote:
> We would also like to make kernel-team@android.com the maintainer of
> the whole android directory.

Mailing lists can't be a maintainer, but you could add it as a private
list in the MAINTAINERS file.

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH 1/4] vfs: Prepare for adding a new preadv/pwritev with user flags.
From: Christoph Hellwig @ 2014-10-24  9:35 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: linux-kernel, Christoph Hellwig, linux-fsdevel, linux-aio,
	Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
	Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk
In-Reply-To: <9306856a3ea74bcc41523430d74ef4be93d8f436.1413923420.git.milosz@adfin.com>

On Tue, Oct 21, 2014 at 04:46:56PM -0400, Milosz Tanski wrote:
> Plumbing the flags argument through the vfs code so they can be passed down to
> __generic_file_(read/write)_iter function that do the acctual work.
> 
> Signed-off-by: Milosz Tanski <milosz@adfin.com>


The last hunk doesn't belong into this patch, both logically and because
the code won't actually compile after applying just this patch.

Otherwise looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


This is the culprit:

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 14b4642..cb7f530 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1465,7 +1465,7 @@ static void shrink_readahead_size_eio(struct file *filp,
>   * of the logic when it comes to error handling etc.
>   */
>  static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
> -		struct iov_iter *iter, ssize_t written)
> +		struct iov_iter *iter, ssize_t written, int flags)
>  {
>  	struct address_space *mapping = filp->f_mapping;
>  	struct inode *inode = mapping->host;

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply

* Re: [PATCH 2/4] vfs: Define new syscalls preadv2,pwritev2
From: Christoph Hellwig @ 2014-10-24  9:36 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: linux-kernel, Christoph Hellwig, linux-fsdevel, linux-aio,
	Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
	Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk
In-Reply-To: <1613303dfaa91e6ff09cdd7860e0316765663ba6.1413923420.git.milosz@adfin.com>

> diff --git a/mm/filemap.c b/mm/filemap.c
> index cb7f530..45964c8 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1735,7 +1735,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  		}
>  	}
>  
> -	retval = do_generic_file_read(file, ppos, iter, retval);
> +	retval = do_generic_file_read(file, ppos, iter, retval, iocb->ki_rwflags);

I think this belongs into the last patch, together with the last hunk
from the first patch.

Otherwise looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply

* Re: [PATCH 3/4] vfs: Export new vector IO syscalls (with flags) to userland
From: Christoph Hellwig @ 2014-10-24  9:38 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-aio-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Volker Lendecke,
	Tejun Heo, Jeff Moyer, Theodore Ts'o, Al Viro,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk
In-Reply-To: <e43b43c1e7b14b454b6425a650cd45cbb75fd325.1413923420.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>

On Tue, Oct 21, 2014 at 04:46:58PM -0400, Milosz Tanski wrote:
> This is only for x86_64 and x86. Will add other arch later.

Looks good, but I think the patch subject should read something like:

x86: wite up preadv2 and pwritev2

in which case you might also get away with an empty description.

Please also Cc linux-arch to get the attention of other architecture
maintainers for the next repost.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

^ permalink raw reply

* Re: [PATCH 4/4] vfs: RWF_NONBLOCK flag for preadv2
From: Christoph Hellwig @ 2014-10-24  9:40 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-aio-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Volker Lendecke,
	Tejun Heo, Jeff Moyer, Theodore Ts'o, Al Viro,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk
In-Reply-To: <ba8d704a12c7ceffce610fc438a0af11a3422002.1413923420.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>

On Tue, Oct 21, 2014 at 04:46:59PM -0400, Milosz Tanski wrote:
> Filesystems that generic_file_read_iter will not be allowed to perform
> non-blocking reads. This only will read data if it's in the page cache and if
> there is no page error (causing a re-read).

I can't quite parse the first sentence, can you try to reword it?

> Christoph Hellwig wrote the filesystem specify code (cifs, ofs, shm, xfs).

s/specify/specific/.  It's so trivial that I don't think it matters.

^ permalink raw reply

* Re: [PATCH 0/4] vfs: Non-blockling buffered fs read (page cache only)
From: Christoph Hellwig @ 2014-10-24  9:41 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: linux-kernel, Christoph Hellwig, linux-fsdevel, linux-aio,
	Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
	Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk
In-Reply-To: <cover.1413923420.git.milosz@adfin.com>

Any reason you didn't include my patch to support XFS, and the RWF_DSYNC
one?

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox