Linux userland API discussions
 help / color / mirror / Atom feed
* [PATCH V6] Allow compaction of unevictable pages
From: Eric B Munson @ 2015-03-19 13:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Rik van Riel, Michal Hocko, linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of is is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today, does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavoir with respect to the unevictable lru.  Users that demand no page
faults after a page is present can set compact_unevictable to 0 and
users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable sysctl is 0, I cannot allocate hugepages after
fragmenting memory.  When the value is set to 1, allocations succeed.

Signed-off-by: Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
Cc: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>
Cc: David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: linux-rt-users-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Cc: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
Changes from V5:
* Default sysctl value is now 1
* Used more descriptive sysctl name
* documentation calss out default value

 Documentation/sysctl/vm.txt |   11 +++++++++++
 include/linux/compaction.h  |    1 +
 kernel/sysctl.c             |    9 +++++++++
 mm/compaction.c             |    7 +++++++
 4 files changed, 28 insertions(+)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 902b457..9832ec5 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -21,6 +21,7 @@ Currently, these files are in /proc/sys/vm:
 - admin_reserve_kbytes
 - block_dump
 - compact_memory
+- compact_unevictable_allowed
 - dirty_background_bytes
 - dirty_background_ratio
 - dirty_bytes
@@ -106,6 +107,16 @@ huge pages although processes will also directly compact memory as required.
 
 ==============================================================
 
+compact_unevictable_allowed
+
+Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
+allowed to examine the unevictable lru (mlocked pages) for pages to compact.
+This should be used on systems where stalls for minor page faults are an
+acceptable trade for large contiguous free memory.  Set to 0 to prevent
+compaction from moving pages that are unevictable.  Default value is 1.
+
+==============================================================
+
 dirty_background_bytes
 
 Contains the amount of dirty memory at which the background kernel
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a014559..aa8f61c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -34,6 +34,7 @@ extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 extern int sysctl_extfrag_threshold;
 extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
+extern int sysctl_compact_unevictable_allowed;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 88ea2d6..2f6c880 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1313,6 +1313,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &min_extfrag_threshold,
 		.extra2		= &max_extfrag_threshold,
 	},
+	{
+		.procname	= "compact_unevictable_allowed",
+		.data		= &sysctl_compact_unevictable_allowed,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 
 #endif /* CONFIG_COMPACTION */
 	{
diff --git a/mm/compaction.c b/mm/compaction.c
index 8c0d945..ad88a8c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1047,6 +1047,12 @@ typedef enum {
 } isolate_migrate_t;
 
 /*
+ * Allow userspace to control policy on scanning the unevictable LRU for
+ * compactable pages.
+ */
+int sysctl_compact_unevictable_allowed __read_mostly = 1;
+
+/*
  * Isolate all pages that can be migrated from the first suitable block,
  * starting at the block pointed to by the migrate scanner pfn within
  * compact_control.
@@ -1057,6 +1063,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long low_pfn, end_pfn;
 	struct page *page;
 	const isolate_mode_t isolate_mode =
+		(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
 		(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
 
 	/*
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH v3 10/15] serial: stm32-usart: Add STM32 USART Driver
From: Maxime Coquelin @ 2015-03-19 13:55 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Uwe Kleine-König, Andreas Färber, Geert Uytterhoeven,
	Rob Herring, Philipp Zabel, Linus Walleij, Arnd Bergmann,
	Stefan Agner, Peter Meerwald, Paul Bolle, Jonathan Corbet,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Russell King,
	Daniel Lezcano, Thomas Gleixner, Greg Kroah-Hartman, Jiri Slaby,
	Andrew Morton, David S. Miller
In-Reply-To: <CAHp75Vdf6=z1d5+eYJ+QGTko9kkZf-oPfoWaf8FRoTdqcdWxjw@mail.gmail.com>

2015-03-17 18:56 GMT+01:00 Andy Shevchenko <andy.shevchenko@gmail.com>:
> On Tue, Mar 17, 2015 at 7:32 PM, Maxime Coquelin
> <mcoquelin.stm32@gmail.com> wrote:
>> 2015-03-13 15:19 GMT+01:00 Andy Shevchenko <andy.shevchenko@gmail.com>:
>
>>>> +static void stm32_set_termios(struct uart_port *port, struct ktermios *termios,
>>>> +                           struct ktermios *old)
>>>> +{
>>>> +       unsigned int baud;
>>>> +       u32 usardiv, mantissa, fraction;
>>>> +       tcflag_t cflag;
>>>> +       u32 cr1, cr2, cr3;
>>>> +       unsigned long flags;
>>>> +
>>>> +       baud = uart_get_baud_rate(port, termios, old, 0, port->uartclk / 16);
>>>> +       cflag = termios->c_cflag;
>>>> +
>>>> +       spin_lock_irqsave(&port->lock, flags);
>>>> +
>>>> +       /* Stop serial port and reset value */
>>>> +       writel_relaxed(0, port->membase + USART_CR1);
>>>> +
>>>> +       cr1 = USART_CR1_TE | USART_CR1_RE | USART_CR1_UE | USART_CR1_RXNEIE;
>>>> +
>>>> +       if (cflag & CSTOPB)
>>>> +               cr2 = USART_CR2_STOP_2B;
>>>> +
>>>> +       if (cflag & PARENB) {
>>>> +               cr1 |= USART_CR1_PCE;
>>>> +               if ((cflag & CSIZE) == CS8)
>>>> +                       cr1 |= USART_CR1_M;
>>>> +       }
>>>> +
>>>> +       if (cflag & PARODD)
>>>> +               cr1 |= USART_CR1_PS;
>>>> +
>>>> +       if (cflag & CRTSCTS)
>>>> +               cr3 = USART_CR3_RTSE | USART_CR3_CTSE;
>>>> +
>>>> +       usardiv = (port->uartclk * 25) / (baud * 4);
>>>> +       mantissa = (usardiv / 100) << USART_BRR_DIV_M_SHIFT;
>>>> +       fraction = DIV_ROUND_CLOSEST((usardiv % 100) * 16, 100);
>>>> +       if (fraction & ~USART_BRR_DIV_F_MASK) {
>>>> +               fraction = 0;
>>>> +               mantissa += (1 << USART_BRR_DIV_M_SHIFT);
>>>> +       }
>>>
>>> So, it's a fractional divider. right? Could it be then fractional
>>> divider clock in this first place (see
>>> drivers/clk/clk-fractional-divider.c)?
>>>
>>
>> I'm not sure it makes sense to represent this baudrate divider within
>> the UART IP as a clock.
>
> You have it already. I mean it should be fractional divider clock.
>
> stm32port->clk = devm_clk_get(&pdev->dev, NULL);
> +
> +       if (WARN_ON(IS_ERR(stm32port->clk)))
> +               return -EINVAL;
> +
> +       /* ensure that clk rate is correct by enabling the clk */
> +       clk_prepare_enable(stm32port->clk);
> +       stm32port->port.uartclk = clk_get_rate(stm32port->clk);

No I don't have it already.
The clock I get here is coming from outside the IP, so this driver is
a clock consumer.
If I follow your advice, this driver will become both the clock
provider and the only clock consumer of this clock.
I think this is something that should be avoided.

Also, it would require to add another clock in between (a by-16 fixed
divider), because the formula is:
baud = fck / (16 * usart_div)

Finally, representing this as a clock is not really matching the
reality, because we are generating a baud rate, not a frequency.

Really, I would prefer keeping this fractional divider as it is
implemented today.

Kind regards,
Maxime

>
>> What would be the gain?
>
> Remove custom implementation of the calculations. It will be just one
> line to refresh the baud rate. I think we have a lot of duplicates
> here and there in the kernel. It would be nice not to bring another
> one.
>
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply

* Re: [PATCH 1/1] Add virtio-input driver.
From: David Herrmann @ 2015-03-19 12:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: virtio-dev, virtualization, mst, Rusty Russell, open list,
	open list:ABI/API, Dmitry Torokhov
In-Reply-To: <1426756391-26585-2-git-send-email-kraxel@redhat.com>

Hi

(CC: Dmitry)

On Thu, Mar 19, 2015 at 10:13 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> virtio-input is basically evdev-events-over-virtio, so this driver isn't
> much more than reading configuration from config space and forwarding
> incoming events to the linux input layer.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/virtio/Kconfig            |  10 ++
>  drivers/virtio/Makefile           |   1 +
>  drivers/virtio/virtio_input.c     | 313 ++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_input.h |  65 ++++++++
>  5 files changed, 390 insertions(+)
>  create mode 100644 drivers/virtio/virtio_input.c
>  create mode 100644 include/uapi/linux/virtio_input.h
>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index b546da5..cab9f3f 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -48,6 +48,16 @@ config VIRTIO_BALLOON
>
>          If unsure, say M.
>
> +config VIRTIO_INPUT
> +       tristate "Virtio input driver"
> +       depends on VIRTIO
> +       depends on INPUT
> +       ---help---
> +        This driver supports virtio input devices such as
> +        keyboards, mice and tablets.
> +
> +        If unsure, say M.
> +
>   config VIRTIO_MMIO
>         tristate "Platform bus driver for memory mapped virtio devices"
>         depends on HAS_IOMEM
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index d85565b..41e30e3 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
> new file mode 100644
> index 0000000..2d425cf
> --- /dev/null
> +++ b/drivers/virtio/virtio_input.c
> @@ -0,0 +1,313 @@
> +#include <linux/module.h>
> +#include <linux/virtio.h>
> +#include <linux/input.h>
> +
> +#include <uapi/linux/virtio_ids.h>
> +#include <uapi/linux/virtio_input.h>
> +
> +struct virtio_input {
> +       struct virtio_device       *vdev;
> +       struct input_dev           *idev;
> +       char                       name[64];
> +       char                       serial[64];
> +       char                       phys[64];
> +       struct virtqueue           *evt, *sts;
> +       struct virtio_input_event  evts[64];
> +};
> +
> +static ssize_t serial_show(struct device *dev,
> +                          struct device_attribute *attr, char *buf)
> +{
> +       struct input_dev *idev = to_input_dev(dev);
> +       struct virtio_input *vi = input_get_drvdata(idev);
> +       return sprintf(buf, "%s\n", vi->serial);
> +}
> +static DEVICE_ATTR_RO(serial);
> +
> +static struct attribute *dev_attrs[] = {
> +       &dev_attr_serial.attr,
> +       NULL
> +};
> +
> +static umode_t dev_attrs_are_visible(struct kobject *kobj,
> +                                    struct attribute *a, int n)
> +{
> +       struct device *dev = container_of(kobj, struct device, kobj);
> +       struct input_dev *idev = to_input_dev(dev);
> +       struct virtio_input *vi = input_get_drvdata(idev);
> +
> +       if (a == &dev_attr_serial.attr && !strlen(vi->serial))
> +               return 0;
> +
> +       return a->mode;
> +}
> +
> +static struct attribute_group dev_attr_grp = {
> +       .attrs =        dev_attrs,
> +       .is_visible =   dev_attrs_are_visible,
> +};
> +
> +static const struct attribute_group *dev_attr_groups[] = {
> +       &dev_attr_grp,
> +       NULL
> +};
> +
> +static void virtinput_queue_evtbuf(struct virtio_input *vi,
> +                                  struct virtio_input_event *evtbuf)
> +{
> +       struct scatterlist sg[1];
> +
> +       sg_init_one(sg, evtbuf, sizeof(*evtbuf));
> +       virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC);
> +}
> +
> +static void virtinput_recv_events(struct virtqueue *vq)
> +{
> +       struct virtio_input *vi = vq->vdev->priv;
> +       struct virtio_input_event *event;
> +       unsigned int len;
> +
> +       while ((event = virtqueue_get_buf(vi->evt, &len)) != NULL) {
> +               input_event(vi->idev,
> +                           le16_to_cpu(event->type),
> +                           le16_to_cpu(event->code),
> +                           le32_to_cpu(event->value));
> +               virtinput_queue_evtbuf(vi, event);
> +       }
> +       virtqueue_kick(vq);
> +}
> +
> +static int virtinput_send_status(struct virtio_input *vi,
> +                                u16 type, u16 code, s32 value)
> +{
> +       struct virtio_input_event *stsbuf;
> +       struct scatterlist sg[1];
> +
> +       stsbuf = kzalloc(sizeof(*stsbuf), GFP_ATOMIC);
> +       if (!stsbuf)
> +               return -ENOMEM;
> +
> +       stsbuf->type  = cpu_to_le16(type);
> +       stsbuf->code  = cpu_to_le16(code);
> +       stsbuf->value = cpu_to_le32(value);
> +       sg_init_one(sg, stsbuf, sizeof(*stsbuf));
> +       virtqueue_add_outbuf(vi->sts, sg, 1, stsbuf, GFP_ATOMIC);
> +       virtqueue_kick(vi->sts);

GFP_ATOMIC, eww. But everyone does that for input_event() callbacks..
we should fix that for user-space input one day.

> +       return 0;
> +}
> +
> +static void virtinput_recv_status(struct virtqueue *vq)
> +{
> +       struct virtio_input *vi = vq->vdev->priv;
> +       struct virtio_input_event *stsbuf;
> +       unsigned int len;
> +
> +       while ((stsbuf = virtqueue_get_buf(vi->sts, &len)) != NULL)
> +               kfree(stsbuf);
> +       virtqueue_kick(vq);
> +}
> +
> +static int virtinput_status(struct input_dev *idev, unsigned int type,
> +                           unsigned int code, int value)
> +{
> +       struct virtio_input *vi = input_get_drvdata(idev);
> +
> +       return virtinput_send_status(vi, type, code, value);
> +}
> +
> +static size_t virtinput_cfg_select(struct virtio_input *vi,
> +                                  u8 select, u8 subsel)
> +{
> +       u8 size;
> +
> +       virtio_cwrite(vi->vdev, struct virtio_input_config, select, &select);
> +       virtio_cwrite(vi->vdev, struct virtio_input_config, subsel, &subsel);
> +       virtio_cread(vi->vdev, struct virtio_input_config, size, &size);
> +       return size;
> +}
> +
> +static void virtinput_cfg_bits(struct virtio_input *vi, int select, int subsel,
> +                              unsigned long *bits, unsigned int bitcount)
> +{
> +       unsigned int bit;
> +       size_t bytes;
> +       u8 cfg = 0;
> +
> +       bytes = virtinput_cfg_select(vi, select, subsel);
> +       if (!bytes)
> +               return;
> +       if (bitcount > bytes*8)
> +               bitcount = bytes*8;
> +       if (select == VIRTIO_INPUT_CFG_EV_BITS)
> +               set_bit(subsel, vi->idev->evbit);
> +       for (bit = 0; bit < bitcount; bit++) {
> +               if ((bit % 8) == 0)
> +                       virtio_cread(vi->vdev, struct virtio_input_config,
> +                                    u.bitmap[bit/8], &cfg);
> +               if (cfg & (1 << (bit % 8)))
> +                       set_bit(bit, bits);
> +       }
> +}
> +
> +static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
> +{
> +       u32 size, mi, ma, fu, fl;
> +
> +       size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
> +       virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
> +       virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
> +       virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
> +       virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);

abs.resolution is missing. Please add it, we really also need to add
it to uinput one day.

> +       input_set_abs_params(vi->idev, abs,
> +                            le32_to_cpu(mi), le32_to_cpu(ma),
> +                            le32_to_cpu(fu), le32_to_cpu(fl));
> +}
> +
> +static int virtinput_init_vqs(struct virtio_input *vi)
> +{
> +       struct virtqueue *vqs[2];
> +       vq_callback_t *cbs[] = { virtinput_recv_events,
> +                                virtinput_recv_status };
> +       const char *names[] = { "events", "status" };
> +       int i, err, size;
> +
> +       err = vi->vdev->config->find_vqs(vi->vdev, 2, vqs, cbs, names);
> +       if (err)
> +               return err;
> +       vi->evt = vqs[0];
> +       vi->sts = vqs[1];
> +
> +       size = virtqueue_get_vring_size(vi->evt);
> +       if (size > ARRAY_SIZE(vi->evts))
> +               size = ARRAY_SIZE(vi->evts);
> +       for (i = 0; i < size; i++)
> +               virtinput_queue_evtbuf(vi, &vi->evts[i]);
> +
> +       return 0;
> +}
> +
> +static int virtinput_probe(struct virtio_device *vdev)
> +{
> +       struct virtio_input *vi;
> +       size_t size;
> +       int abs, err;
> +
> +       vi = kzalloc(sizeof(*vi), GFP_KERNEL);
> +       if (!vi) {
> +               err = -ENOMEM;
> +               goto out1;
> +       }
> +       vdev->priv = vi;
> +       vi->vdev = vdev;
> +
> +       err = virtinput_init_vqs(vi);
> +       if (err)
> +               goto out2;
> +
> +       vi->idev = input_allocate_device();
> +       if (!vi->idev) {
> +               err = -ENOMEM;
> +               goto out3;
> +       }
> +       input_set_drvdata(vi->idev, vi);
> +
> +       size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_NAME, 0);
> +       virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
> +                          vi->name, min(size, sizeof(vi->name)));
> +       size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_SERIAL, 0);
> +       virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
> +                          vi->serial, min(size, sizeof(vi->serial)));
> +       snprintf(vi->phys, sizeof(vi->phys),
> +                "virtio%d/input0", vdev->index);
> +
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_PROP_BITS, 0,
> +                          vi->idev->propbit, INPUT_PROP_CNT);
> +       size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REP);
> +       if (size)
> +               set_bit(EV_REP, vi->idev->evbit);
> +
> +       vi->idev->name = vi->name;
> +       vi->idev->phys = vi->phys;

Can you set vi->idev->uniq to the virtio-bus path?

> +       vi->idev->id.bustype = BUS_VIRTUAL;
> +       vi->idev->id.vendor  = 0x0001;
> +       vi->idev->id.product = 0x0001;
> +       vi->idev->id.version = 0x0100;

Please don't hardcode those. All user-space based interaction with
input-devices relies on those IDs. Can we retrieve it from the host
just like the name?

> +       vi->idev->dev.parent = &vdev->dev;
> +       vi->idev->dev.groups = dev_attr_groups;
> +       vi->idev->event = virtinput_status;
> +
> +       /* device -> kernel */
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_KEY,
> +                          vi->idev->keybit, KEY_CNT);
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REL,
> +                          vi->idev->relbit, REL_CNT);
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_ABS,
> +                          vi->idev->absbit, ABS_CNT);
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_MSC,
> +                          vi->idev->mscbit, MSC_CNT);
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SW,
> +                          vi->idev->swbit,  SW_CNT);
> +
> +       /* kernel -> device */
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_LED,
> +                          vi->idev->ledbit, LED_CNT);
> +       virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SND,
> +                          vi->idev->sndbit, SND_CNT);
> +
> +       if (test_bit(EV_ABS, vi->idev->evbit)) {
> +               for (abs = 0; abs < ABS_CNT; abs++) {
> +                       if (!test_bit(abs, vi->idev->absbit))
> +                               continue;
> +                       virtinput_cfg_abs(vi, abs);
> +               }
> +       }
> +
> +       err = input_register_device(vi->idev);
> +       if (err)
> +               goto out4;
> +
> +       return 0;
> +
> +out4:
> +       input_free_device(vi->idev);
> +out3:
> +       vdev->config->del_vqs(vdev);
> +out2:
> +       kfree(vi);
> +out1:
> +       return err;
> +}
> +
> +static void virtinput_remove(struct virtio_device *vdev)
> +{
> +       struct virtio_input *vi = vdev->priv;
> +
> +       input_unregister_device(vi->idev);
> +       vdev->config->reset(vdev);
> +       vdev->config->del_vqs(vdev);
> +       kfree(vi);
> +}
> +
> +static unsigned int features[] = {
> +};
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_INPUT, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> +static struct virtio_driver virtio_input_driver = {
> +       .driver.name         = KBUILD_MODNAME,
> +       .driver.owner        = THIS_MODULE,
> +       .feature_table       = features,
> +       .feature_table_size  = ARRAY_SIZE(features),
> +       .id_table            = id_table,
> +       .probe               = virtinput_probe,
> +       .remove              = virtinput_remove,
> +};
> +
> +module_virtio_driver(virtio_input_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Virtio input device driver");
> +MODULE_AUTHOR("Gerd Hoffmann <kraxel@redhat.com>");
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 284fc3a..5f60aa4 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -39,5 +39,6 @@
>  #define VIRTIO_ID_9P           9 /* 9p virtio console */
>  #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
>  #define VIRTIO_ID_CAIF        12 /* Virtio caif */
> +#define VIRTIO_ID_INPUT        18 /* virtio input */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_input.h b/include/uapi/linux/virtio_input.h
> new file mode 100644
> index 0000000..190d04a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_input.h
> @@ -0,0 +1,65 @@
> +#ifndef _LINUX_VIRTIO_INPUT_H
> +#define _LINUX_VIRTIO_INPUT_H
> +/* This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE. */
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +
> +enum virtio_input_config_select {
> +       VIRTIO_INPUT_CFG_UNSET      = 0x00,
> +       VIRTIO_INPUT_CFG_ID_NAME    = 0x01,
> +       VIRTIO_INPUT_CFG_ID_SERIAL  = 0x02,
> +       VIRTIO_INPUT_CFG_PROP_BITS  = 0x10,
> +       VIRTIO_INPUT_CFG_EV_BITS    = 0x11,
> +       VIRTIO_INPUT_CFG_ABS_INFO   = 0x12,
> +};
> +
> +struct virtio_input_absinfo {
> +       __le32  min;
> +       __le32  max;
> +       __le32  fuzz;
> +       __le32  flat;
> +};
> +
> +struct virtio_input_config {
> +       __u8    select;
> +       __u8    subsel;
> +       __u8    size;
> +       __u8    reserved;
> +       union {
> +               char string[128];
> +               __u8 bitmap[128];
> +               struct virtio_input_absinfo abs;
> +       } u;
> +};
> +
> +struct virtio_input_event {
> +       __le16 type;
> +       __le16 code;
> +       __le32 value;
> +};
> +
> +#endif /* _LINUX_VIRTIO_INPUT_H */
> --

The input-parts look good to me (apart from my comments). No idea how
virtio exactly works, so I'll leave that to Rusty.

I put Dmitry on CC, he might have some more valuable input on the input-parts.

Thanks
David

^ permalink raw reply

* Re: [PATCH 1/1] Add virtio-input driver.
From: Michael S. Tsirkin @ 2015-03-19 12:27 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Rusty Russell, open list, open list:ABI/API
In-Reply-To: <1426756391-26585-2-git-send-email-kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Mar 19, 2015 at 10:13:11AM +0100, Gerd Hoffmann wrote:
> virtio-input is basically evdev-events-over-virtio, so this driver isn't
> much more than reading configuration from config space and forwarding
> incoming events to the linux input layer.
> 
> Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

What worries me is how well are these events specified.
Will we be able to write drivers for non-linux guests?
Not sure where to discuss this - this seems like an inappropriate
thread.

Comments on linux driver below.

> ---
>  drivers/virtio/Kconfig            |  10 ++
>  drivers/virtio/Makefile           |   1 +
>  drivers/virtio/virtio_input.c     | 313 ++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_input.h |  65 ++++++++
>  5 files changed, 390 insertions(+)
>  create mode 100644 drivers/virtio/virtio_input.c
>  create mode 100644 include/uapi/linux/virtio_input.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index b546da5..cab9f3f 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -48,6 +48,16 @@ config VIRTIO_BALLOON
>  
>  	 If unsure, say M.
>  
> +config VIRTIO_INPUT
> +	tristate "Virtio input driver"
> +	depends on VIRTIO
> +	depends on INPUT
> +	---help---
> +	 This driver supports virtio input devices such as
> +	 keyboards, mice and tablets.
> +
> +	 If unsure, say M.
> +
>   config VIRTIO_MMIO
>  	tristate "Platform bus driver for memory mapped virtio devices"
>  	depends on HAS_IOMEM
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index d85565b..41e30e3 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
> new file mode 100644
> index 0000000..2d425cf
> --- /dev/null
> +++ b/drivers/virtio/virtio_input.c
> @@ -0,0 +1,313 @@
> +#include <linux/module.h>
> +#include <linux/virtio.h>
> +#include <linux/input.h>
> +
> +#include <uapi/linux/virtio_ids.h>
> +#include <uapi/linux/virtio_input.h>
> +
> +struct virtio_input {
> +	struct virtio_device       *vdev;
> +	struct input_dev           *idev;
> +	char                       name[64];
> +	char                       serial[64];
> +	char                       phys[64];
> +	struct virtqueue           *evt, *sts;
> +	struct virtio_input_event  evts[64];
> +};
> +
> +static ssize_t serial_show(struct device *dev,
> +			   struct device_attribute *attr, char *buf)
> +{
> +	struct input_dev *idev = to_input_dev(dev);
> +	struct virtio_input *vi = input_get_drvdata(idev);
> +	return sprintf(buf, "%s\n", vi->serial);
> +}
> +static DEVICE_ATTR_RO(serial);
> +
> +static struct attribute *dev_attrs[] = {
> +	&dev_attr_serial.attr,
> +	NULL
> +};
> +
> +static umode_t dev_attrs_are_visible(struct kobject *kobj,
> +				     struct attribute *a, int n)
> +{
> +	struct device *dev = container_of(kobj, struct device, kobj);
> +	struct input_dev *idev = to_input_dev(dev);
> +	struct virtio_input *vi = input_get_drvdata(idev);
> +
> +	if (a == &dev_attr_serial.attr && !strlen(vi->serial))
> +		return 0;
> +
> +	return a->mode;
> +}
> +
> +static struct attribute_group dev_attr_grp = {
> +	.attrs =	dev_attrs,
> +	.is_visible =	dev_attrs_are_visible,
> +};
> +
> +static const struct attribute_group *dev_attr_groups[] = {
> +	&dev_attr_grp,
> +	NULL
> +};
> +
> +static void virtinput_queue_evtbuf(struct virtio_input *vi,
> +				   struct virtio_input_event *evtbuf)
> +{
> +	struct scatterlist sg[1];
> +
> +	sg_init_one(sg, evtbuf, sizeof(*evtbuf));
> +	virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC);
> +}
> +
> +static void virtinput_recv_events(struct virtqueue *vq)
> +{
> +	struct virtio_input *vi = vq->vdev->priv;
> +	struct virtio_input_event *event;
> +	unsigned int len;
> +
> +	while ((event = virtqueue_get_buf(vi->evt, &len)) != NULL) {
> +		input_event(vi->idev,
> +			    le16_to_cpu(event->type),
> +			    le16_to_cpu(event->code),
> +			    le32_to_cpu(event->value));
> +		virtinput_queue_evtbuf(vi, event);
> +	}
> +	virtqueue_kick(vq);
> +}
> +
> +static int virtinput_send_status(struct virtio_input *vi,
> +				 u16 type, u16 code, s32 value)
> +{
> +	struct virtio_input_event *stsbuf;
> +	struct scatterlist sg[1];
> +
> +	stsbuf = kzalloc(sizeof(*stsbuf), GFP_ATOMIC);
> +	if (!stsbuf)
> +		return -ENOMEM;

Does this return an error to userspace?
If so it's not a good idea I think, GFP_ATOMIC failures are
transient conditions and should not be reported
to userspace.
Can use GFP_KERNEL here?

> +
> +	stsbuf->type  = cpu_to_le16(type);
> +	stsbuf->code  = cpu_to_le16(code);
> +	stsbuf->value = cpu_to_le32(value);
> +	sg_init_one(sg, stsbuf, sizeof(*stsbuf));
> +	virtqueue_add_outbuf(vi->sts, sg, 1, stsbuf, GFP_ATOMIC);

This can fail if queue is full. What prevents this from happening?

> +	virtqueue_kick(vi->sts);

Also what prevents multiple virtinput_send_status calls
from racing with each other? Is there locking at a higher level?

> +	return 0;
> +}
> +
> +static void virtinput_recv_status(struct virtqueue *vq)
> +{
> +	struct virtio_input *vi = vq->vdev->priv;
> +	struct virtio_input_event *stsbuf;
> +	unsigned int len;
> +
> +	while ((stsbuf = virtqueue_get_buf(vi->sts, &len)) != NULL)

looks like this get_buf can race with add above.
Need some locking.

Also pls avoid != NULL, removing it makes code less verbose.

> +		kfree(stsbuf);
> +	virtqueue_kick(vq);

Why are you kicking here?

> +}
> +
> +static int virtinput_status(struct input_dev *idev, unsigned int type,
> +			    unsigned int code, int value)
> +{
> +	struct virtio_input *vi = input_get_drvdata(idev);
> +
> +	return virtinput_send_status(vi, type, code, value);
> +}
> +
> +static size_t virtinput_cfg_select(struct virtio_input *vi,
> +				   u8 select, u8 subsel)
> +{
> +	u8 size;
> +
> +	virtio_cwrite(vi->vdev, struct virtio_input_config, select, &select);
> +	virtio_cwrite(vi->vdev, struct virtio_input_config, subsel, &subsel);
> +	virtio_cread(vi->vdev, struct virtio_input_config, size, &size);
> +	return size;
> +}
> +
> +static void virtinput_cfg_bits(struct virtio_input *vi, int select, int subsel,
> +			       unsigned long *bits, unsigned int bitcount)
> +{
> +	unsigned int bit;
> +	size_t bytes;
> +	u8 cfg = 0;
> +
> +	bytes = virtinput_cfg_select(vi, select, subsel);
> +	if (!bytes)
> +		return;
> +	if (bitcount > bytes*8)
> +		bitcount = bytes*8;
> +	if (select == VIRTIO_INPUT_CFG_EV_BITS)
> +		set_bit(subsel, vi->idev->evbit);
> +	for (bit = 0; bit < bitcount; bit++) {
> +		if ((bit % 8) == 0)
> +			virtio_cread(vi->vdev, struct virtio_input_config,
> +				     u.bitmap[bit/8], &cfg);

coding style violations above. you need spaces around ops like / and *.
Please run checkpatch.pl

> +		if (cfg & (1 << (bit % 8)))
> +			set_bit(bit, bits);

what if not set? does something clear the mask?


> +	}

doesn't above just implement bitmap_copy or bitmap_or?  Will it hurt to
just do virtio_cread_bytes into a temporary buffer and then invoke
bitmap ops? Looks like the buffer is at most 256 bytes:
too large to be on stack, but you can allocate it at probe time.


> +}
> +
> +static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
> +{
> +	u32 size, mi, ma, fu, fl;
> +
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);

you read le field into u32 value.
Please run sparse on this code. you will get a ton
of warnings. Same error appears elsewhere.

> +	input_set_abs_params(vi->idev, abs,
> +			     le32_to_cpu(mi), le32_to_cpu(ma),
> +			     le32_to_cpu(fu), le32_to_cpu(fl));
> +}
> +
> +static int virtinput_init_vqs(struct virtio_input *vi)
> +{
> +	struct virtqueue *vqs[2];
> +	vq_callback_t *cbs[] = { virtinput_recv_events,
> +				 virtinput_recv_status };
> +	const char *names[] = { "events", "status" };
> +	int i, err, size;
> +
> +	err = vi->vdev->config->find_vqs(vi->vdev, 2, vqs, cbs, names);
> +	if (err)
> +		return err;
> +	vi->evt = vqs[0];
> +	vi->sts = vqs[1];
> +
> +	size = virtqueue_get_vring_size(vi->evt);
> +	if (size > ARRAY_SIZE(vi->evts))
> +		size = ARRAY_SIZE(vi->evts);
> +	for (i = 0; i < size; i++)
> +		virtinput_queue_evtbuf(vi, &vi->evts[i]);
> +

you add a bunch of bufs but never kick apparently.
you really must kick otherwise device might not
see the buffers.

> +	return 0;
> +}
> +
> +static int virtinput_probe(struct virtio_device *vdev)
> +{
> +	struct virtio_input *vi;
> +	size_t size;
> +	int abs, err;

How about checking VERSION_1 and bailing out of not there?

> +
> +	vi = kzalloc(sizeof(*vi), GFP_KERNEL);
> +	if (!vi) {
> +		err = -ENOMEM;
> +		goto out1;
> +	}
> +	vdev->priv = vi;
> +	vi->vdev = vdev;
> +
> +	err = virtinput_init_vqs(vi);
> +	if (err)
> +		goto out2;
> +
> +	vi->idev = input_allocate_device();
> +	if (!vi->idev) {
> +		err = -ENOMEM;
> +		goto out3;
> +	}
> +	input_set_drvdata(vi->idev, vi);
> +
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_NAME, 0);
> +	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
> +			   vi->name, min(size, sizeof(vi->name)));
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_SERIAL, 0);
> +	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
> +			   vi->serial, min(size, sizeof(vi->serial)));
> +	snprintf(vi->phys, sizeof(vi->phys),
> +		 "virtio%d/input0", vdev->index);
> +
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_PROP_BITS, 0,
> +			   vi->idev->propbit, INPUT_PROP_CNT);
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REP);
> +	if (size)
> +		set_bit(EV_REP, vi->idev->evbit);
> +
> +	vi->idev->name = vi->name;
> +	vi->idev->phys = vi->phys;
> +	vi->idev->id.bustype = BUS_VIRTUAL;
> +	vi->idev->id.vendor  = 0x0001;
> +	vi->idev->id.product = 0x0001;
> +	vi->idev->id.version = 0x0100;

Add comments explaining why these #s make sense?


> +	vi->idev->dev.parent = &vdev->dev;
> +	vi->idev->dev.groups = dev_attr_groups;
> +	vi->idev->event = virtinput_status;
> +
> +	/* device -> kernel */
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_KEY,
> +			   vi->idev->keybit, KEY_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REL,
> +			   vi->idev->relbit, REL_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_ABS,
> +			   vi->idev->absbit, ABS_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_MSC,
> +			   vi->idev->mscbit, MSC_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SW,
> +			   vi->idev->swbit,  SW_CNT);
> +
> +	/* kernel -> device */
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_LED,
> +			   vi->idev->ledbit, LED_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SND,
> +			   vi->idev->sndbit, SND_CNT);
> +
> +	if (test_bit(EV_ABS, vi->idev->evbit)) {
> +		for (abs = 0; abs < ABS_CNT; abs++) {
> +			if (!test_bit(abs, vi->idev->absbit))
> +				continue;
> +			virtinput_cfg_abs(vi, abs);
> +		}
> +	}
> +
> +	err = input_register_device(vi->idev);

Once you do this, virtinput_status can get called,
and that will kick, correct?
If so, you must call device_ready before this.

> +	if (err)
> +		goto out4;
> +
> +	return 0;
> +
> +out4:
> +	input_free_device(vi->idev);
> +out3:
> +	vdev->config->del_vqs(vdev);
> +out2:
> +	kfree(vi);
> +out1:
> +	return err;

free on error is out of order with initialization.
Might lead to leaks or other bugs.
Also - can you name labels something sensible pls?
out is usually for exiting on success too...
E.g. out4 -> err_register etc.


> +}
> +
> +static void virtinput_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_input *vi = vdev->priv;
> +
> +	input_unregister_device(vi->idev);
> +	vdev->config->reset(vdev);

You don't really need a reset if you just to del_vqs.
People do this if they want to prevent interrupts
without deleting vqs.

> +	vdev->config->del_vqs(vdev);
> +	kfree(vi);

free_device seems to be missing?

> +}
> +
> +static unsigned int features[] = {
> +};
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_INPUT, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> +static struct virtio_driver virtio_input_driver = {
> +	.driver.name         = KBUILD_MODNAME,
> +	.driver.owner        = THIS_MODULE,
> +	.feature_table       = features,
> +	.feature_table_size  = ARRAY_SIZE(features),
> +	.id_table            = id_table,
> +	.probe               = virtinput_probe,
> +	.remove              = virtinput_remove,
> +};
> +
> +module_virtio_driver(virtio_input_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Virtio input device driver");
> +MODULE_AUTHOR("Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>");
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 284fc3a..5f60aa4 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -39,5 +39,6 @@
>  #define VIRTIO_ID_9P		9 /* 9p virtio console */
>  #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
>  #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
> +#define VIRTIO_ID_INPUT        18 /* virtio input */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_input.h b/include/uapi/linux/virtio_input.h
> new file mode 100644
> index 0000000..190d04a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_input.h
> @@ -0,0 +1,65 @@
> +#ifndef _LINUX_VIRTIO_INPUT_H
> +#define _LINUX_VIRTIO_INPUT_H
> +/* This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE. */
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +
> +enum virtio_input_config_select {
> +	VIRTIO_INPUT_CFG_UNSET      = 0x00,
> +	VIRTIO_INPUT_CFG_ID_NAME    = 0x01,
> +	VIRTIO_INPUT_CFG_ID_SERIAL  = 0x02,
> +	VIRTIO_INPUT_CFG_PROP_BITS  = 0x10,
> +	VIRTIO_INPUT_CFG_EV_BITS    = 0x11,
> +	VIRTIO_INPUT_CFG_ABS_INFO   = 0x12,
> +};
> +
> +struct virtio_input_absinfo {
> +	__le32  min;
> +	__le32  max;
> +	__le32  fuzz;
> +	__le32  flat;
> +};
> +
> +struct virtio_input_config {
> +	__u8    select;
> +	__u8    subsel;
> +	__u8    size;
> +	__u8    reserved;
> +	union {
> +		char string[128];
> +		__u8 bitmap[128];
> +		struct virtio_input_absinfo abs;
> +	} u;
> +};
> +
> +struct virtio_input_event {
> +	__le16 type;
> +	__le16 code;
> +	__le32 value;
> +};
> +
> +#endif /* _LINUX_VIRTIO_INPUT_H */
> -- 
> 1.8.3.1

^ permalink raw reply

* Re: [PATCH v4 00/14] Add kdbus implementation
From: David Herrmann @ 2015-03-19 11:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Daniel Mack,
	Djalal Harouni
In-Reply-To: <CALCETrVWbz7YudNQXQD_2PjC1HR0P0cB_1ea8NiYoQPDfQxERg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi

On Wed, Mar 18, 2015 at 7:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Wed, Mar 18, 2015 at 6:54 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
[...]
>> This program sends unicast messages on kdbus and UDS, exactly the same
>> number of times with the same 8kb payload. No parsing, no marshaling
>> is done, just plain message passing. The interesting spikes are
>> sys_read(), sys_write() and the 3 kdbus sys_ioctl(). Everything else
>> should be ignored.
>>
>> sys_read() and sys_ioctl(KDBUS_CMD_RECV) are about the same. But note
>> that we don't copy any payload in RECV, so it scales O(1) compared to
>> message-size.
>>
>> sys_write() is about 3x faster than sys_ioctl(KDBUS_CMD_WRITE).
>
> Is that factor of 3 for 8 kb payloads?  If so, I expect it's a factor
> of much worse than 3 for small payloads.

Yes, factor of 3x for 8kb payloads. ~3.8x for 64byte payloads (see the
second flamegraph I linked, which contains data for 64byte payloads
(which is basically nothing)).

>>>  - The time it takes to connect
>>
>> No idea, never measured it. Why is it of interest?
>
> Gah, sorry, bad terminology.  I mean the time it takes to send a
> message to a receiver that you haven't sent to before.

Cold message transactions are horribly slow for both, kdbus and UDS,
and the performance varies heavily (factor of 10x). I haven't figured
out whether it's icache/dcache misses, cold branch predictor, process
mem faults, scheduler, whatever..

What I can say, is the kdbus paths are more expensive, in both LOC and
execution time. We might be able to improve the cold-transaction
performance with _unlikely_() annotations, shortcuts, etc. But I want
much more benchmark data before I try to outsmart the compiler. It
works reasonably well right now.

Note that from a algorithmic view, there's no difference between the
first transaction and a following transaction. All relevant accesses
are O(1).

(Actually looking at the numbers again, worst-case vs. average-case in
message transaction is exactly 10x for both, UDS and kdbus. Skipping
the first couple, I get <2x. std-dev is roughly 2%)

> (The kdbus terminology is weird.  You don't send to "endpoints", you
> don't "connect" to other participants, and it's not even clear to me
> what a participant in the bus is called.)

A participant is called a "connection" or "peer" (I prefer 'peer'). It
"connects" to a bus via an endpoint of the bus. Endpoints are
file-system entries and can be shared, and usually are.
Unlike binder, kdbus does not know peer-to-peer links. That is, there
is never (not even a temporary) link between only two peers. Messages
are always sent to the bus, and the bus makes sure only the addressed
recipients will get the message.

>>
>>> I'm also interested in whether the current design is actually amenable
>>> to good performance.  I'm told that it is, but ISTM there's a lot of
>>> heavyweight stuff going on with each send operation that will be hard
>>> to remove.
>>
>> I disagree. What heavyweight stuff is going on?
>
> At least metadata generation, metadata free, and policy db checks seem
> expensive.  It could be worth running a bunch of copies of your
> benchmark on different cores and seeing how it scales.

metadata handling is local to the connection that sends the message.
It does not affect the overall performance of other bus operations in
parallel. Furthermore, it's way faster than collecting the "same" data
via /proc, so I don't think it slows down the overall transaction at
all. If a receiver doesn't want metadata, it should not request it (by
setting the receiver-metadata-mask). If a sender doesn't like the
overhead, it should not send the metadata (by setting the
sender-metadata-mask). Only if both peers set the metadata mask, it
will be transmitted.

The policy-db does indeed look like a bottleneck. Until v2 we used to
have a policy-cache, which I ripped out as it didn't meet our
expectations. There are plans to rewrite it, but it's low-priority.
Thing is, policy-setup is bus-privileged. As long as it's done in a
sane manner (keeping the entries per name minimal), the hash-table
based name-lookup gives suitable performance. Only if the number of
entries per name rises, it gets problematic due to O(n)
list-traversal. But even that could be optimized without a policy
cache, by merging matching entries (see kdbus_policy_db_entry_access).
Furthermore, the policy-db is skipped for privileged peers or if both,
sender and recipient, trust each other (eg., have the same
endpoint+uid). Thus, if you have a trusted transaction, the policy-db
is skipped, anyway.

One real bottleneck I see is the name-registry rwlock.
Acquiring/releasing names is still a heavy operation, as it blocks the
whole bus due to acquiring the write-lock. Again, I have plans to
optimize it (srcu would be an idea, syncing on name-acquire/release),
but it's an implementation detail.

Thanks
David

^ permalink raw reply

* Re: [PATCH 3/3] mm: idle memory tracking
From: Cyrill Gorcunov @ 2015-03-19 10:45 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Greg Thelen,
	Michel Lespinasse, David Rientjes, Pavel Emelyanov,
	Jonathan Corbet, linux-api, linux-doc, linux-mm, linux-kernel
In-Reply-To: <20150319104103.GA12162@esperanza>

On Thu, Mar 19, 2015 at 01:41:03PM +0300, Vladimir Davydov wrote:
> > 
> > Vladimir, might we need get_online_mems/put_online_mems here,
> > or if node gets offline this wont be a problem? (Asking
> > because i don't know).
> 
> I only need to dereference page structs corresponding to the node here,
> and page structs are not freed when the node gets offline AFAICS, so I
> guess it must be safe.

OK, thanks for info!

^ permalink raw reply

* Re: [PATCH 3/3] mm: idle memory tracking
From: Vladimir Davydov @ 2015-03-19 10:41 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Greg Thelen,
	Michel Lespinasse, David Rientjes, Pavel Emelyanov,
	Jonathan Corbet, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150319101205.GC27066@moon>

On Thu, Mar 19, 2015 at 01:12:05PM +0300, Cyrill Gorcunov wrote:
> On Wed, Mar 18, 2015 at 11:44:36PM +0300, Vladimir Davydov wrote:
> > +static void set_mem_idle(void)
> > +{
> > +	int nid;
> > +
> > +	for_each_online_node(nid)
> > +		set_mem_idle_node(nid);
> > +}
> 
> Vladimir, might we need get_online_mems/put_online_mems here,
> or if node gets offline this wont be a problem? (Asking
> because i don't know).

I only need to dereference page structs corresponding to the node here,
and page structs are not freed when the node gets offline AFAICS, so I
guess it must be safe.

Thanks,
Vladimir

^ permalink raw reply

* Re: [PATCH v2] add support for Freescale's MMA8653FC 10 bit accelerometer
From: Bastien Nocera @ 2015-03-19 10:22 UTC (permalink / raw)
  To: Martin Kepplinger
  Cc: Alexander Stein, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Martin Kepplinger,
	Christoph Muellner
In-Reply-To: <5509C3E6.8070000-1KBjaw7Xf1+zQB+pC5nmwQ@public.gmane.org>

On Wed, 2015-03-18 at 19:28 +0100, Martin Kepplinger wrote:
> Am 2015-03-18 um 19:05 schrieb Bastien Nocera:
> > On Wed, 2015-03-18 at 19:02 +0100, Martin Kepplinger wrote:
> > > Am 2015-03-18 um 17:59 schrieb Bastien Nocera:
> > > > On Wed, 2015-03-18 at 17:42 +0100, Martin Kepplinger wrote:
> > > > > 
> > > > <snip>
> > > > > It could have gone to drivers/iio/accel if it would use an 
> > > > > iio   interface, which would make more sense, you are right, 
> > > > > but I 
> > > > > simply  don't have the time to merge it in to iio.
> > > > > 
> > > > > It doesn't use an input interface either but I don't see a 
> > > > > good   place for an accelerometer that uses sysfs only.
> > > > > 
> > > > > It works well, is a relatively recent chip and a clean 
> > > > > dirver.  But  this is all I can provide.
> > > > 
> > > > As a person who works on the user-space interaction of those 
> > > > with   desktops [1]: Urgh.
> > > > 
> > > > I already have 3 (probably 4) types of accelerometers to 
> > > > contend  with,  I'm not fond of adding yet another type.
> > > > 
> > > > Is there any way to get this hardware working outside the SoCs 
> > > > it's  designed for (say, a device with I2C like a Raspberry 
> > > > Pi),  so that a  kind soul could handle getting this using the 
> > > > right  interfaces?
> > > > 
> > > 
> > > It works on basically any SoC and is in no way limited in this  
> > > regard. Sure, userspace has to expicitely support it and I hear 
> > > you.  Using the iio interface would make more sense. I can only 
> > > say I'd  love to have the time to move this driver over. I'm 
> > > very sorry.
> > 
> > How can we get the hardware for somebody to use on their own  
> > laptops/embedded boards to implement this driver?
> > 
> 
> It's connected over I2C. If the included documentation is not clear 
> please tell me what exacly. Thanks!


I'll ask the question a different way: can you please give the address 
of a shop where that hardware is available?

^ permalink raw reply

* Re: [PATCH 3/3] mm: idle memory tracking
From: Cyrill Gorcunov @ 2015-03-19 10:12 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Greg Thelen,
	Michel Lespinasse, David Rientjes, Pavel Emelyanov,
	Jonathan Corbet, linux-api, linux-doc, linux-mm, linux-kernel
In-Reply-To: <0b70e70137aa5232cce44a69c0b5e320f2745f7d.1426706637.git.vdavydov@parallels.com>

On Wed, Mar 18, 2015 at 11:44:36PM +0300, Vladimir Davydov wrote:
> Knowing the portion of memory that is not used by a certain application
> or memory cgroup (idle memory) can be useful for partitioning the system
> efficiently. Currently, the only means to estimate the amount of idle
> memory provided by the kernel is /proc/PID/clear_refs. However, it has
> two serious shortcomings:
> 
>  - it does not count unmapped file pages
>  - it affects the reclaimer logic
> 
> This patch attempts to provide the userspace with the means to track
> idle memory without the above mentioned limitations.
...
> +static void set_mem_idle(void)
> +{
> +	int nid;
> +
> +	for_each_online_node(nid)
> +		set_mem_idle_node(nid);
> +}

Vladimir, might we need get_online_mems/put_online_mems here,
or if node gets offline this wont be a problem? (Asking
because i don't know).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [v10 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
From: Jan Kara @ 2015-03-19  9:56 UTC (permalink / raw)
  To: Li Xi
  Cc: linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack, viro,
	hch, dmonakhov
In-Reply-To: <1426705497-22158-5-git-send-email-lixi@ddn.com>

On Thu 19-03-15 04:04:56, Li Xi wrote:
> This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
> support for ext4. The interface is kept consistent with
> XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
  Two more comments below.

> 
> Signed-off-by: Li Xi <lixi@ddn.com>
...
> +static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
> +{
> +	struct inode *inode = file_inode(filp);
> +	struct super_block *sb = inode->i_sb;
> +	struct ext4_inode_info *ei = EXT4_I(inode);
> +	int err;
> +	handle_t *handle;
> +	kprojid_t kprojid;
> +	struct ext4_iloc iloc;
> +	struct ext4_inode *raw_inode;
> +
> +	struct dquot *transfer_to[EXT4_MAXQUOTAS] = { };
> +
> +	/* Make sure caller can change project. */
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EACCES;
  Why do you test capabilities here when you already test permission in
EXT4_IOC_FSSETXATTR? Furthermore this test is too restrictive. Just delete
it.

...
> -flags_out:
> +		err = ext4_ioctl_setflags(inode, flags);
>  		mutex_unlock(&inode->i_mutex);
> -		mnt_drop_write_file(filp);
> +		mnt_drop_write(filp->f_path.mnt);
  Why did you change this? mnt_drop_write_file() should stay here.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply

* Re: [v10 3/5] ext4: adds project quota support
From: Jan Kara @ 2015-03-19  9:41 UTC (permalink / raw)
  To: Li Xi
  Cc: linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack, viro,
	hch, dmonakhov
In-Reply-To: <1426705497-22158-4-git-send-email-lixi@ddn.com>

On Thu 19-03-15 04:04:55, Li Xi wrote:
> This patch adds mount options for enabling/disabling project quota
> accounting and enforcement. A new specific inode is also used for
> project quota accounting.
  Here we still don't have resolved the problem that inode number 11 may be
used. Please note that in the changelog as FIXME so that we don't forget.

> @@ -987,6 +988,7 @@ struct ext4_inode_info {
>  #define EXT4_MOUNT_DIOREAD_NOLOCK	0x400000 /* Enable support for dio read nolocking */
>  #define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000 /* Journal checksums */
>  #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
> +#define EXT4_MOUNT_PRJJQUOTA		0x2000000 /* Journaled Project quota */
>  #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
>  #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
>  #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
  I'm ok with the fact that you removed support for project quota files not
in system files. However why do you keep the PRJJQUOTA mount option then?
Quota tools are clever enough to query kernel for enabled quota without
checking for mount options these days. So they don't need the mount option
for anything. And kernel doesn't need the option for anything either
(usrquota and grpquota mount options are there only for quota in standard
files case).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply

* [PATCH 1/1] Add virtio-input driver.
From: Gerd Hoffmann @ 2015-03-19  9:13 UTC (permalink / raw)
  To: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mst-H+wXaHxf7aLQT0dZR+AlfA, Gerd Hoffmann, Rusty Russell,
	open list, open list:ABI/API
In-Reply-To: <1426756391-26585-1-git-send-email-kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

virtio-input is basically evdev-events-over-virtio, so this driver isn't
much more than reading configuration from config space and forwarding
incoming events to the linux input layer.

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/virtio/Kconfig            |  10 ++
 drivers/virtio/Makefile           |   1 +
 drivers/virtio/virtio_input.c     | 313 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_input.h |  65 ++++++++
 5 files changed, 390 insertions(+)
 create mode 100644 drivers/virtio/virtio_input.c
 create mode 100644 include/uapi/linux/virtio_input.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index b546da5..cab9f3f 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -48,6 +48,16 @@ config VIRTIO_BALLOON
 
 	 If unsure, say M.
 
+config VIRTIO_INPUT
+	tristate "Virtio input driver"
+	depends on VIRTIO
+	depends on INPUT
+	---help---
+	 This driver supports virtio input devices such as
+	 keyboards, mice and tablets.
+
+	 If unsure, say M.
+
  config VIRTIO_MMIO
 	tristate "Platform bus driver for memory mapped virtio devices"
 	depends on HAS_IOMEM
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index d85565b..41e30e3 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
new file mode 100644
index 0000000..2d425cf
--- /dev/null
+++ b/drivers/virtio/virtio_input.c
@@ -0,0 +1,313 @@
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/input.h>
+
+#include <uapi/linux/virtio_ids.h>
+#include <uapi/linux/virtio_input.h>
+
+struct virtio_input {
+	struct virtio_device       *vdev;
+	struct input_dev           *idev;
+	char                       name[64];
+	char                       serial[64];
+	char                       phys[64];
+	struct virtqueue           *evt, *sts;
+	struct virtio_input_event  evts[64];
+};
+
+static ssize_t serial_show(struct device *dev,
+			   struct device_attribute *attr, char *buf)
+{
+	struct input_dev *idev = to_input_dev(dev);
+	struct virtio_input *vi = input_get_drvdata(idev);
+	return sprintf(buf, "%s\n", vi->serial);
+}
+static DEVICE_ATTR_RO(serial);
+
+static struct attribute *dev_attrs[] = {
+	&dev_attr_serial.attr,
+	NULL
+};
+
+static umode_t dev_attrs_are_visible(struct kobject *kobj,
+				     struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct input_dev *idev = to_input_dev(dev);
+	struct virtio_input *vi = input_get_drvdata(idev);
+
+	if (a == &dev_attr_serial.attr && !strlen(vi->serial))
+		return 0;
+
+	return a->mode;
+}
+
+static struct attribute_group dev_attr_grp = {
+	.attrs =	dev_attrs,
+	.is_visible =	dev_attrs_are_visible,
+};
+
+static const struct attribute_group *dev_attr_groups[] = {
+	&dev_attr_grp,
+	NULL
+};
+
+static void virtinput_queue_evtbuf(struct virtio_input *vi,
+				   struct virtio_input_event *evtbuf)
+{
+	struct scatterlist sg[1];
+
+	sg_init_one(sg, evtbuf, sizeof(*evtbuf));
+	virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC);
+}
+
+static void virtinput_recv_events(struct virtqueue *vq)
+{
+	struct virtio_input *vi = vq->vdev->priv;
+	struct virtio_input_event *event;
+	unsigned int len;
+
+	while ((event = virtqueue_get_buf(vi->evt, &len)) != NULL) {
+		input_event(vi->idev,
+			    le16_to_cpu(event->type),
+			    le16_to_cpu(event->code),
+			    le32_to_cpu(event->value));
+		virtinput_queue_evtbuf(vi, event);
+	}
+	virtqueue_kick(vq);
+}
+
+static int virtinput_send_status(struct virtio_input *vi,
+				 u16 type, u16 code, s32 value)
+{
+	struct virtio_input_event *stsbuf;
+	struct scatterlist sg[1];
+
+	stsbuf = kzalloc(sizeof(*stsbuf), GFP_ATOMIC);
+	if (!stsbuf)
+		return -ENOMEM;
+
+	stsbuf->type  = cpu_to_le16(type);
+	stsbuf->code  = cpu_to_le16(code);
+	stsbuf->value = cpu_to_le32(value);
+	sg_init_one(sg, stsbuf, sizeof(*stsbuf));
+	virtqueue_add_outbuf(vi->sts, sg, 1, stsbuf, GFP_ATOMIC);
+	virtqueue_kick(vi->sts);
+	return 0;
+}
+
+static void virtinput_recv_status(struct virtqueue *vq)
+{
+	struct virtio_input *vi = vq->vdev->priv;
+	struct virtio_input_event *stsbuf;
+	unsigned int len;
+
+	while ((stsbuf = virtqueue_get_buf(vi->sts, &len)) != NULL)
+		kfree(stsbuf);
+	virtqueue_kick(vq);
+}
+
+static int virtinput_status(struct input_dev *idev, unsigned int type,
+			    unsigned int code, int value)
+{
+	struct virtio_input *vi = input_get_drvdata(idev);
+
+	return virtinput_send_status(vi, type, code, value);
+}
+
+static size_t virtinput_cfg_select(struct virtio_input *vi,
+				   u8 select, u8 subsel)
+{
+	u8 size;
+
+	virtio_cwrite(vi->vdev, struct virtio_input_config, select, &select);
+	virtio_cwrite(vi->vdev, struct virtio_input_config, subsel, &subsel);
+	virtio_cread(vi->vdev, struct virtio_input_config, size, &size);
+	return size;
+}
+
+static void virtinput_cfg_bits(struct virtio_input *vi, int select, int subsel,
+			       unsigned long *bits, unsigned int bitcount)
+{
+	unsigned int bit;
+	size_t bytes;
+	u8 cfg = 0;
+
+	bytes = virtinput_cfg_select(vi, select, subsel);
+	if (!bytes)
+		return;
+	if (bitcount > bytes*8)
+		bitcount = bytes*8;
+	if (select == VIRTIO_INPUT_CFG_EV_BITS)
+		set_bit(subsel, vi->idev->evbit);
+	for (bit = 0; bit < bitcount; bit++) {
+		if ((bit % 8) == 0)
+			virtio_cread(vi->vdev, struct virtio_input_config,
+				     u.bitmap[bit/8], &cfg);
+		if (cfg & (1 << (bit % 8)))
+			set_bit(bit, bits);
+	}
+}
+
+static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
+{
+	u32 size, mi, ma, fu, fl;
+
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);
+	input_set_abs_params(vi->idev, abs,
+			     le32_to_cpu(mi), le32_to_cpu(ma),
+			     le32_to_cpu(fu), le32_to_cpu(fl));
+}
+
+static int virtinput_init_vqs(struct virtio_input *vi)
+{
+	struct virtqueue *vqs[2];
+	vq_callback_t *cbs[] = { virtinput_recv_events,
+				 virtinput_recv_status };
+	const char *names[] = { "events", "status" };
+	int i, err, size;
+
+	err = vi->vdev->config->find_vqs(vi->vdev, 2, vqs, cbs, names);
+	if (err)
+		return err;
+	vi->evt = vqs[0];
+	vi->sts = vqs[1];
+
+	size = virtqueue_get_vring_size(vi->evt);
+	if (size > ARRAY_SIZE(vi->evts))
+		size = ARRAY_SIZE(vi->evts);
+	for (i = 0; i < size; i++)
+		virtinput_queue_evtbuf(vi, &vi->evts[i]);
+
+	return 0;
+}
+
+static int virtinput_probe(struct virtio_device *vdev)
+{
+	struct virtio_input *vi;
+	size_t size;
+	int abs, err;
+
+	vi = kzalloc(sizeof(*vi), GFP_KERNEL);
+	if (!vi) {
+		err = -ENOMEM;
+		goto out1;
+	}
+	vdev->priv = vi;
+	vi->vdev = vdev;
+
+	err = virtinput_init_vqs(vi);
+	if (err)
+		goto out2;
+
+	vi->idev = input_allocate_device();
+	if (!vi->idev) {
+		err = -ENOMEM;
+		goto out3;
+	}
+	input_set_drvdata(vi->idev, vi);
+
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_NAME, 0);
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
+			   vi->name, min(size, sizeof(vi->name)));
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_SERIAL, 0);
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
+			   vi->serial, min(size, sizeof(vi->serial)));
+	snprintf(vi->phys, sizeof(vi->phys),
+		 "virtio%d/input0", vdev->index);
+
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_PROP_BITS, 0,
+			   vi->idev->propbit, INPUT_PROP_CNT);
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REP);
+	if (size)
+		set_bit(EV_REP, vi->idev->evbit);
+
+	vi->idev->name = vi->name;
+	vi->idev->phys = vi->phys;
+	vi->idev->id.bustype = BUS_VIRTUAL;
+	vi->idev->id.vendor  = 0x0001;
+	vi->idev->id.product = 0x0001;
+	vi->idev->id.version = 0x0100;
+	vi->idev->dev.parent = &vdev->dev;
+	vi->idev->dev.groups = dev_attr_groups;
+	vi->idev->event = virtinput_status;
+
+	/* device -> kernel */
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_KEY,
+			   vi->idev->keybit, KEY_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REL,
+			   vi->idev->relbit, REL_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_ABS,
+			   vi->idev->absbit, ABS_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_MSC,
+			   vi->idev->mscbit, MSC_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SW,
+			   vi->idev->swbit,  SW_CNT);
+
+	/* kernel -> device */
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_LED,
+			   vi->idev->ledbit, LED_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SND,
+			   vi->idev->sndbit, SND_CNT);
+
+	if (test_bit(EV_ABS, vi->idev->evbit)) {
+		for (abs = 0; abs < ABS_CNT; abs++) {
+			if (!test_bit(abs, vi->idev->absbit))
+				continue;
+			virtinput_cfg_abs(vi, abs);
+		}
+	}
+
+	err = input_register_device(vi->idev);
+	if (err)
+		goto out4;
+
+	return 0;
+
+out4:
+	input_free_device(vi->idev);
+out3:
+	vdev->config->del_vqs(vdev);
+out2:
+	kfree(vi);
+out1:
+	return err;
+}
+
+static void virtinput_remove(struct virtio_device *vdev)
+{
+	struct virtio_input *vi = vdev->priv;
+
+	input_unregister_device(vi->idev);
+	vdev->config->reset(vdev);
+	vdev->config->del_vqs(vdev);
+	kfree(vi);
+}
+
+static unsigned int features[] = {
+};
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_INPUT, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_input_driver = {
+	.driver.name         = KBUILD_MODNAME,
+	.driver.owner        = THIS_MODULE,
+	.feature_table       = features,
+	.feature_table_size  = ARRAY_SIZE(features),
+	.id_table            = id_table,
+	.probe               = virtinput_probe,
+	.remove              = virtinput_remove,
+};
+
+module_virtio_driver(virtio_input_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Virtio input device driver");
+MODULE_AUTHOR("Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>");
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 284fc3a..5f60aa4 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -39,5 +39,6 @@
 #define VIRTIO_ID_9P		9 /* 9p virtio console */
 #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
 #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
+#define VIRTIO_ID_INPUT        18 /* virtio input */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_input.h b/include/uapi/linux/virtio_input.h
new file mode 100644
index 0000000..190d04a
--- /dev/null
+++ b/include/uapi/linux/virtio_input.h
@@ -0,0 +1,65 @@
+#ifndef _LINUX_VIRTIO_INPUT_H
+#define _LINUX_VIRTIO_INPUT_H
+/* This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE. */
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+
+enum virtio_input_config_select {
+	VIRTIO_INPUT_CFG_UNSET      = 0x00,
+	VIRTIO_INPUT_CFG_ID_NAME    = 0x01,
+	VIRTIO_INPUT_CFG_ID_SERIAL  = 0x02,
+	VIRTIO_INPUT_CFG_PROP_BITS  = 0x10,
+	VIRTIO_INPUT_CFG_EV_BITS    = 0x11,
+	VIRTIO_INPUT_CFG_ABS_INFO   = 0x12,
+};
+
+struct virtio_input_absinfo {
+	__le32  min;
+	__le32  max;
+	__le32  fuzz;
+	__le32  flat;
+};
+
+struct virtio_input_config {
+	__u8    select;
+	__u8    subsel;
+	__u8    size;
+	__u8    reserved;
+	union {
+		char string[128];
+		__u8 bitmap[128];
+		struct virtio_input_absinfo abs;
+	} u;
+};
+
+struct virtio_input_event {
+	__le16 type;
+	__le16 code;
+	__le32 value;
+};
+
+#endif /* _LINUX_VIRTIO_INPUT_H */
-- 
1.8.3.1

^ permalink raw reply related

* Re: [v10 1/5] vfs: adds general codes to enforces project quota limits
From: Jan Kara @ 2015-03-19  9:06 UTC (permalink / raw)
  To: Li Xi
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q, jack-AlSwsSmVLrQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	dmonakhov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <1426705497-22158-2-git-send-email-lixi-LfVdkaOWEx8@public.gmane.org>

On Thu 19-03-15 04:04:53, Li Xi wrote:
> This patch adds support for a new quota type PRJQUOTA for project quota
> enforcement. Also a new method get_projid() is added into dquot_operations
> structure.
> 
> Signed-off-by: Li Xi <lixi-LfVdkaOWEx8@public.gmane.org>
> Signed-off-by: Dmitry Monakhov <dmonakhov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
  Thanks. I've added this patch to my tree and will push it to Linus in the
next merge window (I had to modify it a bit to work with changes in my tree).
This patch is independent of the dispute with Konstantin so however that is
resolved this should go in.

								Honza

> ---
>  fs/quota/dquot.c           |   35 ++++++++++++++++++++++++++++++-----
>  fs/quota/quota.c           |    5 ++++-
>  fs/quota/quotaio_v2.h      |    6 ++++--
>  include/linux/quota.h      |    2 ++
>  include/uapi/linux/quota.h |    6 ++++--
>  5 files changed, 44 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 8f0acef..a02bb68 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1159,8 +1159,8 @@ static int need_print_warning(struct dquot_warn *warn)
>  			return uid_eq(current_fsuid(), warn->w_dq_id.uid);
>  		case GRPQUOTA:
>  			return in_group_p(warn->w_dq_id.gid);
> -		case PRJQUOTA:	/* Never taken... Just make gcc happy */
> -			return 0;
> +		case PRJQUOTA:
> +			return 1;
>  	}
>  	return 0;
>  }
> @@ -1399,6 +1399,9 @@ static void __dquot_initialize(struct inode *inode, int type)
>  	/* First get references to structures we might need. */
>  	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
>  		struct kqid qid;
> +		kprojid_t projid;
> +		int rc;
> +
>  		got[cnt] = NULL;
>  		if (type != -1 && cnt != type)
>  			continue;
> @@ -1409,6 +1412,10 @@ static void __dquot_initialize(struct inode *inode, int type)
>  		 */
>  		if (i_dquot(inode)[cnt])
>  			continue;
> +
> +		if (!sb_has_quota_active(sb, cnt))
> +			continue;
> +
>  		init_needed = 1;
>  
>  		switch (cnt) {
> @@ -1418,6 +1425,12 @@ static void __dquot_initialize(struct inode *inode, int type)
>  		case GRPQUOTA:
>  			qid = make_kqid_gid(inode->i_gid);
>  			break;
> +		case PRJQUOTA:
> +			rc = inode->i_sb->dq_op->get_projid(inode, &projid);
> +			if (rc)
> +				continue;
> +			qid = make_kqid_projid(projid);
> +			break;
>  		}
>  		got[cnt] = dqget(sb, qid);
>  	}
> @@ -2161,7 +2174,8 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id,
>  		error = -EROFS;
>  		goto out_fmt;
>  	}
> -	if (!sb->s_op->quota_write || !sb->s_op->quota_read) {
> +	if (!sb->s_op->quota_write || !sb->s_op->quota_read ||
> +	    (type == PRJQUOTA && sb->dq_op->get_projid == NULL)) {
>  		error = -EINVAL;
>  		goto out_fmt;
>  	}
> @@ -2402,8 +2416,19 @@ static void do_get_dqblk(struct dquot *dquot, struct fs_disk_quota *di)
>  
>  	memset(di, 0, sizeof(*di));
>  	di->d_version = FS_DQUOT_VERSION;
> -	di->d_flags = dquot->dq_id.type == USRQUOTA ?
> -			FS_USER_QUOTA : FS_GROUP_QUOTA;
> +	switch (dquot->dq_id.type) {
> +	case USRQUOTA:
> +		di->d_flags = FS_USER_QUOTA;
> +		break;
> +	case GRPQUOTA:
> +		di->d_flags = FS_GROUP_QUOTA;
> +		break;
> +	case PRJQUOTA:
> +		di->d_flags = FS_PROJ_QUOTA;
> +		break;
> +	default:
> +		BUG();
> +	}
>  	di->d_id = from_kqid_munged(current_user_ns(), dquot->dq_id);
>  
>  	spin_lock(&dq_data_lock);
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 2aa4151..33b30b1 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -30,7 +30,10 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
>  	case Q_XGETQSTATV:
>  	case Q_XQUOTASYNC:
>  		break;
> -	/* allow to query information for dquots we "own" */
> +	/*
> +	 * allow to query information for dquots we "own"
> +	 * always allow querying project quota
> +	 */
>  	case Q_GETQUOTA:
>  	case Q_XGETQUOTA:
>  		if ((type == USRQUOTA && uid_eq(current_euid(), make_kuid(current_user_ns(), id))) ||
> diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h
> index f1966b4..4e95430 100644
> --- a/fs/quota/quotaio_v2.h
> +++ b/fs/quota/quotaio_v2.h
> @@ -13,12 +13,14 @@
>   */
>  #define V2_INITQMAGICS {\
>  	0xd9c01f11,	/* USRQUOTA */\
> -	0xd9c01927	/* GRPQUOTA */\
> +	0xd9c01927,	/* GRPQUOTA */\
> +	0xd9c03f14,	/* PRJQUOTA */\
>  }
>  
>  #define V2_INITQVERSIONS {\
>  	1,		/* USRQUOTA */\
> -	1		/* GRPQUOTA */\
> +	1,		/* GRPQUOTA */\
> +	1,		/* PRJQUOTA */\
>  }
>  
>  /* First generic header */
> diff --git a/include/linux/quota.h b/include/linux/quota.h
> index 50978b7..ba51f7e 100644
> --- a/include/linux/quota.h
> +++ b/include/linux/quota.h
> @@ -50,6 +50,7 @@
>  
>  #undef USRQUOTA
>  #undef GRPQUOTA
> +#undef PRJQUOTA
>  enum quota_type {
>  	USRQUOTA = 0,		/* element used for user quotas */
>  	GRPQUOTA = 1,		/* element used for group quotas */
> @@ -317,6 +318,7 @@ struct dquot_operations {
>  	/* get reserved quota for delayed alloc, value returned is managed by
>  	 * quota code only */
>  	qsize_t *(*get_reserved_space) (struct inode *);
> +	int (*get_projid) (struct inode *, kprojid_t *);/* Get project ID */
>  };
>  
>  struct path;
> diff --git a/include/uapi/linux/quota.h b/include/uapi/linux/quota.h
> index 3b6cfbe..b2d9486 100644
> --- a/include/uapi/linux/quota.h
> +++ b/include/uapi/linux/quota.h
> @@ -36,11 +36,12 @@
>  #include <linux/errno.h>
>  #include <linux/types.h>
>  
> -#define __DQUOT_VERSION__	"dquot_6.5.2"
> +#define __DQUOT_VERSION__	"dquot_6.6.0"
>  
> -#define MAXQUOTAS 2
> +#define MAXQUOTAS 3
>  #define USRQUOTA  0		/* element used for user quotas */
>  #define GRPQUOTA  1		/* element used for group quotas */
> +#define PRJQUOTA  2		/* element used for project quotas */
>  
>  /*
>   * Definitions for the default names of the quotas files.
> @@ -48,6 +49,7 @@
>  #define INITQFNAMES { \
>  	"user",    /* USRQUOTA */ \
>  	"group",   /* GRPQUOTA */ \
> +	"project", /* PRJQUOTA */ \
>  	"undefined", \
>  };
>  
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 0/3] idle memory tracking
From: Vladimir Davydov @ 2015-03-19  8:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Greg Thelen,
	Michel Lespinasse, David Rientjes, Pavel Emelyanov,
	Cyrill Gorcunov, Jonathan Corbet, linux-api, linux-doc, linux-mm,
	linux-kernel
In-Reply-To: <20150319021337.GD9153@blaptop>

On Thu, Mar 19, 2015 at 11:13:37AM +0900, Minchan Kim wrote:
> On Wed, Mar 18, 2015 at 11:44:33PM +0300, Vladimir Davydov wrote:
> >  1. Write 1 to /proc/sys/vm/set_idle.
> > 
> >     This will set the IDLE flag for all user pages. The IDLE flag is cleared
> >     when the page is read or the ACCESS/YOUNG bit is cleared in any PTE pointing
> >     to the page. It is also cleared when the page is freed.
> 
> We should scan all of pages periodically? I understand why you did but
> someone might not take care of unmapped pages so I hope it should be optional.
> if someone just want to catch mapped file+anon pages, he can do it
> by scanning of address space of the process he selects.
> Even, someone might want to scan just part of address space rather than
> all address space of the process. Acutally, I have such scenario.

You still can estimate the working set size of a particular process, or
even by a part of its address space, by setting the IDLE bit for all
user pages, but clearing refs for and analyzing only those pages you are
interested in. You can filter them by scanning /proc/PID/pagemap.

If you are concerned about performance, I don't think it would be an
issue: on my test machine setting the IDLE bit for 20 GB of user pages
takes about 150 ms. Provided that this kind of work is supposed to be
done relatively rarely (every several minutes or so), the overhead looks
negligible to me. Anyway, we can introduce /proc/PID/set_mem_idle for
setting the IDLE bit only on pages of a particular address space.

> 
> > 
> >  2. Wait some time.
> > 
> >  3. Write 6 to /proc/PID/clear_refs for each PID of interest.
> > 
> >     This will clear the IDLE flag for recently accessed pages.
> > 
> >  4. Count the number of idle pages as reported by /proc/kpageflags. One may use
> >     /proc/PID/pagemap and/or /proc/kpagecgroup to filter pages that belong to a
> >     certain application/container.
> > 
> 
> Adding two new page flags? I don't know it's okay for 64bit but there is no
> room for 32bit. Please take care of 32 bit. It would be good feature for
> embedded. How about using page_ext if you couldn't make room for page->flags
> for 32bit? You would add per-page meta data in there.

For the time being, I made it dependant on 64BIT explicitly, because I
am only interested in analyzing working set size of containers running
on big machines, but I admit one could use page_ext for storing the
additional flags if compiled for 32 bit.

> 
> Your suggestion is generic so my concern is overhead. On every iteration,
> we should set/clear/investigate page flags. I don't know how much overhead
> is in there but it surely could be big if memory is big.
> Couldn't we do that at one go? Maybe, like mincore
> 
>         int idlecore(pid_t pid, void *addr, size_t length, unsigned char *vec)
> 
> So, we could know what pages of the process[pid] were idle by vec in
> [addr, lentgh] and reset idle of the pages for the process
> in the system call at one go.

I don't think adding yet another syscall for such a specialized feature
is a good idea. Besides, I want to keep the interface consistent with
/proc/PID/clear_refs, which IMO suits perfectly well for clearing the
IDLE flag on referenced pages. As I mentioned above, to reduce the
overhead in case the user is not interested in unmapped file pages, we
could introduce /proc/PID/set_mem_idle, though I think this only should
be done if there are complains about /proc/sys/vm/set_idle performance.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Daniel Micay @ 2015-03-19  5:34 UTC (permalink / raw)
  To: Andrew Morton, Shaohua Li
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski, Aliaksey Kandratsenka
In-Reply-To: <20150318153100.5658b741277f3717b52e42d9-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

On 18/03/15 06:31 PM, Andrew Morton wrote:
> On Tue, 17 Mar 2015 14:09:39 -0700 Shaohua Li <shli-b10kYP2dOMg@public.gmane.org> wrote:
> 
>> There was a similar patch posted before, but it doesn't get merged. I'd like
>> to try again if there are more discussions.
>> http://marc.info/?l=linux-mm&m=141230769431688&w=2
>>
>> mremap can be used to accelerate realloc. The problem is mremap will
>> punch a hole in original VMA, which makes specific memory allocator
>> unable to utilize it. Jemalloc is an example. It manages memory in 4M
>> chunks. mremap a range of the chunk will punch a hole, which other
>> mmap() syscall can fill into. The 4M chunk is then fragmented, jemalloc
>> can't handle it.
> 
> Daniel's changelog had additional details regarding the userspace
> allocators' behaviour.  It would be best to incorporate that into your
> changelog.
>
> Daniel also had microbenchmark testing results for glibc and jemalloc. 
> Can you please do this?
> 
> I'm not seeing any testing results for tcmalloc and I'm not seeing
> confirmation that this patch will be useful for tcmalloc.  Has anyone
> tried it, or sought input from tcmalloc developers?

TCMalloc and jemalloc are currently equally slow in this benchmark, as
neither makes use of mremap. They're ~2-3x slower than glibc. I CC'ed
the currently most active TCMalloc developer so they can give input
into whether this patch would let them use it.

#include <string.h>
#include <stdlib.h>

int main(void) {
  void *ptr = NULL;
  size_t old_size = 0;
  for (size_t size = 4 * 1024 * 1024; size < 1024 * 1024 * 1024; size *= 2) {
    ptr = realloc(ptr, size);
    if (!ptr) return 1;
    memset(ptr, 0xff, size - old_size);
    old_size = size;
  }
  free(ptr);
}

If an outer loop is wrapped around this, jemalloc's master branch will
at least be able to do in-place resizing for everything after the 1st
run, but that's much rarer in the real world where there are many users
of the allocator. The lack of mremap still ends up hurting a lot.

FWIW, jemalloc is now the default allocator on Android so there are an
increasing number of Linux machines unable to leverage mremap. It could
be worked around by attempting to use an mmap hint to get the memory
back, but that can fail as it's a race with the other threads and that
leads increases fragmentation over the long term.

It's especially problematic if a large range of virtual memory is
reserved and divided up between per-CPU arenas for concurrency, but
only garbage collectors tend to do stuff like this at the moment. This
can still be dealt with by checking internal uses of mmap and returning
any memory from the reserved range to the right place, but it shouldn't
have to be that ugly.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Andrew Morton @ 2015-03-19  5:22 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	danielmicay-Re5JQEeQqe8AvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Rik van Riel, Hugh Dickins,
	Mel Gorman, Johannes Weiner, Michal Hocko, Andy Lutomirski
In-Reply-To: <20150319050826.GA1591708-XA4dbxeItU7BTsLV8vAZyg2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>

On Wed, 18 Mar 2015 22:08:26 -0700 Shaohua Li <shli-b10kYP2dOMg@public.gmane.org> wrote:

> > Daniel also had microbenchmark testing results for glibc and jemalloc. 
> > Can you please do this?
> 
> I run Daniel's microbenchmark too, and not surprise the result is
> similar:
> glibc: 32.82
> jemalloc: 70.35
> jemalloc+mremap: 33.01
> tcmalloc: 68.81
> 
> but tcmalloc doesn't support mremap currently, so I cant test it.

But Daniel's changelog implies strongly that tcmalloc would benefit
from his patch.  Was that inaccurate or is this a difference between
his patch and yours?

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Shaohua Li @ 2015-03-19  5:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	danielmicay-Re5JQEeQqe8AvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Rik van Riel, Hugh Dickins,
	Mel Gorman, Johannes Weiner, Michal Hocko, Andy Lutomirski
In-Reply-To: <20150318153100.5658b741277f3717b52e42d9-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

On Wed, Mar 18, 2015 at 03:31:00PM -0700, Andrew Morton wrote:
> On Tue, 17 Mar 2015 14:09:39 -0700 Shaohua Li <shli-b10kYP2dOMg@public.gmane.org> wrote:
> 
> > There was a similar patch posted before, but it doesn't get merged. I'd like
> > to try again if there are more discussions.
> > http://marc.info/?l=linux-mm&m=141230769431688&w=2
> > 
> > mremap can be used to accelerate realloc. The problem is mremap will
> > punch a hole in original VMA, which makes specific memory allocator
> > unable to utilize it. Jemalloc is an example. It manages memory in 4M
> > chunks. mremap a range of the chunk will punch a hole, which other
> > mmap() syscall can fill into. The 4M chunk is then fragmented, jemalloc
> > can't handle it.
> 
> Daniel's changelog had additional details regarding the userspace
> allocators' behaviour.  It would be best to incorporate that into your
> changelog.

I'll extract some from his changelog in next post
 
> Daniel also had microbenchmark testing results for glibc and jemalloc. 
> Can you please do this?

I run Daniel's microbenchmark too, and not surprise the result is
similar:
glibc: 32.82
jemalloc: 70.35
jemalloc+mremap: 33.01
tcmalloc: 68.81

but tcmalloc doesn't support mremap currently, so I cant test it.
 
> I'm not seeing any testing results for tcmalloc and I'm not seeing
> confirmation that this patch will be useful for tcmalloc.  Has anyone
> tried it, or sought input from tcmalloc developers?
> 
> > This patch adds a new flag for mremap. With it, mremap will not punch the
> > hole. page tables of original vma will be zapped in the same way, but
> > vma is still there. That is original vma will look like a vma without
> > pagefault. Behavior of new vma isn't changed.
> > 
> > For private vma, accessing original vma will cause
> > page fault and just like the address of the vma has never been accessed.
> > So for anonymous, new page/zero page will be fault in. For file mapping,
> > new page will be allocated with file reading for cow, or pagefault will
> > use existing page cache.
> > 
> > For shared vma, original and new vma will map to the same file. We can
> > optimize this without zaping original vma's page table in this case, but
> > this patch doesn't do it yet.
> > 
> > Since with MREMAP_NOHOLE, original vma still exists. pagefault handler
> > for special vma might not able to handle pagefault for mremap'd area.
> > The patch doesn't allow vmas with VM_PFNMAP|VM_MIXEDMAP flags do NOHOLE
> > mremap.
> 
> At some point (preferably an early point) we'd like a manpage update
> and a cc: to linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org please.

ok, will add in next post.

Thanks,
Shaohua

^ permalink raw reply

* Re: [PATCH 0/3] idle memory tracking
From: Minchan Kim @ 2015-03-19  2:13 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Greg Thelen,
	Michel Lespinasse, David Rientjes, Pavel Emelyanov,
	Cyrill Gorcunov, Jonathan Corbet, linux-api, linux-doc, linux-mm,
	linux-kernel
In-Reply-To: <cover.1426706637.git.vdavydov@parallels.com>

Hello,

On Wed, Mar 18, 2015 at 11:44:33PM +0300, Vladimir Davydov wrote:
> Hi,
> 
> Knowing the portion of memory that is not used by a certain application
> or memory cgroup (idle memory) can be useful for partitioning the system
> efficiently. Currently, the only means to estimate the amount of idle
> memory provided by the kernel is /proc/PID/clear_refs. However, it has
> two serious shortcomings:
> 
>  - it does not count unmapped file pages
>  - it affects the reclaimer logic
> 
> Back in 2011 an attempt was made by Michel Lespinasse to improve the
> situation (see http://lwn.net/Articles/459269/). He proposed a kernel
> space daemon which would periodically scan physical address range,
> testing and clearing ACCESS/YOUNG PTE bits, and counting pages that had
> not been referenced since the last scan. The daemon avoided interference
> with the page reclaimer by setting the new PG_young flag on referenced
> pages and making page_referenced() return >= 1 if PG_young was set.
> 
> This patch set reuses the idea of Michel's patch set, but the
> implementation is quite different. Instead of introducing a kernel space
> daemon, it only provides the userspace with the necessary means to
> estimate the amount of idle memory, leaving the daemon to be implemented
> in the userspace. In order to achieve that, it adds two new proc files,
> /proc/kpagecgroup and /proc/sys/vm/set_idle, and extends the clear_refs
> interface.
> 
> 
>  1. Write 1 to /proc/sys/vm/set_idle.
> 
>     This will set the IDLE flag for all user pages. The IDLE flag is cleared
>     when the page is read or the ACCESS/YOUNG bit is cleared in any PTE pointing
>     to the page. It is also cleared when the page is freed.

We should scan all of pages periodically? I understand why you did but
someone might not take care of unmapped pages so I hope it should be optional.
if someone just want to catch mapped file+anon pages, he can do it
by scanning of address space of the process he selects.
Even, someone might want to scan just part of address space rather than
all address space of the process. Acutally, I have such scenario.

> 
>  2. Wait some time.
> 
>  3. Write 6 to /proc/PID/clear_refs for each PID of interest.
> 
>     This will clear the IDLE flag for recently accessed pages.
> 
>  4. Count the number of idle pages as reported by /proc/kpageflags. One may use
>     /proc/PID/pagemap and/or /proc/kpagecgroup to filter pages that belong to a
>     certain application/container.
> 

Adding two new page flags? I don't know it's okay for 64bit but there is no
room for 32bit. Please take care of 32 bit. It would be good feature for
embedded. How about using page_ext if you couldn't make room for page->flags
for 32bit? You would add per-page meta data in there.

Your suggestion is generic so my concern is overhead. On every iteration,
we should set/clear/investigate page flags. I don't know how much overhead
is in there but it surely could be big if memory is big.
Couldn't we do that at one go? Maybe, like mincore

        int idlecore(pid_t pid, void *addr, size_t length, unsigned char *vec)

So, we could know what pages of the process[pid] were idle by vec in
[addr, lentgh] and reset idle of the pages for the process
in the system call at one go.

Anyway, Thanks for the good feature.

> An example of using this new interface is below. It is a script that
> counts the number of pages charged to a specified cgroup that have not
> been accessed for a given time interval.
> 
> ---- BEGIN SCRIPT ----
> #! /usr/bin/python
> #
> 
> import struct
> import sys
> import os
> import stat
> import time
> 
> def get_end_pfn():
>     f = open("/proc/zoneinfo", "r")
>     end_pfn = 0
>     for l in f.readlines():
>         l = l.split()
>         if l[0] == "spanned":
>             end_pfn = int(l[1])
>         elif l[0] == "start_pfn:":
>             end_pfn += int(l[1])
>     return end_pfn
> 
> def set_idle():
>     open("/proc/sys/vm/set_idle", "w").write("1")
> 
> def clear_refs(target_cg_path):
>     procs = open(target_cg_path + "/cgroup.procs", "r")
>     for pid in procs.readlines():
>         try:
>             with open("/proc/" + pid.rstrip() + "/clear_refs", "w") as f:
>                 f.write("6")
>         except IOError as e:
>             print "Failed to clear_refs for pid " + pid + ": " + str(e)
> 
> def count_idle(target_cg_path):
>     target_cg_ino = os.stat(target_cg_path)[stat.ST_INO]
> 
>     pgflags = open("/proc/kpageflags", "rb")
>     pgcgroup = open("/proc/kpagecgroup", "rb")
> 
>     nidle = 0
> 
>     for i in range(0, get_end_pfn()):
>         cg_ino = struct.unpack('Q', pgcgroup.read(8))[0]
>         flags = struct.unpack('Q', pgflags.read(8))[0]
> 
>         if cg_ino != target_cg_ino:
>             continue
> 
>         # unevictable?
>         if (flags >> 18) & 1 != 0:
>             continue
> 
>         # huge?
>         if (flags >> 22) & 1 != 0:
>             npages = 512
>         else:
>             npages = 1
> 
>         # idle?
>         if (flags >> 25) & 1 != 0:
>             nidle += npages
> 
>     return nidle
> 
> if len(sys.argv) <> 3:
>     print "Usage: %s cgroup_path scan_interval" % sys.argv[0]
>     exit(1)
> 
> cg_path = sys.argv[1]
> scan_interval = int(sys.argv[2])
> 
> while True:
>     set_idle()
>     time.sleep(scan_interval)
>     clear_refs(cg_path)
>     print count_idle(cg_path)
> ---- END SCRIPT ----
> 
> Thanks,
> 
> Vladimir Davydov (3):
>   memcg: add page_cgroup_ino helper
>   proc: add kpagecgroup file
>   mm: idle memory tracking
> 
>  Documentation/filesystems/proc.txt     |    3 +
>  Documentation/vm/00-INDEX              |    2 +
>  Documentation/vm/idle_mem_tracking.txt |   23 +++++++
>  Documentation/vm/pagemap.txt           |   10 ++-
>  fs/proc/Kconfig                        |    5 +-
>  fs/proc/page.c                         |  107 ++++++++++++++++++++++++++++++++
>  fs/proc/task_mmu.c                     |   22 ++++++-
>  include/linux/memcontrol.h             |    8 +--
>  include/linux/page-flags.h             |   12 ++++
>  include/uapi/linux/kernel-page-flags.h |    1 +
>  kernel/sysctl.c                        |   14 +++++
>  mm/Kconfig                             |   12 ++++
>  mm/debug.c                             |    4 ++
>  mm/hwpoison-inject.c                   |    5 +-
>  mm/memcontrol.c                        |   61 ++++++++----------
>  mm/memory-failure.c                    |   16 +----
>  mm/rmap.c                              |    7 +++
>  mm/swap.c                              |    2 +
>  18 files changed, 248 insertions(+), 66 deletions(-)
>  create mode 100644 Documentation/vm/idle_mem_tracking.txt
> 
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] selftests: Fix kcmp build to not require headers install
From: Michael Ellerman @ 2015-03-19  0:51 UTC (permalink / raw)
  To: Shuah Khan
  Cc: gorcunov-GEFAQzZX7r8dnm+yROfE0A,
	tranmanphong-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <550A15EB.1060900-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

On Wed, 2015-03-18 at 18:18 -0600, Shuah Khan wrote:
> On 03/18/2015 06:02 PM, Michael Ellerman wrote:
> > On Wed, 2015-03-18 at 09:04 -0600, Shuah Khan wrote:
> >> On 03/16/2015 05:00 AM, Michael Ellerman wrote:
> >>> On Fri, 2015-03-13 at 19:45 -0600, Shuah Khan wrote:
> >>>> Change CFLAGS to look in uapi to allow kcmp to be built without
> >>>> requiring headers install. This will make it easier to run tests
> >>>> without going through the headers install step.
> >>>>
> >>>> Signed-off-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>
> >>>> ---
> >>>>  tools/testing/selftests/kcmp/Makefile | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
> >>>> index ff0eefd..d405ad4 100644
> >>>> --- a/tools/testing/selftests/kcmp/Makefile
> >>>> +++ b/tools/testing/selftests/kcmp/Makefile
> >>>> @@ -1,5 +1,5 @@
> >>>>  CC := $(CROSS_COMPILE)$(CC)
> >>>> -CFLAGS += -I../../../../usr/include/
> >>>> +CFLAGS += -I../../../../include/uapi -I../../../../usr/include/
> >>>
> >>> Hi Shuah,
> >>>
> >>> Sorry but this is wrong. The contents of include/uapi are not the same as the
> >>> exported kernel headers.
> >>>
> >>> Mixing the unprocessed kernel headers with user headers leads to all sorts of
> >>> mess, eg:
> >>>
> >>> $ cc -I../../../../include/uapi -I../../../../usr/include/ kcmp_test.c   -o kcmp_test
> >>
> >> Do you see this error when you run the compile using kcmp Makefile
> >> or using make ksefltest target?
> > 
> > $ cd tools/testing/selftests
> > $ make TARGETS=kcmp
> > for TARGET in kcmp; do \
> > 	make -C $TARGET; \
> > done;
> > make[1]: Entering directory '/home/michael/work/topics/powerpc-maint/src/misc-test/tools/testing/selftests/kcmp'
> > ppc64-cc -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c   -o kcmp_test
> > In file included from /usr/powerpc-linux-gnu/include/asm/ptrace.h:27:0,
> >                  from /usr/powerpc-linux-gnu/include/asm/sigcontext.h:11,
> >                  from /usr/powerpc-linux-gnu/include/bits/sigcontext.h:27,
> >                  from /usr/powerpc-linux-gnu/include/signal.h:332,
> >                  from kcmp_test.c:5:
> > ../../../../include/uapi/linux/types.h:9:2: warning: #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders" [-Wcpp]
> >  #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders"
> >   ^
> > 
> 
> Hmm..  I don't see this when I run it on x86
> 
> make TARGETS=kcmp
> for TARGET in kcmp; do \
> 	make -C $TARGET; \
> done;
> make[1]: Entering directory
> '/lkml/linux-kselftest/tools/testing/selftests/kcmp'
> cc -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c
>   -o kcmp_test
> make[1]: Leaving directory
> '/lkml/linux-kselftest/tools/testing/selftests/kcmp'

Yeah, but that's basically just luck.

$ cc -w -H -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c   -o kcmp_test
. /usr/include/stdio.h
.. /usr/include/features.h
... /usr/include/x86_64-linux-gnu/sys/cdefs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/gnu/stubs.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/types.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/typesizes.h
.. /usr/include/libio.h
... /usr/include/_G_config.h
.... /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
.... /usr/include/wchar.h
... /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stdarg.h
.. /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
.. /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
. /usr/include/stdlib.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/waitflags.h
.. /usr/include/x86_64-linux-gnu/bits/waitstatus.h
... /usr/include/endian.h
.... /usr/include/x86_64-linux-gnu/bits/endian.h
.... /usr/include/x86_64-linux-gnu/bits/byteswap.h
..... /usr/include/x86_64-linux-gnu/bits/wordsize.h
..... /usr/include/x86_64-linux-gnu/bits/byteswap-16.h
.. /usr/include/xlocale.h
.. /usr/include/x86_64-linux-gnu/sys/types.h
... /usr/include/time.h
... /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
... /usr/include/x86_64-linux-gnu/sys/select.h
.... /usr/include/x86_64-linux-gnu/bits/select.h
..... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/bits/sigset.h
.... /usr/include/time.h
.... /usr/include/x86_64-linux-gnu/bits/time.h
... /usr/include/x86_64-linux-gnu/sys/sysmacros.h
... /usr/include/x86_64-linux-gnu/bits/pthreadtypes.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.. /usr/include/alloca.h
... /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/stdlib-float.h
. /usr/include/signal.h
.. /usr/include/x86_64-linux-gnu/bits/sigset.h
.. /usr/include/x86_64-linux-gnu/bits/signum.h
.. /usr/include/time.h
.. /usr/include/x86_64-linux-gnu/bits/siginfo.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.. /usr/include/x86_64-linux-gnu/bits/sigaction.h
.. /usr/include/x86_64-linux-gnu/bits/sigcontext.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/sigstack.h
.. /usr/include/x86_64-linux-gnu/sys/ucontext.h
... /usr/include/signal.h
.. /usr/include/x86_64-linux-gnu/bits/sigthread.h
. /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed/limits.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed/syslimits.h
... /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed/limits.h
.... /usr/include/limits.h
..... /usr/include/x86_64-linux-gnu/bits/posix1_lim.h
...... /usr/include/x86_64-linux-gnu/bits/local_lim.h
....... ../../../../include/uapi/linux/limits.h
..... /usr/include/x86_64-linux-gnu/bits/posix2_lim.h
..... /usr/include/x86_64-linux-gnu/bits/xopen_lim.h
...... /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
. /usr/include/unistd.h
.. /usr/include/x86_64-linux-gnu/bits/posix_opt.h
.. /usr/include/x86_64-linux-gnu/bits/environments.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/confname.h
.. /usr/include/getopt.h
. /usr/include/errno.h
.. /usr/include/x86_64-linux-gnu/bits/errno.h
... ../../../../include/uapi/linux/errno.h
.... /usr/include/x86_64-linux-gnu/asm/errno.h
..... ../../../../include/uapi/asm-generic/errno.h
...... ../../../../include/uapi/asm-generic/errno-base.h
. /usr/include/string.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.9/include/stddef.h
. /usr/include/fcntl.h
.. /usr/include/x86_64-linux-gnu/bits/fcntl.h
... /usr/include/x86_64-linux-gnu/bits/fcntl-linux.h
.... /usr/include/x86_64-linux-gnu/bits/uio.h
.. /usr/include/time.h
.. /usr/include/x86_64-linux-gnu/bits/stat.h
. ../../../../include/uapi/linux/unistd.h
.. /usr/include/x86_64-linux-gnu/asm/unistd.h
... /usr/include/x86_64-linux-gnu/asm/unistd_64.h
. ../../../../include/uapi/linux/kcmp.h
. /usr/include/x86_64-linux-gnu/sys/syscall.h
.. /usr/include/x86_64-linux-gnu/bits/syscall.h
. /usr/include/x86_64-linux-gnu/sys/stat.h
.. /usr/include/time.h
.. /usr/include/x86_64-linux-gnu/bits/stat.h
. /usr/include/x86_64-linux-gnu/sys/wait.h
.. /usr/include/x86_64-linux-gnu/bits/siginfo.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
. ../kselftest.h
Multiple include guards may be useful for:
../../../../include/uapi/linux/errno.h
/usr/include/errno.h
/usr/include/limits.h
/usr/include/wchar.h
/usr/include/x86_64-linux-gnu/asm/errno.h
/usr/include/x86_64-linux-gnu/bits/byteswap-16.h
/usr/include/x86_64-linux-gnu/bits/byteswap.h
/usr/include/x86_64-linux-gnu/bits/confname.h
/usr/include/x86_64-linux-gnu/bits/endian.h
/usr/include/x86_64-linux-gnu/bits/environments.h
/usr/include/x86_64-linux-gnu/bits/errno.h
/usr/include/x86_64-linux-gnu/bits/fcntl-linux.h
/usr/include/x86_64-linux-gnu/bits/fcntl.h
/usr/include/x86_64-linux-gnu/bits/local_lim.h
/usr/include/x86_64-linux-gnu/bits/select.h
/usr/include/x86_64-linux-gnu/bits/sigaction.h
/usr/include/x86_64-linux-gnu/bits/signum.h
/usr/include/x86_64-linux-gnu/bits/sigstack.h
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h
/usr/include/x86_64-linux-gnu/bits/sys_errlist.h
/usr/include/x86_64-linux-gnu/bits/syscall.h
/usr/include/x86_64-linux-gnu/bits/time.h
/usr/include/x86_64-linux-gnu/bits/typesizes.h
/usr/include/x86_64-linux-gnu/bits/uio.h
/usr/include/x86_64-linux-gnu/bits/waitflags.h
/usr/include/x86_64-linux-gnu/bits/waitstatus.h
/usr/include/x86_64-linux-gnu/gnu/stubs-64.h
/usr/include/x86_64-linux-gnu/gnu/stubs.h
/usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed/syslimits.h


As you can see it uses a few uapi headers, but not many, and we seem to get
away with it.

On powerpc instead we have:

$ ppc64-cc -w -H -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c   -o kcmp_test
. /usr/powerpc-linux-gnu/include/stdio.h
.. /usr/powerpc-linux-gnu/include/features.h
... /usr/powerpc-linux-gnu/include/sys/cdefs.h
.... /usr/powerpc-linux-gnu/include/bits/wordsize.h
... /usr/powerpc-linux-gnu/include/gnu/stubs.h
.... /usr/powerpc-linux-gnu/include/bits/wordsize.h
.... /usr/powerpc-linux-gnu/include/gnu/stubs-32.h
.. /usr/lib/gcc-cross/powerpc-linux-gnu/4.9/include/stddef.h
.. /usr/powerpc-linux-gnu/include/bits/types.h
... /usr/powerpc-linux-gnu/include/bits/wordsize.h
... /usr/powerpc-linux-gnu/include/bits/typesizes.h
.. /usr/powerpc-linux-gnu/include/libio.h
... /usr/powerpc-linux-gnu/include/_G_config.h
.... /usr/lib/gcc-cross/powerpc-linux-gnu/4.9/include/stddef.h
.... /usr/powerpc-linux-gnu/include/wchar.h
... /usr/lib/gcc-cross/powerpc-linux-gnu/4.9/include/stdarg.h
.. /usr/powerpc-linux-gnu/include/bits/stdio_lim.h
.. /usr/powerpc-linux-gnu/include/bits/sys_errlist.h
. /usr/powerpc-linux-gnu/include/stdlib.h
.. /usr/lib/gcc-cross/powerpc-linux-gnu/4.9/include/stddef.h
.. /usr/powerpc-linux-gnu/include/bits/waitflags.h
.. /usr/powerpc-linux-gnu/include/bits/waitstatus.h
... /usr/powerpc-linux-gnu/include/endian.h
.... /usr/powerpc-linux-gnu/include/bits/endian.h
.... /usr/powerpc-linux-gnu/include/bits/byteswap.h
..... /usr/powerpc-linux-gnu/include/bits/byteswap-16.h
.. /usr/powerpc-linux-gnu/include/xlocale.h
.. /usr/powerpc-linux-gnu/include/sys/types.h
... /usr/powerpc-linux-gnu/include/time.h
... /usr/lib/gcc-cross/powerpc-linux-gnu/4.9/include/stddef.h
... /usr/powerpc-linux-gnu/include/sys/select.h
.... /usr/powerpc-linux-gnu/include/bits/select.h
.... /usr/powerpc-linux-gnu/include/bits/sigset.h
.... /usr/powerpc-linux-gnu/include/time.h
.... /usr/powerpc-linux-gnu/include/bits/time.h
... /usr/powerpc-linux-gnu/include/sys/sysmacros.h
... /usr/powerpc-linux-gnu/include/bits/pthreadtypes.h
.... /usr/powerpc-linux-gnu/include/bits/wordsize.h
.. /usr/powerpc-linux-gnu/include/alloca.h
... /usr/lib/gcc-cross/powerpc-linux-gnu/4.9/include/stddef.h
.. /usr/powerpc-linux-gnu/include/bits/stdlib-float.h
. /usr/powerpc-linux-gnu/include/signal.h
.. /usr/powerpc-linux-gnu/include/bits/sigset.h
.. /usr/powerpc-linux-gnu/include/bits/signum.h
.. /usr/powerpc-linux-gnu/include/time.h
.. /usr/powerpc-linux-gnu/include/bits/siginfo.h
... /usr/powerpc-linux-gnu/include/bits/wordsize.h
.. /usr/powerpc-linux-gnu/include/bits/sigaction.h
.. /usr/powerpc-linux-gnu/include/bits/sigcontext.h
... /usr/powerpc-linux-gnu/include/asm/sigcontext.h
.... /usr/powerpc-linux-gnu/include/asm/ptrace.h
..... ../../../../include/uapi/linux/types.h
...... /usr/powerpc-linux-gnu/include/asm/types.h
....... ../../../../include/uapi/asm-generic/int-ll64.h
........ /usr/powerpc-linux-gnu/include/asm/bitsperlong.h
......... ../../../../include/uapi/asm-generic/bitsperlong.h
...... ../../../../include/uapi/linux/posix_types.h
....... ../../../../include/uapi/linux/stddef.h
In file included from ../../../../include/uapi/linux/posix_types.h:4:0,
                 from ../../../../include/uapi/linux/types.h:13,
                 from /usr/powerpc-linux-gnu/include/asm/ptrace.h:27,
                 from /usr/powerpc-linux-gnu/include/asm/sigcontext.h:11,
                 from /usr/powerpc-linux-gnu/include/bits/sigcontext.h:27,
                 from /usr/powerpc-linux-gnu/include/signal.h:332,
                 from kcmp_test.c:5:
../../../../include/uapi/linux/stddef.h:1:28: fatal error: linux/compiler.h: No such file or directory
 #include <linux/compiler.h>
                            ^
compilation terminated.


So things start to go badly when we include asm/sigcontext.h. It just comes
down to the exact sequence the headers are included, and we can't do much about
that.

cheers

^ permalink raw reply

* Re: [PATCH 2/2] selftests/timers: change to use shared logic to run and install tests
From: Michael Ellerman @ 2015-03-19  0:21 UTC (permalink / raw)
  To: Shuah Khan; +Cc: John Stultz, Thomas Gleixner, Linux API, lkml
In-Reply-To: <55099FD8.1080502-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

On Wed, 2015-03-18 at 09:55 -0600, Shuah Khan wrote:
> On 03/15/2015 08:48 PM, Michael Ellerman wrote:
> > On Fri, 2015-03-13 at 20:14 -0700, John Stultz wrote:
> >> My only thoughts:
> >> 1) Would it be better if threadtest can be made to have better
> >> defaults for kselftest so you don't need that extra logic?
> > 
> > That would help. But with the patch I just sent I think it's no bother, it's
> > only a little extra logic and it's only in the timers Makefile.
> 
> Let's go with a threadtest patch with better defaults. It will emit
> scripts logic as well. Awesome. I just saw John's new patch in my
> Inbox. Thanks.

Fine by me.

> >> 2) While I get that TEST_FILES is likely going to be used to copy the
> >> destructive tests over, It feels a little like its being bundled in
> >> with something like data files that tests might need, which seems sort
> >> of hackish. Would TEST_PROGS_EXTENDED or something be more clear and
> >> make more sense?
> > 
> > That doesn't really bother me. You're right that TEST_FILES is originally
> > intended for data files etc. but I don't think it's a big hack to use it for
> > other tests that shouldn't be run by default. Still if it bothers you I'm happy
> > to add a separate variable for it, they are cheap :)
> 
> Could you please make the change from TEST_FILES to TEST_PROGS_EXTENDED
> which is definitely better than overloading TEST_FILES?

I don't see the point, but I don't care that much.

> I think this would probably only change Add Install target patch. You
> can send me patch v5 with that change and I will override the next with
> your new one. Thanks for doing this.

You shouldn't rebase your next branch, people may have already merged it (like
me). You should apply it as an additional patch on top.

Patch sent.

cheers

^ permalink raw reply

* [PATCH v2] selftests/timers: Use shared logic to run and install tests
From: Michael Ellerman @ 2015-03-19  0:20 UTC (permalink / raw)
  To: shuahkh-JPH+aEBZ4P+UEJcrhfAQsw
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Change the timers Makefile to make use of shared run and install logic
in lib.mk. Destructive tests are installed but not run by default.

Add a new variable, TEST_PROGS_EXTENDED, which is a list of extra
programs to install, but which are not run by the default run_tests
logic.

Signed-off-by: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
---

v2: Add and use TEST_PROGS_EXTENDED.
    Just use the standard run_tests/install/emit_tests, this relies on the
    patch from John to change the defaults for threadtest.

 tools/testing/selftests/lib.mk          |  2 +-
 tools/testing/selftests/timers/Makefile | 27 ++++++++++++---------------
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 7bd3dabe2846..0baf7d32a67d 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -9,7 +9,7 @@ run_tests: all
 
 define INSTALL_RULE
 	mkdir -p $(INSTALL_PATH)
-	install -t $(INSTALL_PATH) $(TEST_PROGS) $(TEST_FILES)
+	install -t $(INSTALL_PATH) $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES)
 endef
 
 install: all
diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile
index 9da3498987c8..670aebdb4a99 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -2,24 +2,21 @@ CC = $(CROSS_COMPILE)gcc
 BUILD_FLAGS = -DKTEST
 CFLAGS += -O3 -Wl,-no-as-needed -Wall $(BUILD_FLAGS)
 LDFLAGS += -lrt -lpthread
-bins = posix_timers nanosleep inconsistency-check nsleep-lat raw_skew \
-	set-timer-lat threadtest mqueue-lat valid-adjtimex \
-	alarmtimer-suspend change_skew skew_consistency clocksource-switch \
-	leap-a-day leapcrash set-tai set-2038
-
-all: ${bins}
 
 # these are all "safe" tests that don't modify
 # system time or require escalated privledges
-run_tests: all
-	./posix_timers
-	./nanosleep
-	./nsleep-lat
-	./set-timer-lat
-	./mqueue-lat
-	./inconsistency-check
-	./raw_skew
-	./threadtest -t 30 -n 8
+TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \
+	     inconsistency-check raw_skew threadtest
+
+TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \
+		      skew_consistency clocksource-switch leap-a-day \
+		      leapcrash set-tai set-2038
+
+bins = $(TEST_PROGS) $(TEST_PROGS_EXTENDED)
+
+all: ${bins}
+
+include ../lib.mk
 
 # these tests require escalated privledges
 # and may modify the system time or trigger
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH] selftests: Fix kcmp build to not require headers install
From: Shuah Khan @ 2015-03-19  0:18 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: gorcunov, tranmanphong, akpm, linux-api, linux-kernel, Shuah Khan
In-Reply-To: <1426723357.14101.1.camel@ellerman.id.au>

On 03/18/2015 06:02 PM, Michael Ellerman wrote:
> On Wed, 2015-03-18 at 09:04 -0600, Shuah Khan wrote:
>> On 03/16/2015 05:00 AM, Michael Ellerman wrote:
>>> On Fri, 2015-03-13 at 19:45 -0600, Shuah Khan wrote:
>>>> Change CFLAGS to look in uapi to allow kcmp to be built without
>>>> requiring headers install. This will make it easier to run tests
>>>> without going through the headers install step.
>>>>
>>>> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
>>>> ---
>>>>  tools/testing/selftests/kcmp/Makefile | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
>>>> index ff0eefd..d405ad4 100644
>>>> --- a/tools/testing/selftests/kcmp/Makefile
>>>> +++ b/tools/testing/selftests/kcmp/Makefile
>>>> @@ -1,5 +1,5 @@
>>>>  CC := $(CROSS_COMPILE)$(CC)
>>>> -CFLAGS += -I../../../../usr/include/
>>>> +CFLAGS += -I../../../../include/uapi -I../../../../usr/include/
>>>
>>> Hi Shuah,
>>>
>>> Sorry but this is wrong. The contents of include/uapi are not the same as the
>>> exported kernel headers.
>>>
>>> Mixing the unprocessed kernel headers with user headers leads to all sorts of
>>> mess, eg:
>>>
>>> $ cc -I../../../../include/uapi -I../../../../usr/include/ kcmp_test.c   -o kcmp_test
>>
>> Do you see this error when you run the compile using kcmp Makefile
>> or using make ksefltest target?
> 
> $ cd tools/testing/selftests
> $ make TARGETS=kcmp
> for TARGET in kcmp; do \
> 	make -C $TARGET; \
> done;
> make[1]: Entering directory '/home/michael/work/topics/powerpc-maint/src/misc-test/tools/testing/selftests/kcmp'
> ppc64-cc -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c   -o kcmp_test
> In file included from /usr/powerpc-linux-gnu/include/asm/ptrace.h:27:0,
>                  from /usr/powerpc-linux-gnu/include/asm/sigcontext.h:11,
>                  from /usr/powerpc-linux-gnu/include/bits/sigcontext.h:27,
>                  from /usr/powerpc-linux-gnu/include/signal.h:332,
>                  from kcmp_test.c:5:
> ../../../../include/uapi/linux/types.h:9:2: warning: #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders" [-Wcpp]
>  #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders"
>   ^
> 

Hmm..  I don't see this when I run it on x86

make TARGETS=kcmp
for TARGET in kcmp; do \
	make -C $TARGET; \
done;
make[1]: Entering directory
'/lkml/linux-kselftest/tools/testing/selftests/kcmp'
cc -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c
  -o kcmp_test
make[1]: Leaving directory
'/lkml/linux-kselftest/tools/testing/selftests/kcmp'

oh well. I don't want to make the change, since it introduces warnings
on powerpc.

-- Shuah


-- 
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply

* Re: [PATCH v2] selftests: Fix build failures when invoked from kselftest target
From: Michael Ellerman @ 2015-03-19  0:05 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1426701942-25217-1-git-send-email-shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

On Wed, 2015-03-18 at 12:05 -0600, Shuah Khan wrote:
> Several tests that rely on implicit build rules fail to build,
> when invoked from the main Makefile kselftest target. These
> failures are due to --no-builtin-rules and --no-builtin-variables
> options set in the inherited MAKEFLAGS.
> 
> --no-builtin-rules eliminates the use of built-in implicit rules
> and --no-builtin-variables is for not defining built-in variables.
> These two options override the use of implicit rules resulting in
> build failures. In addition, inherited LDFLAGS result in build
> failures and there is no need to define LDFLAGS.  Clear LDFLAGS
> and MAKEFLAG when make is invoked from the main Makefile kselftest
> target. Fixing this at selftests Makefile avoids changing the main
> Makefile and keeps this change self contained at selftests level.

Yep this looks good.

Acked-by: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>

cheers

^ permalink raw reply

* Re: [PATCH] selftests: Fix kcmp build to not require headers install
From: Michael Ellerman @ 2015-03-19  0:02 UTC (permalink / raw)
  To: Shuah Khan
  Cc: gorcunov-GEFAQzZX7r8dnm+yROfE0A,
	tranmanphong-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <55099415.5060400-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

On Wed, 2015-03-18 at 09:04 -0600, Shuah Khan wrote:
> On 03/16/2015 05:00 AM, Michael Ellerman wrote:
> > On Fri, 2015-03-13 at 19:45 -0600, Shuah Khan wrote:
> >> Change CFLAGS to look in uapi to allow kcmp to be built without
> >> requiring headers install. This will make it easier to run tests
> >> without going through the headers install step.
> >>
> >> Signed-off-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>
> >> ---
> >>  tools/testing/selftests/kcmp/Makefile | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
> >> index ff0eefd..d405ad4 100644
> >> --- a/tools/testing/selftests/kcmp/Makefile
> >> +++ b/tools/testing/selftests/kcmp/Makefile
> >> @@ -1,5 +1,5 @@
> >>  CC := $(CROSS_COMPILE)$(CC)
> >> -CFLAGS += -I../../../../usr/include/
> >> +CFLAGS += -I../../../../include/uapi -I../../../../usr/include/
> > 
> > Hi Shuah,
> > 
> > Sorry but this is wrong. The contents of include/uapi are not the same as the
> > exported kernel headers.
> > 
> > Mixing the unprocessed kernel headers with user headers leads to all sorts of
> > mess, eg:
> > 
> > $ cc -I../../../../include/uapi -I../../../../usr/include/ kcmp_test.c   -o kcmp_test
> 
> Do you see this error when you run the compile using kcmp Makefile
> or using make ksefltest target?

$ cd tools/testing/selftests
$ make TARGETS=kcmp
for TARGET in kcmp; do \
	make -C $TARGET; \
done;
make[1]: Entering directory '/home/michael/work/topics/powerpc-maint/src/misc-test/tools/testing/selftests/kcmp'
ppc64-cc -I../../../../include/uapi -I../../../../usr/include/    kcmp_test.c   -o kcmp_test
In file included from /usr/powerpc-linux-gnu/include/asm/ptrace.h:27:0,
                 from /usr/powerpc-linux-gnu/include/asm/sigcontext.h:11,
                 from /usr/powerpc-linux-gnu/include/bits/sigcontext.h:27,
                 from /usr/powerpc-linux-gnu/include/signal.h:332,
                 from kcmp_test.c:5:
../../../../include/uapi/linux/types.h:9:2: warning: #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders" [-Wcpp]
 #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders"
  ^

etc.

I don't use the kselftest target, because I'm cross compiling, and it assumes
you are running natively (it runs the tests). (And it doesn't work at the
moment because of -rR and LDFLAGS).

> Do you consider running cc directly isn't a valid use-case for
> this test?

Running cc directly is completely legitimate. It makes no difference anyway.

> I see your point that kernel headers shouldn't be included from user
> space. 

Good!

> But kcmp_test isn't really a regular use space app. It is
> intended for testing kernel. 

It is 100% a regular user space app. It tests the kernel using syscalls, just
like every other user space app.

> There are other tests that do include uapi headers.

They are also wrong and should be fixed.
 
> Requiring header install is a big hammer. Thinking about a developer
> use-case, if a developer is building and testing several kernels, it
> is tedious install headers each time and remembering to cleanup headers.

But if you are testing a kernel without installing the headers for that
kernel, you are not actually testing that kernel. You're testing that kernel
with some other kernel's headers, which is something different.

Because we are (generally) careful not to break the userspace API, it usually
works to build things with one kernels headers and run them on another kernel.
But if you're interested in testing a particular kernel version properly then
you absolutely must install that kernel's headers before building the tests.

cheers

^ permalink raw reply

* Re: [PATCH v3  00/15] Add support to STMicroelectronics STM32 family
From: Chanwoo Choi @ 2015-03-18 23:35 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: u.kleine-koenig, afaerber, geert, Rob Herring, Philipp Zabel,
	Linus Walleij, Arnd Bergmann, stefan, pmeerw, pebolle,
	Jonathan Corbet, Pawel Moll, Mark Rutland, Ian Campbell,
	Kumar Gala, Russell King, Daniel Lezcano, Thomas Gleixner,
	Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, David S. Miller,
	Mauro Carvalho Chehab, Joe Perches, Antti Palosaari, Tejun Heo
In-Reply-To: <1426197361-19290-1-git-send-email-maxime.coquelin@st.com>

Hi Maxime,

I tested this patch-set on Linux 4.0-rc4 for STM32F429IDISCOVERY board.
I completed the kernel booting successfully without any modification .

Looks good to me for this patch-set.

Tested-by: Chanwoo Choi <cw00.choi@samsung.com>

Best Regards,
Chanwoo Choi

On 03/13/2015 06:55 AM, Maxime Coquelin wrote:
> From: Maxime Coquelin <mcoquelin.stm32@gmail.com>
> 
> This third round tries to address most of the comments made on previous series.
> 
> It contains few less patches, as the reset_controller_of_init() patch has been
> removed, now that the bootlaoder handles the reset of the timers.
> 
> The pinctrl driver has also been removed after Linus review.
> It will be reworked to use the generic pinconf bindings, and may contain
> changes for other machines (Mediatek), to add support for pinmux property
> handling directly in pinconf-generic.
> 
> STM32 MCUs are Cortex-M CPU, used in various applications (consumer
> electronics, industrial applications, hobbyists...).
> Datasheets, user and programming manuals are publicly available on
> STMicroelectronics website.
> 
> With this series applied, the STM32F419 Discovery can boot succesfully.
> 
> 
> Changes since v2:
> -----------------
>  - Remove pinctrl driver from the series. 
>  - Remove reset_controller_of_init(), and reset the timers in the bootloader
>  - Add HW flow contrl property for serial driver
>  - Lots of changes in the DTS file, as per Andreas recommendations
>  - Some Kconfig clean-ups
>  - Adapt the config to be compatible with Andreas' bootwrapper, except UART port.
>  - Various fixes in documentation
> 
> Changes since v1:
> -----------------
>  - Move bindings documentation in their own patches (Andreas)
>  - Rename ARM System timer to armv7m-systick (Rob)
>  - Add clock-frequency property handling in armv7m-systick (Rob)
>  - Re-factor the reset controllers into a single controller (Philipp)
>  - Add kerneldoc to reset_controller_of_init (Philipp)
>  - Add named constants in include/dt-bindings/reset/ (Philipp)
>  - Make pinctrl driver to depend on ARCH_STM32 or COMPILE_TEST (Geert)
>  - Introduce CPUV7M_NUM_IRQ config flag to indicate the number of interrupts
> supported by the MCU, in order to limit memory waste in vectors' table (Uwe)
> 
> 
> Maxime Coquelin (15):
>   scripts: link-vmlinux: Don't pass page offset to kallsyms if XIP
>     Kernel
>   ARM: ARMv7-M: Enlarge vector table up to 256 entries
>   dt-bindings: Document the ARM System timer bindings
>   clocksource: Add ARM System timer driver
>   dt-bindings: Document the STM32 reset bindings
>   drivers: reset: Add STM32 reset driver
>   dt-bindings: Document the STM32 timer bindings
>   clockevent: Add STM32 Timer driver
>   dt-bindings: Document the STM32 USART bindings
>   serial: stm32-usart: Add STM32 USART Driver
>   ARM: Add STM32 family machine
>   ARM: dts: Add ARM System timer as clockevent in armv7m
>   ARM: dts: Introduce STM32F429 MCU
>   ARM: configs: Add STM32 defconfig
>   MAINTAINERS: Add entry for STM32 MCUs
> 
>  Documentation/arm/stm32/overview.txt               |  32 +
>  Documentation/arm/stm32/stm32f429-overview.txt     |  22 +
>  .../devicetree/bindings/arm/armv7m_systick.txt     |  26 +
>  .../devicetree/bindings/reset/st,stm32-rcc.txt     | 102 +++
>  .../devicetree/bindings/serial/st,stm32-usart.txt  |  32 +
>  .../devicetree/bindings/timer/st,stm32-timer.txt   |  22 +
>  MAINTAINERS                                        |   8 +
>  arch/arm/Kconfig                                   |  18 +
>  arch/arm/Makefile                                  |   1 +
>  arch/arm/boot/dts/Makefile                         |   1 +
>  arch/arm/boot/dts/armv7-m.dtsi                     |   6 +
>  arch/arm/boot/dts/stm32f429-disco.dts              |  71 +++
>  arch/arm/boot/dts/stm32f429.dtsi                   | 226 +++++++
>  arch/arm/configs/stm32_defconfig                   |  71 +++
>  arch/arm/kernel/entry-v7m.S                        |  13 +-
>  arch/arm/mach-stm32/Makefile                       |   1 +
>  arch/arm/mach-stm32/Makefile.boot                  |   3 +
>  arch/arm/mach-stm32/board-dt.c                     |  19 +
>  arch/arm/mm/Kconfig                                |  15 +
>  drivers/clocksource/Kconfig                        |  15 +
>  drivers/clocksource/Makefile                       |   2 +
>  drivers/clocksource/armv7m_systick.c               |  78 +++
>  drivers/clocksource/timer-stm32.c                  | 184 ++++++
>  drivers/reset/Makefile                             |   1 +
>  drivers/reset/reset-stm32.c                        | 125 ++++
>  drivers/tty/serial/Kconfig                         |  17 +
>  drivers/tty/serial/Makefile                        |   1 +
>  drivers/tty/serial/stm32-usart.c                   | 695 +++++++++++++++++++++
>  include/uapi/linux/serial_core.h                   |   3 +
>  scripts/link-vmlinux.sh                            |   2 +-
>  30 files changed, 1807 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/arm/stm32/overview.txt
>  create mode 100644 Documentation/arm/stm32/stm32f429-overview.txt
>  create mode 100644 Documentation/devicetree/bindings/arm/armv7m_systick.txt
>  create mode 100644 Documentation/devicetree/bindings/reset/st,stm32-rcc.txt
>  create mode 100644 Documentation/devicetree/bindings/serial/st,stm32-usart.txt
>  create mode 100644 Documentation/devicetree/bindings/timer/st,stm32-timer.txt
>  create mode 100644 arch/arm/boot/dts/stm32f429-disco.dts
>  create mode 100644 arch/arm/boot/dts/stm32f429.dtsi
>  create mode 100644 arch/arm/configs/stm32_defconfig
>  create mode 100644 arch/arm/mach-stm32/Makefile
>  create mode 100644 arch/arm/mach-stm32/Makefile.boot
>  create mode 100644 arch/arm/mach-stm32/board-dt.c
>  create mode 100644 drivers/clocksource/armv7m_systick.c
>  create mode 100644 drivers/clocksource/timer-stm32.c
>  create mode 100644 drivers/reset/reset-stm32.c
>  create mode 100644 drivers/tty/serial/stm32-usart.c
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox