Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH v3] Add virtio-input driver.
From: Gerd Hoffmann @ 2015-03-24 10:26 UTC (permalink / raw)
  To: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b
  Cc: virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mst-H+wXaHxf7aLQT0dZR+AlfA, David Herrmann, Dmitry Torokhov,
	Rusty Russell, open list, open list:ABI/API
In-Reply-To: <1427182321-19451-1-git-send-email-kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

  Hi,

> +static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
> +{
> +	u32 mi, ma, re, fu, fl;
> +
> +	virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.res, &re);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);
> +	input_set_abs_params(vi->idev, abs, mi, ma, fu, fl);
> +	input_abs_set_res(vi->idev, abs, re);
> +}

> +struct virtio_input_absinfo {
> +	__virtio32  min;
> +	__virtio32  max;
> +	__virtio32  fuzz;
> +	__virtio32  flat;
> +	__virtio32  res;
> +};

Damn, had sparse disabled for the test builds.  [ Too bad there are way
too many warnings on a full kernel build so having sparse enabled all
the time doesn't fly. ]

So this doesn't work either.

Hmm, back to using "u32" in the virtio config structs?

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH v2 1/7] eeprom: Add a simple EEPROM framework for eeprom providers
From: Srinivas Kandagatla @ 2015-03-24  9:18 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Maxime Ripard,
	Rob Herring, Pawel Moll, Kumar Gala,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd, Arnd Bergmann,
	Greg Kroah-Hartman, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <55108E2B.7050305-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>



On 23/03/15 22:05, Srinivas Kandagatla wrote:
>
>
> On 23/03/15 21:09, Mark Brown wrote:
>> On Fri, Mar 13, 2015 at 09:50:14AM +0000, Srinivas Kandagatla wrote:
>>
>> A couple of *very* minor points below, otherwise this looks OK to me.
>>
> Thankyou for the review.
>
>>> +struct eeprom_device *eeprom_register(struct eeprom_config *config)
>>> +{
>>> +    struct eeprom_device *eeprom;
>>> +    int rval;
>>> +
>>> +    if (!config->regmap || !config->size) {
>>> +        dev_err(config->dev, "Regmap not found\n");
>>> +        return ERR_PTR(-EINVAL);
>>> +    }
>>
>> You have a struct device in the config and the regmap API has
>> dev_get_regmap() which for most devices that don't have multiple regmaps
>> will give the right regmap.  It would be nice to support this as a
>> convenience for users.
> Yes, sure that makes sense, I will give it a try.
>
I did try your suggestion, by which I could remove the regmap from 
config. One thing I did not like was eeprom-core getting size/stride 
info directly from providers and regmap from regmap apis. I was 
wondering if we could take a step further and introduce new regmap 
helpers like

regmap_get_size(regmap)
regmap_get_stride(regmap)

Which would be give eeprom-core the size and stride info, doing this way 
would cut down regmap related things from eeprom_config structure to 
minimal and also the source of information would come from just regmap apis.


--srini

>>
>>> +    eeprom = kzalloc(sizeof(*eeprom), GFP_KERNEL);
>>> +    if (!eeprom)
>>> +        return ERR_PTR(-ENOMEM);
>>
>> ...
>>
>>> +    rval = device_add(&eeprom->dev);
>>> +    if (rval)
>>> +        return ERR_PTR(rval);
>>
>> Don't you need a kfree() if device_add() fails?
> I will fix it in next version.
>
> --srini
>>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/3] idle memory tracking
From: Vladimir Davydov @ 2015-03-24  7:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Johannes Weiner, Michal Hocko, Greg Thelen,
	Michel Lespinasse, David Rientjes, Pavel Emelyanov,
	Cyrill Gorcunov, Jonathan Corbet, linux-api, linux-doc, linux-mm,
	linux-kernel
In-Reply-To: <cover.1426706637.git.vdavydov@parallels.com>

On Wed, Mar 18, 2015 at 11:44:33PM +0300, Vladimir Davydov wrote:
> Usage:
> 
>  1. Write 1 to /proc/sys/vm/set_idle.
> 
>     This will set the IDLE flag for all user pages. The IDLE flag is cleared
>     when the page is read or the ACCESS/YOUNG bit is cleared in any PTE pointing
>     to the page. It is also cleared when the page is freed.
> 
>  2. Wait some time.
> 
>  3. Write 6 to /proc/PID/clear_refs for each PID of interest.
> 
>     This will clear the IDLE flag for recently accessed pages.
> 
>  4. Count the number of idle pages as reported by /proc/kpageflags. One may use
>     /proc/PID/pagemap and/or /proc/kpagecgroup to filter pages that belong to a
>     certain application/container.

Any more thoughts on this? I am particularly interested in the user
interface. I think that /proc/kpagecgroup is OK, but I have my
reservations about using /proc/sys/vm/set_idle and /proc/PID/clear_refs
for setting and clearing the idle flag. The point is it is impossible to
scan memory for setting/clearing page idle flags in the background with
some predefined rate - one has to scan it all at once, which might
result in CPU load spikes on huge machines with TBs of RAM. May be, we'd
better introduce /proc/sys/vm/{set_idle,clear_refs_idle}, which would
receive pfn range to set/clear idle flags?

Any thoughts/ideas are more than welcome.

Thanks,
Vladimir

^ permalink raw reply

* [PATCH v3] Add virtio-input driver.
From: Gerd Hoffmann @ 2015-03-24  7:32 UTC (permalink / raw)
  To: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mst-H+wXaHxf7aLQT0dZR+AlfA, David Herrmann, Dmitry Torokhov,
	Gerd Hoffmann, Rusty Russell, open list, open list:ABI/API

virtio-input is basically evdev-events-over-virtio, so this driver isn't
much more than reading configuration from config space and forwarding
incoming events to the linux input layer.

Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 MAINTAINERS                       |   6 +
 drivers/virtio/Kconfig            |  10 ++
 drivers/virtio/Makefile           |   1 +
 drivers/virtio/virtio_input.c     | 313 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_input.h |  76 +++++++++
 7 files changed, 408 insertions(+)
 create mode 100644 drivers/virtio/virtio_input.c
 create mode 100644 include/uapi/linux/virtio_input.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 358eb01..6f233dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10442,6 +10442,12 @@ S:	Maintained
 F:	drivers/vhost/
 F:	include/uapi/linux/vhost.h
 
+VIRTIO INPUT DRIVER
+M:	Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
+S:	Maintained
+F:	drivers/virtio/virtio_input.c
+F:	include/uapi/linux/virtio_input.h
+
 VIA RHINE NETWORK DRIVER
 M:	Roger Luethi <rl-7uj+XXdSDtwfv37vnLkPlQ@public.gmane.org>
 S:	Maintained
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index b546da5..cab9f3f 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -48,6 +48,16 @@ config VIRTIO_BALLOON
 
 	 If unsure, say M.
 
+config VIRTIO_INPUT
+	tristate "Virtio input driver"
+	depends on VIRTIO
+	depends on INPUT
+	---help---
+	 This driver supports virtio input devices such as
+	 keyboards, mice and tablets.
+
+	 If unsure, say M.
+
  config VIRTIO_MMIO
 	tristate "Platform bus driver for memory mapped virtio devices"
 	depends on HAS_IOMEM
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index d85565b..41e30e3 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
new file mode 100644
index 0000000..cf112b2
--- /dev/null
+++ b/drivers/virtio/virtio_input.c
@@ -0,0 +1,313 @@
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/input.h>
+
+#include <uapi/linux/virtio_ids.h>
+#include <uapi/linux/virtio_input.h>
+
+struct virtio_input {
+	struct virtio_device       *vdev;
+	struct input_dev           *idev;
+	char                       name[64];
+	char                       serial[64];
+	char                       phys[64];
+	struct virtqueue           *evt, *sts;
+	struct virtio_input_event  evts[64];
+	spinlock_t                 lock;
+};
+
+static void virtinput_queue_evtbuf(struct virtio_input *vi,
+				   struct virtio_input_event *evtbuf)
+{
+	struct scatterlist sg[1];
+
+	sg_init_one(sg, evtbuf, sizeof(*evtbuf));
+	virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC);
+}
+
+static void virtinput_recv_events(struct virtqueue *vq)
+{
+	struct virtio_input *vi = vq->vdev->priv;
+	struct virtio_input_event *event;
+	unsigned long flags;
+	unsigned int len;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	while ((event = virtqueue_get_buf(vi->evt, &len)) != NULL) {
+		input_event(vi->idev,
+			    le16_to_cpu(event->type),
+			    le16_to_cpu(event->code),
+			    le32_to_cpu(event->value));
+		virtinput_queue_evtbuf(vi, event);
+	}
+	virtqueue_kick(vq);
+	spin_unlock_irqrestore(&vi->lock, flags);
+}
+
+static int virtinput_send_status(struct virtio_input *vi,
+				 u16 type, u16 code, s32 value)
+{
+	struct virtio_input_event *stsbuf;
+	struct scatterlist sg[1];
+	unsigned long flags;
+	int rc;
+
+	stsbuf = kzalloc(sizeof(*stsbuf), GFP_ATOMIC);
+	if (!stsbuf)
+		return -ENOMEM;
+
+	stsbuf->type  = cpu_to_le16(type);
+	stsbuf->code  = cpu_to_le16(code);
+	stsbuf->value = cpu_to_le32(value);
+	sg_init_one(sg, stsbuf, sizeof(*stsbuf));
+
+	spin_lock_irqsave(&vi->lock, flags);
+	rc = virtqueue_add_outbuf(vi->sts, sg, 1, stsbuf, GFP_ATOMIC);
+	virtqueue_kick(vi->sts);
+	spin_unlock_irqrestore(&vi->lock, flags);
+
+	if (rc != 0)
+		kfree(stsbuf);
+	return rc;
+}
+
+static void virtinput_recv_status(struct virtqueue *vq)
+{
+	struct virtio_input *vi = vq->vdev->priv;
+	struct virtio_input_event *stsbuf;
+	unsigned long flags;
+	unsigned int len;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	while ((stsbuf = virtqueue_get_buf(vi->sts, &len)) != NULL)
+		kfree(stsbuf);
+	spin_unlock_irqrestore(&vi->lock, flags);
+}
+
+static int virtinput_status(struct input_dev *idev, unsigned int type,
+			    unsigned int code, int value)
+{
+	struct virtio_input *vi = input_get_drvdata(idev);
+
+	return virtinput_send_status(vi, type, code, value);
+}
+
+static size_t virtinput_cfg_select(struct virtio_input *vi,
+				   u8 select, u8 subsel)
+{
+	u8 size;
+
+	virtio_cwrite(vi->vdev, struct virtio_input_config, select, &select);
+	virtio_cwrite(vi->vdev, struct virtio_input_config, subsel, &subsel);
+	virtio_cread(vi->vdev, struct virtio_input_config, size, &size);
+	return size;
+}
+
+static void virtinput_cfg_bits(struct virtio_input *vi, int select, int subsel,
+			       unsigned long *bits, unsigned int bitcount)
+{
+	unsigned int bit;
+	size_t bytes;
+	u8 *virtio_bits;
+
+	bytes = virtinput_cfg_select(vi, select, subsel);
+	if (!bytes)
+		return;
+	if (bitcount > bytes*8)
+		bitcount = bytes*8;
+
+	/*
+	 * Bitmap in virtio config space is a simple stream of bytes,
+	 * with the first byte carrying bits 0-7, second bits 8-15 and
+	 * so on.
+	 */
+	virtio_bits = kzalloc(bytes, GFP_KERNEL);
+	if (!virtio_bits)
+		return;
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
+			   virtio_bits, bytes);
+	for (bit = 0; bit < bitcount; bit++) {
+		if (virtio_bits[bit / 8] & (1 << (bit % 8)))
+			__set_bit(bit, bits);
+	}
+	kfree(virtio_bits);
+
+	if (select == VIRTIO_INPUT_CFG_EV_BITS)
+		__set_bit(subsel, vi->idev->evbit);
+}
+
+static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
+{
+	u32 mi, ma, re, fu, fl;
+
+	virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.res, &re);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);
+	input_set_abs_params(vi->idev, abs, mi, ma, fu, fl);
+	input_abs_set_res(vi->idev, abs, re);
+}
+
+static int virtinput_init_vqs(struct virtio_input *vi)
+{
+	struct virtqueue *vqs[2];
+	vq_callback_t *cbs[] = { virtinput_recv_events,
+				 virtinput_recv_status };
+	static const char * names[] = { "events", "status" };
+	int i, err, size;
+
+	err = vi->vdev->config->find_vqs(vi->vdev, 2, vqs, cbs, names);
+	if (err)
+		return err;
+	vi->evt = vqs[0];
+	vi->sts = vqs[1];
+
+	size = virtqueue_get_vring_size(vi->evt);
+	if (size > ARRAY_SIZE(vi->evts))
+		size = ARRAY_SIZE(vi->evts);
+	for (i = 0; i < size; i++)
+		virtinput_queue_evtbuf(vi, &vi->evts[i]);
+	virtqueue_kick(vi->evt);
+
+	return 0;
+}
+
+static int virtinput_probe(struct virtio_device *vdev)
+{
+	struct virtio_input *vi;
+	size_t size;
+	int abs, err;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+		return -ENODEV;
+
+	vi = kzalloc(sizeof(*vi), GFP_KERNEL);
+	if (!vi)
+		return -ENOMEM;
+
+	vdev->priv = vi;
+	vi->vdev = vdev;
+	spin_lock_init(&vi->lock);
+
+	err = virtinput_init_vqs(vi);
+	if (err)
+		goto err_init_vq;
+
+	vi->idev = input_allocate_device();
+	if (!vi->idev) {
+		err = -ENOMEM;
+		goto err_input_alloc;
+	}
+	input_set_drvdata(vi->idev, vi);
+
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_NAME, 0);
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
+			   vi->name, min(size, sizeof(vi->name)));
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_SERIAL, 0);
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config, u),
+			   vi->serial, min(size, sizeof(vi->serial)));
+	snprintf(vi->phys, sizeof(vi->phys),
+		 "virtio%d/input0", vdev->index);
+	vi->idev->name = vi->name;
+	vi->idev->phys = vi->phys;
+	vi->idev->uniq = vi->serial;
+
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_DEVIDS, 0);
+	if (size >= 8) {
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.bustype, &vi->idev->id.bustype);
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.vendor, &vi->idev->id.vendor);
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.product, &vi->idev->id.product);
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.version, &vi->idev->id.version);
+	} else {
+		vi->idev->id.bustype = BUS_VIRTUAL;
+	}
+
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_PROP_BITS, 0,
+			   vi->idev->propbit, INPUT_PROP_CNT);
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REP);
+	if (size)
+		__set_bit(EV_REP, vi->idev->evbit);
+
+	vi->idev->dev.parent = &vdev->dev;
+	vi->idev->event = virtinput_status;
+
+	/* device -> kernel */
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_KEY,
+			   vi->idev->keybit, KEY_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REL,
+			   vi->idev->relbit, REL_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_ABS,
+			   vi->idev->absbit, ABS_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_MSC,
+			   vi->idev->mscbit, MSC_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SW,
+			   vi->idev->swbit,  SW_CNT);
+
+	/* kernel -> device */
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_LED,
+			   vi->idev->ledbit, LED_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SND,
+			   vi->idev->sndbit, SND_CNT);
+
+	if (test_bit(EV_ABS, vi->idev->evbit)) {
+		for (abs = 0; abs < ABS_CNT; abs++) {
+			if (!test_bit(abs, vi->idev->absbit))
+				continue;
+			virtinput_cfg_abs(vi, abs);
+		}
+	}
+	virtio_device_ready(vdev);
+
+	err = input_register_device(vi->idev);
+	if (err)
+		goto err_input_register;
+
+	return 0;
+
+err_input_register:
+	input_free_device(vi->idev);
+err_input_alloc:
+	vdev->config->del_vqs(vdev);
+err_init_vq:
+	kfree(vi);
+	return err;
+}
+
+static void virtinput_remove(struct virtio_device *vdev)
+{
+	struct virtio_input *vi = vdev->priv;
+
+	input_unregister_device(vi->idev);
+	vdev->config->del_vqs(vdev);
+	kfree(vi);
+}
+
+static unsigned int features[] = {
+};
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_INPUT, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_input_driver = {
+	.driver.name         = KBUILD_MODNAME,
+	.driver.owner        = THIS_MODULE,
+	.feature_table       = features,
+	.feature_table_size  = ARRAY_SIZE(features),
+	.id_table            = id_table,
+	.probe               = virtinput_probe,
+	.remove              = virtinput_remove,
+};
+
+module_virtio_driver(virtio_input_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Virtio input device driver");
+MODULE_AUTHOR("Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>");
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 68ceb97..04b829e 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -430,6 +430,7 @@ header-y += virtio_blk.h
 header-y += virtio_config.h
 header-y += virtio_console.h
 header-y += virtio_ids.h
+header-y += virtio_input.h
 header-y += virtio_net.h
 header-y += virtio_pci.h
 header-y += virtio_ring.h
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 284fc3a..5f60aa4 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -39,5 +39,6 @@
 #define VIRTIO_ID_9P		9 /* 9p virtio console */
 #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
 #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
+#define VIRTIO_ID_INPUT        18 /* virtio input */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_input.h b/include/uapi/linux/virtio_input.h
new file mode 100644
index 0000000..7fceabd
--- /dev/null
+++ b/include/uapi/linux/virtio_input.h
@@ -0,0 +1,76 @@
+#ifndef _LINUX_VIRTIO_INPUT_H
+#define _LINUX_VIRTIO_INPUT_H
+/* This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE. */
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+
+enum virtio_input_config_select {
+	VIRTIO_INPUT_CFG_UNSET      = 0x00,
+	VIRTIO_INPUT_CFG_ID_NAME    = 0x01,
+	VIRTIO_INPUT_CFG_ID_SERIAL  = 0x02,
+	VIRTIO_INPUT_CFG_ID_DEVIDS  = 0x03,
+	VIRTIO_INPUT_CFG_PROP_BITS  = 0x10,
+	VIRTIO_INPUT_CFG_EV_BITS    = 0x11,
+	VIRTIO_INPUT_CFG_ABS_INFO   = 0x12,
+};
+
+struct virtio_input_absinfo {
+	__virtio32  min;
+	__virtio32  max;
+	__virtio32  fuzz;
+	__virtio32  flat;
+	__virtio32  res;
+};
+
+struct virtio_input_devids {
+	__virtio16  bustype;
+	__virtio16  vendor;
+	__virtio16  product;
+	__virtio16  version;
+};
+
+struct virtio_input_config {
+	__u8    select;
+	__u8    subsel;
+	__u8    size;
+	__u8    reserved;
+	union {
+		char string[128];
+		__u8 bitmap[128];
+		struct virtio_input_absinfo abs;
+		struct virtio_input_devids ids;
+	} u;
+};
+
+struct virtio_input_event {
+	__le16 type;
+	__le16 code;
+	__le32 value;
+};
+
+#endif /* _LINUX_VIRTIO_INPUT_H */
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Aliaksey Kandratsenka @ 2015-03-24  5:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Daniel Micay, Andrew Morton, linux-mm, linux-api, Rik van Riel,
	Hugh Dickins, Mel Gorman, Johannes Weiner, Michal Hocko,
	Andy Lutomirski, google-perftools@googlegroups.com
In-Reply-To: <20150323051731.GA2616341@devbig257.prn2.facebook.com>

Hi.

On Sun, Mar 22, 2015 at 10:17 PM, Shaohua Li <shli@fb.com> wrote:
> On Sat, Mar 21, 2015 at 11:06:14PM -0700, Aliaksey Kandratsenka wrote:

>> But now I realize that it is more interesting than that. I.e. because as
>> Daniel
>> pointed out, mremap holds mmap_sem exclusively, while page faults are
>> holding it
>> for read. That could be optimized of course. Either by separate "teleport
>> ptes"
>> syscall (again, as noted by Daniel), or by having mremap drop mmap_sem for
>> write
>> and retaking it for read for "moving pages" part of work. Being not really
>> familiar with kernel code I have no idea if that's doable or not. But it
>> looks
>> like it might be quite important.
>
> Does mmap_sem contend in your workload? Otherwise, there is no big
> difference of read or write lock. memcpy to new allocation could trigger
> page fault, new page allocation overhead and etc.

Well, I don't have any workloads. I'm just maintaining a library that
others run various workloads on. Part of the problem is lack of good
and varied malloc benchmarks which could allow us that prevent
regression. So this makes me a bit more cautious on performance
matters.

But I see your point. Indeed I have no evidence at all that exclusive
locking might cause observable performance difference.

>> b) is that optimization worth having at all ?
>>
>> After all, memcpy is actually known to be fast. I understand that copying
>> memory
>> in user space can be slowed down by minor page faults (results below seem to
>> confirm that). But this is something where either allocator may retain
>> populated
>> pages a bit longer or where kernel could help. E.g. maybe by exposing
>> something
>> similar to MAP_POPULATE in madvise, or even doing some safe combination of
>> madvise and MAP_UNINITIALIZED.
>
> This option will make allocator use more memory than expected.
> Eventually the memory must be reclaimed, which has big overhead too.
>
>> I've played with Daniel's original benchmark (copied from
>> http://marc.info/?l=linux-mm&m=141230769431688&w=2) with some tiny
>> modifications:
>>

...

>> Another notable thing is how mlock effectively disables MADV_DONTNEED for
>> jemalloc{1,2} and tcmalloc, lowers page faults count and thus improves
>> runtime. It can be seen that tcmalloc+mlock on thp-less configuration is
>> slightly better on runtime to glibc. The later spends a ton of time in
>> kernel,
>> probably handling minor page faults, and the former burns cpu in user space
>> doing memcpy-s. So "tons of memcpys" seems to be competitive to what glibc
>> is
>> doing in this benchmark.
>
> mlock disables MADV_DONTNEED, so this is an unfair comparsion. With it,
> allocator will use more memory than expected.

Do not agree with unfair. I'm actually hoping MADV_FREE to provide
most if not all of benefits of mlock in this benchmark. I believe it's
not too unreasonable expectation.

>
> I'm kind of confused why we talk about THP, mlock here. When application
> uses allocator, it doesn't need to be forced to use THP or mlock. Can we
> forcus on normal case?

See my note on mlock above.

THP it is actually "normal". I know for certain, that many production
workloads are run on boxes with THP enabled. Red Hat famously ships
it's distros with THP set to "always". And I also know that some other
many production workloads are run on boxes with THP disabled. Also, as
seen above, "teleporting" pages is more efficient with THP due to much
smaller overhead of moving those pages. So I felt it was important not
to omit THP in my runs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Aliaksey Kandratsenka @ 2015-03-24  4:36 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Andrew Morton, Shaohua Li, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Rik van Riel, Hugh Dickins,
	Mel Gorman, Johannes Weiner, Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <550E6D9D.1060507-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hi.

First of all, I'd like to apologize for messing up formatting of my
past email. I've learned my lesson.

On Sun, Mar 22, 2015 at 12:22 AM, Daniel Micay <danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> My initial thinking was that we'd likely use mremap in all cases where
>> we know
>> that touching destination would cause minor page faults (i.e. when
>> destination
>> chunk was MADV_DONTNEED-ed or is brand new mapping). And then also
>> always when
>> size is large enough, i.e. because "teleporting" large count of pages is
>> likely
>> to be faster than copying them.
>>
>> But now I realize that it is more interesting than that. I.e. because as
>> Daniel
>> pointed out, mremap holds mmap_sem exclusively, while page faults are
>> holding it
>> for read. That could be optimized of course. Either by separate
>> "teleport ptes"
>> syscall (again, as noted by Daniel), or by having mremap drop mmap_sem
>> for write
>> and retaking it for read for "moving pages" part of work. Being not really
>> familiar with kernel code I have no idea if that's doable or not. But it
>> looks
>> like it might be quite important.
>
> I think it's doable but it would pessimize the case where the dest VMA
> isn't reusable. It would need to optimistically take the reader lock to
> find out and then drop it. However, userspace knows when this is surely
> going to work and could give it a hint.
>
> I have a good idea about what the *ideal* API for the jemalloc/tcmalloc
> case would be. It would be extremely specific though... they want the
> kernel to move pages from a source VMA to a destination VMA where both
> are anon/private with identical flags so only the reader lock is
> necessary. On top of that, they really want to keep around as many
> destination pages as possible, maybe by swapping as many as possible
> back to the source.
>
> That's *extremely* specific though and I now think the best way to get
> there is by landing this feature and then extending it as necessary down
> the road. An allocator may actually want to manage other kinds of
> mappings itself and it would want the mmap_sem optimization to be an
> optional hint.

Interesting. But what might be other users of MREMAP_NOHOLE/MREMAP_RETAIN ?

I believe it can be argued that "exchange vmas/pages" as separate
syscall is actually more general and thus possibly more useful thing
to have. Regardless of locking. And MREMAP_NOHOLE/MREMAP_RETAIN
functionality can be built on top of that syscall in userspace if
needed (with more than one syscall naturally, but maybe still with
relatively small overhead).

I'm not saying this is good idea, but just asking.

And here is another observation just to make sure that more options
are considered.

Given that mremap is holding mmap_sem exclusively, how about userspace
malloc implementation taking some exclusive malloc lock and doing
normal mremap followed by mmap with MAP_FIXED to fill the hole ? It
might end up having largely same overhead. Well, modulo some extra TLB
flushing. But arguably, reducing TLB flushes for sequence of page
table updates could be usefully addressed separately (e.g. maybe by
matching those syscalls, maybe via syslets).

^ permalink raw reply

* Re: [PATCH V2 2/2] af_packet: pass checksum validation status to the user
From: David Miller @ 2015-03-24  2:01 UTC (permalink / raw)
  To: al.drozdov
  Cc: corbet, dborkman, edumazet, viro, willemb, mst, netdev,
	linux-kernel, tklauser, linux-doc, linux-api
In-Reply-To: <1427091073-8129-2-git-send-email-al.drozdov@gmail.com>

From: Alexander Drozdov <al.drozdov@gmail.com>
Date: Mon, 23 Mar 2015 09:11:13 +0300

> Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
> af_packet user that at least the transport header checksum
> has been already validated.
> 
> For now, the flag may be set for incoming packets only.
> 
> Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>

Applied to net-next

^ permalink raw reply

* Re: [PATCH V2 1/2] af_packet: make tpacket_rcv to not set status value before run_filter
From: David Miller @ 2015-03-24  2:01 UTC (permalink / raw)
  To: al.drozdov
  Cc: corbet, dborkman, edumazet, viro, willemb, mst, netdev,
	linux-kernel, tklauser, linux-doc, linux-api
In-Reply-To: <1427091073-8129-1-git-send-email-al.drozdov@gmail.com>

From: Alexander Drozdov <al.drozdov@gmail.com>
Date: Mon, 23 Mar 2015 09:11:12 +0300

> It is just an optimization. We don't need the value of status variable
> if the packet is filtered.
> 
> Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>

Applied to net-next

^ permalink raw reply

* Re: [PATCH v4 00/14] Add kdbus implementation
From: Eric W. Biederman @ 2015-03-24  0:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Herrmann, Greg Kroah-Hartman, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Linux API,
	linux-kernel@vger.kernel.org, Daniel Mack, Djalal Harouni,
	Linus Torvalds
In-Reply-To: <CALCETrXqYBeZuOWhm9mz_nt+aWPXHFwkQPEAfwBXzDxnAP7f+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Mon, Mar 23, 2015 at 8:28 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi
>>
>> On Thu, Mar 19, 2015 at 4:48 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>> On Thu, Mar 19, 2015 at 4:26 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

>>> But you're comparing to the wrong thing, IMO.  Of course it's much
>>> faster than /proc hackery, but it's probably much slower to do the
>>> metadata operation once per message than to do it when you connect to
>>> the endpoint.  (Gah!  It's a "bus" that could easily have tons of
>>> users but a single "endpoint".  I'm still not used to it.)
>>
>> Yes, of course your assumption is right if you compare against
>> per-connection caches, instead of per-message metadata. But we do
>> support _both_ use-cases, so we don't impose any policy.
>> We still believe "live"-metadata is a crucial feature of kdbus,
>> despite the known performance penalties.
>
> And you still have not described a single use case for which it's
> better than per-connection metadata.
>
> I'm against adding a feature to the kernel (per-message metadata) if
> the primary reason it's being added is to support what appears to be a
> misfeature in *new* userspace that we have no obligation whatsoever to
> be ABI-compatible with.  This is especially true if that feature is
> slower than the alternatives.  This is even more true if this feature
> is *inconsistent* with legacy userspace (i.e. userspace dbus).
>
> I could be wrong about the lack of use cases.  If so, please enlighten
> me.

Please.  I asked this same question on the first revision of this code
to go up for review and I was told that there are no existing users of
dbus that cares.

Right now this looks like a case of the deplorable habit of getting
review comments, and then resubmitting a patch with trivial changes and
ignoring the substantial review comments.  Again and again and again
until the reviewers run out of energy to object.

That seems like a very poor way to add a new ABI that to the kernel that
we will have to support for the next 20 years, because huge swaths of
userspace are going to be using it.

Not to mention that per message meta-data is known to be both a
performance problem but also that it tends to turn in to a security
problem.  It is the kind of information that is easy to mess up and hard
to support long term.

So I agree with Andy that we really need something better than it would
be nice to have.

Eric

^ permalink raw reply

* Re: [PATCH v4 00/14] Add kdbus implementation
From: Andy Lutomirski @ 2015-03-23 23:24 UTC (permalink / raw)
  To: David Herrmann
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Daniel Mack,
	Djalal Harouni
In-Reply-To: <CANq1E4TtXq02Ug22fsTr8j0X6+vJKvyryALiTEMbT4SMQB1j3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Mar 23, 2015 at 8:28 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi
>
> On Thu, Mar 19, 2015 at 4:48 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> On Thu, Mar 19, 2015 at 4:26 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> metadata handling is local to the connection that sends the message.
>>> It does not affect the overall performance of other bus operations in
>>> parallel.
>>
>> Sure it does if it writes to shared cachelines.  Given that you're
>> incrementing refcounts, I'm reasonable sure that you're touching lots
>> of shared cachelines.
>
> Ok, sure, but it's still mostly local to the sending task. We take
> locks and ref-counts on the task-struct and mm, which is for most
> parts local to the CPU the task runs on. But this is inherent to
> accessing this kind of data, which is the fundamental difference in
> our views here, as seen below..

You're also refcounting the struct cred, and there's no good reason
for that to be local.  (It might be a bit more local than intended
because of the absurd things that the key subsystem does to struct
cred, but IMO users should turn that off or the kernel should fix it.)

Even more globally, I think you're touching init_user_ns's refcount in
most scenarios.  That's about as global as it gets.

(Also, is there an easy benchmark to see how much time it takes to
send and receive metadata?  I tried to get the kdbus test to do this,
and I failed.  I probably did it wrong.)

>
>>> Furthermore, it's way faster than collecting the "same" data
>>> via /proc, so I don't think it slows down the overall transaction at
>>> all. If a receiver doesn't want metadata, it should not request it (by
>>> setting the receiver-metadata-mask). If a sender doesn't like the
>>> overhead, it should not send the metadata (by setting the
>>> sender-metadata-mask). Only if both peers set the metadata mask, it
>>> will be transmitted.
>>
>> But you're comparing to the wrong thing, IMO.  Of course it's much
>> faster than /proc hackery, but it's probably much slower to do the
>> metadata operation once per message than to do it when you connect to
>> the endpoint.  (Gah!  It's a "bus" that could easily have tons of
>> users but a single "endpoint".  I'm still not used to it.)
>
> Yes, of course your assumption is right if you compare against
> per-connection caches, instead of per-message metadata. But we do
> support _both_ use-cases, so we don't impose any policy.
> We still believe "live"-metadata is a crucial feature of kdbus,
> despite the known performance penalties.

And you still have not described a single use case for which it's
better than per-connection metadata.

I'm against adding a feature to the kernel (per-message metadata) if
the primary reason it's being added is to support what appears to be a
misfeature in *new* userspace that we have no obligation whatsoever to
be ABI-compatible with.  This is especially true if that feature is
slower than the alternatives.  This is even more true if this feature
is *inconsistent* with legacy userspace (i.e. userspace dbus).

I could be wrong about the lack of use cases.  If so, please enlighten me.

--Andy

^ permalink raw reply

* Re: [PATCH v2 1/7] eeprom: Add a simple EEPROM framework for eeprom providers
From: Srinivas Kandagatla @ 2015-03-23 22:05 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Maxime Ripard,
	Rob Herring, Pawel Moll, Kumar Gala,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd, Arnd Bergmann,
	Greg Kroah-Hartman, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150323210918.GS14954-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>



On 23/03/15 21:09, Mark Brown wrote:
> On Fri, Mar 13, 2015 at 09:50:14AM +0000, Srinivas Kandagatla wrote:
>
> A couple of *very* minor points below, otherwise this looks OK to me.
>
Thankyou for the review.

>> +struct eeprom_device *eeprom_register(struct eeprom_config *config)
>> +{
>> +	struct eeprom_device *eeprom;
>> +	int rval;
>> +
>> +	if (!config->regmap || !config->size) {
>> +		dev_err(config->dev, "Regmap not found\n");
>> +		return ERR_PTR(-EINVAL);
>> +	}
>
> You have a struct device in the config and the regmap API has
> dev_get_regmap() which for most devices that don't have multiple regmaps
> will give the right regmap.  It would be nice to support this as a
> convenience for users.
Yes, sure that makes sense, I will give it a try.

>
>> +	eeprom = kzalloc(sizeof(*eeprom), GFP_KERNEL);
>> +	if (!eeprom)
>> +		return ERR_PTR(-ENOMEM);
>
> ...
>
>> +	rval = device_add(&eeprom->dev);
>> +	if (rval)
>> +		return ERR_PTR(rval);
>
> Don't you need a kfree() if device_add() fails?
I will fix it in next version.

--srini
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 13/14] twl4030_charger: Increase current carefully while watching voltage.
From: Pavel Machek @ 2015-03-23 21:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Sebastian Reichel, NeilBrown, linux-api, linux-kernel,
	GTA04 owners, inux-pm, linux-omap, Lee Jones
In-Reply-To: <20150322232029.23789.30185.stgit@notabene.brown>

Hi!

> The USB Battery Charging spec (BC1.2) suggests a dedicated
> charging port can deliver from 0.5 to 5.0A at between 4.75 and 5.25
> volts.
> 
> To choose the "correct" current voltage setting requires a trial
> and error approach: try to draw current and see if the voltage drops
> too low.
> 
> Even with a configured Standard Downstream Port, it may not be possible
> to reliably pull 500mA - depending on cable quality and source
> quality I have reports of charging failure due to the voltage dropping
> too low.
> 
> To address both these concerns, this patch introduce incremental
> current setting.
> The current pull from VBUS is increased in steps of 20mA every 100ms
> until the target is reached or until the measure voltage drops below
> 4.75V.  If the voltage does go too low, the target current is reduced
> by 20mA and kept there.

Still nervous. If it is possible to overheat the charger, without
tripping internal fuse, then you'll do it.

> This applies to currents selected automatically, or to values
> set via sysfs.  So setting a large value will cause the maximum
> available to be used - up to the limit of 1.7A imposed by the
> hardware.
>

> +	printk("v=%d cur=%d target=%d\n", v, bci->usb_cur,
> +	       bci->usb_cur_target);

dev_info() and a bit better message, or drop it for production?

> +	if (v < USB_MIN_VOLT) {
> +		/* Back up and stop adjusting. */
> +		bci->usb_cur -= USB_CUR_STEP;
> +		bci->usb_cur_target = bci->usb_cur;

More importantly.... how does it work with device drawing power for
operation, too?

Imagine device need 500mA with wifi hotspot, nearly nothing while idle.

Idle device. Code will find that it can charge using 1A, backs up to
0.9A. User starts hotspot. Now device will draw 1.4A, overloading the
charger and not charging at all...?

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 02/14] twl4030_charger: use devres for power_supply_register and kzalloc.
From: Pavel Machek @ 2015-03-23 21:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Sebastian Reichel, NeilBrown, linux-api, linux-kernel,
	GTA04 owners, inux-pm, linux-omap, Lee Jones
In-Reply-To: <20150322232028.23789.38570.stgit@notabene.brown>

On Mon 2015-03-23 10:20:28, NeilBrown wrote:
> From: NeilBrown <neilb@suse.de>
> 
> Final allocations/registrations are now managed by devres.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

Patches 1,2: Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 0/5] Enhancements to twl4030 phy to support better charging - V2
From: Pavel Machek @ 2015-03-23 21:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kishon Vijay Abraham I, Tony Lindgren,
	linux-api-u79uwXL29TY76Z2rM5mHXA, GTA04 owners,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150322223307.21765.62974.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>

On Mon 2015-03-23 09:35:23, NeilBrown wrote:
> Hi Kishon,
>  I wonder if you could queue the following for the next merge window.
>  They allow the twl4030 phy to provide more information to the
>  twl4030 battery charger.
>  There are only minimal changes since the first version, particularly
>  documentation has been improved.

Patches 1-3 Acked-by: Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org>
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH v2 1/7] eeprom: Add a simple EEPROM framework for eeprom providers
From: Mark Brown @ 2015-03-23 21:09 UTC (permalink / raw)
  To: Srinivas Kandagatla
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Maxime Ripard,
	Rob Herring, Pawel Moll, Kumar Gala,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Stephen Boyd, Arnd Bergmann,
	Greg Kroah-Hartman, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1426240214-2434-1-git-send-email-srinivas.kandagatla-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On Fri, Mar 13, 2015 at 09:50:14AM +0000, Srinivas Kandagatla wrote:

A couple of *very* minor points below, otherwise this looks OK to me.

> +struct eeprom_device *eeprom_register(struct eeprom_config *config)
> +{
> +	struct eeprom_device *eeprom;
> +	int rval;
> +
> +	if (!config->regmap || !config->size) {
> +		dev_err(config->dev, "Regmap not found\n");
> +		return ERR_PTR(-EINVAL);
> +	}

You have a struct device in the config and the regmap API has
dev_get_regmap() which for most devices that don't have multiple regmaps
will give the right regmap.  It would be nice to support this as a
convenience for users.

> +	eeprom = kzalloc(sizeof(*eeprom), GFP_KERNEL);
> +	if (!eeprom)
> +		return ERR_PTR(-ENOMEM);

...

> +	rval = device_add(&eeprom->dev);
> +	if (rval)
> +		return ERR_PTR(rval);

Don't you need a kfree() if device_add() fails?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* Re: [PATCH v2 01/11] stm class: Introduce an abstraction for System Trace Module devices
From: Alexander Shishkin @ 2015-03-23 19:41 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Greg Kroah-Hartman, linux-kernel@vger.kernel.org, Paul Bolle,
	peter.lachner-ral2JQCrhuEAvxtiuMwx3w,
	norbert.schulz-ral2JQCrhuEAvxtiuMwx3w,
	keven.boell-ral2JQCrhuEAvxtiuMwx3w,
	yann.fouassier-ral2JQCrhuEAvxtiuMwx3w,
	laurent.fert-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Pratik Patel, Chunyan Zhang,
	Kaixu Xia
In-Reply-To: <CANLsYkycrE-VyyLWLYtBFNijhLP8UejVjDEXYv4FBP88eyZrRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Mathieu Poirier <mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> writes:

>> +source "drivers/hwtracing/stm/Kconfig"
>> +
>>  endmenu
>
> When the coresight framework and drivers were submitted for review
> people asked that I move the Kconfig options to
> "arch/arm[64]/kernel.debug", resulting in coresight configurable
> options showing up under the "Kernel Hacking" department.  To me the
> request was not deprived of logic since if one is dealing with HW
> tracing, some serious kernel hacking is likely happening.

To me this is very much a non-sequitur conclusion: if one's likely to
use CONFIG_x for kernel debugging doesn't necessarily mean CONFIG_x is a
strictly kernel debugging feature. One might move serial drivers under
"Kernel Hacking" by the same token. I also suspect that sweeping things
under "Kernel Hacking" is kind of a license to go easy on code review.

> Now that the Intel drivers are coming in, that we have a generic STM,
> and "drivers/hwtracing" has already been created, we should take a
> minute to ponder if tracers for various architecture should go under
> "arch/XYX/kernel.debug" or if we should introduce a new "hwtracing"
> submenu in the drivers list.

I'm in favor of the latter. If some bits in this submenu are strictly
specific to kernel hacking an appropriate dependency can be used.

> Because of the STM sources (which are bound to grow in numbers) I
> _think_ it would be easier to have a new submenu in the drivers list
> but I'm not strongly opinionated on the topic.  Please take a minute
> to think about it and get back to me with your opinion.  I'd also be
> interested to know what other community members think - it's
> definitely not the first time this kind of dilemma happens...

I don't see why any of these should be hidden under Kconfig.debug, they
are mostly device drivers. They shouldn't add any runtime footprint
unless they are actually used and it's up to the user how to use them
(provided we handle capabilities/permissions correctly, which we should
do regardless).

So yes, my opinion -- let's have a submenu in hwtracing.

Regards,
--
Alex

^ permalink raw reply

* Re: [PATCH 1/1] Add virtio-input driver.
From: Michael S. Tsirkin @ 2015-03-23 18:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: virtio-dev, open list, open list:ABI/API, virtualization
In-Reply-To: <1427123129.27137.62.camel@nilsson.home.kraxel.org>

On Mon, Mar 23, 2015 at 04:05:29PM +0100, Gerd Hoffmann wrote:
> On Mo, 2015-03-23 at 15:54 +0100, Michael S. Tsirkin wrote:
> > On Mon, Mar 23, 2015 at 03:27:35PM +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > > > > You are doing leXXX everywhere, that's VERSION_1 dependency.
> > > > > > > virtio_cread will do byteswaps differently without VERSION_1.
> > > > > > > Just don't go there.
> > > 
> > > > So to clarify, you dislike using __virtio32 in virtio input header?
> > > 
> > > Well, as I understand things __virtio32 implies byteorder depends on
> > > whenever we are using VERSION_1 or not.  And non-transitional drivers
> > > should not need it as everything is by definition little endian.
> > > 
> > > So, yes, your suggestion to just require VERSION_1 in the driver implies
> > > in my eyes that there should be no reason to use __virtio32 instead of
> > > __le32.
> > > 
> > > Or do I miss something here?
> > > 
> > > cheers,
> > >   Gerd
> > > 
> > 
> > You are right but then if you do require VERSION_1 then
> > __virtio32 becomes identical to __le32.
> 
> Except that sparse doesn't know that and throws errors when I mix the
> two.
> 
> > There's some runtime overhead as we check on each access,
> > but it shouldn't matter here, right?
> 
> Correct, config space is used at initialization time only.
> 
> > I guess we could add virtio_cread_le - is this what
> > you'd like?
> 
> I just want something that makes both you and sparse happy.  I don't
> care much whenever that is adding virtio_cread_le() or using __virtio32
> even though it'll effectively is __le32 due to VERSION_1 being required.
> 
> cheers,
>   Gerd
> 

OK so how about we just use __virtio32 everywhere for now?

-- 
MST

^ permalink raw reply

* Re: [PATCH v10 tip 5/9] tracing: allow BPF programs to call bpf_trace_printk()
From: Alexei Starovoitov @ 2015-03-23 17:50 UTC (permalink / raw)
  To: Ingo Molnar, David Laight
  Cc: Steven Rostedt, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
	Masami Hiramatsu, David S. Miller, Daniel Borkmann,
	Peter Zijlstra, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20150323120753.GA22560-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 3/23/15 5:07 AM, Ingo Molnar wrote:
>
> * David Laight <David.Laight-ZS65k/vG3HxXrIkS9f7CXA@public.gmane.org> wrote:
>
>> From: Alexei Starovoitov
>>> Debugging of BPF programs needs some form of printk from the program,
>>> so let programs call limited trace_printk() with %d %u %x %p modifiers only.
>>
>> Should anyone be allowed to use BPF programs to determine the kernel
>> addresses of any items?
>> Looks as though it is leaking kernel addresses to userspace.
>> Note that the problem is with the arguments, not the format string.
>
> All of these are privileged operations - inherent if you are trying to
> debug the kernel.

yep.

There is a plan to add 'pointer leak detector' to bpf verifier and
'constant blinding' pass, so in the future we may let unprivileged
users load programs. seccomp will be first such user. But it will
take long time.

^ permalink raw reply

* Re: [PATCH v9 tip 8/9] samples: bpf: IO latency analysis (iosnoop/heatmap)
From: Alexei Starovoitov @ 2015-03-23 17:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
	Masami Hiramatsu, David S. Miller, Daniel Borkmann,
	Peter Zijlstra, linux-api, netdev, linux-kernel
In-Reply-To: <20150323074028.GE25184@gmail.com>

On 3/23/15 12:40 AM, Ingo Molnar wrote:
>
> * Alexei Starovoitov <ast@plumgrid.com> wrote:
>
>> BPF C program attaches to blk_mq_start_request/blk_update_request kprobe events
>> to calculate IO latency.
>
>   ...
>
>> +/* kprobe is NOT a stable ABI
>> + * This bpf+kprobe example can stop working any time.
>> + */
>> +SEC("kprobe/blk_mq_start_request")
>> +int bpf_prog1(struct pt_regs *ctx)
>> +{
>> +	long rq = ctx->di;
>> +	u64 val = bpf_ktime_get_ns();
>> +
>> +	bpf_map_update_elem(&my_map, &rq, &val, BPF_ANY);
>> +	return 0;
>> +}
>
> So just to make sure the original BPF instrumentation model is still
> upheld: no matter in what way the kernel changes, neither the kprobe,
> nor the BPF program can ever crash or corrupt the kernel, assuming the
> kprobes, perf and BPF subsystem has no bugs, correct?

yes. of course. That was always #1 requirement.

> So 'stops working' here means that the instrumentation data might not
> be reliable if kernel internal interfaces change - but it won't ever
> make the kernel unreliable in any fashion. Right?

yes. of course.

The only situations where it can 'stop working':
- in-kernel blk_mq_start_request function is renamed, so kprobe cannot 
find it and cannot attach.
- arguments to blk_mq_start_request change. Then ctx->di can be 
meaningless and using it as key into map is useless.
- whole logic of blk_mq_start_request/blk_update_request pair changes.
then this sample code won't be measuring any useful io latency.

In all cases kernel will never crash (barring bugs in bpf, kprobe 
subsystems).

^ permalink raw reply

* Re: [PATCH v2 7/7] clone4: Add a CLONE_FD flag to get task exit notification via fd
From: David Drysdale @ 2015-03-23 17:38 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Al Viro, Andrew Morton, Andy Lutomirski, Ingo Molnar, Kees Cook,
	Oleg Nesterov, Paul E. McKenney, H. Peter Anvin, Rik van Riel,
	Thomas Gleixner, Michael Kerrisk, Thiago Macieira,
	linux-kernel@vger.kernel.org, Linux API, Linux FS Devel, X86 ML
In-Reply-To: <fdec4b70c7cd34e2eacf6a0e41d36f606a696da1.1426376419.git.josh@joshtriplett.org>

On Sun, Mar 15, 2015 at 8:00 AM, Josh Triplett <josh@joshtriplett.org> wrote:
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 6c4a68d..c90df5a 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -299,6 +299,8 @@ struct compat_clone4_args {
>         compat_ulong_t stack_start;
>         compat_ulong_t stack_size;
>         compat_ulong_t tls;
> +       compat_uptr_t clonefd;
> +       u32 clonefd_flags;
>  };
>
>  struct compat_statfs;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 9daa017..1dc680b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1374,6 +1374,11 @@ struct task_struct {
>
>         unsigned autoreap:1; /* Do not become a zombie on exit */
>
> +#ifdef CONFIG_CLONEFD
> +       unsigned clonefd:1; /* Notify clonefd_wqh on exit */
> +       wait_queue_head_t clonefd_wqh;
> +#endif
> +
>         unsigned long atomic_flags; /* Flags needing atomic access. */
>
>         struct restart_block restart_block;

Idle thought: are there any concerns about the occupancy
impact of adding a wait_queue_head to every task_struct,
whether it has a clonefd or not?

I guess we could reduce the size somewhat by just
storing a struct file *clonefd_file in the task, and then have
a separate structure (with the wqh and a task_struct*) referenced
by file->private_data.  Not sure whether the added complication
would be worthwhile, though.

> diff --git a/kernel/clonefd.c b/kernel/clonefd.c
> new file mode 100644
> index 0000000..eac560c
> --- /dev/null
> +++ b/kernel/clonefd.c
> @@ -0,0 +1,121 @@
> +/*
> + * Support functions for CLONE_FD
> + *
> + * Copyright (c) 2015 Intel Corporation
> + * Original authors: Josh Triplett <josh@joshtriplett.org>
> + *                   Thiago Macieira <thiago@macieira.org>
> + */
> +#include <linux/anon_inodes.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/poll.h>
> +#include <linux/slab.h>
> +#include "clonefd.h"
> +
> +static int clonefd_release(struct inode *inode, struct file *file)
> +{
> +       put_task_struct(file->private_data);
> +       return 0;
> +}
> +
> +static unsigned int clonefd_poll(struct file *file, poll_table *wait)
> +{
> +       struct task_struct *p = file->private_data;
> +       poll_wait(file, &p->clonefd_wqh, wait);
> +       return p->exit_state ? (POLLIN | POLLRDNORM | POLLHUP) : 0;
> +}
> +
> +static ssize_t clonefd_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> +{
> +       struct task_struct *p = file->private_data;
> +       int ret = 0;
> +
> +       /* EOF after first read */
> +       if (*ppos)
> +               return 0;
> +
> +       if (file->f_flags & O_NONBLOCK)
> +               ret = -EAGAIN;
> +       else
> +               ret = wait_event_interruptible(p->clonefd_wqh, p->exit_state);
> +
> +       if (p->exit_state) {
> +               struct clonefd_info info = {};
> +               cputime_t utime, stime;
> +               task_exit_code_status(p->exit_code, &info.code, &info.status);
> +               info.code &= ~__SI_MASK;
> +               task_cputime(p, &utime, &stime);
> +               info.utime = cputime_to_clock_t(utime + p->signal->utime);
> +               info.stime = cputime_to_clock_t(stime + p->signal->stime);
> +               ret = simple_read_from_buffer(buf, count, ppos, &info, sizeof(info));
> +       }
> +       return ret;
> +}
> +
> +static struct file_operations clonefd_fops = {
> +       .release = clonefd_release,
> +       .poll = clonefd_poll,
> +       .read = clonefd_read,
> +       .llseek = no_llseek,
> +};

It might be nice to include a show_fdinfo() implementation that shows
(say) the pid that the clonefd refers to.  E.g. something like:

static void clonefd_show_fdinfo(struct seq_file *m, struct file *file)
{
    struct task_struct *p = file->private_data;

    seq_printf(m, "tid:\t%d\n", task_tgid_vnr(p));
}

> +
> +/* Do process exit notification for clonefd. */
> +void clonefd_do_notify(struct task_struct *p)
> +{
> +       if (p->clonefd)
> +               wake_up_all(&p->clonefd_wqh);
> +}
> +
> +/* Handle the CLONE_FD case for copy_process. */
> +int clonefd_do_clone(u64 clone_flags, struct task_struct *p,
> +                    struct clone4_args *args, struct clonefd_setup *setup)
> +{
> +       int flags;
> +       struct file *file;
> +       int fd;
> +
> +       p->clonefd = !!(clone_flags & CLONE_FD);
> +       if (!p->clonefd)
> +               return 0;
> +
> +       if (args->clonefd_flags & ~(O_CLOEXEC | O_NONBLOCK))
> +               return -EINVAL;
> +

Maybe also check for (args->clonefd == NULL) in advance, and
return -EINVAL or -EFAULT?

> +       init_waitqueue_head(&p->clonefd_wqh);
> +
> +       get_task_struct(p);
> +       flags = O_RDONLY | FMODE_ATOMIC_POS | args->clonefd_flags;
> +       file = anon_inode_getfile("[process]", &clonefd_fops, p, flags);
> +       if (IS_ERR(file)) {
> +               put_task_struct(p);
> +               return PTR_ERR(file);
> +       }
> +
> +       fd = get_unused_fd_flags(flags);
> +       if (fd < 0) {
> +               fput(file);
> +               return fd;
> +       }
> +
> +       setup->fd = fd;
> +       setup->file = file;
> +       return 0;
> +}
> +
> +/* Clean up clonefd information after a partially complete clone */
> +void clonefd_cleanup_failed_clone(struct clonefd_setup *setup)
> +{
> +       if (setup->file) {
> +               put_unused_fd(setup->fd);
> +               fput(setup->file);
> +       }
> +}
> +
> +/* Finish setting up the clonefd */
> +void clonefd_install_fd(struct clone4_args *args, struct clonefd_setup *setup)
> +{
> +       if (setup->file) {
> +               fd_install(setup->fd, setup->file);
> +               put_user(setup->fd, args->clonefd);
> +       }
> +}

^ permalink raw reply

* Re: [PATCH v9 tip 6/9] samples: bpf: simple non-portable kprobe filter example
From: Alexei Starovoitov @ 2015-03-23 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
	Masami Hiramatsu, David S. Miller, Daniel Borkmann,
	Peter Zijlstra, linux-api-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150323073506.GC25184-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 3/23/15 12:35 AM, Ingo Molnar wrote:
>
> * Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
>
>> +void read_trace_pipe(void)
>> +{
>> +	int trace_fd;
>> +
>> +	trace_fd = open(DEBUGFS "trace_pipe", O_RDONLY, 0);
>> +	if (trace_fd < 0)
>> +		return;
>> +
>> +	while (1) {
>> +		static char buf[4096];
>> +		ssize_t sz;
>> +
>> +		sz = read(trace_fd, buf, sizeof(buf));
>
> read() will return -1 on failure ...
>
>> +		if (sz) {
>
> ... this test passes ...
>
>> +			buf[sz] = 0;
>
> ... and here we smash the stack?

good point. If it was normal file, for sure it's a bug, but trace_pipe
is a pseudo file and I think read cannot return -1. Regardless, it makes
sense to fix it. Will do. Do you mind I address it as follow up patch?
Or if the rest is ok, can you change the condition to sz>0 while
applying? I can respin the whole thing too, if you like.
Thanks!

^ permalink raw reply

* Re: [PATCH v9 tip 6/9] samples: bpf: simple non-portable kprobe filter example
From: Alexei Starovoitov @ 2015-03-23 17:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa,
	Masami Hiramatsu, David S. Miller, Daniel Borkmann,
	Peter Zijlstra, linux-api-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150323072929.GB25184-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 3/23/15 12:29 AM, Ingo Molnar wrote:
>
>> **                                                      **
>> ** This means that this is a DEBUG kernel and it is     **
>> ** unsafe for production use.                           **
>
> But I think printing that it's unsafe for production use is over the
> top: it's up to the admin whether it's safe or unsafe, just like
> inserting a kprobe can be safe or unsafe.
>
> Informing that something happened is enough.

Well that is Steven's banner and I agree that it's a bit extreme.
I think it's done on purpose to scary people away from using
trace_printk() for anything other than debug.
It applies to both native trace_printk() for kernel debugging and
for bpf_trace_printk() for debugging of bpf programs.
I don't have a strong opinion about native case, but for bpf I do want
this banner to be scary. Otherwise it's too easy to start using
bpf_trace_printk() to pass event notifications to user space.
bpf_trace_printk and trace_pipe parsing shouldn't be used as a way
to communicate between programs and user space.
At the end, in actual production use, bpf programs won't be using it
and no banner will be seen.
Anyway, I don't think I can change this banner in this patch set.
If we decide to relax it, it should be done via Steven's tree.

^ permalink raw reply

* Re: [PATCH 1/1] Add virtio-input driver.
From: Cornelia Huck @ 2015-03-23 16:17 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Michael S. Tsirkin, virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	open list, open list:ABI/API,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <1427123129.27137.62.camel-3OfP5uLMi4C46o+2HkPkLj4oCIwMql/M@public.gmane.org>

On Mon, 23 Mar 2015 16:05:29 +0100
Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Mo, 2015-03-23 at 15:54 +0100, Michael S. Tsirkin wrote:
> > On Mon, Mar 23, 2015 at 03:27:35PM +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > > > > You are doing leXXX everywhere, that's VERSION_1 dependency.
> > > > > > > virtio_cread will do byteswaps differently without VERSION_1.
> > > > > > > Just don't go there.
> > > 
> > > > So to clarify, you dislike using __virtio32 in virtio input header?
> > > 
> > > Well, as I understand things __virtio32 implies byteorder depends on
> > > whenever we are using VERSION_1 or not.  And non-transitional drivers
> > > should not need it as everything is by definition little endian.
> > > 
> > > So, yes, your suggestion to just require VERSION_1 in the driver implies
> > > in my eyes that there should be no reason to use __virtio32 instead of
> > > __le32.
> > > 
> > > Or do I miss something here?
> > > 
> > > cheers,
> > >   Gerd
> > > 
> > 
> > You are right but then if you do require VERSION_1 then
> > __virtio32 becomes identical to __le32.
> 
> Except that sparse doesn't know that and throws errors when I mix the
> two.
> 
> > There's some runtime overhead as we check on each access,
> > but it shouldn't matter here, right?
> 
> Correct, config space is used at initialization time only.
> 
> > I guess we could add virtio_cread_le - is this what
> > you'd like?
> 
> I just want something that makes both you and sparse happy.  I don't
> care much whenever that is adding virtio_cread_le() or using __virtio32
> even though it'll effectively is __le32 due to VERSION_1 being required.

I think it is clearer if you use __virtio32: The fact that virtio-input
is VERSION_1 only is not apparent from looking at the config space
definition, and explicitly forcing le accesses makes it look similar to
the oddballs like virtio-balloon. Just my two cents.

^ permalink raw reply

* Re: [PATCH v4 00/14] Add kdbus implementation
From: David Herrmann @ 2015-03-23 15:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Linux API,
	linux-kernel@vger.kernel.org, Daniel Mack, Djalal Harouni
In-Reply-To: <CALCETrWCBYzbzG3xZznG4AvVf67E_d8BVmBi1bZsDx7x_2Bh5g@mail.gmail.com>

Hi

On Thu, Mar 19, 2015 at 4:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Mar 19, 2015 at 4:26 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
>> metadata handling is local to the connection that sends the message.
>> It does not affect the overall performance of other bus operations in
>> parallel.
>
> Sure it does if it writes to shared cachelines.  Given that you're
> incrementing refcounts, I'm reasonable sure that you're touching lots
> of shared cachelines.

Ok, sure, but it's still mostly local to the sending task. We take
locks and ref-counts on the task-struct and mm, which is for most
parts local to the CPU the task runs on. But this is inherent to
accessing this kind of data, which is the fundamental difference in
our views here, as seen below..

>> Furthermore, it's way faster than collecting the "same" data
>> via /proc, so I don't think it slows down the overall transaction at
>> all. If a receiver doesn't want metadata, it should not request it (by
>> setting the receiver-metadata-mask). If a sender doesn't like the
>> overhead, it should not send the metadata (by setting the
>> sender-metadata-mask). Only if both peers set the metadata mask, it
>> will be transmitted.
>
> But you're comparing to the wrong thing, IMO.  Of course it's much
> faster than /proc hackery, but it's probably much slower to do the
> metadata operation once per message than to do it when you connect to
> the endpoint.  (Gah!  It's a "bus" that could easily have tons of
> users but a single "endpoint".  I'm still not used to it.)

Yes, of course your assumption is right if you compare against
per-connection caches, instead of per-message metadata. But we do
support _both_ use-cases, so we don't impose any policy.
We still believe "live"-metadata is a crucial feature of kdbus,
despite the known performance penalties.

Thanks
David

^ permalink raw reply

* Re: [PATCH 1/1] Add virtio-input driver.
From: Gerd Hoffmann @ 2015-03-23 15:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, virtualization, Rusty Russell, open list,
	open list:ABI/API
In-Reply-To: <20150323155106-mutt-send-email-mst@redhat.com>

On Mo, 2015-03-23 at 15:54 +0100, Michael S. Tsirkin wrote:
> On Mon, Mar 23, 2015 at 03:27:35PM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > > > > You are doing leXXX everywhere, that's VERSION_1 dependency.
> > > > > > virtio_cread will do byteswaps differently without VERSION_1.
> > > > > > Just don't go there.
> > 
> > > So to clarify, you dislike using __virtio32 in virtio input header?
> > 
> > Well, as I understand things __virtio32 implies byteorder depends on
> > whenever we are using VERSION_1 or not.  And non-transitional drivers
> > should not need it as everything is by definition little endian.
> > 
> > So, yes, your suggestion to just require VERSION_1 in the driver implies
> > in my eyes that there should be no reason to use __virtio32 instead of
> > __le32.
> > 
> > Or do I miss something here?
> > 
> > cheers,
> >   Gerd
> > 
> 
> You are right but then if you do require VERSION_1 then
> __virtio32 becomes identical to __le32.

Except that sparse doesn't know that and throws errors when I mix the
two.

> There's some runtime overhead as we check on each access,
> but it shouldn't matter here, right?

Correct, config space is used at initialization time only.

> I guess we could add virtio_cread_le - is this what
> you'd like?

I just want something that makes both you and sparse happy.  I don't
care much whenever that is adding virtio_cread_le() or using __virtio32
even though it'll effectively is __le32 due to VERSION_1 being required.

cheers,
  Gerd

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox