Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH 0/5] Enhancements to twl4030 phy to support better charging - V2
From: NeilBrown @ 2015-03-25 21:22 UTC (permalink / raw)
  To: Kishon Vijay Abraham I
  Cc: NeilBrown, Tony Lindgren, linux-api, GTA04 owners, linux-kernel,
	Pavel Machek
In-Reply-To: <551325B0.1090308@ti.com>

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

On Thu, 26 Mar 2015 02:46:32 +0530 Kishon Vijay Abraham I <kishon@ti.com>
wrote:

> Hi,
> 
> On Monday 23 March 2015 04:05 AM, NeilBrown wrote:
> > Hi Kishon,
> >  I wonder if you could queue the following for the next merge window.
> >  They allow the twl4030 phy to provide more information to the
> >  twl4030 battery charger.
> >  There are only minimal changes since the first version, particularly
> >  documentation has been improved.
> 
> There are quite a few things in this series which use the USB PHY library
> interface which is kindof deprecated. We should try and use the Generic PHY
> library for all of them. It would also be better to add features to the
> PHY framework if the we can't achieve something with the existing PHY
> framework.

Hi,
 are you able to more specific at all?  What is the "USB PHY library"?
Where exactly is the "PHY framework"?

I know none of the history here and while I could try to guess I suspect
there is an even chance I would get wrong.
I'm happy to do the work but I want to be sure of what you are asking.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [PATCH 1/5] usb: phy: twl4030: make runtime pm more reliable.
From: Kishon Vijay Abraham I @ 2015-03-25 21:32 UTC (permalink / raw)
  To: NeilBrown
  Cc: Tony Lindgren, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, GTA04 owners, NeilBrown,
	Pavel Machek
In-Reply-To: <20150322223523.21765.3199.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>

Hi,

On Monday 23 March 2015 04:05 AM, NeilBrown wrote:
> From: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
> 
> A construct like:
> 
>         if (pm_runtime_suspended(twl->dev))
>                pm_runtime_get_sync(twl->dev);
> 
> is against the spirit of the runtime_pm interface as it
> makes the internal refcounting useless.
> 
> In this case it is also racy, particularly as 'put_autosuspend'
> is use to drop a reference.
> When that happens a timer is started and the device is
> runtime-suspended after the timeout.
> If the above code runs in this window, the device will not be
> found to be suspended so no pm_runtime reference is taken.
> When the timer expires the device will be suspended, which is
> against the intention of the code.
> 
> So be more direct is taking and dropping references.
> If twl->linkstat is VBUS_VALID or ID_GROUND, then hold a
> pm_runtime reference, otherwise don't.
> Define "cable_present()" to test for this condition.

minor comment below..
> 
> Tested-by: Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org>
> Signed-off-by: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
> ---
>  drivers/phy/phy-twl4030-usb.c |   29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/phy/phy-twl4030-usb.c b/drivers/phy/phy-twl4030-usb.c
> index 8e87f54671f3..1a244f34b748 100644
> --- a/drivers/phy/phy-twl4030-usb.c
> +++ b/drivers/phy/phy-twl4030-usb.c
> @@ -144,6 +144,16 @@
>  #define PMBR1				0x0D
>  #define GPIO_USB_4PIN_ULPI_2430C	(3 << 0)
>  
> +/*
> + * If VBUS is valid or ID is ground, then we know a
> + * cable is present and we need to be runtime-enabled
> + */
> +static inline bool cable_present(enum omap_musb_vbus_id_status stat)

twl4030_cable_present?

-Kishon

^ permalink raw reply

* Re: [PATCH 0/5] Enhancements to twl4030 phy to support better charging - V2
From: Kishon Vijay Abraham I @ 2015-03-25 23:59 UTC (permalink / raw)
  To: NeilBrown
  Cc: NeilBrown, Tony Lindgren, linux-api-u79uwXL29TY76Z2rM5mHXA,
	GTA04 owners, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Pavel Machek
In-Reply-To: <20150326082219.510ac598-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>

Hi NeilBrown,

On Thursday 26 March 2015 02:52 AM, NeilBrown wrote:
> On Thu, 26 Mar 2015 02:46:32 +0530 Kishon Vijay Abraham I <kishon-l0cyMroinI0@public.gmane.org>
> wrote:
> 
>> Hi,
>>
>> On Monday 23 March 2015 04:05 AM, NeilBrown wrote:
>>> Hi Kishon,
>>>  I wonder if you could queue the following for the next merge window.
>>>  They allow the twl4030 phy to provide more information to the
>>>  twl4030 battery charger.
>>>  There are only minimal changes since the first version, particularly
>>>  documentation has been improved.
>>
>> There are quite a few things in this series which use the USB PHY library
>> interface which is kindof deprecated. We should try and use the Generic PHY
>> library for all of them. It would also be better to add features to the
>> PHY framework if the we can't achieve something with the existing PHY
>> framework.
> 
> Hi,
>  are you able to more specific at all?  What is the "USB PHY library"?
> Where exactly is the "PHY framework"?

There is a USB PHY library that exists in drivers/usb/phy/phy.c and there is
a Generic PHY framework that is present in drivers/phy/phy-core.c. twl4030
actually supports both the framework.

In your patch whatever uses struct usb_phy uses the old USB PHY library and
whatever uses struct phy uses the generic PHY framework. (Actually your patch
does not use the PHY framework at all). We want to deprecate using the USB PHY
library and make everyone use the generic PHY framework. Adding features
to a driver using the USB PHY library will make the transition to generic PHY
framework a bit more difficult.

Now all the features that is supported in the USB PHY library may not be
supported by the PHY framework. So we should start extending the PHY framework
instead of using the USB PHY library.

One think I noticed in your driver is using atomic notifier chain. IMO extcon
framework should be used in twl4030 USB driver to notify the controller driver
instead of using USB PHY notifier. For all other things we have to see if it
can be added in the PHY framework.

Thanks
Kishon
> 
> I know none of the history here and while I could try to guess I suspect
> there is an even chance I would get wrong.
> I'm happy to do the work but I want to be sure of what you are asking.
> 
> Thanks,
> NeilBrown
> 

^ permalink raw reply

* Re: [PATCH 0/5] Enhancements to twl4030 phy to support better charging - V2
From: NeilBrown @ 2015-03-26  0:16 UTC (permalink / raw)
  To: Kishon Vijay Abraham I
  Cc: NeilBrown, Tony Lindgren, linux-api-u79uwXL29TY76Z2rM5mHXA,
	GTA04 owners, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Pavel Machek
In-Reply-To: <55134BEE.7050406-l0cyMroinI0@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2492 bytes --]

On Thu, 26 Mar 2015 05:29:42 +0530 Kishon Vijay Abraham I <kishon-l0cyMroinI0@public.gmane.org>
wrote:

> Hi NeilBrown,
> 
> On Thursday 26 March 2015 02:52 AM, NeilBrown wrote:
> > On Thu, 26 Mar 2015 02:46:32 +0530 Kishon Vijay Abraham I <kishon-Bv2c3lPp1Ag@public.gmane.orgm>
> > wrote:
> > 
> >> Hi,
> >>
> >> On Monday 23 March 2015 04:05 AM, NeilBrown wrote:
> >>> Hi Kishon,
> >>>  I wonder if you could queue the following for the next merge window.
> >>>  They allow the twl4030 phy to provide more information to the
> >>>  twl4030 battery charger.
> >>>  There are only minimal changes since the first version, particularly
> >>>  documentation has been improved.
> >>
> >> There are quite a few things in this series which use the USB PHY library
> >> interface which is kindof deprecated. We should try and use the Generic PHY
> >> library for all of them. It would also be better to add features to the
> >> PHY framework if the we can't achieve something with the existing PHY
> >> framework.
> > 
> > Hi,
> >  are you able to more specific at all?  What is the "USB PHY library"?
> > Where exactly is the "PHY framework"?
> 
> There is a USB PHY library that exists in drivers/usb/phy/phy.c and there is
> a Generic PHY framework that is present in drivers/phy/phy-core.c. twl4030
> actually supports both the framework.
> 
> In your patch whatever uses struct usb_phy uses the old USB PHY library and
> whatever uses struct phy uses the generic PHY framework. (Actually your patch
> does not use the PHY framework at all). We want to deprecate using the USB PHY
> library and make everyone use the generic PHY framework. Adding features
> to a driver using the USB PHY library will make the transition to generic PHY
> framework a bit more difficult.
> 
> Now all the features that is supported in the USB PHY library may not be
> supported by the PHY framework. So we should start extending the PHY framework
> instead of using the USB PHY library.
> 
> One think I noticed in your driver is using atomic notifier chain. IMO extcon
> framework should be used in twl4030 USB driver to notify the controller driver
> instead of using USB PHY notifier. For all other things we have to see if it
> can be added in the PHY framework.

Thanks a lot - exactly what I wanted.
I agree about extcon - I'll be very happy to make that work properly for twl.

I'll let you know when I have something for review.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [RFC] simple_char: New infrastructure to simplify chardev management
From: Andy Lutomirski @ 2015-03-26  0:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Jiri Kosina,
	linux-kernel@vger.kernel.org, Linux API
  Cc: Andy Lutomirski
In-Reply-To: <d3ab93173d51cebf00dd2263fd0ce9f8cd6258f7.1423609645.git.luto@amacapital.net>

Quick ping: does anyone want to review this?

--Andy

On Tue, Feb 10, 2015 at 3:44 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> This isn't adequately tested, and I don't have a demonstration (yet).
> It's here for review for whether it's a good idea in the first place
> and for weather the fully_dynamic mechanism is a good idea.
>
> The current character device interfaces are IMO awful.  There's a
> reservation mechanism (alloc_chrdev_region, etc), a bizarre
> sort-of-hashtable lookup mechanism that character drivers need to
> interact with (cdev_add, etc), and number of lookup stages on open
> with various levels of optimization.  Despite all the code, there's no
> core infrastructure to map from a dev_t back to a kobject, struct
> device, or any other useful device pointer.
>
> This means that lots of drivers need to implement all of this
> themselves.  The drivers don't even all seem to do it right.  For
> example, the UIO code seems to be missing any protection against
> chardev open racing against device removal.
>
> On top of the complexity of writing a chardev driver, the user
> interface is odd.  We have /proc/devices, which nothing should use,
> since it conveys no information about minor numbers, and minors are
> mostly dynamic these days.  Even the major numbers aren't terribly
> useful, since sysfs refers to (major, minor) pairs.
>
> This adds simple helpers simple_char_minor_create and
> simple_char_minor_free to create and destroy chardev minors.  Common
> code handles minor allocation and lookup and provides a simple helper
> to allow (and even mostly require!) users to reference count their
> devices correctly.
>
> Users can either allocation a traditional dynamic major using
> simple_char_major_create, or they can use a global "fully_dynamic"
> major and avoid thinking about major numbers at all.
>
> This currently does not integrate with the driver core at all.
> Users are responsible for plugging the dev_t into their struct
> devices manually.  I couldn't see a clean way to fix this without
> integrating all of this into the driver core.
>
> Thoughts?  I want to use this for the u2f driver, which will either be
> a chardev driver in its own right or use a simple new iso7816 class.
>
> Ideally we could convert a bunch of drivers to use this, at least
> where there are no legacy minor number considerations.
>
> (Note: simple_char users will *not* have their devicename%d indices
> match their minor numbers unless they specifically arrange for this to
> be the case.  For new drivers, this shouldn't be a problem at all.  I
> don't know whether it matters for old drivers.)
>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>
>  drivers/base/Makefile       |   2 +-
>  drivers/base/simple_char.c  | 231 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/simple_char.h |  38 ++++++++
>  3 files changed, 270 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/base/simple_char.c
>  create mode 100644 include/linux/simple_char.h
>
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 6922cd6850a2..d3832749f74c 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -4,7 +4,7 @@ obj-y                   := component.o core.o bus.o dd.o syscore.o \
>                            driver.o class.o platform.o \
>                            cpu.o firmware.o init.o map.o devres.o \
>                            attribute_container.o transport_class.o \
> -                          topology.o container.o
> +                          topology.o container.o simple_char.o
>  obj-$(CONFIG_DEVTMPFS) += devtmpfs.o
>  obj-$(CONFIG_DMA_CMA) += dma-contiguous.o
>  obj-y                  += power/
> diff --git a/drivers/base/simple_char.c b/drivers/base/simple_char.c
> new file mode 100644
> index 000000000000..f3205ef9e44b
> --- /dev/null
> +++ b/drivers/base/simple_char.c
> @@ -0,0 +1,231 @@
> +/*
> + * A simple way to create character devices
> + *
> + * Copyright (c) 2015 Andy Lutomirski <luto@amacapital.net>
> + *
> + * Loosely based, somewhat arbitrarily, on the UIO driver, which is one
> + * of many copies of essentially identical boilerplate.
> + *
> + * Licensed under the GPLv2.
> + */
> +
> +#include <linux/simple_char.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/poll.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/mm.h>
> +#include <linux/idr.h>
> +#include <linux/sched.h>
> +#include <linux/string.h>
> +#include <linux/kobject.h>
> +#include <linux/cdev.h>
> +
> +#define MAX_MINORS (1U << MINORBITS)
> +
> +struct simple_char_major {
> +       struct cdev cdev;
> +       unsigned majornum;
> +       struct idr idr;
> +       struct mutex lock;
> +};
> +
> +static struct simple_char_major *fully_dynamic_major;
> +static DEFINE_MUTEX(fully_dynamic_major_lock);
> +
> +static int simple_char_open(struct inode *inode, struct file *filep)
> +{
> +       struct simple_char_major *major =
> +               container_of(inode->i_cdev, struct simple_char_major,
> +                            cdev);
> +       void *private;
> +       const struct simple_char_ops *ops;
> +       int ret = 0;
> +
> +       mutex_lock(&major->lock);
> +
> +       {
> +               /*
> +                * This is a separate block to make the locking entirely
> +                * clear.  The only thing keeping minor alive is major->lock.
> +                * We need to be completely done with the simple_char_minor
> +                * by the time we release the lock.
> +                */
> +               struct simple_char_minor *minor;
> +               minor = idr_find(&major->idr, iminor(inode));
> +               if (!minor || !minor->ops->reference(minor->private)) {
> +                       mutex_unlock(&major->lock);
> +                       return -ENODEV;
> +               }
> +               private = minor->private;
> +               ops = minor->ops;
> +       }
> +
> +       mutex_unlock(&major->lock);
> +
> +       replace_fops(filep, ops->fops);
> +       filep->private_data = private;
> +       if (ops->fops->open)
> +               ret = ops->fops->open(inode, filep);
> +
> +       return ret;
> +}
> +
> +static const struct file_operations simple_char_fops = {
> +       .open = simple_char_open,
> +       .llseek = noop_llseek,
> +};
> +
> +struct simple_char_major *simple_char_major_create(const char *name)
> +{
> +       struct simple_char_major *major = NULL;
> +       dev_t devt;
> +       int ret;
> +
> +       ret = alloc_chrdev_region(&devt, 0, MAX_MINORS, name);
> +       if (ret)
> +               goto out;
> +
> +       ret = -ENOMEM;
> +       major = kmalloc(sizeof(struct simple_char_major), GFP_KERNEL);
> +       if (!major)
> +               goto out_unregister;
> +       cdev_init(&major->cdev, &simple_char_fops);
> +       kobject_set_name(&major->cdev.kobj, "%s", name);
> +
> +       ret = cdev_add(&major->cdev, devt, MAX_MINORS);
> +       if (ret)
> +               goto out_free;
> +
> +       major->majornum = MAJOR(devt);
> +       idr_init(&major->idr);
> +       return major;
> +
> +out_free:
> +       cdev_del(&major->cdev);
> +       kfree(major);
> +out_unregister:
> +       unregister_chrdev_region(devt, MAX_MINORS);
> +out:
> +       return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(simple_char_major_create);
> +
> +void simple_char_major_free(struct simple_char_major *major)
> +{
> +       BUG_ON(!idr_is_empty(&major->idr));
> +
> +       cdev_del(&major->cdev);
> +       unregister_chrdev_region(MKDEV(major->majornum, 0), MAX_MINORS);
> +       idr_destroy(&major->idr);
> +       kfree(major);
> +}
> +
> +static struct simple_char_major *get_fully_dynamic_major(void)
> +{
> +       struct simple_char_major *major =
> +               smp_load_acquire(&fully_dynamic_major);
> +       if (major)
> +               return major;
> +
> +       mutex_lock(&fully_dynamic_major_lock);
> +
> +       if (fully_dynamic_major) {
> +               major = fully_dynamic_major;
> +               goto out;
> +       }
> +
> +       major = simple_char_major_create("fully_dynamic");
> +       if (!IS_ERR(major))
> +               smp_store_release(&fully_dynamic_major, major);
> +
> +out:
> +       mutex_unlock(&fully_dynamic_major_lock);
> +       return major;
> +
> +}
> +
> +/**
> + * simple_char_minor_create() - create a chardev minor
> + * @major:     Major to use or NULL for a fully dynamic chardev.
> + * @ops:       simple_char_ops to associate with the minor.
> + * @private:   opaque pointer for @ops's use.
> + *
> + * simple_char_minor_create() creates a minor chardev.  For new code,
> + * @major should be NULL; this will create a minor chardev with fully
> + * dynamic major and minor numbers and without a useful name in
> + * /proc/devices.  (All recent user code should be using sysfs
> + * exclusively to map between devices and device numbers.)  For legacy
> + * code, @major can come from simple_char_major_create().
> + *
> + * The chardev will use @ops->fops for its file operations.  Before any
> + * of those operations are called, the struct file's private_data will
> + * be set to @private.
> + *
> + * To simplify reference counting, @ops->reference will be called before
> + * @ops->fops->open.  @ops->reference should take any needed references
> + * and return true if the object being opened still exists, and it
> + * should return false without taking references if the object is dying.
> + * @ops->reference is called with locks held, so it should neither sleep
> + * nor take heavy locks.
> + *
> + * @ops->fops->release (and @ops->fops->open, if it exists and fails)
> + * are responsible for releasing any references takes by @ops->reference.
> + *
> + * The minor must be destroyed by @simple_char_minor_free.  After
> + * @simple_char_minor_free returns, @ops->reference will not be called.
> + */
> +struct simple_char_minor *
> +simple_char_minor_create(struct simple_char_major *major,
> +                        const struct simple_char_ops *ops,
> +                        void *private)
> +{
> +       int ret;
> +       struct simple_char_minor *minor = NULL;
> +
> +       if (!major) {
> +               major = get_fully_dynamic_major();
> +               if (IS_ERR(major))
> +                       return (void *)major;
> +       }
> +
> +       minor = kmalloc(sizeof(struct simple_char_minor), GFP_KERNEL);
> +       if (!minor)
> +               return ERR_PTR(-ENOMEM);
> +
> +       mutex_lock(&major->lock);
> +       ret = idr_alloc(&major->idr, minor, 0, MAX_MINORS, GFP_KERNEL);
> +       if (ret >= 0) {
> +               minor->devt = MKDEV(major->majornum, ret);
> +               ret = 0;
> +       }
> +       /* Warn on ENOSPC?  It's embarrassing if it ever happens. */
> +       mutex_unlock(&major->lock);
> +
> +       if (ret) {
> +               kfree(minor);
> +               return ERR_PTR(ret);
> +       }
> +
> +       minor->major = major;
> +       minor->private = private;
> +       minor->ops = ops;
> +       return minor;
> +}
> +
> +/**
> + * simple_char_minor_free() - Free a simple_char chardev minor
> + * @minor:     the minor to free.
> + *
> + * This frees a chardev minor and prevents that minor's @ops->reference
> + * op from being called in the future.
> + */
> +void simple_char_minor_free(struct simple_char_minor *minor)
> +{
> +       mutex_lock(&minor->major->lock);
> +       idr_remove(&minor->major->idr, MINOR(minor->devt));
> +       mutex_unlock(&minor->major->lock);
> +       kfree(minor);
> +}
> +EXPORT_SYMBOL(simple_char_minor_free);
> diff --git a/include/linux/simple_char.h b/include/linux/simple_char.h
> new file mode 100644
> index 000000000000..8f391e7b50af
> --- /dev/null
> +++ b/include/linux/simple_char.h
> @@ -0,0 +1,38 @@
> +/*
> + * A simple way to create character devices
> + *
> + * Copyright (c) 2015 Andy Lutomirski <luto@amacapital.net>
> + *
> + * Licensed under the GPLv2.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/kobject.h>
> +#include <linux/file.h>
> +
> +struct simple_char_major;
> +
> +struct simple_char_ops {
> +       bool (*reference)(void *private);
> +       const struct file_operations *fops;
> +};
> +
> +struct simple_char_minor {
> +       struct simple_char_major *major;
> +       const struct simple_char_ops *ops;
> +       void *private;
> +       dev_t devt;
> +};
> +
> +extern struct simple_char_minor *
> +simple_char_minor_create(struct simple_char_major *major,
> +                        const struct simple_char_ops *ops,
> +                        void *private);
> +extern void simple_char_minor_free(struct simple_char_minor *minor);
> +
> +extern void simple_char_file_release(struct file *filep, struct kobject *kobj);
> +
> +/* These exist only to support legacy classes that need their own major. */
> +extern struct simple_char_major *simple_char_major_create(const char *name);
> +extern void simple_char_major_free(struct simple_char_major *major);
> +
> --
> 2.1.0
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: David Rientjes @ 2015-03-26  0:19 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Vlastimil Babka, Aliaksey Kandratsenka, Andrew Morton, Shaohua Li,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <55131F70.7020503-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wed, 25 Mar 2015, Daniel Micay wrote:

> > I'm not sure I get your description right. The problem I know about is
> > where "purging" means madvise(MADV_DONTNEED) and khugepaged later
> > collapses a new hugepage that will repopulate the purged parts,
> > increasing the memory usage. One can limit this via
> > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none . That
> > setting doesn't affect the page fault THP allocations, which however
> > happen only in newly accessed hugepage-sized areas and not partially
> > purged ones, though.
> 
> Since jemalloc doesn't unmap memory but instead does recycling itself in
> userspace, it ends up with large spans of free virtual memory and gets
> *lots* of huge pages from the page fault heuristic. It keeps track of
> active vs. dirty (not purged) vs. clean (purged / untouched) ranges
> everywhere, and will purge dirty ranges as they build up.
> 
> The THP allocation on page faults mean it ends up with memory that's
> supposed to be clean but is really not.
> 
> A worst case example with the (up until recently) default chunk size of
> 4M is allocating a bunch of 2.1M allocations. Chunks are naturally
> aligned, so each one can be represented as 2 huge pages. It increases
> memory usage by nearly *50%*. The allocator thinks the tail is clean
> memory, but it's not. When the allocations are freed, it will purge the
> 2.1M at the head (once enough dirty memory builds up) but all of the
> tail memory will be leaked until something else is allocated there and
> then freed.
> 

With tcmalloc, it's simple to always expand the heap by mmaping 2MB ranges 
for size classes <= 2MB, allocate its own metadata from an arena that is 
also expanded in 2MB range, and always do madvise(MADV_DONTNEED) for the 
longest span on the freelist when it does periodic memory freeing back to 
the kernel, and even better if the freed memory splits at most one 
hugepage.  When memory is pulled from the freelist of memory that has 
already been returned to the kernel, you can return a span that will make 
it eligible to be collapsed into a hugepage based on your setting of 
max_ptes_none, trying to consolidate the memory as much as possible.  If 
your malloc is implemented in a way to understand the benefit of 
hugepages, and how much memory you're willing to sacrifice (max_ptes_none) 
for it, then you should _never_ be increasing memory usage by 50%.

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Daniel Micay @ 2015-03-26  0:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Aliaksey Kandratsenka, Andrew Morton, Shaohua Li,
	linux-mm, linux-api, Rik van Riel, Hugh Dickins, Mel Gorman,
	Johannes Weiner, Michal Hocko, Andy Lutomirski,
	google-perftools@googlegroups.com
In-Reply-To: <alpine.DEB.2.10.1503251710400.31453@chino.kir.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 2727 bytes --]

On 25/03/15 08:19 PM, David Rientjes wrote:
> On Wed, 25 Mar 2015, Daniel Micay wrote:
> 
>>> I'm not sure I get your description right. The problem I know about is
>>> where "purging" means madvise(MADV_DONTNEED) and khugepaged later
>>> collapses a new hugepage that will repopulate the purged parts,
>>> increasing the memory usage. One can limit this via
>>> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none . That
>>> setting doesn't affect the page fault THP allocations, which however
>>> happen only in newly accessed hugepage-sized areas and not partially
>>> purged ones, though.
>>
>> Since jemalloc doesn't unmap memory but instead does recycling itself in
>> userspace, it ends up with large spans of free virtual memory and gets
>> *lots* of huge pages from the page fault heuristic. It keeps track of
>> active vs. dirty (not purged) vs. clean (purged / untouched) ranges
>> everywhere, and will purge dirty ranges as they build up.
>>
>> The THP allocation on page faults mean it ends up with memory that's
>> supposed to be clean but is really not.
>>
>> A worst case example with the (up until recently) default chunk size of
>> 4M is allocating a bunch of 2.1M allocations. Chunks are naturally
>> aligned, so each one can be represented as 2 huge pages. It increases
>> memory usage by nearly *50%*. The allocator thinks the tail is clean
>> memory, but it's not. When the allocations are freed, it will purge the
>> 2.1M at the head (once enough dirty memory builds up) but all of the
>> tail memory will be leaked until something else is allocated there and
>> then freed.
>>
> 
> With tcmalloc, it's simple to always expand the heap by mmaping 2MB ranges 
> for size classes <= 2MB, allocate its own metadata from an arena that is 
> also expanded in 2MB range, and always do madvise(MADV_DONTNEED) for the 
> longest span on the freelist when it does periodic memory freeing back to 
> the kernel, and even better if the freed memory splits at most one 
> hugepage.  When memory is pulled from the freelist of memory that has 
> already been returned to the kernel, you can return a span that will make 
> it eligible to be collapsed into a hugepage based on your setting of 
> max_ptes_none, trying to consolidate the memory as much as possible.  If 
> your malloc is implemented in a way to understand the benefit of 
> hugepages, and how much memory you're willing to sacrifice (max_ptes_none) 
> for it, then you should _never_ be increasing memory usage by 50%.

If khugepaged was the only source of huge pages, sure. The primary
source of huge pages is the heuristic handing out an entire 2M page on
the first page fault in a 2M range.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Minchan Kim @ 2015-03-26  0:50 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Aliaksey Kandratsenka, Shaohua Li, Andrew Morton, linux-mm,
	linux-api, Rik van Riel, Hugh Dickins, Mel Gorman,
	Johannes Weiner, Michal Hocko, Andy Lutomirski,
	google-perftools@googlegroups.com
In-Reply-To: <55117724.6030102@gmail.com>

Hello Daniel,

On Tue, Mar 24, 2015 at 10:39:32AM -0400, Daniel Micay wrote:
> On 24/03/15 01:25 AM, Aliaksey Kandratsenka wrote:
> > 
> > Well, I don't have any workloads. I'm just maintaining a library that
> > others run various workloads on. Part of the problem is lack of good
> > and varied malloc benchmarks which could allow us that prevent
> > regression. So this makes me a bit more cautious on performance
> > matters.
> > 
> > But I see your point. Indeed I have no evidence at all that exclusive
> > locking might cause observable performance difference.
> 
> I'm sure it matters but I expect you'd need *many* cores running many
> threads before it started to outweigh the benefit of copying pages
> instead of data.
> 
> Thinking about it a bit more, it would probably make sense for mremap to
> start with the optimistic assumption that the reader lock is enough here
> when using MREMAP_NOHOLE|MREMAP_FIXED. It only needs the writer lock if
> the destination mapping is incomplete or doesn't match, which is an edge
> case as holes would mean thread unsafety.
> 
> An ideal allocator will toggle on PROT_NONE when overcommit is disabled
> so this assumption would be wrong. The heuristic could just be adjusted
> to assume the dest VMA will match with MREMAP_NOHOLE|MREMAP_FIXED when
> full memory accounting isn't enabled. The fallback would never ended up
> being needed in existing use cases that I'm aware of, and would just add
> the overhead of a quick lock, O(log n) check and unlock with the reader
> lock held anyway. Another flag isn't really necessary.
> 
> >>> Another notable thing is how mlock effectively disables MADV_DONTNEED for
> >>> jemalloc{1,2} and tcmalloc, lowers page faults count and thus improves
> >>> runtime. It can be seen that tcmalloc+mlock on thp-less configuration is
> >>> slightly better on runtime to glibc. The later spends a ton of time in
> >>> kernel,
> >>> probably handling minor page faults, and the former burns cpu in user space
> >>> doing memcpy-s. So "tons of memcpys" seems to be competitive to what glibc
> >>> is
> >>> doing in this benchmark.
> >>
> >> mlock disables MADV_DONTNEED, so this is an unfair comparsion. With it,
> >> allocator will use more memory than expected.
> > 
> > Do not agree with unfair. I'm actually hoping MADV_FREE to provide
> > most if not all of benefits of mlock in this benchmark. I believe it's
> > not too unreasonable expectation.
> 
> MADV_FREE will still result in as many page faults, just no zeroing.

I didn't follow this thread. However, as you mentioned MADV_FREE will
make many page fault, I jump into here.
One of the benefit with MADV_FREE in current implementation is to
avoid page fault as well as no zeroing.
Why did you see many page fault?


> 
> I get ~20k requests/s with jemalloc on the ebizzy benchmark with this
> dual core ivy bridge laptop. It jumps to ~60k requests/s with MADV_FREE
> IIRC, but disabling purging via MALLOC_CONF=lg_dirty_mult:-1 leads to
> 3.5 *million* requests/s. It has a similar impact with TCMalloc.

When I tested MADV_FREE with ebizzy, I saw similar result two or three
times fater than MADV_DONTNEED. But It's no free cost. It incurs MADV_FREE
cost itself*(ie, enumerating all of page table in the range and clear
dirty bit and tlb flush). Of course, it has mmap_sem with read-side lock.
If you see great improve when you disable purging, I guess mainly it's
caused by no lock of mmap_sem so some threads can allocate while other
threads can do page fault. The reason I think so is I saw similar result
when I implemented vrange syscall which hold mmap_sem read-side lock
during very short time(ie, marking the volatile into vma, ie O(1) while
MADV_FREE holds a lock during enumerating all of pages in the range, ie O(N))

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Daniel Micay @ 2015-03-26  1:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Aliaksey Kandratsenka, Shaohua Li, Andrew Morton,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <20150326005009.GA7658@blaptop>

[-- Attachment #1: Type: text/plain, Size: 1865 bytes --]

> I didn't follow this thread. However, as you mentioned MADV_FREE will
> make many page fault, I jump into here.
> One of the benefit with MADV_FREE in current implementation is to
> avoid page fault as well as no zeroing.
> Why did you see many page fault?

I think I just misunderstood why it was still so much slower than not
using purging at all.

>> I get ~20k requests/s with jemalloc on the ebizzy benchmark with this
>> dual core ivy bridge laptop. It jumps to ~60k requests/s with MADV_FREE
>> IIRC, but disabling purging via MALLOC_CONF=lg_dirty_mult:-1 leads to
>> 3.5 *million* requests/s. It has a similar impact with TCMalloc.
> 
> When I tested MADV_FREE with ebizzy, I saw similar result two or three
> times fater than MADV_DONTNEED. But It's no free cost. It incurs MADV_FREE
> cost itself*(ie, enumerating all of page table in the range and clear
> dirty bit and tlb flush). Of course, it has mmap_sem with read-side lock.
> If you see great improve when you disable purging, I guess mainly it's
> caused by no lock of mmap_sem so some threads can allocate while other
> threads can do page fault. The reason I think so is I saw similar result
> when I implemented vrange syscall which hold mmap_sem read-side lock
> during very short time(ie, marking the volatile into vma, ie O(1) while
> MADV_FREE holds a lock during enumerating all of pages in the range, ie O(N))

It stops doing mmap after getting warmed up since it never unmaps so I
don't think mmap_sem is a contention issue. It could just be caused by
the cost of the system call itself and TLB flush. I found perf to be
fairly useless in identifying where the time was being spent.

It might be much more important to purge very large ranges in one go
with MADV_FREE. It's a different direction than the current compromises
forced by MADV_DONTNEED.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v4] Add virtio-input driver.
From: Rusty Russell @ 2015-03-26  1:53 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Gerd Hoffmann, virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mst-H+wXaHxf7aLQT0dZR+AlfA, David Herrmann, Dmitry Torokhov,
	open list, open list:ABI/API
In-Reply-To: <20150325053613.GA4572-IBi9RG/b67k@public.gmane.org>

Vojtech Pavlik <vojtech-IBi9RG/b67k@public.gmane.org> writes:
> On Wed, Mar 25, 2015 at 01:51:43PM +1030, Rusty Russell wrote:
>> Imagine a future virtio standard which incorporates this.  And a Windows
>> or FreeBSD implementation of the device and or driver.  How ugly would
>> they be?
>
> A windows translation layer is fairly easy, people porting software from
> Windows to Linux told me numerous times that adapting isn't hard. I also
> believe that at least one of the BSD's has a compatible implementation
> these days based on the fact that I was asked to allow copying the
> headers in the past.

Thanks Vojtech, that answers my questions.

I figure we apply this and see where it leads...

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: David Rientjes @ 2015-03-26  2:31 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Vlastimil Babka, Aliaksey Kandratsenka, Andrew Morton, Shaohua Li,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <551351CA.3090803-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wed, 25 Mar 2015, Daniel Micay wrote:

> > With tcmalloc, it's simple to always expand the heap by mmaping 2MB ranges 
> > for size classes <= 2MB, allocate its own metadata from an arena that is 
> > also expanded in 2MB range, and always do madvise(MADV_DONTNEED) for the 
> > longest span on the freelist when it does periodic memory freeing back to 
> > the kernel, and even better if the freed memory splits at most one 
> > hugepage.  When memory is pulled from the freelist of memory that has 
> > already been returned to the kernel, you can return a span that will make 
> > it eligible to be collapsed into a hugepage based on your setting of 
> > max_ptes_none, trying to consolidate the memory as much as possible.  If 
> > your malloc is implemented in a way to understand the benefit of 
> > hugepages, and how much memory you're willing to sacrifice (max_ptes_none) 
> > for it, then you should _never_ be increasing memory usage by 50%.
> 
> If khugepaged was the only source of huge pages, sure. The primary
> source of huge pages is the heuristic handing out an entire 2M page on
> the first page fault in a 2M range.
> 

The behavior is a property of what you brk() or mmap() to expand your 
heap, you can intentionally require it to fault hugepages or not fault 
hugepages without any special madvise().

With the example above, the implementation I wrote specifically tries to 
sbrk() in 2MB regions and hands out allocator metadata via a memory arena 
doing the same thing.  Memory is treated as being on a normal freelist so 
that it is considered resident, i.e. the same as faulting 4KB, freeing it, 
before tcmalloc does madvise(MADV_DONTNEED), and we naturally prefer to 
hand that out before going to the returned freelist or mmap() as fallback.  
There will always be fragmentation in your normal freelist spans, so 
there's always wasted memory (with or without thp).  There should never be 
a case where you're always mapping 2MB aligned regions and then only 
touching a small portion of it, for >2MB size classes you could easily map 
only the size required and you would never get an excess of memory due to 
thp at fault.

I think this may be tangential to the thread, though, since this has 
nothing to do with mremap() or any new mremap() flag.

If the thp faulting behavior is going to be changed, then it would need to 
be something that is opted into and not by any system tunable or madvise() 
flag.  It would probably need to be a prctl() like PR_SET_THP_DISABLE is 
that would control only fault behavior.

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Daniel Micay @ 2015-03-26  3:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Aliaksey Kandratsenka, Andrew Morton, Shaohua Li,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <alpine.DEB.2.10.1503251914260.16714-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2017 bytes --]

It's all well and good to say that you shouldn't do that, but it's the
basis of the design in jemalloc and other zone-based arena allocators.

There's a chosen chunk size and chunks are naturally aligned. An
allocation is either a span of chunks (chunk-aligned) or has metadata
stored in the chunk header. This also means chunks can be assigned to
arenas for a high level of concurrency. Thread caching is then only
necessary for batching operations to amortize the cost of locking rather
than to reduce contention. Per-CPU arenas can be implemented quite well
by using sched_getcpu() to move threads around whenever it detects that
another thread allocated from the arena.

With >= 2M chunks, madvise purging works very well at the chunk level
but there's also fine-grained purging within chunks and it completely
breaks down from THP page faults.

The allocator packs memory towards low addresses (address-ordered
best-fit and first-fit can both be done in O(log n) time) so swings in
memory usage will tend to clear large spans of memory which will then
fault in huge pages no matter how it was mapped. Once MADV_FREE can be
used rather than MADV_DONTNEED, this would only happen after memory
pressure... but that's not very comforting.

I don't find it acceptable that programs can have huge (up to ~30% in
real programs) amounts of memory leaked over time due to THP page
faults. This is a very real problem impacting projects like Redis,
MariaDB and Firefox because they all use jemalloc.

https://shk.io/2015/03/22/transparent-huge-pages/
https://www.percona.com/blog/2014/07/23/why-tokudb-hates-transparent-hugepages/
http://dev.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-nuodb
https://bugzilla.mozilla.org/show_bug.cgi?id=770612

Bionic (Android's libc) switched over to jemalloc too.

The only reason you don't hear about this with glibc is because it
doesn't have aggressive, fine-grained purging and a low fragmentation
design in the first place.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Daniel Micay @ 2015-03-26  3:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Aliaksey Kandratsenka, Andrew Morton, Shaohua Li,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <55137C06.9020608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

jemalloc doesn't really use free lists or sbrk for user allocations much
at all: thread caches are arrays of pointers (easier to flush and no
need to deref stale memory), red-black trees manage chunks and runs
within chunks, and runs use bitmaps. It can use sbrk as an alternate
source of chunks, but it defaults to using mmap and there's no real
advantage to switching it.

THP currently seems to be designed around the assumption that all
userspace allocators are variants of dlmalloc...


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH 1/5] usb: phy: twl4030: make runtime pm more reliable.
From: Pavel Machek @ 2015-03-26  6:39 UTC (permalink / raw)
  To: Kishon Vijay Abraham I
  Cc: NeilBrown, Tony Lindgren, linux-api, linux-kernel, GTA04 owners,
	NeilBrown
In-Reply-To: <55132965.3090002@ti.com>

> > diff --git a/drivers/phy/phy-twl4030-usb.c
    b/drivers/phy/phy-twl4030-usb.c
> > index 8e87f54671f3..1a244f34b748 100644
> > --- a/drivers/phy/phy-twl4030-usb.c
> > +++ b/drivers/phy/phy-twl4030-usb.c
> > @@ -144,6 +144,16 @@
> >  #define PMBR1				0x0D
> >  #define GPIO_USB_4PIN_ULPI_2430C	(3 << 0)
> >  
> > +/*
> > + * If VBUS is valid or ID is ground, then we know a
> > + * cable is present and we need to be runtime-enabled
> > + */
> > +static inline bool cable_present(enum omap_musb_vbus_id_status stat)
> 
> twl4030_cable_present?

It is static function, no need for prefixes, they just make code
harder to read.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
From: Minchan Kim @ 2015-03-26  7:02 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Aliaksey Kandratsenka, Shaohua Li, Andrew Morton,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Rik van Riel, Hugh Dickins, Mel Gorman, Johannes Weiner,
	Michal Hocko, Andy Lutomirski,
	google-perftools-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
In-Reply-To: <55135F06.4000906-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wed, Mar 25, 2015 at 09:21:10PM -0400, Daniel Micay wrote:
> > I didn't follow this thread. However, as you mentioned MADV_FREE will
> > make many page fault, I jump into here.
> > One of the benefit with MADV_FREE in current implementation is to
> > avoid page fault as well as no zeroing.
> > Why did you see many page fault?
> 
> I think I just misunderstood why it was still so much slower than not
> using purging at all.
> 
> >> I get ~20k requests/s with jemalloc on the ebizzy benchmark with this
> >> dual core ivy bridge laptop. It jumps to ~60k requests/s with MADV_FREE
> >> IIRC, but disabling purging via MALLOC_CONF=lg_dirty_mult:-1 leads to
> >> 3.5 *million* requests/s. It has a similar impact with TCMalloc.
> > 
> > When I tested MADV_FREE with ebizzy, I saw similar result two or three
> > times fater than MADV_DONTNEED. But It's no free cost. It incurs MADV_FREE
> > cost itself*(ie, enumerating all of page table in the range and clear
> > dirty bit and tlb flush). Of course, it has mmap_sem with read-side lock.
> > If you see great improve when you disable purging, I guess mainly it's
> > caused by no lock of mmap_sem so some threads can allocate while other
> > threads can do page fault. The reason I think so is I saw similar result
> > when I implemented vrange syscall which hold mmap_sem read-side lock
> > during very short time(ie, marking the volatile into vma, ie O(1) while
> > MADV_FREE holds a lock during enumerating all of pages in the range, ie O(N))
> 
> It stops doing mmap after getting warmed up since it never unmaps so I
> don't think mmap_sem is a contention issue. It could just be caused by
> the cost of the system call itself and TLB flush. I found perf to be
> fairly useless in identifying where the time was being spent.
> 
> It might be much more important to purge very large ranges in one go
> with MADV_FREE. It's a different direction than the current compromises
> forced by MADV_DONTNEED.
> 

I tested ebizzy + recent jemalloc in my KVM guest.

Apparently, no purging was best(ie, 4925 records/s) while purging with
MADV_DONTNEED was worst(ie, 1814 records/s).
However, in my machine, purging with MADV_FREE was not bad as yourr.

        4338 records/s vs 4925 records/s.

Still, no purging was win but if we consider the num of madvise syscall
between no purging and MADV_FREE purging, it would be better than now.

        0 vs 43724

One thing I am wondering is why the madvise syscall count is increased
when we turns on MADV_FREE compared to MADV_DONTNEED. It might be
aggressive dirty puring rule in jemalloc internal?

Anyway, my point is gap between MADV_FREE and no puring in my machine
is not much like you said.

********
#> lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Stepping:              7
CPU MHz:               1200.000
BogoMIPS:              6399.71
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11

*****

ebizzy 0.2 
(C) 2006-7 Intel Corporation
(C) 2007 Valerie Henson <val-/gER7w9Thpc@public.gmane.org>
always_mmap 0
never_mmap 0
chunks 10
prevent coalescing using permissions 0
prevent coalescing using holes 0
random_size 0
chunk_size 5242880
seconds 10
threads 24
verbose 1
linear 0
touch_pages 0
page size 4096
Allocated memory
Wrote memory
Threads starting
Threads finished

******

jemalloc git head
commit 65db63cf3f0c5dd5126a1b3786756486eaf931ba
Author: Jason Evans <je-b10kYP2dOMg@public.gmane.org>
Date:   Wed Mar 25 18:56:55 2015 -0700

    Fix in-place shrinking huge reallocation purging bugs.


******
1) LD_PRELOAD="/jemalloc/lib/libjemalloc.so.dontneed" strace -c -f ./ebizzy -s $((5<<20))

1814 records/s
real 10.00 s
user 28.18 s
sys  90.08 s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.78   99.368420        5469     18171           madvise
  9.14   10.001131    10001131         1           nanosleep
  0.05    0.050037         807        62        10 futex
  0.03    0.031721         291       109           mmap
  0.00    0.004455         178        25           set_robust_list
  0.00    0.000129           5        24           clone
  0.00    0.000000           0         4           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         6           open
  0.00    0.000000           0         6           close
  0.00    0.000000           0         6           fstat
  0.00    0.000000           0        32           mprotect
  0.00    0.000000           0        35           munmap
  0.00    0.000000           0         2           brk
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         3           rt_sigprocmask
  0.00    0.000000           0         4         3 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1         1 readlink
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         2           getrusage
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
------ ----------- ----------- --------- --------- ----------------
100.00  109.455893                 18501        14 total

2) LD_PRELOAD="/jemalloc/lib/libjemalloc.so.dontneed" MALLOC_CONF=lg_dirty_mult:-1 strace -c -f ./ebizzy -s $((5<<20))

4925 records/s
real 10.00 s
user 119.83 s
sys   0.16 s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 82.73    0.821804       15804        52         6 futex
 15.70    0.156000      156000         1           nanosleep
  1.53    0.015186         115       132           mmap
  0.04    0.000349           4        87           munmap
  0.00    0.000000           0         4           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         6           open
  0.00    0.000000           0         6           close
  0.00    0.000000           0         6           fstat
  0.00    0.000000           0        32           mprotect
  0.00    0.000000           0         2           brk
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         3           rt_sigprocmask
  0.00    0.000000           0         4         3 access
  0.00    0.000000           0        24           madvise
  0.00    0.000000           0        24           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1         1 readlink
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         2           getrusage
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0        25           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.993339                   419        10 total

3) LD_PRELOAD="/jemalloc/lib/libjemalloc.so.free" strace -c -f ./ebizzy -s $((5<<20))

4338 records/s
real 10.00 s
user 91.40 s
sys  12.58 s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 78.39   36.433483         839     43408           madvise
 21.53   10.004889    10004889         1           nanosleep
  0.04    0.020472         394        52        15 futex
  0.03    0.015464         145       107           mmap
  0.00    0.000041           2        24           clone
  0.00    0.000000           0         4           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         6           open
  0.00    0.000000           0         6           close
  0.00    0.000000           0         6           fstat
  0.00    0.000000           0        32           mprotect
  0.00    0.000000           0        33           munmap
  0.00    0.000000           0         2           brk 
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         3           rt_sigprocmask
  0.00    0.000000           0         4         3 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1         1 readlink
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         2           getrusage
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0        25           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   46.474349                 43724        19 total

-- 
Kind regards,
Minchan Kim

^ permalink raw reply

* Re: [PATCH] Add virtio gpu driver.
From: Gerd Hoffmann @ 2015-03-26  7:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, open list:ABI/API, Rusty Russell, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <20150325180721-mutt-send-email-mst@redhat.com>

On Mi, 2015-03-25 at 18:09 +0100, Michael S. Tsirkin wrote:
> On Wed, Mar 25, 2015 at 04:37:16PM +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > BTW can we teach virtio-gpu to look for framebuffer using
> > > virtio pci caps?
> > 
> > The virtio-gpu driver doesn't matter much here, it doesn't use it
> > anyway.
> > 
> > >  Or are there limitations such as only
> > > using IO port BARs, or compatibility with
> > > BIOS code etc that limit us to specific BARs anyway?
> > 
> > Yes, vgabios code needs to know.  Currently it has bar #2 for the vga
> > framebuffer bar hardcoded.  It's 16bit code.  I don't feel like making
> > the probing more complicated ...
> > 
> > cheers,
> >   Gerd
> 
> OK - you are saying all VGA cards use bar #2 for this
> functionality, so we are just following
> established practice here?

vgabios checks pci ids to figure.  qxl+stdvga use bar #0, vmware-vga bar
#1, virtio-vga bar #2.

cheers,
  Gerd


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH] Add virtio gpu driver.
From: Michael S. Tsirkin @ 2015-03-26  8:18 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b, Dave Airlie,
	Dave Airlie, David Airlie, Rusty Russell, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET...,
	open list:ABI/API
In-Reply-To: <1427353959.9779.2.camel-3OfP5uLMi4C46o+2HkPkLj4oCIwMql/M@public.gmane.org>

On Thu, Mar 26, 2015 at 08:12:39AM +0100, Gerd Hoffmann wrote:
> On Mi, 2015-03-25 at 18:09 +0100, Michael S. Tsirkin wrote:
> > On Wed, Mar 25, 2015 at 04:37:16PM +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > BTW can we teach virtio-gpu to look for framebuffer using
> > > > virtio pci caps?
> > > 
> > > The virtio-gpu driver doesn't matter much here, it doesn't use it
> > > anyway.
> > > 
> > > >  Or are there limitations such as only
> > > > using IO port BARs, or compatibility with
> > > > BIOS code etc that limit us to specific BARs anyway?
> > > 
> > > Yes, vgabios code needs to know.  Currently it has bar #2 for the vga
> > > framebuffer bar hardcoded.  It's 16bit code.  I don't feel like making
> > > the probing more complicated ...
> > > 
> > > cheers,
> > >   Gerd
> > 
> > OK - you are saying all VGA cards use bar #2 for this
> > functionality, so we are just following
> > established practice here?
> 
> vgabios checks pci ids to figure.  qxl+stdvga use bar #0, vmware-vga bar
> #1, virtio-vga bar #2.
> 
> cheers,
>   Gerd
> 

And is it possible to use offset within BAR and/or memory BARs?
If yes I'd strongly prefer this.
As for writing 16 bit code, I need to do this for virtio scsi/blk
anyway, so we'll be able to share code.

-- 
MST

^ permalink raw reply

* Re: [virtio-dev] Re: [PATCH] Add virtio gpu driver.
From: Gerd Hoffmann @ 2015-03-26  8:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, open list:ABI/API, Rusty Russell, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <20150326091545-mutt-send-email-mst@redhat.com>

  Hi,

> And is it possible to use offset within BAR and/or memory BARs?
> If yes I'd strongly prefer this.

What is the point?  Do you want place virtio regions and vga framebuffer
in the same pci bar?  Why?  virtio is mmio and traps into qemu on
access, whereas the vga framebuffer is memory-backed (with dirty
tracking turned on).  Don't think this is a good idea, even though the
memory api would probably allow to do this.

cheers,
  Gerd


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH v4] Add virtio-input driver.
From: Michael S. Tsirkin @ 2015-03-26  8:46 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: virtio-dev, virtualization, David Herrmann, Dmitry Torokhov,
	Rusty Russell, open list, open list:ABI/API
In-Reply-To: <1427198927-1745-1-git-send-email-kraxel@redhat.com>

On Tue, Mar 24, 2015 at 01:08:46PM +0100, Gerd Hoffmann wrote:
> virtio-input is basically evdev-events-over-virtio, so this driver isn't
> much more than reading configuration from config space and forwarding
> incoming events to the linux input layer.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Looks good overallm though I think I still see a couple of minor issues.

> ---
>  MAINTAINERS                       |   6 +
>  drivers/virtio/Kconfig            |  10 ++
>  drivers/virtio/Makefile           |   1 +
>  drivers/virtio/virtio_input.c     | 363 ++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/Kbuild         |   1 +
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_input.h |  76 ++++++++
>  7 files changed, 458 insertions(+)
>  create mode 100644 drivers/virtio/virtio_input.c
>  create mode 100644 include/uapi/linux/virtio_input.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 358eb01..6f233dd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10442,6 +10442,12 @@ S:	Maintained
>  F:	drivers/vhost/
>  F:	include/uapi/linux/vhost.h
>  
> +VIRTIO INPUT DRIVER
> +M:	Gerd Hoffmann <kraxel@redhat.com>
> +S:	Maintained
> +F:	drivers/virtio/virtio_input.c
> +F:	include/uapi/linux/virtio_input.h
> +
>  VIA RHINE NETWORK DRIVER
>  M:	Roger Luethi <rl@hellgate.ch>
>  S:	Maintained
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index b546da5..cab9f3f 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -48,6 +48,16 @@ config VIRTIO_BALLOON
>  
>  	 If unsure, say M.
>  
> +config VIRTIO_INPUT
> +	tristate "Virtio input driver"
> +	depends on VIRTIO
> +	depends on INPUT
> +	---help---
> +	 This driver supports virtio input devices such as
> +	 keyboards, mice and tablets.
> +
> +	 If unsure, say M.
> +
>   config VIRTIO_MMIO
>  	tristate "Platform bus driver for memory mapped virtio devices"
>  	depends on HAS_IOMEM
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index d85565b..41e30e3 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
> new file mode 100644
> index 0000000..fc99a15
> --- /dev/null
> +++ b/drivers/virtio/virtio_input.c
> @@ -0,0 +1,363 @@
> +#include <linux/module.h>
> +#include <linux/virtio.h>
> +#include <linux/input.h>
> +
> +#include <uapi/linux/virtio_ids.h>
> +#include <uapi/linux/virtio_input.h>
> +
> +struct virtio_input {
> +	struct virtio_device       *vdev;
> +	struct input_dev           *idev;
> +	char                       name[64];
> +	char                       serial[64];
> +	char                       phys[64];
> +	struct virtqueue           *evt, *sts;
> +	struct virtio_input_event  evts[64];
> +	spinlock_t                 lock;
> +	bool                       ready;
> +};
> +
> +static void virtinput_queue_evtbuf(struct virtio_input *vi,
> +				   struct virtio_input_event *evtbuf)
> +{
> +	struct scatterlist sg[1];
> +
> +	sg_init_one(sg, evtbuf, sizeof(*evtbuf));
> +	virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC);
> +}
> +
> +static void virtinput_recv_events(struct virtqueue *vq)
> +{
> +	struct virtio_input *vi = vq->vdev->priv;
> +	struct virtio_input_event *event;
> +	unsigned long flags;
> +	unsigned int len;
> +
> +	spin_lock_irqsave(&vi->lock, flags);
> +	if (vi->ready) {
> +		while ((event = virtqueue_get_buf(vi->evt, &len)) != NULL) {
> +			input_event(vi->idev,
> +				    le16_to_cpu(event->type),
> +				    le16_to_cpu(event->code),
> +				    le32_to_cpu(event->value));
> +			virtinput_queue_evtbuf(vi, event);
> +		}
> +		virtqueue_kick(vq);
> +	}
> +	spin_unlock_irqrestore(&vi->lock, flags);
> +}
> +
> +static int virtinput_send_status(struct virtio_input *vi,
> +				 u16 type, u16 code, s32 value)
> +{
> +	struct virtio_input_event *stsbuf;
> +	struct scatterlist sg[1];
> +	unsigned long flags;
> +	int rc;
> +
> +	stsbuf = kzalloc(sizeof(*stsbuf), GFP_ATOMIC);
> +	if (!stsbuf)
> +		return -ENOMEM;
> +
> +	stsbuf->type  = cpu_to_le16(type);
> +	stsbuf->code  = cpu_to_le16(code);
> +	stsbuf->value = cpu_to_le32(value);
> +	sg_init_one(sg, stsbuf, sizeof(*stsbuf));
> +
> +	spin_lock_irqsave(&vi->lock, flags);
> +	if (vi->ready) {
> +		rc = virtqueue_add_outbuf(vi->sts, sg, 1, stsbuf, GFP_ATOMIC);
> +		virtqueue_kick(vi->sts);
> +	} else {
> +		rc = -ENODEV;
> +	}
> +	spin_unlock_irqrestore(&vi->lock, flags);
> +
> +	if (rc != 0)
> +		kfree(stsbuf);
> +	return rc;
> +}
> +
> +static void virtinput_recv_status(struct virtqueue *vq)
> +{
> +	struct virtio_input *vi = vq->vdev->priv;
> +	struct virtio_input_event *stsbuf;
> +	unsigned long flags;
> +	unsigned int len;
> +
> +	spin_lock_irqsave(&vi->lock, flags);
> +	while ((stsbuf = virtqueue_get_buf(vi->sts, &len)) != NULL)
> +		kfree(stsbuf);
> +	spin_unlock_irqrestore(&vi->lock, flags);
> +}
> +
> +static int virtinput_status(struct input_dev *idev, unsigned int type,
> +			    unsigned int code, int value)
> +{
> +	struct virtio_input *vi = input_get_drvdata(idev);
> +
> +	return virtinput_send_status(vi, type, code, value);
> +}
> +
> +static size_t virtinput_cfg_select(struct virtio_input *vi,
> +				   u8 select, u8 subsel)
> +{
> +	u8 size;
> +
> +	virtio_cwrite(vi->vdev, struct virtio_input_config, select, &select);
> +	virtio_cwrite(vi->vdev, struct virtio_input_config, subsel, &subsel);
> +	virtio_cread(vi->vdev, struct virtio_input_config, size, &size);
> +	return size;
> +}
> +
> +static void virtinput_cfg_bits(struct virtio_input *vi, int select, int subsel,
> +			       unsigned long *bits, unsigned int bitcount)
> +{
> +	unsigned int bit;
> +	size_t bytes;
> +	u8 *virtio_bits;
> +
> +	bytes = virtinput_cfg_select(vi, select, subsel);
> +	if (!bytes)
> +		return;
> +	if (bitcount > bytes * 8)
> +		bitcount = bytes * 8;
> +
> +	/*
> +	 * Bitmap in virtio config space is a simple stream of bytes,
> +	 * with the first byte carrying bits 0-7, second bits 8-15 and
> +	 * so on.
> +	 */
> +	virtio_bits = kzalloc(bytes, GFP_KERNEL);
> +	if (!virtio_bits)
> +		return;
> +	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config,
> +					      u.bitmap),
> +			   virtio_bits, bytes);
> +	for (bit = 0; bit < bitcount; bit++) {
> +		if (virtio_bits[bit / 8] & (1 << (bit % 8)))
> +			__set_bit(bit, bits);
> +	}
> +	kfree(virtio_bits);
> +
> +	if (select == VIRTIO_INPUT_CFG_EV_BITS)
> +		__set_bit(subsel, vi->idev->evbit);
> +}
> +
> +static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
> +{
> +	u32 mi, ma, re, fu, fl;
> +
> +	virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.res, &re);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
> +	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);
> +	input_set_abs_params(vi->idev, abs, mi, ma, fu, fl);
> +	input_abs_set_res(vi->idev, abs, re);
> +}
> +
> +static int virtinput_init_vqs(struct virtio_input *vi)
> +{
> +	struct virtqueue *vqs[2];
> +	vq_callback_t *cbs[] = { virtinput_recv_events,
> +				 virtinput_recv_status };
> +	static const char *names[] = { "events", "status" };
> +	int err;
> +
> +	err = vi->vdev->config->find_vqs(vi->vdev, 2, vqs, cbs, names);
> +	if (err)
> +		return err;
> +	vi->evt = vqs[0];
> +	vi->sts = vqs[1];
> +
> +	return 0;
> +}
> +
> +static void virtinput_fill_evt(struct virtio_input *vi)
> +{
> +	unsigned long flags;
> +	int i, size;
> +
> +	spin_lock_irqsave(&vi->lock, flags);
> +	size = virtqueue_get_vring_size(vi->evt);
> +	if (size > ARRAY_SIZE(vi->evts))
> +		size = ARRAY_SIZE(vi->evts);
> +	for (i = 0; i < size; i++)
> +		virtinput_queue_evtbuf(vi, &vi->evts[i]);
> +	virtqueue_kick(vi->evt);
> +	spin_unlock_irqrestore(&vi->lock, flags);
> +}
> +
> +static int virtinput_probe(struct virtio_device *vdev)
> +{
> +	struct virtio_input *vi;
> +	size_t size;
> +	int abs, err;
> +
> +	if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> +		return -ENODEV;
> +
> +	vi = kzalloc(sizeof(*vi), GFP_KERNEL);
> +	if (!vi)
> +		return -ENOMEM;
> +
> +	vdev->priv = vi;
> +	vi->vdev = vdev;
> +	spin_lock_init(&vi->lock);
> +
> +	err = virtinput_init_vqs(vi);
> +	if (err)
> +		goto err_init_vq;
> +
> +	vi->idev = input_allocate_device();
> +	if (!vi->idev) {
> +		err = -ENOMEM;
> +		goto err_input_alloc;
> +	}
> +	input_set_drvdata(vi->idev, vi);
> +
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_NAME, 0);
> +	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config,
> +					      u.string),
> +			   vi->name, min(size, sizeof(vi->name)));
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_SERIAL, 0);
> +	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config,
> +					      u.string),
> +			   vi->serial, min(size, sizeof(vi->serial)));
> +	snprintf(vi->phys, sizeof(vi->phys),
> +		 "virtio%d/input0", vdev->index);
> +	vi->idev->name = vi->name;
> +	vi->idev->phys = vi->phys;
> +	vi->idev->uniq = vi->serial;
> +
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_DEVIDS, 0);
> +	if (size >= sizeof(struct virtio_input_devids)) {
> +		virtio_cread(vi->vdev, struct virtio_input_config,
> +			     u.ids.bustype, &vi->idev->id.bustype);
> +		virtio_cread(vi->vdev, struct virtio_input_config,
> +			     u.ids.vendor, &vi->idev->id.vendor);
> +		virtio_cread(vi->vdev, struct virtio_input_config,
> +			     u.ids.product, &vi->idev->id.product);
> +		virtio_cread(vi->vdev, struct virtio_input_config,
> +			     u.ids.version, &vi->idev->id.version);
> +	} else {
> +		vi->idev->id.bustype = BUS_VIRTUAL;
> +	}
> +
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_PROP_BITS, 0,
> +			   vi->idev->propbit, INPUT_PROP_CNT);
> +	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REP);
> +	if (size)
> +		__set_bit(EV_REP, vi->idev->evbit);
> +
> +	vi->idev->dev.parent = &vdev->dev;
> +	vi->idev->event = virtinput_status;
> +
> +	/* device -> kernel */
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_KEY,
> +			   vi->idev->keybit, KEY_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REL,
> +			   vi->idev->relbit, REL_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_ABS,
> +			   vi->idev->absbit, ABS_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_MSC,
> +			   vi->idev->mscbit, MSC_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SW,
> +			   vi->idev->swbit,  SW_CNT);
> +
> +	/* kernel -> device */
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_LED,
> +			   vi->idev->ledbit, LED_CNT);
> +	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SND,
> +			   vi->idev->sndbit, SND_CNT);
> +
> +	if (test_bit(EV_ABS, vi->idev->evbit)) {
> +		for (abs = 0; abs < ABS_CNT; abs++) {
> +			if (!test_bit(abs, vi->idev->absbit))
> +				continue;
> +			virtinput_cfg_abs(vi, abs);
> +		}
> +	}
> +	err = input_register_device(vi->idev);
> +	if (err)
> +		goto err_input_register;


vi->ready is false here so you might lose status updates.
Losing updates isn't too terrible, but seems easy to fix:


	virtio_device_ready(vdev);
	vi->ready = true;
	err = input_register_device(vi->idev);

...

err_input_register:
	vi->ready = false;

it's up to you whether to do this.


> +
> +	virtio_device_ready(vdev);
> +	vi->ready = true;
> +	virtinput_fill_evt(vi);
> +	return 0;
> +
> +err_input_register:
> +	input_free_device(vi->idev);
> +err_input_alloc:
> +	vdev->config->del_vqs(vdev);
> +err_init_vq:
> +	kfree(vi);
> +	return err;
> +}
> +
> +static void virtinput_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_input *vi = vdev->priv;
> +
> +	vi->ready = false;

This one needs to be done under a lock, otherwise
we can't be sure no callbacks are running here.

> +	input_unregister_device(vi->idev);
> +	vdev->config->del_vqs(vdev);
> +	kfree(vi);
> +}
> +
> +#ifdef CONFIG_PM_SLEEP
> +static int virtinput_freeze(struct virtio_device *vdev)
> +{
> +	struct virtio_input *vi = vdev->priv;
> +
> +	vi->ready = false;

Same here I think.

> +	vdev->config->del_vqs(vdev);
> +	return 0;
> +}
> +
> +static int virtinput_restore(struct virtio_device *vdev)
> +{
> +	struct virtio_input *vi = vdev->priv;
> +	int err;
> +
> +	err = virtinput_init_vqs(vi);
> +	if (err)
> +		return err;
> +
> +	virtio_device_ready(vdev);
> +	vi->ready = true;
> +	virtinput_fill_evt(vi);
> +	return 0;
> +}
> +#endif
> +
> +static unsigned int features[] = {
> +};


an emoty line won't hurt here.

> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_INPUT, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> +static struct virtio_driver virtio_input_driver = {
> +	.driver.name         = KBUILD_MODNAME,
> +	.driver.owner        = THIS_MODULE,
> +	.feature_table       = features,
> +	.feature_table_size  = ARRAY_SIZE(features),
> +	.id_table            = id_table,
> +	.probe               = virtinput_probe,
> +	.remove              = virtinput_remove,
> +#ifdef CONFIG_PM_SLEEP
> +	.freeze	             = virtinput_freeze,
> +	.restore             = virtinput_restore,
> +#endif
> +};
> +
> +module_virtio_driver(virtio_input_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Virtio input device driver");
> +MODULE_AUTHOR("Gerd Hoffmann <kraxel@redhat.com>");
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index 68ceb97..04b829e 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -430,6 +430,7 @@ header-y += virtio_blk.h
>  header-y += virtio_config.h
>  header-y += virtio_console.h
>  header-y += virtio_ids.h
> +header-y += virtio_input.h
>  header-y += virtio_net.h
>  header-y += virtio_pci.h
>  header-y += virtio_ring.h
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 284fc3a..5f60aa4 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -39,5 +39,6 @@
>  #define VIRTIO_ID_9P		9 /* 9p virtio console */
>  #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
>  #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
> +#define VIRTIO_ID_INPUT        18 /* virtio input */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_input.h b/include/uapi/linux/virtio_input.h
> new file mode 100644
> index 0000000..dc61b28
> --- /dev/null
> +++ b/include/uapi/linux/virtio_input.h
> @@ -0,0 +1,76 @@
> +#ifndef _LINUX_VIRTIO_INPUT_H
> +#define _LINUX_VIRTIO_INPUT_H
> +/* This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE. */

You must include linux/types.h for __le types I think.

> +#include <linux/virtio_ids.h>

This one is there in all other headers, so ok,
though I don't know why.

> +#include <linux/virtio_config.h>

I'm sure you don't need this one.

> +
> +enum virtio_input_config_select {
> +	VIRTIO_INPUT_CFG_UNSET      = 0x00,
> +	VIRTIO_INPUT_CFG_ID_NAME    = 0x01,
> +	VIRTIO_INPUT_CFG_ID_SERIAL  = 0x02,
> +	VIRTIO_INPUT_CFG_ID_DEVIDS  = 0x03,
> +	VIRTIO_INPUT_CFG_PROP_BITS  = 0x10,
> +	VIRTIO_INPUT_CFG_EV_BITS    = 0x11,
> +	VIRTIO_INPUT_CFG_ABS_INFO   = 0x12,
> +};
> +
> +struct virtio_input_absinfo {
> +	__u32 min;
> +	__u32 max;
> +	__u32 fuzz;
> +	__u32 flat;
> +	__u32 res;
> +};
> +
> +struct virtio_input_devids {
> +	__u16 bustype;
> +	__u16 vendor;
> +	__u16 product;
> +	__u16 version;
> +};
> +
> +struct virtio_input_config {
> +	__u8    select;
> +	__u8    subsel;
> +	__u8    size;
> +	__u8    reserved;


Thinking more about it, might it be better to make
this reserved[5] so the union is 8 byte aligned?
Will be helpful if you later want a 64 bit field in the union.

> +	union {
> +		char string[128];
> +		__u8 bitmap[128];
> +		struct virtio_input_absinfo abs;
> +		struct virtio_input_devids ids;
> +	} u;
> +};
> +
> +struct virtio_input_event {
> +	__le16 type;
> +	__le16 code;
> +	__le32 value;
> +};
> +
> +#endif /* _LINUX_VIRTIO_INPUT_H */
> -- 
> 1.8.3.1

^ permalink raw reply

* Re: [PATCH] Add virtio gpu driver.
From: Daniel Vetter @ 2015-03-26  8:53 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Daniel Vetter, virtio-dev-sDuHXQ4OtrM4h7I2RyI4rWD2FQJk+8+b,
	Michael S. Tsirkin, open list:ABI/API, Rusty Russell, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <1427295189.23304.6.camel-3OfP5uLMi4C46o+2HkPkLj4oCIwMql/M@public.gmane.org>

On Wed, Mar 25, 2015 at 03:53:09PM +0100, Gerd Hoffmann wrote:
> > > Signed-off-by: Dave Airlie <airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > Signed-off-by: Gerd Hoffmann <kraxel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > Standard request from my side for new drm drivers (especially if they're
> > this simple): Can you please update the drivers to latest drm internal
> > interfaces, i.e. using universal planes and atomic?
> 
> Have a docs / sample code pointer for me?

Picking any of the recently converted drivers or recently merged atomic
drivers should be fairly informative. Overall conversion procedure is
detailed in

http://blog.ffwll.ch/2014/11/atomic-modeset-support-for-kms-drivers.html
http://blog.ffwll.ch/2015/01/update-for-atomic-display-updates.html

And ofc there's the kerneldoc stuff in the drm docbook. If you have
questions probably best to ask them in #dri-devel irc, most of the people
who've done atomic conversions hang out there and can help out.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [virtio-dev] Re: [PATCH] Add virtio gpu driver.
From: Michael S. Tsirkin @ 2015-03-26  9:04 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: virtio-dev, open list:ABI/API, Rusty Russell, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <1427359367.9779.9.camel@nilsson.home.kraxel.org>

On Thu, Mar 26, 2015 at 09:42:47AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > And is it possible to use offset within BAR and/or memory BARs?
> > If yes I'd strongly prefer this.
> 
> What is the point?  Do you want place virtio regions and vga framebuffer
> in the same pci bar?  Why?  virtio is mmio and traps into qemu on
> access, whereas the vga framebuffer is memory-backed (with dirty
> tracking turned on).  Don't think this is a good idea, even though the
> memory api would probably allow to do this.
> 
> cheers,
>   Gerd

Absolutely, it's pretty common to mix regions in a BAR.
For example, we have virtio kick (ioeventfd backed,
handled in kernel) in same BAR as common and device
specific configuration.

We did the same thing you are now doing with the
virtio BAR, and now we have to maintain two code
bases, virtio pci config was designed to be future proof
so why not use it?

This is mostly just making sure we don't paint ourselves into a corner.

-- 
MST
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [RFC] simple_char: New infrastructure to simplify chardev management
From: Greg Kroah-Hartman @ 2015-03-26  9:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Jiri Kosina,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux API
In-Reply-To: <CALCETrUXgqAY1MzMP80W_YC-zCi3nostJE2sLA+X_U0Xu6hoFw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Mar 25, 2015 at 05:16:28PM -0700, Andy Lutomirski wrote:
> Quick ping: does anyone want to review this?

Yes, sorry, I'm still way behind on my patch queue review, want to get
to this this week.

greg k-h

^ permalink raw reply

* Re: [PATCH v3  04/15] clocksource: Add ARM System timer driver
From: Daniel Lezcano @ 2015-03-26  9:50 UTC (permalink / raw)
  To: Maxime Coquelin, u.kleine-koenig, afaerber, geert, Rob Herring,
	Philipp Zabel, Linus Walleij, Arnd Bergmann, stefan, pmeerw,
	pebolle
  Cc: Jonathan Corbet, Pawel Moll, Mark Rutland, Ian Campbell,
	Kumar Gala, Russell King, Thomas Gleixner, Greg Kroah-Hartman,
	Jiri Slaby, Andrew Morton, David S. Miller, Mauro Carvalho Chehab,
	Joe Perches, Antti Palosaari, Tejun Heo, Will Deacon,
	Nikolay Borisov, Rusty Russell, Kees Cook, Michal Marek,
	linux-doc, linux-arm-kernel, linux-kernel, devicetree, linux-gpi
In-Reply-To: <1426197361-19290-5-git-send-email-maxime.coquelin@st.com>

On 03/12/2015 10:55 PM, Maxime Coquelin wrote:
> From: Maxime Coquelin <mcoquelin.stm32@gmail.com>
>
> This patch adds clocksource support for ARMv7-M's System timer,
> also known as SysTick.
>
> Signed-off-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>

Hi Maxime,

the driver looks good. Three comments below.

   -- Daniel

> ---
>   drivers/clocksource/Kconfig          |  7 ++++
>   drivers/clocksource/Makefile         |  1 +
>   drivers/clocksource/armv7m_systick.c | 78 ++++++++++++++++++++++++++++++++++++
>   3 files changed, 86 insertions(+)
>   create mode 100644 drivers/clocksource/armv7m_systick.c
>
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index 1c2506f..b82e58b 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -134,6 +134,13 @@ config CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
>   	help
>   	 Use ARM global timer clock source as sched_clock
>
> +config ARMV7M_SYSTICK
> +	bool
> +	select CLKSRC_OF if OF
> +	select CLKSRC_MMIO
> +	help
> +	  This options enables support for the ARMv7M system timer unit
> +
>   config ATMEL_PIT
>   	select CLKSRC_OF if OF
>   	def_bool SOC_AT91SAM9 || SOC_SAMA5
> diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
> index 752d5c7..1c9a643 100644
> --- a/drivers/clocksource/Makefile
> +++ b/drivers/clocksource/Makefile
> @@ -44,6 +44,7 @@ obj-$(CONFIG_MTK_TIMER)		+= mtk_timer.o
>
>   obj-$(CONFIG_ARM_ARCH_TIMER)		+= arm_arch_timer.o
>   obj-$(CONFIG_ARM_GLOBAL_TIMER)		+= arm_global_timer.o
> +obj-$(CONFIG_ARMV7M_SYSTICK)		+= armv7m_systick.o
>   obj-$(CONFIG_CLKSRC_METAG_GENERIC)	+= metag_generic.o
>   obj-$(CONFIG_ARCH_HAS_TICK_BROADCAST)	+= dummy_timer.o
>   obj-$(CONFIG_ARCH_KEYSTONE)		+= timer-keystone.o
> diff --git a/drivers/clocksource/armv7m_systick.c b/drivers/clocksource/armv7m_systick.c
> new file mode 100644
> index 0000000..23d8249
> --- /dev/null
> +++ b/drivers/clocksource/armv7m_systick.c
> @@ -0,0 +1,78 @@
> +/*
> + * Copyright (C) Maxime Coquelin 2015
> + * Author:  Maxime Coquelin <mcoquelin.stm32@gmail.com>
> + * License terms:  GNU General Public License (GPL), version 2
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/clocksource.h>
> +#include <linux/clockchips.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/clk.h>
> +#include <linux/bitops.h>
> +
> +#define SYST_CSR	0x00
> +#define SYST_RVR	0x04
> +#define SYST_CVR	0x08
> +#define SYST_CALIB	0x0c
> +
> +#define SYST_CSR_ENABLE BIT(0)
> +
> +#define SYSTICK_LOAD_RELOAD_MASK 0x00FFFFFF
> +
> +static void __init system_timer_of_register(struct device_node *np)
> +{
> +	struct clk *clk;
> +	void __iomem *base;
> +	u32 rate = 0;
> +	int ret;
> +
> +	base = of_iomap(np, 0);
> +	if (!base) {
> +		pr_warn("system-timer: invalid base address\n");
> +		return;
> +	}
> +
> +	clk = of_clk_get(np, 0);
> +	if (!IS_ERR(clk)) {
> +		ret = clk_prepare_enable(clk);
> +		if (ret) {
> +			clk_put(clk);
> +			goto out_unmap;
> +		}
> +
> +		rate = clk_get_rate(clk);
> +	}
> +
> +	/* If no clock found, try to get clock-frequency property */
> +	if (!rate) {
> +		ret = of_property_read_u32(np, "clock-frequency", &rate);
> +		if (ret)
> +			goto out_unmap;

Shouldn't be 'goto out_clk_disable' ?

> +	}
> +
> +	writel_relaxed(SYSTICK_LOAD_RELOAD_MASK, base + SYST_RVR);
> +	writel_relaxed(SYST_CSR_ENABLE, base + SYST_CSR);
> +
> +	ret = clocksource_mmio_init(base + SYST_CVR, "arm_system_timer", rate,
> +			200, 24, clocksource_mmio_readl_down);
> +	if (ret) {
> +		pr_err("failed to init clocksource (%d)\n", ret);
> +		goto out_clk_disable;
> +	}
> +
> +	pr_info("ARM System timer initialized as clocksource\n");
> +
> +	return;
> +
> +out_clk_disable:
> +	if (!IS_ERR(clk))

Why do you need this check ?

It isn't missing a clk_put ?

> +		clk_disable_unprepare(clk);
> +out_unmap:
> +	iounmap(base);
> +	WARN(ret, "ARM System timer register failed (%d)\n", ret);
> +}
> +
> +CLOCKSOURCE_OF_DECLARE(arm_systick, "arm,armv7m-systick",
> +			system_timer_of_register);
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5] Add virtio-input driver.
From: Gerd Hoffmann @ 2015-03-26 10:49 UTC (permalink / raw)
  To: virtio-dev, virtualization
  Cc: Dmitry Torokhov, mst, open list:ABI/API, open list,
	David Herrmann

virtio-input is basically evdev-events-over-virtio, so this driver isn't
much more than reading configuration from config space and forwarding
incoming events to the linux input layer.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 MAINTAINERS                       |   6 +
 drivers/virtio/Kconfig            |  10 +
 drivers/virtio/Makefile           |   1 +
 drivers/virtio/virtio_input.c     | 384 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_input.h |  76 ++++++++
 7 files changed, 479 insertions(+)
 create mode 100644 drivers/virtio/virtio_input.c
 create mode 100644 include/uapi/linux/virtio_input.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 358eb01..6f233dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10442,6 +10442,12 @@ S:	Maintained
 F:	drivers/vhost/
 F:	include/uapi/linux/vhost.h
 
+VIRTIO INPUT DRIVER
+M:	Gerd Hoffmann <kraxel@redhat.com>
+S:	Maintained
+F:	drivers/virtio/virtio_input.c
+F:	include/uapi/linux/virtio_input.h
+
 VIA RHINE NETWORK DRIVER
 M:	Roger Luethi <rl@hellgate.ch>
 S:	Maintained
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index b546da5..cab9f3f 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -48,6 +48,16 @@ config VIRTIO_BALLOON
 
 	 If unsure, say M.
 
+config VIRTIO_INPUT
+	tristate "Virtio input driver"
+	depends on VIRTIO
+	depends on INPUT
+	---help---
+	 This driver supports virtio input devices such as
+	 keyboards, mice and tablets.
+
+	 If unsure, say M.
+
  config VIRTIO_MMIO
 	tristate "Platform bus driver for memory mapped virtio devices"
 	depends on HAS_IOMEM
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index d85565b..41e30e3 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
new file mode 100644
index 0000000..60e2a16
--- /dev/null
+++ b/drivers/virtio/virtio_input.c
@@ -0,0 +1,384 @@
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/input.h>
+
+#include <uapi/linux/virtio_ids.h>
+#include <uapi/linux/virtio_input.h>
+
+struct virtio_input {
+	struct virtio_device       *vdev;
+	struct input_dev           *idev;
+	char                       name[64];
+	char                       serial[64];
+	char                       phys[64];
+	struct virtqueue           *evt, *sts;
+	struct virtio_input_event  evts[64];
+	spinlock_t                 lock;
+	bool                       ready;
+};
+
+static void virtinput_queue_evtbuf(struct virtio_input *vi,
+				   struct virtio_input_event *evtbuf)
+{
+	struct scatterlist sg[1];
+
+	sg_init_one(sg, evtbuf, sizeof(*evtbuf));
+	virtqueue_add_inbuf(vi->evt, sg, 1, evtbuf, GFP_ATOMIC);
+}
+
+static void virtinput_recv_events(struct virtqueue *vq)
+{
+	struct virtio_input *vi = vq->vdev->priv;
+	struct virtio_input_event *event;
+	unsigned long flags;
+	unsigned int len;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	if (vi->ready) {
+		while ((event = virtqueue_get_buf(vi->evt, &len)) != NULL) {
+			spin_unlock_irqrestore(&vi->lock, flags);
+			input_event(vi->idev,
+				    le16_to_cpu(event->type),
+				    le16_to_cpu(event->code),
+				    le32_to_cpu(event->value));
+			spin_lock_irqsave(&vi->lock, flags);
+			virtinput_queue_evtbuf(vi, event);
+		}
+		virtqueue_kick(vq);
+	}
+	spin_unlock_irqrestore(&vi->lock, flags);
+}
+
+/*
+ * On error we are losing the status update, which isn't critical as
+ * this is typically used for stuff like keyboard leds.
+ */
+static int virtinput_send_status(struct virtio_input *vi,
+				 u16 type, u16 code, s32 value)
+{
+	struct virtio_input_event *stsbuf;
+	struct scatterlist sg[1];
+	unsigned long flags;
+	int rc;
+
+	stsbuf = kzalloc(sizeof(*stsbuf), GFP_ATOMIC);
+	if (!stsbuf)
+		return -ENOMEM;
+
+	stsbuf->type  = cpu_to_le16(type);
+	stsbuf->code  = cpu_to_le16(code);
+	stsbuf->value = cpu_to_le32(value);
+	sg_init_one(sg, stsbuf, sizeof(*stsbuf));
+
+	spin_lock_irqsave(&vi->lock, flags);
+	if (vi->ready) {
+		rc = virtqueue_add_outbuf(vi->sts, sg, 1, stsbuf, GFP_ATOMIC);
+		virtqueue_kick(vi->sts);
+	} else {
+		rc = -ENODEV;
+	}
+	spin_unlock_irqrestore(&vi->lock, flags);
+
+	if (rc != 0)
+		kfree(stsbuf);
+	return rc;
+}
+
+static void virtinput_recv_status(struct virtqueue *vq)
+{
+	struct virtio_input *vi = vq->vdev->priv;
+	struct virtio_input_event *stsbuf;
+	unsigned long flags;
+	unsigned int len;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	while ((stsbuf = virtqueue_get_buf(vi->sts, &len)) != NULL)
+		kfree(stsbuf);
+	spin_unlock_irqrestore(&vi->lock, flags);
+}
+
+static int virtinput_status(struct input_dev *idev, unsigned int type,
+			    unsigned int code, int value)
+{
+	struct virtio_input *vi = input_get_drvdata(idev);
+
+	return virtinput_send_status(vi, type, code, value);
+}
+
+static u8 virtinput_cfg_select(struct virtio_input *vi,
+			       u8 select, u8 subsel)
+{
+	u8 size;
+
+	virtio_cwrite(vi->vdev, struct virtio_input_config, select, &select);
+	virtio_cwrite(vi->vdev, struct virtio_input_config, subsel, &subsel);
+	virtio_cread(vi->vdev, struct virtio_input_config, size, &size);
+	return size;
+}
+
+static void virtinput_cfg_bits(struct virtio_input *vi, int select, int subsel,
+			       unsigned long *bits, unsigned int bitcount)
+{
+	unsigned int bit;
+	u8 *virtio_bits;
+	u8 bytes;
+
+	bytes = virtinput_cfg_select(vi, select, subsel);
+	if (!bytes)
+		return;
+	if (bitcount > bytes * 8)
+		bitcount = bytes * 8;
+
+	/*
+	 * Bitmap in virtio config space is a simple stream of bytes,
+	 * with the first byte carrying bits 0-7, second bits 8-15 and
+	 * so on.
+	 */
+	virtio_bits = kzalloc(bytes, GFP_KERNEL);
+	if (!virtio_bits)
+		return;
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config,
+					      u.bitmap),
+			   virtio_bits, bytes);
+	for (bit = 0; bit < bitcount; bit++) {
+		if (virtio_bits[bit / 8] & (1 << (bit % 8)))
+			__set_bit(bit, bits);
+	}
+	kfree(virtio_bits);
+
+	if (select == VIRTIO_INPUT_CFG_EV_BITS)
+		__set_bit(subsel, vi->idev->evbit);
+}
+
+static void virtinput_cfg_abs(struct virtio_input *vi, int abs)
+{
+	u32 mi, ma, re, fu, fl;
+
+	virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ABS_INFO, abs);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.min, &mi);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.max, &ma);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.res, &re);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.fuzz, &fu);
+	virtio_cread(vi->vdev, struct virtio_input_config, u.abs.flat, &fl);
+	input_set_abs_params(vi->idev, abs, mi, ma, fu, fl);
+	input_abs_set_res(vi->idev, abs, re);
+}
+
+static int virtinput_init_vqs(struct virtio_input *vi)
+{
+	struct virtqueue *vqs[2];
+	vq_callback_t *cbs[] = { virtinput_recv_events,
+				 virtinput_recv_status };
+	static const char *names[] = { "events", "status" };
+	int err;
+
+	err = vi->vdev->config->find_vqs(vi->vdev, 2, vqs, cbs, names);
+	if (err)
+		return err;
+	vi->evt = vqs[0];
+	vi->sts = vqs[1];
+
+	return 0;
+}
+
+static void virtinput_fill_evt(struct virtio_input *vi)
+{
+	unsigned long flags;
+	int i, size;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	size = virtqueue_get_vring_size(vi->evt);
+	if (size > ARRAY_SIZE(vi->evts))
+		size = ARRAY_SIZE(vi->evts);
+	for (i = 0; i < size; i++)
+		virtinput_queue_evtbuf(vi, &vi->evts[i]);
+	virtqueue_kick(vi->evt);
+	spin_unlock_irqrestore(&vi->lock, flags);
+}
+
+static int virtinput_probe(struct virtio_device *vdev)
+{
+	struct virtio_input *vi;
+	unsigned long flags;
+	size_t size;
+	int abs, err;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+		return -ENODEV;
+
+	vi = kzalloc(sizeof(*vi), GFP_KERNEL);
+	if (!vi)
+		return -ENOMEM;
+
+	vdev->priv = vi;
+	vi->vdev = vdev;
+	spin_lock_init(&vi->lock);
+
+	err = virtinput_init_vqs(vi);
+	if (err)
+		goto err_init_vq;
+
+	vi->idev = input_allocate_device();
+	if (!vi->idev) {
+		err = -ENOMEM;
+		goto err_input_alloc;
+	}
+	input_set_drvdata(vi->idev, vi);
+
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_NAME, 0);
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config,
+					      u.string),
+			   vi->name, min(size, sizeof(vi->name)));
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_SERIAL, 0);
+	virtio_cread_bytes(vi->vdev, offsetof(struct virtio_input_config,
+					      u.string),
+			   vi->serial, min(size, sizeof(vi->serial)));
+	snprintf(vi->phys, sizeof(vi->phys),
+		 "virtio%d/input0", vdev->index);
+	vi->idev->name = vi->name;
+	vi->idev->phys = vi->phys;
+	vi->idev->uniq = vi->serial;
+
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_ID_DEVIDS, 0);
+	if (size >= sizeof(struct virtio_input_devids)) {
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.bustype, &vi->idev->id.bustype);
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.vendor, &vi->idev->id.vendor);
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.product, &vi->idev->id.product);
+		virtio_cread(vi->vdev, struct virtio_input_config,
+			     u.ids.version, &vi->idev->id.version);
+	} else {
+		vi->idev->id.bustype = BUS_VIRTUAL;
+	}
+
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_PROP_BITS, 0,
+			   vi->idev->propbit, INPUT_PROP_CNT);
+	size = virtinput_cfg_select(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REP);
+	if (size)
+		__set_bit(EV_REP, vi->idev->evbit);
+
+	vi->idev->dev.parent = &vdev->dev;
+	vi->idev->event = virtinput_status;
+
+	/* device -> kernel */
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_KEY,
+			   vi->idev->keybit, KEY_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_REL,
+			   vi->idev->relbit, REL_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_ABS,
+			   vi->idev->absbit, ABS_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_MSC,
+			   vi->idev->mscbit, MSC_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SW,
+			   vi->idev->swbit,  SW_CNT);
+
+	/* kernel -> device */
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_LED,
+			   vi->idev->ledbit, LED_CNT);
+	virtinput_cfg_bits(vi, VIRTIO_INPUT_CFG_EV_BITS, EV_SND,
+			   vi->idev->sndbit, SND_CNT);
+
+	if (test_bit(EV_ABS, vi->idev->evbit)) {
+		for (abs = 0; abs < ABS_CNT; abs++) {
+			if (!test_bit(abs, vi->idev->absbit))
+				continue;
+			virtinput_cfg_abs(vi, abs);
+		}
+	}
+
+	virtio_device_ready(vdev);
+	vi->ready = true;
+	err = input_register_device(vi->idev);
+	if (err)
+		goto err_input_register;
+
+	virtinput_fill_evt(vi);
+	return 0;
+
+err_input_register:
+	spin_lock_irqsave(&vi->lock, flags);
+	vi->ready = false;
+	spin_unlock_irqrestore(&vi->lock, flags);
+	input_free_device(vi->idev);
+err_input_alloc:
+	vdev->config->del_vqs(vdev);
+err_init_vq:
+	kfree(vi);
+	return err;
+}
+
+static void virtinput_remove(struct virtio_device *vdev)
+{
+	struct virtio_input *vi = vdev->priv;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	vi->ready = false;
+	spin_unlock_irqrestore(&vi->lock, flags);
+
+	input_unregister_device(vi->idev);
+	vdev->config->del_vqs(vdev);
+	kfree(vi);
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int virtinput_freeze(struct virtio_device *vdev)
+{
+	struct virtio_input *vi = vdev->priv;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vi->lock, flags);
+	vi->ready = false;
+	spin_unlock_irqrestore(&vi->lock, flags);
+
+	vdev->config->del_vqs(vdev);
+	return 0;
+}
+
+static int virtinput_restore(struct virtio_device *vdev)
+{
+	struct virtio_input *vi = vdev->priv;
+	int err;
+
+	err = virtinput_init_vqs(vi);
+	if (err)
+		return err;
+
+	virtio_device_ready(vdev);
+	vi->ready = true;
+	virtinput_fill_evt(vi);
+	return 0;
+}
+#endif
+
+static unsigned int features[] = {
+	/* none */
+};
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_INPUT, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_input_driver = {
+	.driver.name         = KBUILD_MODNAME,
+	.driver.owner        = THIS_MODULE,
+	.feature_table       = features,
+	.feature_table_size  = ARRAY_SIZE(features),
+	.id_table            = id_table,
+	.probe               = virtinput_probe,
+	.remove              = virtinput_remove,
+#ifdef CONFIG_PM_SLEEP
+	.freeze	             = virtinput_freeze,
+	.restore             = virtinput_restore,
+#endif
+};
+
+module_virtio_driver(virtio_input_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Virtio input device driver");
+MODULE_AUTHOR("Gerd Hoffmann <kraxel@redhat.com>");
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 68ceb97..04b829e 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -430,6 +430,7 @@ header-y += virtio_blk.h
 header-y += virtio_config.h
 header-y += virtio_console.h
 header-y += virtio_ids.h
+header-y += virtio_input.h
 header-y += virtio_net.h
 header-y += virtio_pci.h
 header-y += virtio_ring.h
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 284fc3a..5f60aa4 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -39,5 +39,6 @@
 #define VIRTIO_ID_9P		9 /* 9p virtio console */
 #define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
 #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
+#define VIRTIO_ID_INPUT        18 /* virtio input */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_input.h b/include/uapi/linux/virtio_input.h
new file mode 100644
index 0000000..a7fe5c8
--- /dev/null
+++ b/include/uapi/linux/virtio_input.h
@@ -0,0 +1,76 @@
+#ifndef _LINUX_VIRTIO_INPUT_H
+#define _LINUX_VIRTIO_INPUT_H
+/* This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE. */
+
+#include <linux/types.h>
+
+enum virtio_input_config_select {
+	VIRTIO_INPUT_CFG_UNSET      = 0x00,
+	VIRTIO_INPUT_CFG_ID_NAME    = 0x01,
+	VIRTIO_INPUT_CFG_ID_SERIAL  = 0x02,
+	VIRTIO_INPUT_CFG_ID_DEVIDS  = 0x03,
+	VIRTIO_INPUT_CFG_PROP_BITS  = 0x10,
+	VIRTIO_INPUT_CFG_EV_BITS    = 0x11,
+	VIRTIO_INPUT_CFG_ABS_INFO   = 0x12,
+};
+
+struct virtio_input_absinfo {
+	__u32 min;
+	__u32 max;
+	__u32 fuzz;
+	__u32 flat;
+	__u32 res;
+};
+
+struct virtio_input_devids {
+	__u16 bustype;
+	__u16 vendor;
+	__u16 product;
+	__u16 version;
+};
+
+struct virtio_input_config {
+	__u8    select;
+	__u8    subsel;
+	__u8    size;
+	__u8    reserved[5];
+	union {
+		char string[128];
+		__u8 bitmap[128];
+		struct virtio_input_absinfo abs;
+		struct virtio_input_devids ids;
+	} u;
+};
+
+struct virtio_input_event {
+	__le16 type;
+	__le16 code;
+	__le32 value;
+};
+
+#endif /* _LINUX_VIRTIO_INPUT_H */
-- 
1.8.3.1

^ permalink raw reply related

* Re: [virtio-dev] Re: [PATCH] Add virtio gpu driver.
From: Gerd Hoffmann @ 2015-03-26 11:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, open list:ABI/API, Rusty Russell, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <20150326095736-mutt-send-email-mst@redhat.com>

  Hi,

> Absolutely, it's pretty common to mix regions in a BAR.
> For example, we have virtio kick (ioeventfd backed,
> handled in kernel) in same BAR as common and device
> specific configuration.

> We did the same thing you are now doing with the
> virtio BAR, and now we have to maintain two code
> bases, virtio pci config was designed to be future proof
> so why not use it?

It's not about virtio at all.  It's about vga compatibility, so we have
a simple framebuffer as boot display.  Only used when virtio is *not*
enabled.

> This is mostly just making sure we don't paint ourselves into a corner.

It's a simple memory bar.  vga cards have that since pci was invented
(standalone ones, chipset graphics aside), and there havn't been
fundamental changes ...

cheers,
  Gerd


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox