qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
@ 2015-07-13 20:09 Gabriel L. Somlo
  2015-07-13 23:03 ` Eric Blake
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-13 20:09 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: matt.fleming, mdroth, rjones, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini

Hi,

A while ago I was pondering on the options available for retrieving
a fw_cfg blob from the guest-side (now that we can insert fw_cfg
files on the host-side command line, see commit 81b2b8106).

So over the last couple of weekends I cooked up the sysfs kernel module
below, which lists all fw_cfg files under /sys/firmware/fw_cfg/<filename>.

I'm building it against the current Fedora (22) running kernel (after
installing the kernel-devel rpm) using the following Makefile:

obj-m := fw_cfg.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

I'm looking for early feedback before trying to push this into the
kernel:

1. Right now, only fw_cfg *files* are listed (not sure it's worth
   trying for the pre-FW_CFG_FILE_FIRST blobs, since AFAICT there's
   no good way of figuring out their size, not to mention even
   knowing which ones are present in the first place.

2. File names are listed in /sys/fs/fw_cfg/... with slashes replaced
   exclamation marks, e.g.:

# ls /sys/firmware/fw_cfg/
bootorder           etc!e820                  etc!table-loader
etc!acpi!rsdp       etc!smbios!smbios-anchor  genroms!kvmvapic.bin
etc!acpi!tables     etc!smbios!smbios-tables
etc!boot-fail-wait  etc!system-states

   That's done automatically by kobject_init_and_add(), and I'm hoping
   it's acceptable, since the alternative involves parsing file names
   and building some sort of hierarchy of ksets representing subfolders
   like "etc", "etc/smbios/", "etc/acpi/", "opt/whatever/", etc.

3. I'm currently only handling x86 and I/O ports. I could drop the
   fw_cfg_dmi_whitelist and just check the signature, using mmio where
   appropriate, but I don't have a handy-dandy set of VMs for those
   architectures on which I could test. Wondering if that's something
   we should have before I officially try to submit this to the kernel,
   or whether it could wait for a second iteration.

   Speaking of the kernel: My default plan is to subscribe to
   kernelnewbies@kernelnewbies.org and submit it there (this is after
   all baby's first kernel module :) but if there's a better route for
   pushing it upstream, please feel free to suggest it to me.

Thanks much,
--Gabriel

PS. This still leaves me with the open question of how to do something
    similar on Windows, but I'm less bothered by the concept of compiling
    an in-house, ad-hoc, binary-from-hell program to simply read/write from
    the fw_cfg I/O ports on Windows :) Although in principle it'd be
    nice to have a standard, cross-platform way of doing things, likely
    involving the guest agent...


/* fw_cfg.c
 *
 * Expose entries from QEMU's firmware configuration (fw_cfg) device in
 * sysfs (read-only, under "/sys/firmware/fw_cfg/<fw_cfg_filename>").
 *
 * NOTE: '/' chars in fw_cfg file names automatically converted to '!' by
 *       the kobject_init_and_add() call.
 */

#include <linux/module.h>
#include <linux/capability.h>
#include <linux/slab.h>
#include <linux/dmi.h>


/* fw_cfg i/o ports */
#define FW_CFG_PORT_CTL  0x510
#define FW_CFG_PORT_DATA 0x511

/* selector values for "well-known" fw_cfg entries */
#define FW_CFG_SIGNATURE  0x00
#define FW_CFG_FILE_DIR   0x19

/* size in bytes of fw_cfg signature */
#define FW_CFG_SIG_SIZE 4

/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
#define FW_CFG_MAX_FILE_PATH 56

/* fw_cfg file directory entry type */
struct fw_cfg_file {
	uint32_t size;
	uint16_t select;
	uint16_t reserved;
	char name[FW_CFG_MAX_FILE_PATH];
};


/* provide atomic read access to hardware fw_cfg device
 * (critical section involves potentially lengthy i/o, using mutex) */
static DEFINE_MUTEX(fw_cfg_dev_lock);

/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
static inline void fw_cfg_read_blob(uint16_t select,
				     void *buf, loff_t pos, size_t count)
{
	mutex_lock(&fw_cfg_dev_lock);
	outw(select, FW_CFG_PORT_CTL);
	while (pos-- > 0)
		inb(FW_CFG_PORT_DATA);
	insb(FW_CFG_PORT_DATA, buf, count);
	mutex_unlock(&fw_cfg_dev_lock);
}


/* fw_cfg_sysfs_entry type */
struct fw_cfg_sysfs_entry {
	struct kobject kobj;
	struct fw_cfg_file f;
	struct list_head list;
};

/* get fw_cfg_sysfs_entry from kobject member */
static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
{
	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
}


/* fw_cfg_sysfs_attribute type */
struct fw_cfg_sysfs_attribute {
	struct attribute attr;
	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
};

/* get fw_cfg_sysfs_attribute from attribute member */
static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
{
	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
}


/* global cache of fw_cfg_sysfs_entry objects */
static LIST_HEAD(fw_cfg_entry_cache);

/* kobjects removed lazily by the kernel, so we need mutual exclusion;
 * (critical section is super-short, using spinlock) */
static DEFINE_SPINLOCK(fw_cfg_cache_lock);

static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
{
	spin_lock(&fw_cfg_cache_lock);
	list_add_tail(&entry->list, &fw_cfg_entry_cache);
	spin_unlock(&fw_cfg_cache_lock);
}

static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
{
	spin_lock(&fw_cfg_cache_lock);
	list_del(&entry->list);
	spin_unlock(&fw_cfg_cache_lock);
}

static void fw_cfg_sysfs_cache_cleanup(void)
{
	struct fw_cfg_sysfs_entry *entry, *next;

	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
		/* will end up invoking fw_cfg_sysfs_cache_delist()
	         * via each object's release() method (i.e. destructor) */
		kobject_put(&entry->kobj);
	}
}


/* default_attrs: per-entry attributes and show methods */

#define FW_CFG_SYSFS_ATTR(_attr) \
struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
	.attr = { .name = __stringify(_attr), .mode = 0400 }, \
	.show = fw_cfg_sysfs_show_##_attr, \
}

static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%d\n", e->f.size);
}

static ssize_t fw_cfg_sysfs_show_select(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%d\n", e->f.select);
}

static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%s\n", e->f.name);
}

static FW_CFG_SYSFS_ATTR(size);
static FW_CFG_SYSFS_ATTR(select);
static FW_CFG_SYSFS_ATTR(name);

static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
	&fw_cfg_sysfs_attr_size.attr,
	&fw_cfg_sysfs_attr_select.attr,
	&fw_cfg_sysfs_attr_name.attr,
	NULL,
};

/* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
				      char *buf)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
	struct fw_cfg_sysfs_attribute *attr = to_attr(a);

	if (!capable(CAP_SYS_ADMIN))
		return -EACCES;

	return attr->show(entry, buf);
}

static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
	.show = fw_cfg_sysfs_attr_show,
};


/* release: destructor, to be called via kobject_put() */
static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

	fw_cfg_sysfs_cache_delist(entry);
	kfree(entry);
}

/* kobj_type: ties together all properties required to register an entry */
static struct kobj_type fw_cfg_sysfs_entry_ktype = {
	.default_attrs = fw_cfg_sysfs_entry_attrs,
	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
	.release = &fw_cfg_sysfs_release_entry,
};


/* raw-read method and attribute */
static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
				     struct bin_attribute *bin_attr,
				     char *buf, loff_t pos, size_t count)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

	if (!capable(CAP_SYS_ADMIN))
		return -EACCES;

	if (pos > entry->f.size)
		return -EINVAL;

	if (count > entry->f.size - pos)
		count = entry->f.size - pos;

	fw_cfg_read_blob(entry->f.select, buf, pos, count);
	return count;
}

static struct bin_attribute fw_cfg_sysfs_attr_raw = {
	.attr = { .name = "raw", .mode = 0400 },
	.read = fw_cfg_sysfs_read_raw,
};


/* whitelist only hardware likely to have a fw_cfg device */
static int fw_cfg_dmi_match(const struct dmi_system_id *id)
{
	return 1;
}

static __initdata struct dmi_system_id fw_cfg_whitelist[] = {
	{ fw_cfg_dmi_match, "QEMU Standard PC", {
	  DMI_MATCH(DMI_SYS_VENDOR, "QEMU"),
	  DMI_MATCH(DMI_PRODUCT_NAME, "Standard PC") },
	},
	{ .ident = NULL }
};


/* object set to represent fw_cfg sysfs subfolder */
static struct kset *fw_cfg_kset;

static int __init fw_cfg_sysfs_init(void)
{
	int ret = 0;
	uint32_t count, i;
	char sig[FW_CFG_SIG_SIZE];
	struct fw_cfg_sysfs_entry *entry;

	/* only install on matching "hardware" */
	if (!dmi_check_system(fw_cfg_whitelist)) {
		pr_warn("Supported QEMU Standard PC hardware not found!\n");
		return -ENODEV;
	}
	
	/* verify fw_cfg device signature */
	fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
	if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
		pr_warn("QEMU fw_cfg signature not found!\n");
		return -ENODEV;
	}

	/* create fw_cfg folder in sysfs */
	fw_cfg_kset = kset_create_and_add("fw_cfg", NULL, firmware_kobj);
	if (!fw_cfg_kset)
		return -ENOMEM;

	/* process fw_cfg file directory entry */
	mutex_lock(&fw_cfg_dev_lock);
	outw(FW_CFG_FILE_DIR, FW_CFG_PORT_CTL);	/* select fw_cfg directory */
	insb(FW_CFG_PORT_DATA, &count, sizeof(count)); /* get file count */
	count = be32_to_cpu(count);
	/* create & register a sysfs node for each file in the directory */
	for (i = 0; i < count; i++) {
		/* allocate new entry */
		entry = kzalloc(sizeof(*entry), GFP_KERNEL);
		if (!entry) {
			ret = -ENOMEM;
			break;
		}

		/* acquire file information from directory */
		insb(FW_CFG_PORT_DATA, &entry->f, sizeof(struct fw_cfg_file));
		entry->f.size = be32_to_cpu(entry->f.size);
		entry->f.select = be16_to_cpu(entry->f.select);

		/* register sysfs entry for this file */
		entry->kobj.kset = fw_cfg_kset;
		ret = kobject_init_and_add(&entry->kobj,
					   &fw_cfg_sysfs_entry_ktype, NULL,
					   /* NOTE: '/' represented as '!' */
					   "%s", entry->f.name);
		if (ret)
			break;

		/* add raw binary content access */
		ret = sysfs_create_bin_file(&entry->kobj,
					    &fw_cfg_sysfs_attr_raw);
		if (ret)
			break;

		/* add entry to global cache */
		fw_cfg_sysfs_cache_enlist(entry);
	}
	mutex_unlock(&fw_cfg_dev_lock);

	if (ret) {
		fw_cfg_sysfs_cache_cleanup();
		kset_unregister(fw_cfg_kset);
	} else
		pr_debug("fw_cfg: loaded.\n");

	return ret;
}

static void __exit fw_cfg_sysfs_exit(void)
{
	pr_debug("fw_cfg: unloading.\n");
	fw_cfg_sysfs_cache_cleanup();
	kset_unregister(fw_cfg_kset);
}

module_init(fw_cfg_sysfs_init);
module_exit(fw_cfg_sysfs_exit);
MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
MODULE_LICENSE("GPL");
MODULE_DEVICE_TABLE(dmi, fw_cfg_whitelist);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-13 20:09 [Qemu-devel] RFC: guest-side retrieval of fw_cfg file Gabriel L. Somlo
@ 2015-07-13 23:03 ` Eric Blake
  2015-07-14 17:00   ` Gabriel L. Somlo
  2015-07-14  9:43 ` Richard W.M. Jones
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Eric Blake @ 2015-07-13 23:03 UTC (permalink / raw)
  To: Gabriel L. Somlo, Laszlo Ersek
  Cc: matt.fleming, mdroth, rjones, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

On 07/13/2015 02:09 PM, Gabriel L. Somlo wrote:
> 2. File names are listed in /sys/fs/fw_cfg/... with slashes replaced
>    exclamation marks, e.g.:

Instead of inventing yet another escaping mechanism, can you mimic an
already existing convention such as systemd's escaping?

http://www.freedesktop.org/software/systemd/man/systemd-escape.html

> 
> # ls /sys/firmware/fw_cfg/
> bootorder           etc!e820                  etc!table-loader
> etc!acpi!rsdp       etc!smbios!smbios-anchor  genroms!kvmvapic.bin
> etc!acpi!tables     etc!smbios!smbios-tables
> etc!boot-fail-wait  etc!system-states

would name these files along the lines of 'etc-e820' and
'etc-boot\x2dfail\x2dwait', and the end user can then use systemd to
unescape the names.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-13 20:09 [Qemu-devel] RFC: guest-side retrieval of fw_cfg file Gabriel L. Somlo
  2015-07-13 23:03 ` Eric Blake
@ 2015-07-14  9:43 ` Richard W.M. Jones
  2015-07-14 18:23   ` Gabriel L. Somlo
  2015-07-16  0:43   ` Gabriel L. Somlo
  2015-07-14 11:38 ` Laszlo Ersek
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2015-07-14  9:43 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> 3. I'm currently only handling x86 and I/O ports. I could drop the
>    fw_cfg_dmi_whitelist and just check the signature, using mmio where
>    appropriate, but I don't have a handy-dandy set of VMs for those
>    architectures on which I could test. Wondering if that's something
>    we should have before I officially try to submit this to the kernel,
>    or whether it could wait for a second iteration.

  $ virt-builder --arch armv7l fedora-22
or:
  $ virt-builder --arch aarch64 fedora-22
then:
  $ virt-builder --get-kernel fedora-22.img

and then boot is using the right qemu command, probably something
like:

  $ qemu-system-arm \
      -M virt,accel=tcg \
      -cpu cortex-a15 \
      -kernel vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
      -initrd initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
      -append "console=ttyAMA0 root=/dev/vda3 ro" \
      -drive file=fedora-22.img,if=none,id=hd \
      -device virtio-blk-device,drive=hd \
      -serial stdio

The root password is printed in virt-builder output.

> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static inline void fw_cfg_read_blob(uint16_t select,
> 				     void *buf, loff_t pos, size_t count)
> {
> 	mutex_lock(&fw_cfg_dev_lock);
> 	outw(select, FW_CFG_PORT_CTL);
> 	while (pos-- > 0)
> 		inb(FW_CFG_PORT_DATA);
> 	insb(FW_CFG_PORT_DATA, buf, count);
> 	mutex_unlock(&fw_cfg_dev_lock);
> }

How slow is this?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-13 20:09 [Qemu-devel] RFC: guest-side retrieval of fw_cfg file Gabriel L. Somlo
  2015-07-13 23:03 ` Eric Blake
  2015-07-14  9:43 ` Richard W.M. Jones
@ 2015-07-14 11:38 ` Laszlo Ersek
  2015-07-15 12:00 ` Matt Fleming
  2015-07-15 14:08 ` Michael S. Tsirkin
  4 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2015-07-14 11:38 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: matt.fleming, mdroth, rjones, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini

On 07/13/15 22:09, Gabriel L. Somlo wrote:
> Hi,
> 
> A while ago I was pondering on the options available for retrieving
> a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> files on the host-side command line, see commit 81b2b8106).
> 
> So over the last couple of weekends I cooked up the sysfs kernel module
> below, which lists all fw_cfg files under /sys/firmware/fw_cfg/<filename>.
> 
> I'm building it against the current Fedora (22) running kernel (after
> installing the kernel-devel rpm) using the following Makefile:
> 
> obj-m := fw_cfg.o
> KDIR := /lib/modules/$(shell uname -r)/build
> PWD := $(shell pwd)
> default:
> 	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
> 
> I'm looking for early feedback before trying to push this into the
> kernel:
> 
> 1. Right now, only fw_cfg *files* are listed (not sure it's worth
>    trying for the pre-FW_CFG_FILE_FIRST blobs, since AFAICT there's
>    no good way of figuring out their size, not to mention even
>    knowing which ones are present in the first place.
> 
> 2. File names are listed in /sys/fs/fw_cfg/... with slashes replaced
>    exclamation marks, e.g.:
> 
> # ls /sys/firmware/fw_cfg/
> bootorder           etc!e820                  etc!table-loader
> etc!acpi!rsdp       etc!smbios!smbios-anchor  genroms!kvmvapic.bin
> etc!acpi!tables     etc!smbios!smbios-tables
> etc!boot-fail-wait  etc!system-states
> 
>    That's done automatically by kobject_init_and_add(), and I'm hoping
>    it's acceptable, since the alternative involves parsing file names
>    and building some sort of hierarchy of ksets representing subfolders
>    like "etc", "etc/smbios/", "etc/acpi/", "opt/whatever/", etc.

In general I hope Matt will respond to you; I have just one note
(although I have no horse in this race :)): I think it would be really
useful if you could somehow translate the fw_cfg "pathname components"
to a real directory structure. Utilities like "find", "dirname",
"basename" etc are generally great, and it would be nice to preserve
that flexibility.

Anyway, this is just an idea (because I like "find" so much :)); feel
free to ignore it.

Thanks!
Laszlo

> 3. I'm currently only handling x86 and I/O ports. I could drop the
>    fw_cfg_dmi_whitelist and just check the signature, using mmio where
>    appropriate, but I don't have a handy-dandy set of VMs for those
>    architectures on which I could test. Wondering if that's something
>    we should have before I officially try to submit this to the kernel,
>    or whether it could wait for a second iteration.
> 
>    Speaking of the kernel: My default plan is to subscribe to
>    kernelnewbies@kernelnewbies.org and submit it there (this is after
>    all baby's first kernel module :) but if there's a better route for
>    pushing it upstream, please feel free to suggest it to me.
> 
> Thanks much,
> --Gabriel
> 
> PS. This still leaves me with the open question of how to do something
>     similar on Windows, but I'm less bothered by the concept of compiling
>     an in-house, ad-hoc, binary-from-hell program to simply read/write from
>     the fw_cfg I/O ports on Windows :) Although in principle it'd be
>     nice to have a standard, cross-platform way of doing things, likely
>     involving the guest agent...
> 
> 
> /* fw_cfg.c
>  *
>  * Expose entries from QEMU's firmware configuration (fw_cfg) device in
>  * sysfs (read-only, under "/sys/firmware/fw_cfg/<fw_cfg_filename>").
>  *
>  * NOTE: '/' chars in fw_cfg file names automatically converted to '!' by
>  *       the kobject_init_and_add() call.
>  */
> 
> #include <linux/module.h>
> #include <linux/capability.h>
> #include <linux/slab.h>
> #include <linux/dmi.h>
> 
> 
> /* fw_cfg i/o ports */
> #define FW_CFG_PORT_CTL  0x510
> #define FW_CFG_PORT_DATA 0x511
> 
> /* selector values for "well-known" fw_cfg entries */
> #define FW_CFG_SIGNATURE  0x00
> #define FW_CFG_FILE_DIR   0x19
> 
> /* size in bytes of fw_cfg signature */
> #define FW_CFG_SIG_SIZE 4
> 
> /* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> #define FW_CFG_MAX_FILE_PATH 56
> 
> /* fw_cfg file directory entry type */
> struct fw_cfg_file {
> 	uint32_t size;
> 	uint16_t select;
> 	uint16_t reserved;
> 	char name[FW_CFG_MAX_FILE_PATH];
> };
> 
> 
> /* provide atomic read access to hardware fw_cfg device
>  * (critical section involves potentially lengthy i/o, using mutex) */
> static DEFINE_MUTEX(fw_cfg_dev_lock);
> 
> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static inline void fw_cfg_read_blob(uint16_t select,
> 				     void *buf, loff_t pos, size_t count)
> {
> 	mutex_lock(&fw_cfg_dev_lock);
> 	outw(select, FW_CFG_PORT_CTL);
> 	while (pos-- > 0)
> 		inb(FW_CFG_PORT_DATA);
> 	insb(FW_CFG_PORT_DATA, buf, count);
> 	mutex_unlock(&fw_cfg_dev_lock);
> }
> 
> 
> /* fw_cfg_sysfs_entry type */
> struct fw_cfg_sysfs_entry {
> 	struct kobject kobj;
> 	struct fw_cfg_file f;
> 	struct list_head list;
> };
> 
> /* get fw_cfg_sysfs_entry from kobject member */
> static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
> {
> 	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
> }
> 
> 
> /* fw_cfg_sysfs_attribute type */
> struct fw_cfg_sysfs_attribute {
> 	struct attribute attr;
> 	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
> };
> 
> /* get fw_cfg_sysfs_attribute from attribute member */
> static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
> {
> 	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
> }
> 
> 
> /* global cache of fw_cfg_sysfs_entry objects */
> static LIST_HEAD(fw_cfg_entry_cache);
> 
> /* kobjects removed lazily by the kernel, so we need mutual exclusion;
>  * (critical section is super-short, using spinlock) */
> static DEFINE_SPINLOCK(fw_cfg_cache_lock);
> 
> static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
> {
> 	spin_lock(&fw_cfg_cache_lock);
> 	list_add_tail(&entry->list, &fw_cfg_entry_cache);
> 	spin_unlock(&fw_cfg_cache_lock);
> }
> 
> static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
> {
> 	spin_lock(&fw_cfg_cache_lock);
> 	list_del(&entry->list);
> 	spin_unlock(&fw_cfg_cache_lock);
> }
> 
> static void fw_cfg_sysfs_cache_cleanup(void)
> {
> 	struct fw_cfg_sysfs_entry *entry, *next;
> 
> 	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
> 		/* will end up invoking fw_cfg_sysfs_cache_delist()
> 	         * via each object's release() method (i.e. destructor) */
> 		kobject_put(&entry->kobj);
> 	}
> }
> 
> 
> /* default_attrs: per-entry attributes and show methods */
> 
> #define FW_CFG_SYSFS_ATTR(_attr) \
> struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
> 	.attr = { .name = __stringify(_attr), .mode = 0400 }, \
> 	.show = fw_cfg_sysfs_show_##_attr, \
> }
> 
> static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
> {
> 	return sprintf(buf, "%d\n", e->f.size);
> }
> 
> static ssize_t fw_cfg_sysfs_show_select(struct fw_cfg_sysfs_entry *e, char *buf)
> {
> 	return sprintf(buf, "%d\n", e->f.select);
> }
> 
> static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
> {
> 	return sprintf(buf, "%s\n", e->f.name);
> }
> 
> static FW_CFG_SYSFS_ATTR(size);
> static FW_CFG_SYSFS_ATTR(select);
> static FW_CFG_SYSFS_ATTR(name);
> 
> static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
> 	&fw_cfg_sysfs_attr_size.attr,
> 	&fw_cfg_sysfs_attr_select.attr,
> 	&fw_cfg_sysfs_attr_name.attr,
> 	NULL,
> };
> 
> /* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
> static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
> 				      char *buf)
> {
> 	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> 	struct fw_cfg_sysfs_attribute *attr = to_attr(a);
> 
> 	if (!capable(CAP_SYS_ADMIN))
> 		return -EACCES;
> 
> 	return attr->show(entry, buf);
> }
> 
> static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
> 	.show = fw_cfg_sysfs_attr_show,
> };
> 
> 
> /* release: destructor, to be called via kobject_put() */
> static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
> {
> 	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> 
> 	fw_cfg_sysfs_cache_delist(entry);
> 	kfree(entry);
> }
> 
> /* kobj_type: ties together all properties required to register an entry */
> static struct kobj_type fw_cfg_sysfs_entry_ktype = {
> 	.default_attrs = fw_cfg_sysfs_entry_attrs,
> 	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
> 	.release = &fw_cfg_sysfs_release_entry,
> };
> 
> 
> /* raw-read method and attribute */
> static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> 				     struct bin_attribute *bin_attr,
> 				     char *buf, loff_t pos, size_t count)
> {
> 	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> 
> 	if (!capable(CAP_SYS_ADMIN))
> 		return -EACCES;
> 
> 	if (pos > entry->f.size)
> 		return -EINVAL;
> 
> 	if (count > entry->f.size - pos)
> 		count = entry->f.size - pos;
> 
> 	fw_cfg_read_blob(entry->f.select, buf, pos, count);
> 	return count;
> }
> 
> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> 	.attr = { .name = "raw", .mode = 0400 },
> 	.read = fw_cfg_sysfs_read_raw,
> };
> 
> 
> /* whitelist only hardware likely to have a fw_cfg device */
> static int fw_cfg_dmi_match(const struct dmi_system_id *id)
> {
> 	return 1;
> }
> 
> static __initdata struct dmi_system_id fw_cfg_whitelist[] = {
> 	{ fw_cfg_dmi_match, "QEMU Standard PC", {
> 	  DMI_MATCH(DMI_SYS_VENDOR, "QEMU"),
> 	  DMI_MATCH(DMI_PRODUCT_NAME, "Standard PC") },
> 	},
> 	{ .ident = NULL }
> };
> 
> 
> /* object set to represent fw_cfg sysfs subfolder */
> static struct kset *fw_cfg_kset;
> 
> static int __init fw_cfg_sysfs_init(void)
> {
> 	int ret = 0;
> 	uint32_t count, i;
> 	char sig[FW_CFG_SIG_SIZE];
> 	struct fw_cfg_sysfs_entry *entry;
> 
> 	/* only install on matching "hardware" */
> 	if (!dmi_check_system(fw_cfg_whitelist)) {
> 		pr_warn("Supported QEMU Standard PC hardware not found!\n");
> 		return -ENODEV;
> 	}
> 	
> 	/* verify fw_cfg device signature */
> 	fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> 	if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> 		pr_warn("QEMU fw_cfg signature not found!\n");
> 		return -ENODEV;
> 	}
> 
> 	/* create fw_cfg folder in sysfs */
> 	fw_cfg_kset = kset_create_and_add("fw_cfg", NULL, firmware_kobj);
> 	if (!fw_cfg_kset)
> 		return -ENOMEM;
> 
> 	/* process fw_cfg file directory entry */
> 	mutex_lock(&fw_cfg_dev_lock);
> 	outw(FW_CFG_FILE_DIR, FW_CFG_PORT_CTL);	/* select fw_cfg directory */
> 	insb(FW_CFG_PORT_DATA, &count, sizeof(count)); /* get file count */
> 	count = be32_to_cpu(count);
> 	/* create & register a sysfs node for each file in the directory */
> 	for (i = 0; i < count; i++) {
> 		/* allocate new entry */
> 		entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> 		if (!entry) {
> 			ret = -ENOMEM;
> 			break;
> 		}
> 
> 		/* acquire file information from directory */
> 		insb(FW_CFG_PORT_DATA, &entry->f, sizeof(struct fw_cfg_file));
> 		entry->f.size = be32_to_cpu(entry->f.size);
> 		entry->f.select = be16_to_cpu(entry->f.select);
> 
> 		/* register sysfs entry for this file */
> 		entry->kobj.kset = fw_cfg_kset;
> 		ret = kobject_init_and_add(&entry->kobj,
> 					   &fw_cfg_sysfs_entry_ktype, NULL,
> 					   /* NOTE: '/' represented as '!' */
> 					   "%s", entry->f.name);
> 		if (ret)
> 			break;
> 
> 		/* add raw binary content access */
> 		ret = sysfs_create_bin_file(&entry->kobj,
> 					    &fw_cfg_sysfs_attr_raw);
> 		if (ret)
> 			break;
> 
> 		/* add entry to global cache */
> 		fw_cfg_sysfs_cache_enlist(entry);
> 	}
> 	mutex_unlock(&fw_cfg_dev_lock);
> 
> 	if (ret) {
> 		fw_cfg_sysfs_cache_cleanup();
> 		kset_unregister(fw_cfg_kset);
> 	} else
> 		pr_debug("fw_cfg: loaded.\n");
> 
> 	return ret;
> }
> 
> static void __exit fw_cfg_sysfs_exit(void)
> {
> 	pr_debug("fw_cfg: unloading.\n");
> 	fw_cfg_sysfs_cache_cleanup();
> 	kset_unregister(fw_cfg_kset);
> }
> 
> module_init(fw_cfg_sysfs_init);
> module_exit(fw_cfg_sysfs_exit);
> MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> MODULE_LICENSE("GPL");
> MODULE_DEVICE_TABLE(dmi, fw_cfg_whitelist);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-13 23:03 ` Eric Blake
@ 2015-07-14 17:00   ` Gabriel L. Somlo
  2015-07-15 11:06     ` Matt Fleming
  0 siblings, 1 reply; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-14 17:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: matt.fleming, mdroth, rjones, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Mon, Jul 13, 2015 at 05:03:02PM -0600, Eric Blake wrote:
> On 07/13/2015 02:09 PM, Gabriel L. Somlo wrote:
> > 2. File names are listed in /sys/fs/fw_cfg/... with slashes replaced
> >    exclamation marks, e.g.:
> 
> Instead of inventing yet another escaping mechanism, can you mimic an
> already existing convention such as systemd's escaping?

Not my invention -- as of commit 9f255651f kobject_set_name_vargs()
does some variation of strreplace(s, '/', '!') as a fallback...

> 
> http://www.freedesktop.org/software/systemd/man/systemd-escape.html
> 
> > 
> > # ls /sys/firmware/fw_cfg/
> > bootorder           etc!e820                  etc!table-loader
> > etc!acpi!rsdp       etc!smbios!smbios-anchor  genroms!kvmvapic.bin
> > etc!acpi!tables     etc!smbios!smbios-tables
> > etc!boot-fail-wait  etc!system-states
> 
> would name these files along the lines of 'etc-e820' and
> 'etc-boot\x2dfail\x2dwait', and the end user can then use systemd to
> unescape the names.

That being said, I did reimplement systemd's escape method in cca. 30
lines of C, so that shouldn't be too onerous.

Besides, Laszlo said he liked a real folder hierarchy, and I do too,
so I'm still pondering how much doing that would complicate the module
init function. I'm somewhat worried about what the added complexity
will mean in terms of upstream acceptance in the linux kernel :)

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14  9:43 ` Richard W.M. Jones
@ 2015-07-14 18:23   ` Gabriel L. Somlo
  2015-07-14 18:31     ` Gabriel L. Somlo
                       ` (2 more replies)
  2015-07-16  0:43   ` Gabriel L. Somlo
  1 sibling, 3 replies; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-14 18:23 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Tue, Jul 14, 2015 at 10:43:46AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > 3. I'm currently only handling x86 and I/O ports. I could drop the
> >    fw_cfg_dmi_whitelist and just check the signature, using mmio where
> >    appropriate, but I don't have a handy-dandy set of VMs for those
> >    architectures on which I could test. Wondering if that's something
> >    we should have before I officially try to submit this to the kernel,
> >    or whether it could wait for a second iteration.
> 
>   $ virt-builder --arch armv7l fedora-22
> or:
>   $ virt-builder --arch aarch64 fedora-22
> then:
>   $ virt-builder --get-kernel fedora-22.img
> 
> and then boot is using the right qemu command, probably something
> like:
> 
>   $ qemu-system-arm \
>       -M virt,accel=tcg \
>       -cpu cortex-a15 \
>       -kernel vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
>       -initrd initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
>       -append "console=ttyAMA0 root=/dev/vda3 ro" \
>       -drive file=fedora-22.img,if=none,id=hd \
>       -device virtio-blk-device,drive=hd \
>       -serial stdio
> 
> The root password is printed in virt-builder output.

Thanks, that should help (once I figure out how to *really* start it,
right now it hangs at "reached target basic system", and spews garbage
if I hit 'escape', probably in an attempt to paint the text-mode
progress bar... Then it throws me into a dracut prompt, complaining
that it can't find /dev/vda3...

But I'm sure I'll sort it out eventually :)

> > /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > static inline void fw_cfg_read_blob(uint16_t select,
> > 				     void *buf, loff_t pos, size_t count)
> > {
> > 	mutex_lock(&fw_cfg_dev_lock);
> > 	outw(select, FW_CFG_PORT_CTL);
> > 	while (pos-- > 0)
> > 		inb(FW_CFG_PORT_DATA);
> > 	insb(FW_CFG_PORT_DATA, buf, count);
> > 	mutex_unlock(&fw_cfg_dev_lock);
> > }
> 
> How slow is this?

Well, I think each outw() and inb() will result in a vmexit, with
userspace handling emulation, so much slower comparatively than
inserting into a list (hence mutex here, vs. spinlock there).

Feel free to kick me if I got all that spectacularly wrong :)

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 18:23   ` Gabriel L. Somlo
@ 2015-07-14 18:31     ` Gabriel L. Somlo
  2015-07-14 18:48     ` Richard W.M. Jones
  2015-07-14 19:24     ` Gerd Hoffmann
  2 siblings, 0 replies; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-14 18:31 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Tue, Jul 14, 2015 at 02:23:14PM -0400, Gabriel L. Somlo wrote:
> On Tue, Jul 14, 2015 at 10:43:46AM +0100, Richard W.M. Jones wrote:
> > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > 3. I'm currently only handling x86 and I/O ports. I could drop the
> > >    fw_cfg_dmi_whitelist and just check the signature, using mmio where
> > >    appropriate, but I don't have a handy-dandy set of VMs for those
> > >    architectures on which I could test. Wondering if that's something
> > >    we should have before I officially try to submit this to the kernel,
> > >    or whether it could wait for a second iteration.
> > 
> >   $ virt-builder --arch armv7l fedora-22
> > or:
> >   $ virt-builder --arch aarch64 fedora-22
> > then:
> >   $ virt-builder --get-kernel fedora-22.img
> > 
> > and then boot is using the right qemu command, probably something
> > like:
> > 
> >   $ qemu-system-arm \
> >       -M virt,accel=tcg \
> >       -cpu cortex-a15 \
> >       -kernel vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
> >       -initrd initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
> >       -append "console=ttyAMA0 root=/dev/vda3 ro" \
> >       -drive file=fedora-22.img,if=none,id=hd \
> >       -device virtio-blk-device,drive=hd \
> >       -serial stdio
> > 
> > The root password is printed in virt-builder output.
> 
> Thanks, that should help (once I figure out how to *really* start it,
> right now it hangs at "reached target basic system", and spews garbage
> if I hit 'escape', probably in an attempt to paint the text-mode
> progress bar... Then it throws me into a dracut prompt, complaining
> that it can't find /dev/vda3...
> 
> But I'm sure I'll sort it out eventually :)

And I did figure it out, and it was a dumb typo, so nevermind this
part of the thread going forward :)

Thanks much,
--Gabriel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 18:23   ` Gabriel L. Somlo
  2015-07-14 18:31     ` Gabriel L. Somlo
@ 2015-07-14 18:48     ` Richard W.M. Jones
  2015-07-14 18:51       ` Richard W.M. Jones
                         ` (2 more replies)
  2015-07-14 19:24     ` Gerd Hoffmann
  2 siblings, 3 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2015-07-14 18:48 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Tue, Jul 14, 2015 at 02:23:14PM -0400, Gabriel L. Somlo wrote:
> On Tue, Jul 14, 2015 at 10:43:46AM +0100, Richard W.M. Jones wrote:
> > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > 3. I'm currently only handling x86 and I/O ports. I could drop the
> > >    fw_cfg_dmi_whitelist and just check the signature, using mmio where
> > >    appropriate, but I don't have a handy-dandy set of VMs for those
> > >    architectures on which I could test. Wondering if that's something
> > >    we should have before I officially try to submit this to the kernel,
> > >    or whether it could wait for a second iteration.
> > 
> >   $ virt-builder --arch armv7l fedora-22
> > or:
> >   $ virt-builder --arch aarch64 fedora-22
> > then:
> >   $ virt-builder --get-kernel fedora-22.img
> > 
> > and then boot is using the right qemu command, probably something
> > like:
> > 
> >   $ qemu-system-arm \
> >       -M virt,accel=tcg \
> >       -cpu cortex-a15 \
> >       -kernel vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
> >       -initrd initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
> >       -append "console=ttyAMA0 root=/dev/vda3 ro" \
> >       -drive file=fedora-22.img,if=none,id=hd \
> >       -device virtio-blk-device,drive=hd \
> >       -serial stdio
> > 
> > The root password is printed in virt-builder output.
> 
> Thanks, that should help (once I figure out how to *really* start it,
> right now it hangs at "reached target basic system", and spews garbage
> if I hit 'escape', probably in an attempt to paint the text-mode
> progress bar... Then it throws me into a dracut prompt, complaining
> that it can't find /dev/vda3...
> 
> But I'm sure I'll sort it out eventually :)

That error, as I guess you know, indicates that the disk image is not
being presented as virtio-blk to the guest.  I know for sure (because
I tested the command line above earlier) that the armv7l guest can see
virtio-blk.  I didn't test the aarch64 guest.

> > > /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > static inline void fw_cfg_read_blob(uint16_t select,
> > > 				     void *buf, loff_t pos, size_t count)
> > > {
> > > 	mutex_lock(&fw_cfg_dev_lock);
> > > 	outw(select, FW_CFG_PORT_CTL);
> > > 	while (pos-- > 0)
> > > 		inb(FW_CFG_PORT_DATA);
> > > 	insb(FW_CFG_PORT_DATA, buf, count);
> > > 	mutex_unlock(&fw_cfg_dev_lock);
> > > }
> > 
> > How slow is this?
> 
> Well, I think each outw() and inb() will result in a vmexit, with
> userspace handling emulation, so much slower comparatively than
> inserting into a list (hence mutex here, vs. spinlock there).

I wonder if using a string instruction (ie. rep insb etc) would be
faster.  On x86, qemu specifically optimizes these.  Maybe GCC turns
the above into a string instruction?

The reason I note all this is because there has been an ongoing
discussion about the slowness of fw_cfg.  Starting in 2010 in fact:

https://lists.gnu.org/archive/html/qemu-devel/2010-07/msg00962.html
https://lists.gnu.org/archive/html/qemu-devel/2011-10/msg00996.html

On aarch64 kernel loading is really slow because it can only transfer
(IIRC) 8 bytes at a time, and there are no string instructions we can
use to speed it up.

A long time ago I wrote a memcpy and a "pseudo-DMA" interface for
fw_cfg, but they were both roundly rejected as you can find in the
archives.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 18:48     ` Richard W.M. Jones
@ 2015-07-14 18:51       ` Richard W.M. Jones
  2015-07-14 19:16       ` Peter Maydell
  2015-07-14 19:24       ` Gabriel L. Somlo
  2 siblings, 0 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2015-07-14 18:51 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Tue, Jul 14, 2015 at 07:48:29PM +0100, Richard W.M. Jones wrote:
> On aarch64 kernel loading is really slow because it can only transfer
> (IIRC) 8 bytes at a time, and there are no string instructions we can
> use to speed it up.

I should note here I'm talking about AAVMF (ie. guest UEFI) which has
to load the kernel via fw_cfg.

Using the -kernel option qemu-system-aarch64 is really fast because it
prepopulates the guest RAM.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 18:48     ` Richard W.M. Jones
  2015-07-14 18:51       ` Richard W.M. Jones
@ 2015-07-14 19:16       ` Peter Maydell
  2015-07-14 19:24       ` Gabriel L. Somlo
  2 siblings, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2015-07-14 19:16 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: matt.fleming, gleb, Jordan Justen, Gabriel L. Somlo, Michael Roth,
	QEMU Developers, Gabriel L. Somlo, Gerd Hoffmann, Paolo Bonzini,
	Laszlo Ersek

On 14 July 2015 at 19:48, Richard W.M. Jones <rjones@redhat.com> wrote:
> A long time ago I wrote a memcpy and a "pseudo-DMA" interface for
> fw_cfg, but they were both roundly rejected as you can find in the
> archives.

IIRC opinions have changed on the pseudo-DMA approach since then;
last time it was suggested I think the response was positive.

-- PMM

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 18:23   ` Gabriel L. Somlo
  2015-07-14 18:31     ` Gabriel L. Somlo
  2015-07-14 18:48     ` Richard W.M. Jones
@ 2015-07-14 19:24     ` Gerd Hoffmann
  2 siblings, 0 replies; 31+ messages in thread
From: Gerd Hoffmann @ 2015-07-14 19:24 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: matt.fleming, mdroth, qemu-devel, jordan.l.justen,
	Gabriel L. Somlo, gleb, Richard W.M. Jones, pbonzini,
	Laszlo Ersek

  Hi,

> > > /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > static inline void fw_cfg_read_blob(uint16_t select,
> > > 				     void *buf, loff_t pos, size_t count)
> > > {
> > > 	mutex_lock(&fw_cfg_dev_lock);
> > > 	outw(select, FW_CFG_PORT_CTL);
> > > 	while (pos-- > 0)
> > > 		inb(FW_CFG_PORT_DATA);
> > > 	insb(FW_CFG_PORT_DATA, buf, count);
> > > 	mutex_unlock(&fw_cfg_dev_lock);
> > > }
> > 
> > How slow is this?
> 
> Well, I think each outw() and inb() will result in a vmexit, with
> userspace handling emulation, so much slower comparatively than
> inserting into a list (hence mutex here, vs. spinlock there).

There are a few tricks to speed up things, not sure this really matters
here as we don't transfer big stuff like kernel or initrd here.

Trick one (for x86) is that you can use string prefixes (i.e. load ecx
register with count, then use "rep insb").

Trick two (for arm) is that you can read eight instead of one byte per
instruction.

Both reduce the number of vmexits.

But speaking of kernel+initrd:  You eventually might want add them to
the filesystem too.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 18:48     ` Richard W.M. Jones
  2015-07-14 18:51       ` Richard W.M. Jones
  2015-07-14 19:16       ` Peter Maydell
@ 2015-07-14 19:24       ` Gabriel L. Somlo
  2 siblings, 0 replies; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-14 19:24 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: matt.fleming, mdroth, jordan.l.justen, qemu-devel, gleb,
	Gabriel L. Somlo, kraxel, pbonzini, Laszlo Ersek

On Tue, Jul 14, 2015 at 07:48:30PM +0100, Richard W.M. Jones wrote:
> > > > /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> > > > static inline void fw_cfg_read_blob(uint16_t select,
> > > > 				     void *buf, loff_t pos, size_t count)
> > > > {
> > > > 	mutex_lock(&fw_cfg_dev_lock);
> > > > 	outw(select, FW_CFG_PORT_CTL);
> > > > 	while (pos-- > 0)
> > > > 		inb(FW_CFG_PORT_DATA);
> > > > 	insb(FW_CFG_PORT_DATA, buf, count);
> > > > 	mutex_unlock(&fw_cfg_dev_lock);
> > > > }
> > > 
> > > How slow is this?
> > 
> > Well, I think each outw() and inb() will result in a vmexit, with
> > userspace handling emulation, so much slower comparatively than
> > inserting into a list (hence mutex here, vs. spinlock there).
> 
> I wonder if using a string instruction (ie. rep insb etc) would be
> faster.  On x86, qemu specifically optimizes these.  Maybe GCC turns
> the above into a string instruction?

After some digging...

The insb call is indeed implemented as a "rep ins" in the kernel, and
rep appears to be optimized on the host/kvm side, so we might be in
luck.

The "while (pos-- > 0) inb(FW_CFG_PORT_DATA);" portion is there just
in case, since most of the time pos==0 and we don't need to skip any
bytes from the given fw_cfg blob before getting to the optimized insb.

I guess partial interleaved raw reads of different blobs are
*theoretically* possible, but I expect in practice they'll be
rather unlikely...

Thanks,
--Gabriel

> The reason I note all this is because there has been an ongoing
> discussion about the slowness of fw_cfg.  Starting in 2010 in fact:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2010-07/msg00962.html
> https://lists.gnu.org/archive/html/qemu-devel/2011-10/msg00996.html
> 
> On aarch64 kernel loading is really slow because it can only transfer
> (IIRC) 8 bytes at a time, and there are no string instructions we can
> use to speed it up.
> 
> A long time ago I wrote a memcpy and a "pseudo-DMA" interface for
> fw_cfg, but they were both roundly rejected as you can find in the
> archives.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14 17:00   ` Gabriel L. Somlo
@ 2015-07-15 11:06     ` Matt Fleming
  2015-07-15 11:15       ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Matt Fleming @ 2015-07-15 11:06 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: mdroth, rjones, Gabriel L. Somlo, gleb, qemu-devel, kraxel,
	jordan.l.justen, pbonzini, Laszlo Ersek

On Tue, 2015-07-14 at 13:00 -0400, Gabriel L. Somlo wrote:
> 
> That being said, I did reimplement systemd's escape method in cca. 30
> lines of C, so that shouldn't be too onerous.

I really don't see a reason to use systemd's naming scheme instead of
the one already provided by the kernel.

> Besides, Laszlo said he liked a real folder hierarchy, and I do too,
> so I'm still pondering how much doing that would complicate the module
> init function. I'm somewhat worried about what the added complexity
> will mean in terms of upstream acceptance in the linux kernel :)

Yeah, going that route will complicate the patch and you're going to get
asked "Umm.. why?" by Greg Kroah-Hartman (definitely Cc Greg when
sending this to the kernel mailing lists btw).

But if you can provide sound technical arguments for the added
complexity I'm sure the kernel folks will be happy. Laszlo's argument
makes sense to me, i.e. wanting to use standard utilities to know
whether a blob is available.

Instead of rolling all this kobject-creation logic into your driver I'd
suggest writing a patch to lib/kobject.c., since the problem sounds like
something that should be solved with the kobject API.

The question is, how simple can you make the code ;-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-15 11:06     ` Matt Fleming
@ 2015-07-15 11:15       ` Laszlo Ersek
  0 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2015-07-15 11:15 UTC (permalink / raw)
  To: Matt Fleming, Gabriel L. Somlo
  Cc: mdroth, rjones, jordan.l.justen, Gabriel L. Somlo, qemu-devel,
	gleb, kraxel, pbonzini

On 07/15/15 13:06, Matt Fleming wrote:
> On Tue, 2015-07-14 at 13:00 -0400, Gabriel L. Somlo wrote:
>>
>> That being said, I did reimplement systemd's escape method in cca. 30
>> lines of C, so that shouldn't be too onerous.
> 
> I really don't see a reason to use systemd's naming scheme instead of
> the one already provided by the kernel.
> 
>> Besides, Laszlo said he liked a real folder hierarchy, and I do too,
>> so I'm still pondering how much doing that would complicate the module
>> init function. I'm somewhat worried about what the added complexity
>> will mean in terms of upstream acceptance in the linux kernel :)
> 
> Yeah, going that route will complicate the patch and you're going to get
> asked "Umm.. why?" by Greg Kroah-Hartman (definitely Cc Greg when
> sending this to the kernel mailing lists btw).
> 
> But if you can provide sound technical arguments for the added
> complexity I'm sure the kernel folks will be happy. Laszlo's argument
> makes sense to me, i.e. wanting to use standard utilities to know
> whether a blob is available.
> 
> Instead of rolling all this kobject-creation logic into your driver I'd
> suggest writing a patch to lib/kobject.c., since the problem sounds like
> something that should be solved with the kobject API.

Haha, this feature just got extended by six months. :)

> The question is, how simple can you make the code ;-)

The first question is how strong Gabriel's nerves are, but (from past
experience) they *are* strong. :)

Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-13 20:09 [Qemu-devel] RFC: guest-side retrieval of fw_cfg file Gabriel L. Somlo
                   ` (2 preceding siblings ...)
  2015-07-14 11:38 ` Laszlo Ersek
@ 2015-07-15 12:00 ` Matt Fleming
  2015-07-20 21:19   ` Gabriel L. Somlo
  2015-07-15 14:08 ` Michael S. Tsirkin
  4 siblings, 1 reply; 31+ messages in thread
From: Matt Fleming @ 2015-07-15 12:00 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: mdroth, rjones, jordan.l.justen, Gabriel L. Somlo, qemu-devel,
	gleb, kraxel, pbonzini, Laszlo Ersek

On Mon, 2015-07-13 at 16:09 -0400, Gabriel L. Somlo wrote:
> 
> 3. I'm currently only handling x86 and I/O ports. I could drop the
>    fw_cfg_dmi_whitelist and just check the signature, using mmio where
>    appropriate, but I don't have a handy-dandy set of VMs for those
>    architectures on which I could test. Wondering if that's something
>    we should have before I officially try to submit this to the kernel,
>    or whether it could wait for a second iteration.
> 
>    Speaking of the kernel: My default plan is to subscribe to
>    kernelnewbies@kernelnewbies.org and submit it there (this is after
>    all baby's first kernel module :) but if there's a better route for
>    pushing it upstream, please feel free to suggest it to me.

Congrats on your first kernel patch! Cc'ing kernelnewbies is OK but this
patches warrants a wider audience. Cc Greg Kroah-Hartman
(gregkh@linuxfoundation.org) since he polices things related to sysfs,
the x86 maintainers (x86@kernel.org), and linux-kernel@vger.kernel.org.
The closest thing we have to a Linux system firmware list is probably
linux-efi@vger.kernel.org.

By all means, mention that it's your first kernel patch when submitting
this.

Oh, you're also going to want to add documentation under
Documentation/ABI/testing/sysfs-firmware-fw_cfg as part of your patch
series.


> /* provide atomic read access to hardware fw_cfg device
>  * (critical section involves potentially lengthy i/o, using mutex) */
> static DEFINE_MUTEX(fw_cfg_dev_lock);

Just a very small nitpick, but,

	/*
	 * Multi-line comments should look like this
	 * in the kernel.
	 */


> static int __init fw_cfg_sysfs_init(void)
> {
> 	int ret = 0;
> 	uint32_t count, i;
> 	char sig[FW_CFG_SIG_SIZE];
> 	struct fw_cfg_sysfs_entry *entry;
> 
> 	/* only install on matching "hardware" */
> 	if (!dmi_check_system(fw_cfg_whitelist)) {
> 		pr_warn("Supported QEMU Standard PC hardware not found!\n");
> 		return -ENODEV;
> 	}

I would suggest deleting this warning because it's just noise if I build
this code into my kernel and boot it on non-qemu hardware.

> 	/* verify fw_cfg device signature */
> 	fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> 	if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> 		pr_warn("QEMU fw_cfg signature not found!\n");
> 		return -ENODEV;
> 	}
> 
> 	/* create fw_cfg folder in sysfs */
> 	fw_cfg_kset = kset_create_and_add("fw_cfg", NULL, firmware_kobj);
> 	if (!fw_cfg_kset)
> 		return -ENOMEM;
> 
> 	/* process fw_cfg file directory entry */
> 	mutex_lock(&fw_cfg_dev_lock);
> 	outw(FW_CFG_FILE_DIR, FW_CFG_PORT_CTL);	/* select fw_cfg directory */
> 	insb(FW_CFG_PORT_DATA, &count, sizeof(count)); /* get file count */
> 	count = be32_to_cpu(count);
> 	/* create & register a sysfs node for each file in the directory */
> 	for (i = 0; i < count; i++) {
> 		/* allocate new entry */
> 		entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> 		if (!entry) {
> 			ret = -ENOMEM;
> 			break;
> 		}
> 
> 		/* acquire file information from directory */
> 		insb(FW_CFG_PORT_DATA, &entry->f, sizeof(struct fw_cfg_file));
> 		entry->f.size = be32_to_cpu(entry->f.size);
> 		entry->f.select = be16_to_cpu(entry->f.select);
> 
> 		/* register sysfs entry for this file */
> 		entry->kobj.kset = fw_cfg_kset;
> 		ret = kobject_init_and_add(&entry->kobj,
> 					   &fw_cfg_sysfs_entry_ktype, NULL,
> 					   /* NOTE: '/' represented as '!' */
> 					   "%s", entry->f.name);
> 		if (ret)
> 			break;

If you take this error path, aren't you leaking 'entry'? It isn't yet on
the 'fw_cfg_entry_cache' list and so won't be freed in
fw_cfg_sysfs_cache_cleanup().

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-13 20:09 [Qemu-devel] RFC: guest-side retrieval of fw_cfg file Gabriel L. Somlo
                   ` (3 preceding siblings ...)
  2015-07-15 12:00 ` Matt Fleming
@ 2015-07-15 14:08 ` Michael S. Tsirkin
  2015-07-15 16:01   ` Igor Mammedov
  4 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2015-07-15 14:08 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: matt.fleming, gleb, qemu-devel, jordan.l.justen, Gabriel L. Somlo,
	mdroth, rjones, kraxel, imammedo, pbonzini, Laszlo Ersek

On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> Hi,
> 
> A while ago I was pondering on the options available for retrieving
> a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> files on the host-side command line, see commit 81b2b8106).
> 
> So over the last couple of weekends I cooked up the sysfs kernel module
> below, which lists all fw_cfg files under /sys/firmware/fw_cfg/<filename>.

One concern here is that there will be a conflict here if fw cfg
is used by ACPI. I don't know whether that last is a good idea
though, so maybe not a real concern. I think Igor
wanted to make it so.

-- 
MST

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-15 14:08 ` Michael S. Tsirkin
@ 2015-07-15 16:01   ` Igor Mammedov
  2015-07-15 16:24     ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Mammedov @ 2015-07-15 16:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: matt.fleming, gleb, qemu-devel, jordan.l.justen, Gabriel L. Somlo,
	mdroth, rjones, Gabriel L. Somlo, kraxel, pbonzini, Laszlo Ersek

On Wed, 15 Jul 2015 17:08:27 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > Hi,
> > 
> > A while ago I was pondering on the options available for retrieving
> > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > files on the host-side command line, see commit 81b2b8106).
> > 
> > So over the last couple of weekends I cooked up the sysfs kernel
> > module below, which lists all fw_cfg files
> > under /sys/firmware/fw_cfg/<filename>.
> 
> One concern here is that there will be a conflict here if fw cfg
> is used by ACPI. I don't know whether that last is a good idea
> though, so maybe not a real concern. I think Igor
> wanted to make it so.

I don't see any conflict here so far, it's just guest side module that
accesses fw_cfg.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-15 16:01   ` Igor Mammedov
@ 2015-07-15 16:24     ` Michael S. Tsirkin
  2015-07-16  1:21       ` Gabriel L. Somlo
  2015-07-16  6:57       ` Igor Mammedov
  0 siblings, 2 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2015-07-15 16:24 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: matt.fleming, gleb, qemu-devel, jordan.l.justen, Gabriel L. Somlo,
	mdroth, rjones, Gabriel L. Somlo, kraxel, pbonzini, Laszlo Ersek

On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> On Wed, 15 Jul 2015 17:08:27 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > Hi,
> > > 
> > > A while ago I was pondering on the options available for retrieving
> > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > files on the host-side command line, see commit 81b2b8106).
> > > 
> > > So over the last couple of weekends I cooked up the sysfs kernel
> > > module below, which lists all fw_cfg files
> > > under /sys/firmware/fw_cfg/<filename>.
> > 
> > One concern here is that there will be a conflict here if fw cfg
> > is used by ACPI. I don't know whether that last is a good idea
> > though, so maybe not a real concern. I think Igor
> > wanted to make it so.
> 
> I don't see any conflict here so far, it's just guest side module that
> accesses fw_cfg.

If there's ACPI code that accesses fw_cfg in response to an interrupt,
it'll race with fw cfg access by guest OS. On linux we might be able to
block ACPI preventing it from running. We probably won't be able to do
it on windows.

-- 
MST

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-14  9:43 ` Richard W.M. Jones
  2015-07-14 18:23   ` Gabriel L. Somlo
@ 2015-07-16  0:43   ` Gabriel L. Somlo
  2015-07-16 19:27     ` Eric Blake
  1 sibling, 1 reply; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-16  0:43 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

On Tue, Jul 14, 2015 at 10:43:46AM +0100, Richard W.M. Jones wrote:
> On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > 3. I'm currently only handling x86 and I/O ports. I could drop the
> >    fw_cfg_dmi_whitelist and just check the signature, using mmio where
> >    appropriate, but I don't have a handy-dandy set of VMs for those
> >    architectures on which I could test. Wondering if that's something
> >    we should have before I officially try to submit this to the kernel,
> >    or whether it could wait for a second iteration.
> 
>   $ virt-builder --arch armv7l fedora-22
> or:
>   $ virt-builder --arch aarch64 fedora-22
> then:
>   $ virt-builder --get-kernel fedora-22.img
> 
> and then boot is using the right qemu command, probably something
> like:
> 
>   $ qemu-system-arm \
>       -M virt,accel=tcg \
>       -cpu cortex-a15 \
>       -kernel vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
>       -initrd initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
>       -append "console=ttyAMA0 root=/dev/vda3 ro" \
>       -drive file=fedora-22.img,if=none,id=hd \
>       -device virtio-blk-device,drive=hd \
>       -serial stdio
> 
> The root password is printed in virt-builder output.

OK, so I replaced my port i/o with mmio equivalents:

-#define FW_CFG_PORT_CTL  0x510
+#define FW_CFG_PORT_CTL  (void *)0x09020008

-#define FW_CFG_PORT_DATA 0x511
+#define FW_CFG_PORT_DATA (void *)0x09020000

-       outw(select, FW_CFG_PORT_CTL);
+       writew(select, FW_CFG_PORT_CTL);

-       inb(FW_CFG_PORT_DATA);
+       readb(FW_CFG_PORT_DATA);

-       insb(FW_CFG_PORT_DATA, buf, count);
+       readsb(FW_CFG_PORT_DATA, buf, count);

and managed to build the module for 4.0.4-301.fc22.armv7hl+lpae, but
when I attempt to insmod the resulting .ko, I get an oops:

"Unable to handle kernel paging request at virtual address 09020008"

I grabbed the control and data addresses from qemu's hw/arm/virt.c:

	a15memmap[VIRT_FW_CFG] is { 0x09020000, 0x0000000a }

and used the "fw_cfg_init_mem_wide(base + 8, base, 8)" call to guess
that CTL should be DATA + 8 :)

I'm probably missing something that'll turn out to be really obvious
in retrospect... :)

TIA for any clue,
--Gabriel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-15 16:24     ` Michael S. Tsirkin
@ 2015-07-16  1:21       ` Gabriel L. Somlo
  2015-07-16  6:57       ` Igor Mammedov
  1 sibling, 0 replies; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-16  1:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: matt.fleming, gleb, qemu-devel, jordan.l.justen, Gabriel L. Somlo,
	mdroth, rjones, kraxel, pbonzini, Igor Mammedov, Laszlo Ersek

On Wed, Jul 15, 2015 at 07:24:33PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> > On Wed, 15 Jul 2015 17:08:27 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > > Hi,
> > > > 
> > > > A while ago I was pondering on the options available for retrieving
> > > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > > files on the host-side command line, see commit 81b2b8106).
> > > > 
> > > > So over the last couple of weekends I cooked up the sysfs kernel
> > > > module below, which lists all fw_cfg files
> > > > under /sys/firmware/fw_cfg/<filename>.
> > > 
> > > One concern here is that there will be a conflict here if fw cfg
> > > is used by ACPI. I don't know whether that last is a good idea
> > > though, so maybe not a real concern. I think Igor
> > > wanted to make it so.
> > 
> > I don't see any conflict here so far, it's just guest side module that
> > accesses fw_cfg.
> 
> If there's ACPI code that accesses fw_cfg in response to an interrupt,
> it'll race with fw cfg access by guest OS. On linux we might be able to
> block ACPI preventing it from running. We probably won't be able to do
> it on windows.

Do you mean something like:

+       uint32_t acpi_lock
        mutex_lock(&fw_cfg_dev_lock);
+       acpi_ackquire_global_lock(ACPI_WAIT_FOREVER, &acpi_lock);
        outw(select, FW_CFG_PORT_CTL);
        while (pos-- > 0)
                inb(FW_CFG_PORT_DATA);
        insb(FW_CFG_PORT_DATA, buf, count);
+       acpi_release_global_lock(0, &acpi_lock);
        mutex_unlock(&fw_cfg_dev_lock);

with the expectation that ACPI would try to grab that global lock
itself before fiddling with (mm)io ?

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-15 16:24     ` Michael S. Tsirkin
  2015-07-16  1:21       ` Gabriel L. Somlo
@ 2015-07-16  6:57       ` Igor Mammedov
  2015-07-16  7:30         ` Michael S. Tsirkin
  1 sibling, 1 reply; 31+ messages in thread
From: Igor Mammedov @ 2015-07-16  6:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: matt.fleming, gleb, qemu-devel, jordan.l.justen, Gabriel L. Somlo,
	mdroth, rjones, Gabriel L. Somlo, kraxel, pbonzini, Laszlo Ersek

On Wed, 15 Jul 2015 19:24:33 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> > On Wed, 15 Jul 2015 17:08:27 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > > Hi,
> > > > 
> > > > A while ago I was pondering on the options available for retrieving
> > > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > > files on the host-side command line, see commit 81b2b8106).
> > > > 
> > > > So over the last couple of weekends I cooked up the sysfs kernel
> > > > module below, which lists all fw_cfg files
> > > > under /sys/firmware/fw_cfg/<filename>.
> > > 
> > > One concern here is that there will be a conflict here if fw cfg
> > > is used by ACPI. I don't know whether that last is a good idea
> > > though, so maybe not a real concern. I think Igor
> > > wanted to make it so.
> > 
> > I don't see any conflict here so far, it's just guest side module that
> > accesses fw_cfg.
> 
> If there's ACPI code that accesses fw_cfg in response to an interrupt,
> it'll race with fw cfg access by guest OS. On linux we might be able to
> block ACPI preventing it from running. We probably won't be able to do
> it on windows.
wrt vmgenid series we were talking about possibility to access fw_cfg
from ACPI device.init so it's unlikely that it will ever collide with
much later sysfs accesses. But if ACPI will start accessing fw_cfg from
other methods it will race for sure.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-16  6:57       ` Igor Mammedov
@ 2015-07-16  7:30         ` Michael S. Tsirkin
  2015-07-16  9:50           ` Igor Mammedov
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2015-07-16  7:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: matt.fleming, gleb, qemu-devel, jordan.l.justen, Gabriel L. Somlo,
	mdroth, rjones, Gabriel L. Somlo, kraxel, pbonzini, Laszlo Ersek

On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote:
> On Wed, 15 Jul 2015 19:24:33 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> > > On Wed, 15 Jul 2015 17:08:27 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > > > Hi,
> > > > > 
> > > > > A while ago I was pondering on the options available for retrieving
> > > > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > > > files on the host-side command line, see commit 81b2b8106).
> > > > > 
> > > > > So over the last couple of weekends I cooked up the sysfs kernel
> > > > > module below, which lists all fw_cfg files
> > > > > under /sys/firmware/fw_cfg/<filename>.
> > > > 
> > > > One concern here is that there will be a conflict here if fw cfg
> > > > is used by ACPI. I don't know whether that last is a good idea
> > > > though, so maybe not a real concern. I think Igor
> > > > wanted to make it so.
> > > 
> > > I don't see any conflict here so far, it's just guest side module that
> > > accesses fw_cfg.
> > 
> > If there's ACPI code that accesses fw_cfg in response to an interrupt,
> > it'll race with fw cfg access by guest OS. On linux we might be able to
> > block ACPI preventing it from running. We probably won't be able to do
> > it on windows.
> wrt vmgenid series we were talking about possibility to access fw_cfg
> from ACPI device.init so it's unlikely that it will ever collide with
> much later sysfs accesses.

Don't we need to get updates when it changes too?

> But if ACPI will start accessing fw_cfg from
> other methods it will race for sure.

Maybe we shouldn't poke at fw cfg in ACPI then.

-- 
MST

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-16  7:30         ` Michael S. Tsirkin
@ 2015-07-16  9:50           ` Igor Mammedov
  2015-07-16 10:25             ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Igor Mammedov @ 2015-07-16  9:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: matt.fleming, rjones, qemu-devel, jordan.l.justen,
	Gabriel L. Somlo, mdroth, gleb, Gabriel L. Somlo, kraxel,
	pbonzini, Laszlo Ersek

On Thu, 16 Jul 2015 10:30:41 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote:
> > On Wed, 15 Jul 2015 19:24:33 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> > > > On Wed, 15 Jul 2015 17:08:27 +0300
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > 
> > > > > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > A while ago I was pondering on the options available for retrieving
> > > > > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > > > > files on the host-side command line, see commit 81b2b8106).
> > > > > > 
> > > > > > So over the last couple of weekends I cooked up the sysfs kernel
> > > > > > module below, which lists all fw_cfg files
> > > > > > under /sys/firmware/fw_cfg/<filename>.
> > > > > 
> > > > > One concern here is that there will be a conflict here if fw cfg
> > > > > is used by ACPI. I don't know whether that last is a good idea
> > > > > though, so maybe not a real concern. I think Igor
> > > > > wanted to make it so.
> > > > 
> > > > I don't see any conflict here so far, it's just guest side module that
> > > > accesses fw_cfg.
> > > 
> > > If there's ACPI code that accesses fw_cfg in response to an interrupt,
> > > it'll race with fw cfg access by guest OS. On linux we might be able to
> > > block ACPI preventing it from running. We probably won't be able to do
> > > it on windows.
> > wrt vmgenid series we were talking about possibility to access fw_cfg
> > from ACPI device.init so it's unlikely that it will ever collide with
> > much later sysfs accesses.
> 
> Don't we need to get updates when it changes too?
I've thought that it's pretty static and doesn't change.

> 
> > But if ACPI will start accessing fw_cfg from
> > other methods it will race for sure.
> 
> Maybe we shouldn't poke at fw cfg in ACPI then.
Yep, maybe we shouldn't.

The last vmgenid series just maps UUID region
at fixed location (which is sufficient to build ACPI
tables at machine done time) so there isn't real need
to export address via fw_cfg.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-16  9:50           ` Igor Mammedov
@ 2015-07-16 10:25             ` Michael S. Tsirkin
  2015-07-16 11:15               ` Igor Mammedov
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2015-07-16 10:25 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: matt.fleming, rjones, qemu-devel, jordan.l.justen,
	Gabriel L. Somlo, mdroth, gleb, Gabriel L. Somlo, kraxel,
	pbonzini, Laszlo Ersek

On Thu, Jul 16, 2015 at 11:50:38AM +0200, Igor Mammedov wrote:
> On Thu, 16 Jul 2015 10:30:41 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote:
> > > On Wed, 15 Jul 2015 19:24:33 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> > > > > On Wed, 15 Jul 2015 17:08:27 +0300
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > 
> > > > > > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > A while ago I was pondering on the options available for retrieving
> > > > > > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > > > > > files on the host-side command line, see commit 81b2b8106).
> > > > > > > 
> > > > > > > So over the last couple of weekends I cooked up the sysfs kernel
> > > > > > > module below, which lists all fw_cfg files
> > > > > > > under /sys/firmware/fw_cfg/<filename>.
> > > > > > 
> > > > > > One concern here is that there will be a conflict here if fw cfg
> > > > > > is used by ACPI. I don't know whether that last is a good idea
> > > > > > though, so maybe not a real concern. I think Igor
> > > > > > wanted to make it so.
> > > > > 
> > > > > I don't see any conflict here so far, it's just guest side module that
> > > > > accesses fw_cfg.
> > > > 
> > > > If there's ACPI code that accesses fw_cfg in response to an interrupt,
> > > > it'll race with fw cfg access by guest OS. On linux we might be able to
> > > > block ACPI preventing it from running. We probably won't be able to do
> > > > it on windows.
> > > wrt vmgenid series we were talking about possibility to access fw_cfg
> > > from ACPI device.init so it's unlikely that it will ever collide with
> > > much later sysfs accesses.
> > 
> > Don't we need to get updates when it changes too?
> I've thought that it's pretty static and doesn't change.

vm gen id? No, its dynamic.

> > 
> > > But if ACPI will start accessing fw_cfg from
> > > other methods it will race for sure.
> > 
> > Maybe we shouldn't poke at fw cfg in ACPI then.
> Yep, maybe we shouldn't.
> 
> The last vmgenid series just maps UUID region
> at fixed location (which is sufficient to build ACPI
> tables at machine done time) so there isn't real need
> to export address via fw_cfg.
> 

Other versions just write into guest memory from QEMU,
that will work fine too.

-- 
MST

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-16 10:25             ` Michael S. Tsirkin
@ 2015-07-16 11:15               ` Igor Mammedov
  0 siblings, 0 replies; 31+ messages in thread
From: Igor Mammedov @ 2015-07-16 11:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: matt.fleming, rjones, qemu-devel, jordan.l.justen,
	Gabriel L. Somlo, mdroth, gleb, Gabriel L. Somlo, kraxel,
	pbonzini, Laszlo Ersek

On Thu, 16 Jul 2015 13:25:56 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jul 16, 2015 at 11:50:38AM +0200, Igor Mammedov wrote:
> > On Thu, 16 Jul 2015 10:30:41 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote:
> > > > On Wed, 15 Jul 2015 19:24:33 +0300
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > 
> > > > > On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote:
> > > > > > On Wed, 15 Jul 2015 17:08:27 +0300
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > 
> > > > > > > On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > A while ago I was pondering on the options available for retrieving
> > > > > > > > a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> > > > > > > > files on the host-side command line, see commit 81b2b8106).
> > > > > > > > 
> > > > > > > > So over the last couple of weekends I cooked up the sysfs kernel
> > > > > > > > module below, which lists all fw_cfg files
> > > > > > > > under /sys/firmware/fw_cfg/<filename>.
> > > > > > > 
> > > > > > > One concern here is that there will be a conflict here if fw cfg
> > > > > > > is used by ACPI. I don't know whether that last is a good idea
> > > > > > > though, so maybe not a real concern. I think Igor
> > > > > > > wanted to make it so.
> > > > > > 
> > > > > > I don't see any conflict here so far, it's just guest side module that
> > > > > > accesses fw_cfg.
> > > > > 
> > > > > If there's ACPI code that accesses fw_cfg in response to an interrupt,
> > > > > it'll race with fw cfg access by guest OS. On linux we might be able to
> > > > > block ACPI preventing it from running. We probably won't be able to do
> > > > > it on windows.
> > > > wrt vmgenid series we were talking about possibility to access fw_cfg
> > > > from ACPI device.init so it's unlikely that it will ever collide with
> > > > much later sysfs accesses.
> > > 
> > > Don't we need to get updates when it changes too?
> > I've thought that it's pretty static and doesn't change.
> 
> vm gen id? No, its dynamic.
nope, I've mean address of buffer, not UUID itself.
 
> > > 
> > > > But if ACPI will start accessing fw_cfg from
> > > > other methods it will race for sure.
> > > 
> > > Maybe we shouldn't poke at fw cfg in ACPI then.
> > Yep, maybe we shouldn't.
> > 
> > The last vmgenid series just maps UUID region
> > at fixed location (which is sufficient to build ACPI
> > tables at machine done time) so there isn't real need
> > to export address via fw_cfg.
> > 
> 
> Other versions just write into guest memory from QEMU,
> that will work fine too.
yep they do,
except of that they are more complex on ACPI side and
it's more difficult to debug. I like the last revision of
series for simplicity of design and behavior.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-16  0:43   ` Gabriel L. Somlo
@ 2015-07-16 19:27     ` Eric Blake
  2015-07-16 20:42       ` Gabriel L. Somlo
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Blake @ 2015-07-16 19:27 UTC (permalink / raw)
  To: Gabriel L. Somlo, Richard W.M. Jones
  Cc: matt.fleming, mdroth, jordan.l.justen, Gabriel L. Somlo,
	qemu-devel, gleb, kraxel, pbonzini, Laszlo Ersek

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

On 07/15/2015 06:43 PM, Gabriel L. Somlo wrote:

> 
> OK, so I replaced my port i/o with mmio equivalents:
> 
> -#define FW_CFG_PORT_CTL  0x510
> +#define FW_CFG_PORT_CTL  (void *)0x09020008
> 
> -#define FW_CFG_PORT_DATA 0x511
> +#define FW_CFG_PORT_DATA (void *)0x09020000

Under-parenthesized; you'll want:

#define FW_CFG_PORT_DATA ((void *)0x09020000)

to be useful in all possible locations where an identifier can appear in
an expression.

> 
> -       outw(select, FW_CFG_PORT_CTL);
> +       writew(select, FW_CFG_PORT_CTL);
> 
> -       inb(FW_CFG_PORT_DATA);
> +       readb(FW_CFG_PORT_DATA);
> 
> -       insb(FW_CFG_PORT_DATA, buf, count);
> +       readsb(FW_CFG_PORT_DATA, buf, count);

But as it doesn't affect your usage here...

> 
> I'm probably missing something that'll turn out to be really obvious
> in retrospect... :)

I probably didn't spot the really obvious problem.

So much for my drive-by noise :)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-16 19:27     ` Eric Blake
@ 2015-07-16 20:42       ` Gabriel L. Somlo
  0 siblings, 0 replies; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-16 20:42 UTC (permalink / raw)
  To: Eric Blake
  Cc: matt.fleming, mdroth, qemu-devel, jordan.l.justen, gleb,
	Richard W.M. Jones, kraxel, pbonzini, Laszlo Ersek

On Thu, Jul 16, 2015 at 01:27:15PM -0600, Eric Blake wrote:
> On 07/15/2015 06:43 PM, Gabriel L. Somlo wrote:
> 
> > 
> > OK, so I replaced my port i/o with mmio equivalents:
> > 
> > -#define FW_CFG_PORT_CTL  0x510
> > +#define FW_CFG_PORT_CTL  (void *)0x09020008
> > 
> > -#define FW_CFG_PORT_DATA 0x511
> > +#define FW_CFG_PORT_DATA (void *)0x09020000
> 
> Under-parenthesized; you'll want:
> 
> #define FW_CFG_PORT_DATA ((void *)0x09020000)
> 
> to be useful in all possible locations where an identifier can appear in
> an expression.
> 
> > 
> > -       outw(select, FW_CFG_PORT_CTL);
> > +       writew(select, FW_CFG_PORT_CTL);
> > 
> > -       inb(FW_CFG_PORT_DATA);
> > +       readb(FW_CFG_PORT_DATA);
> > 
> > -       insb(FW_CFG_PORT_DATA, buf, count);
> > +       readsb(FW_CFG_PORT_DATA, buf, count);
> 
> But as it doesn't affect your usage here...
> 
> > 
> > I'm probably missing something that'll turn out to be really obvious
> > in retrospect... :)
> 
> I probably didn't spot the really obvious problem.

After some meditation (and digging around), I now think I may have missed
some of the pomp and circumstance surrounding mmio access, beyond the simple
writew/readsb calls I was using. Such as [request|check|release]_mem_region(),
ioremap(), and maybe even ioport_[map|unmap](), to hopefully make things
more uniform across the mmio vs. ioport architectures :)

Guess Section 9.4 of LDD3 is my new bestest friend :)
(http://www.makelinux.net/ldd3/chp-9-sect-4)

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-15 12:00 ` Matt Fleming
@ 2015-07-20 21:19   ` Gabriel L. Somlo
  2015-07-20 22:07     ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-20 21:19 UTC (permalink / raw)
  To: Matt Fleming
  Cc: mdroth, rjones, jordan.l.justen, qemu-devel, gleb, kraxel,
	pbonzini, Laszlo Ersek

New working version of fw_cfg sysfs module enclosed at the end of this
mail, featuring:

	- probing for the appropriate fw_cfg port/address for the
	  architecture we're on. It's either that, or preprocessor
	  #ifdef voodoo to try only the right access method matching
	  the architecture we're compiling for.

		- the module compiles and runs fine now on both
		  x86_64 and armv7hl+lpae, not tested on ppc or sun.

	- Reworked initialization to prevent leaks when bailing out
	  due to errors (thanks Matt for pointing out one of those !)

On Wed, Jul 15, 2015 at 01:00:45PM +0100, Matt Fleming wrote:
> On Mon, 2015-07-13 at 16:09 -0400, Gabriel L. Somlo wrote:
> >    Speaking of the kernel: My default plan is to subscribe to
> >    kernelnewbies@kernelnewbies.org and submit it there (this is after
> >    all baby's first kernel module :) but if there's a better route for
> >    pushing it upstream, please feel free to suggest it to me.
> 
> Congrats on your first kernel patch! Cc'ing kernelnewbies is OK but this
> patches warrants a wider audience. Cc Greg Kroah-Hartman
> (gregkh@linuxfoundation.org) since he polices things related to sysfs,
> the x86 maintainers (x86@kernel.org), and linux-kernel@vger.kernel.org.
> The closest thing we have to a Linux system firmware list is probably
> linux-efi@vger.kernel.org.
> 
> By all means, mention that it's your first kernel patch when submitting
> this.
> 
> Oh, you're also going to want to add documentation under
> Documentation/ABI/testing/sysfs-firmware-fw_cfg as part of your patch
> series.

Also thanks for the good advice. I'm out on PTO next week, so planning
to submit this to the kernel after I get back -- wouldn't want to not
be around when I start getting feedback from the kernel list... :)

On Wed, Jul 15, 2015 at 01:15:08PM +0200, Laszlo Ersek wrote:
> On 07/15/15 13:06, Matt Fleming wrote:
> > On Tue, 2015-07-14 at 13:00 -0400, Gabriel L. Somlo wrote:
> >>
> >> That being said, I did reimplement systemd's escape method in cca. 30
> >> lines of C, so that shouldn't be too onerous.
> > 
> > I really don't see a reason to use systemd's naming scheme instead of
> > the one already provided by the kernel.
> > 
> >> Besides, Laszlo said he liked a real folder hierarchy, and I do too,
> >> so I'm still pondering how much doing that would complicate the module
> >> init function. I'm somewhat worried about what the added complexity
> >> will mean in terms of upstream acceptance in the linux kernel :)
> > 
> > Yeah, going that route will complicate the patch and you're going to get
> > asked "Umm.. why?" by Greg Kroah-Hartman (definitely Cc Greg when
> > sending this to the kernel mailing lists btw).
> > 
> > But if you can provide sound technical arguments for the added
> > complexity I'm sure the kernel folks will be happy. Laszlo's argument
> > makes sense to me, i.e. wanting to use standard utilities to know
> > whether a blob is available.
> > 
> > Instead of rolling all this kobject-creation logic into your driver I'd
> > suggest writing a patch to lib/kobject.c., since the problem sounds like
> > something that should be solved with the kobject API.
> 
> Haha, this feature just got extended by six months. :)
> 
> > The question is, how simple can you make the code ;-)
> 
> The first question is how strong Gabriel's nerves are, but (from past
> experience) they *are* strong. :)

Sir, you are too kind... :)

Speaking of which, I think I hit the first snag, and it's of the
design/bikeshedding kind:

The code to build nested ksets (represending sub-sub-directories of
/sys/firmware/fw_cfg/...) and cleaning them up on exit doesn't promise
to be *too* horrible or bulky, but as I was getting ready to start
writing it, I realized that, in theory, nothing is to stop the fw_cfg
device from having files named e.g.

	"etc/foo"

and

	"etc/foo/bar"

That doesn't happen right now on the qemu side, but it could in
theory, and I have no idea how I'd deal with the file/directory
duality of "foo" in this situation, short of refusing to load the
module, or leaving out one fw_cfg file or the other. Unless there's
a way around this, I think it's safer to either stick with the default
's/\//!/g' scheme of the kobject library, or implement something like
systemd's string escaping scheme that's been suggested earlier in this
thread...

Or maybe leaving out the occasional poorly-named file is still better
than giving up the ability to run find on the fw_cfg sysfs folder ?

Anyway, here goes version 0.2...

Thanks much,
--Gabriel


/*
 * drivers/firmware/fw_cfg.c
 *
 * Expose entries from QEMU's firmware configuration (fw_cfg) device in
 * sysfs (read-only, under "/sys/firmware/fw_cfg/...").
 *
 * Copyright 2015 Gabriel L. Somlo <somlo@cmu.edu>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License v2 as published
 * by the Free Software Foundation.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 * for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA
 */

#include <linux/module.h>
#include <linux/capability.h>
#include <linux/slab.h>
#include <linux/io.h>
#include <linux/ioport.h>
#include <linux/ctype.h>

/* selector values for "well-known" fw_cfg entries */
#define FW_CFG_SIGNATURE  0x00
#define FW_CFG_FILE_DIR   0x19

/* size in bytes of fw_cfg signature */
#define FW_CFG_SIG_SIZE 4

/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
#define FW_CFG_MAX_FILE_PATH 56

/* fw_cfg file directory entry type */
struct fw_cfg_file {
	uint32_t size;
	uint16_t select;
	uint16_t reserved;
	char name[FW_CFG_MAX_FILE_PATH];
};

/* fw_cfg device i/o access options type */
struct fw_cfg_access {
	phys_addr_t start;
	uint8_t size;
	uint8_t ctrl_offset;
	uint8_t data_offset;
	bool is_mmio;
	const char *name;
};

/* fw_cfg device i/o access available options for known architectures */
static struct fw_cfg_access fw_cfg_modes[] = {
	{ 0x510,       2, 0, 1, false, "fw_cfg on i386, sun4u" },
	{ 0x9020000,  10, 8, 0,  true, "fw_cfg on arm" },
	{ 0xd00000510, 3, 0, 2,  true, "fw_cfg on sun4m" },
	{ 0xf0000510,  3, 0, 2,  true, "fw_cfg on ppc/mac" },
	{ }
};

/* fw_cfg device i/o currently selected option set */
static struct fw_cfg_access *fw_cfg_mode;

/* fw_cfg device i/o register addresses */
static void __iomem *fw_cfg_dev_base;
static void __iomem *fw_cfg_dev_ctrl;
static void __iomem *fw_cfg_dev_data;

/* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
static DEFINE_MUTEX(fw_cfg_dev_lock);

/* type for fw_cfg "directory scan" visitor/callback function */
typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);

/* run a given callback on each fw_cfg directory entry */
static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
{
	int ret = 0;
	uint32_t count, i;
	struct fw_cfg_file f;

	mutex_lock(&fw_cfg_dev_lock);
	iowrite16(FW_CFG_FILE_DIR, fw_cfg_dev_ctrl);
	ioread8_rep(fw_cfg_dev_data, &count, sizeof(count));
	for (i = 0; i < be32_to_cpu(count); i++) {
		ioread8_rep(fw_cfg_dev_data, &f, sizeof(f));
		ret = callback(&f);
		if (ret)
			break;
	}
	mutex_unlock(&fw_cfg_dev_lock);
	return ret;
}

/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
static inline void fw_cfg_read_blob(uint16_t select,
				     void *buf, loff_t pos, size_t count)
{
	mutex_lock(&fw_cfg_dev_lock);
	iowrite16(select, fw_cfg_dev_ctrl);
	while (pos-- > 0)
		ioread8(fw_cfg_dev_data);
	ioread8_rep(fw_cfg_dev_data, buf, count);
	mutex_unlock(&fw_cfg_dev_lock);
}

/* clean up fw_cfg device i/o setup */
static void fw_cfg_io_cleanup(void)
{
	if (fw_cfg_mode->is_mmio) {
		iounmap(fw_cfg_dev_base);
		release_mem_region(fw_cfg_mode->start, fw_cfg_mode->size);
	} else {
		ioport_unmap(fw_cfg_dev_base);
		release_region(fw_cfg_mode->start, fw_cfg_mode->size);
	}
}

/* probe and map fw_cfg device */
static int __init fw_cfg_io_probe(void)
{
	char sig[FW_CFG_SIG_SIZE];

	for (fw_cfg_mode = &fw_cfg_modes[0];
	     fw_cfg_mode->start; fw_cfg_mode++) {

		phys_addr_t start = fw_cfg_mode->start;
		uint8_t size = fw_cfg_mode->size;

		/* reserve and map mmio or ioport region */
		if (fw_cfg_mode->is_mmio) {
			if (!request_mem_region(start, size, fw_cfg_mode->name))
				continue;
			fw_cfg_dev_base = ioremap(start, size);
			if (!fw_cfg_dev_base) {
				release_mem_region(start, size);
				continue;
			}
		} else {
			if (!request_region(start, size, fw_cfg_mode->name))
				continue;
			fw_cfg_dev_base = ioport_map(start, size);
			if (!fw_cfg_dev_base) {
				release_region(start, size);
				continue;
			}
		}

		/* set control and data register addresses */
		fw_cfg_dev_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
		fw_cfg_dev_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;

		/* verify fw_cfg device signature */
		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
			/* success, we're done */
			return 0;

		/* clean up before probing next access mode */
		fw_cfg_io_cleanup();
	}

	return -ENODEV;
}

/* fw_cfg_sysfs_entry type */
struct fw_cfg_sysfs_entry {
	struct kobject kobj;
	struct fw_cfg_file f;
	struct list_head list;
};

/* get fw_cfg_sysfs_entry from kobject member */
static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
{
	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
}

/* fw_cfg_sysfs_attribute type */
struct fw_cfg_sysfs_attribute {
	struct attribute attr;
	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
};

/* get fw_cfg_sysfs_attribute from attribute member */
static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
{
	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
}

/* global cache of fw_cfg_sysfs_entry objects */
static LIST_HEAD(fw_cfg_entry_cache);

/* kobjects removed lazily by kernel, mutual exclusion needed */
static DEFINE_SPINLOCK(fw_cfg_cache_lock);

static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
{
	spin_lock(&fw_cfg_cache_lock);
	list_add_tail(&entry->list, &fw_cfg_entry_cache);
	spin_unlock(&fw_cfg_cache_lock);
}

static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
{
	spin_lock(&fw_cfg_cache_lock);
	list_del(&entry->list);
	spin_unlock(&fw_cfg_cache_lock);
}

static void fw_cfg_sysfs_cache_cleanup(void)
{
	struct fw_cfg_sysfs_entry *entry, *next;

	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
		/* will end up invoking fw_cfg_sysfs_cache_delist()
	         * via each object's release() method (i.e. destructor) */
		kobject_put(&entry->kobj);
	}
}

/* default_attrs: per-entry attributes and show methods */

#define FW_CFG_SYSFS_ATTR(_attr) \
struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
	.attr = { .name = __stringify(_attr), .mode = 0400 }, \
	.show = fw_cfg_sysfs_show_##_attr, \
}

static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%d\n", e->f.size);
}

static ssize_t fw_cfg_sysfs_show_select(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%d\n", e->f.select);
}

static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%s\n", e->f.name);
}

static FW_CFG_SYSFS_ATTR(size);
static FW_CFG_SYSFS_ATTR(select);
static FW_CFG_SYSFS_ATTR(name);

static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
	&fw_cfg_sysfs_attr_size.attr,
	&fw_cfg_sysfs_attr_select.attr,
	&fw_cfg_sysfs_attr_name.attr,
	NULL,
};

/* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
				      char *buf)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
	struct fw_cfg_sysfs_attribute *attr = to_attr(a);

	if (!capable(CAP_SYS_ADMIN))
		return -EACCES;

	return attr->show(entry, buf);
}

static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
	.show = fw_cfg_sysfs_attr_show,
};

/* release: destructor, to be called via kobject_put() */
static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

	fw_cfg_sysfs_cache_delist(entry);
	kfree(entry);
}

/* kobj_type: ties together all properties required to register an entry */
static struct kobj_type fw_cfg_sysfs_entry_ktype = {
	.default_attrs = fw_cfg_sysfs_entry_attrs,
	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
	.release = fw_cfg_sysfs_release_entry,
};

/* raw-read method and attribute */
static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
				     struct bin_attribute *bin_attr,
				     char *buf, loff_t pos, size_t count)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

	if (!capable(CAP_SYS_ADMIN))
		return -EACCES;

	if (pos > entry->f.size)
		return -EINVAL;

	if (count > entry->f.size - pos)
		count = entry->f.size - pos;

	fw_cfg_read_blob(entry->f.select, buf, pos, count);
	return count;
}

static struct bin_attribute fw_cfg_sysfs_attr_raw = {
	.attr = { .name = "raw", .mode = 0400 },
	.read = fw_cfg_sysfs_read_raw,
};

/* object set to represent fw_cfg sysfs subfolder */
static struct kset *fw_cfg_kset;

static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
{
	int err;
	struct fw_cfg_sysfs_entry *entry;

	/* allocate new entry */
	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
	if (!entry)
		return -ENOMEM;

	/* set file entry information */
	entry->f.size = be32_to_cpu(f->size);
	entry->f.select = be16_to_cpu(f->select);
	strcpy(entry->f.name, f->name);

	//FIXME: we could walk the '/' separated tokens of the file
	//       name, creating kset subdirectories of fw_cfg_kset,
	//	 then use the last one as the parent for the kobj
	//	 being created below:
	/* register sysfs entry for this file */
	entry->kobj.kset = fw_cfg_kset;
	/* NOTE: any '/' char in filename automatically turned into '!' */
	err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
				   NULL, "%s", entry->f.name);
	if (err)
		goto err_register;

	/* add raw binary content access */
	err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
	if (err)
		goto err_add_raw;

	/* success, add entry to global cache */
	fw_cfg_sysfs_cache_enlist(entry);
	return 0;

err_add_raw:
	kobject_del(&entry->kobj);
err_register:
	kfree(entry);
	return err;
}

static int __init fw_cfg_sysfs_init(void)
{
	int err;

	err = fw_cfg_io_probe();
	if (err)
		return err;

	/* create fw_cfg folder in sysfs */
	fw_cfg_kset = kset_create_and_add("fw_cfg", NULL, firmware_kobj);
	if (!fw_cfg_kset) {
		err = -ENOMEM;
		goto err_sysfs;
	}

	/* process fw_cfg file directory entry, registering each file */
	err = fw_cfg_scan_dir(fw_cfg_register_file);
	if (err)
		goto err_scan;

	/* success */
	pr_debug("fw_cfg: loaded.\n");
	return 0;

err_scan:
	fw_cfg_sysfs_cache_cleanup();
	//FIXME: we could recursively (depth-first) walk the kset
	//	 hierarchy, unregistering from the leaf end toward
	//	 the root, once the actual leaves (cache entries)
	//	 are gone via fw_cfg_sysfs_cache_cleanup() above.
	kset_unregister(fw_cfg_kset);
err_sysfs:
	fw_cfg_io_cleanup();
	return err;
}

static void __exit fw_cfg_sysfs_exit(void)
{
	pr_debug("fw_cfg: unloading.\n");
	fw_cfg_sysfs_cache_cleanup();
	kset_unregister(fw_cfg_kset);	//FIXME: depth-first recursion again
	fw_cfg_io_cleanup();
}

module_init(fw_cfg_sysfs_init);
module_exit(fw_cfg_sysfs_exit);
MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-20 21:19   ` Gabriel L. Somlo
@ 2015-07-20 22:07     ` Laszlo Ersek
  2015-07-25 23:21       ` Gabriel L. Somlo
  0 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2015-07-20 22:07 UTC (permalink / raw)
  To: Gabriel L. Somlo, Matt Fleming
  Cc: gleb, rjones, jordan.l.justen, mdroth, qemu-devel, kraxel,
	pbonzini

On 07/20/15 23:19, Gabriel L. Somlo wrote:

> The code to build nested ksets (represending sub-sub-directories of
> /sys/firmware/fw_cfg/...) and cleaning them up on exit doesn't promise
> to be *too* horrible or bulky, but as I was getting ready to start
> writing it, I realized that, in theory, nothing is to stop the fw_cfg
> device from having files named e.g.
> 
> 	"etc/foo"
> 
> and
> 
> 	"etc/foo/bar"
> 
> That doesn't happen right now on the qemu side, but it could in
> theory, and I have no idea how I'd deal with the file/directory
> duality of "foo" in this situation, short of refusing to load the
> module, or leaving out one fw_cfg file or the other. Unless there's
> a way around this, I think it's safer to either stick with the default
> 's/\//!/g' scheme of the kobject library, or implement something like
> systemd's string escaping scheme that's been suggested earlier in this
> thread...
> 
> Or maybe leaving out the occasional poorly-named file is still better
> than giving up the ability to run find on the fw_cfg sysfs folder ?

I assume once you have "etc/foo" in place (ie. as part of the
filesystem), where "foo" is not a directory, then the attempt to create
"etc/foo/bar" will cause whatever kernel API is in use to return ENOTDIR.

Since you are checking return values anyway :), just barf a warning into
syslog, and either skip the fw_cfg file, or abort module loading
completely, as you see fit. This is a user error -- you should punish
the user for it, not yourself. :)

... I just love "find", sorry. :) Anyway, I don't envision myself as a
user of this feature any time soon, so please feel free to ignore me.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-20 22:07     ` Laszlo Ersek
@ 2015-07-25 23:21       ` Gabriel L. Somlo
  2015-07-26  9:37         ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Gabriel L. Somlo @ 2015-07-25 23:21 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Matt Fleming, mdroth, rjones, jordan.l.justen, qemu-devel, gleb,
	kraxel, pbonzini

On Tue, Jul 21, 2015 at 12:07:08AM +0200, Laszlo Ersek wrote:
> On 07/20/15 23:19, Gabriel L. Somlo wrote:
> 
> ... I just love "find", sorry. :) Anyway, I don't envision myself as a
> user of this feature any time soon, so please feel free to ignore me.

How about this: I create /sys/firmware/fw_cfg/by_select with *all*
entries, named after their selector key, in a flat folder.

Then, I also create /sys/firmware/fw_cfg/by_name/... where I try to
build the folder hierarchy given by whatever the file names end up
being, *on a best effort basis*, i.e. I may fail to create *some* of
them, if they have brain-damaged names :)

The kernel will issue big scary warnings if a symlink already exists
where I'm trying to add a new subdirectory of the same name, or the
other way around.

The whole thing is below, tested and working on x86 and arm (so one
example of IOPort and one example of MMIO).

I'm out all of next week, but will submit this to the kernel (and cc
Greg-K-H as advised by Matt) once I get back.

In the mean time, any final words of wisdom, advice, testing, etc.
much much appreciated !

Cheers,
--Gabriel


/*
 * drivers/firmware/fw_cfg.c
 *
 * Expose entries from QEMU's firmware configuration (fw_cfg) device in
 * sysfs (read-only, under "/sys/firmware/fw_cfg/...").
 *
 * Copyright 2015 Gabriel L. Somlo <somlo@cmu.edu>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License v2 as published
 * by the Free Software Foundation.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 * for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA
 */

#include <linux/module.h>
#include <linux/capability.h>
#include <linux/slab.h>
#include <linux/io.h>
#include <linux/ioport.h>
#include <linux/ctype.h>

/* selector values for "well-known" fw_cfg entries */
#define FW_CFG_SIGNATURE  0x00
#define FW_CFG_FILE_DIR   0x19

/* size in bytes of fw_cfg signature */
#define FW_CFG_SIG_SIZE 4

/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
#define FW_CFG_MAX_FILE_PATH 56

/* fw_cfg file directory entry type */
struct fw_cfg_file {
	uint32_t size;
	uint16_t select;
	uint16_t reserved;
	char name[FW_CFG_MAX_FILE_PATH];
};

/* fw_cfg device i/o access options type */
struct fw_cfg_access {
	phys_addr_t start;
	uint8_t size;
	uint8_t ctrl_offset;
	uint8_t data_offset;
	bool is_mmio;
	const char *name;
};

/* fw_cfg device i/o access available options for known architectures */
static struct fw_cfg_access fw_cfg_modes[] = {
	{ 0x510,       2, 0, 1, false, "fw_cfg on i386, sun4u" },
	{ 0x9020000,  10, 8, 0,  true, "fw_cfg on arm" },
	{ 0xd00000510, 3, 0, 2,  true, "fw_cfg on sun4m" },
	{ 0xf0000510,  3, 0, 2,  true, "fw_cfg on ppc/mac" },
	{ }
};

/* fw_cfg device i/o currently selected option set */
static struct fw_cfg_access *fw_cfg_mode;

/* fw_cfg device i/o register addresses */
static void __iomem *fw_cfg_dev_base;
static void __iomem *fw_cfg_dev_ctrl;
static void __iomem *fw_cfg_dev_data;

/* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
static DEFINE_MUTEX(fw_cfg_dev_lock);

/* pick apropriate endianness for selector key */
static inline uint16_t fw_cfg_sel_endianness(uint16_t select)
{
	return fw_cfg_mode->is_mmio ? cpu_to_be16(select) : cpu_to_le16(select);
}

/* type for fw_cfg "directory scan" visitor/callback function */
typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);

/* run a given callback on each fw_cfg directory entry */
static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
{
	int ret = 0;
	uint32_t count, i;
	struct fw_cfg_file f;

	mutex_lock(&fw_cfg_dev_lock);
	iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_dev_ctrl);
	ioread8_rep(fw_cfg_dev_data, &count, sizeof(count));
	for (i = 0; i < be32_to_cpu(count); i++) {
		ioread8_rep(fw_cfg_dev_data, &f, sizeof(f));
		ret = callback(&f);
		if (ret)
			break;
	}
	mutex_unlock(&fw_cfg_dev_lock);
	return ret;
}

/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
static inline void fw_cfg_read_blob(uint16_t select,
				     void *buf, loff_t pos, size_t count)
{
	mutex_lock(&fw_cfg_dev_lock);
	iowrite16(fw_cfg_sel_endianness(select), fw_cfg_dev_ctrl);
	while (pos-- > 0)
		ioread8(fw_cfg_dev_data);
	ioread8_rep(fw_cfg_dev_data, buf, count);
	mutex_unlock(&fw_cfg_dev_lock);
}

/* clean up fw_cfg device i/o setup */
static void fw_cfg_io_cleanup(void)
{
	if (fw_cfg_mode->is_mmio) {
		iounmap(fw_cfg_dev_base);
		release_mem_region(fw_cfg_mode->start, fw_cfg_mode->size);
	} else {
		ioport_unmap(fw_cfg_dev_base);
		release_region(fw_cfg_mode->start, fw_cfg_mode->size);
	}
}

/* probe and map fw_cfg device */
static int __init fw_cfg_io_probe(void)
{
	char sig[FW_CFG_SIG_SIZE];

	for (fw_cfg_mode = &fw_cfg_modes[0];
	     fw_cfg_mode->start; fw_cfg_mode++) {

		phys_addr_t start = fw_cfg_mode->start;
		uint8_t size = fw_cfg_mode->size;

		/* reserve and map mmio or ioport region */
		if (fw_cfg_mode->is_mmio) {
			if (!request_mem_region(start, size, fw_cfg_mode->name))
				continue;
			fw_cfg_dev_base = ioremap(start, size);
			if (!fw_cfg_dev_base) {
				release_mem_region(start, size);
				continue;
			}
		} else {
			if (!request_region(start, size, fw_cfg_mode->name))
				continue;
			fw_cfg_dev_base = ioport_map(start, size);
			if (!fw_cfg_dev_base) {
				release_region(start, size);
				continue;
			}
		}

		/* set control and data register addresses */
		fw_cfg_dev_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
		fw_cfg_dev_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;

		/* verify fw_cfg device signature */
		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
			/* success, we're done */
			return 0;

		/* clean up before probing next access mode */
		fw_cfg_io_cleanup();
	}

	return -ENODEV;
}

/* fw_cfg_sysfs_entry type */
struct fw_cfg_sysfs_entry {
	struct kobject kobj;
	struct fw_cfg_file f;
	struct list_head list;
};

/* get fw_cfg_sysfs_entry from kobject member */
static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
{
	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
}

/* fw_cfg_sysfs_attribute type */
struct fw_cfg_sysfs_attribute {
	struct attribute attr;
	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
};

/* get fw_cfg_sysfs_attribute from attribute member */
static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
{
	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
}

/* global cache of fw_cfg_sysfs_entry objects */
static LIST_HEAD(fw_cfg_entry_cache);

/* kobjects removed lazily by kernel, mutual exclusion needed */
static DEFINE_SPINLOCK(fw_cfg_cache_lock);

static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
{
	spin_lock(&fw_cfg_cache_lock);
	list_add_tail(&entry->list, &fw_cfg_entry_cache);
	spin_unlock(&fw_cfg_cache_lock);
}

static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
{
	spin_lock(&fw_cfg_cache_lock);
	list_del(&entry->list);
	spin_unlock(&fw_cfg_cache_lock);
}

static void fw_cfg_sysfs_cache_cleanup(void)
{
	struct fw_cfg_sysfs_entry *entry, *next;

	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
		/* will end up invoking fw_cfg_sysfs_cache_delist()
	         * via each object's release() method (i.e. destructor) */
		kobject_put(&entry->kobj);
	}
}

/* default_attrs: per-entry attributes and show methods */

#define FW_CFG_SYSFS_ATTR(_attr) \
struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
	.attr = { .name = __stringify(_attr), .mode = 0400 }, \
	.show = fw_cfg_sysfs_show_##_attr, \
}

static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%d\n", e->f.size);
}

static ssize_t fw_cfg_sysfs_show_select(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%d\n", e->f.select);
}

static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
{
	return sprintf(buf, "%s\n", e->f.name);
}

static FW_CFG_SYSFS_ATTR(size);
static FW_CFG_SYSFS_ATTR(select);
static FW_CFG_SYSFS_ATTR(name);

static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
	&fw_cfg_sysfs_attr_size.attr,
	&fw_cfg_sysfs_attr_select.attr,
	&fw_cfg_sysfs_attr_name.attr,
	NULL,
};

/* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
				      char *buf)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
	struct fw_cfg_sysfs_attribute *attr = to_attr(a);

	if (!capable(CAP_SYS_ADMIN))
		return -EACCES;

	return attr->show(entry, buf);
}

static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
	.show = fw_cfg_sysfs_attr_show,
};

/* release: destructor, to be called via kobject_put() */
static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

	fw_cfg_sysfs_cache_delist(entry);
	kfree(entry);
}

/* kobj_type: ties together all properties required to register an entry */
static struct kobj_type fw_cfg_sysfs_entry_ktype = {
	.default_attrs = fw_cfg_sysfs_entry_attrs,
	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
	.release = fw_cfg_sysfs_release_entry,
};

/* raw-read method and attribute */
static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
				     struct bin_attribute *bin_attr,
				     char *buf, loff_t pos, size_t count)
{
	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);

	if (!capable(CAP_SYS_ADMIN))
		return -EACCES;

	if (pos > entry->f.size)
		return -EINVAL;

	if (count > entry->f.size - pos)
		count = entry->f.size - pos;

	fw_cfg_read_blob(entry->f.select, buf, pos, count);
	return count;
}

static struct bin_attribute fw_cfg_sysfs_attr_raw = {
	.attr = { .name = "raw", .mode = 0400 },
	.read = fw_cfg_sysfs_read_raw,
};

/*
 * kset_find_obj() is not EXPORTed from lib/kobject.c, clone it here for now.
 * FIXME: patch adding EXPORT in lib/kobject.c tested working, but then
 * 	  building this module for older kernels wouldn't work anymore...
 */
struct kobject *my_kset_find_obj(struct kset *kset, const char *name)
{
        struct kobject *k;
        struct kobject *ret = NULL;

        spin_lock(&kset->list_lock);

        list_for_each_entry(k, &kset->list, entry) {
                if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
                        ret = kobject_get(k);
                        break;
                }
        }

        spin_unlock(&kset->list_lock);
        return ret;
}

/*
 * Create a kset subdirectory matching each '/' delimited dirname token
 * in 'name', starting with sysfs kset/folder 'dir'; At the end, create
 * a symlink directed at the given 'target'.
 * NOTE: We do this on a best-effort basis, since 'name' is not guaranteed
 * to be a well-behaved path name. Whenever a symlink vs. kset directory
 * name collision occurs, the kernel will issue big scary warnings while
 * refusing to add the offending link or directory. We follow up with our
 * own, slightly less scary error messages explaining the situation :)
 */
static void __init fw_cfg_build_symlink(struct kset *dir,
					struct kobject *target,
					const char *name)
{
	struct kset *subdir;
	struct kobject *ko;
	char *name_copy, *p, *tok;

	if (!dir || !target || !name || !*name) {
		pr_err("fw_cfg: invalid argument to fw_cfg_build_symlink\n");
		return;
	}

	/* clone a copy of name for parsing */
	name_copy = p = kstrdup(name, GFP_KERNEL);
	if (!name_copy) {
		pr_err("fw_cfg: kstrdup failed while "
		       "creating symlink %s\n", name);
		return;
	}

	/* create folders for each dirname token, then symlink for basename */
	while ((tok = strsep(&p, "/")) && *tok) {

		/* last (basename) token? If so, add symlink here */
		if (!p || !*p) {
			if (sysfs_create_link(&dir->kobj, target, tok)) {
				pr_err("fw_cfg: can't create symlink %s "
				       "of %s\n", tok, name);
			}
			break;
		}

		/* does the current dir contain an item named after tok ? */
		ko = my_kset_find_obj(dir, tok);
		if (ko) {
			/* drop reference added by kset_find_obj */
			kobject_put(ko);

			/* ko MUST be a kset - we're about to use it as one ! */
			if (ko->ktype != dir->kobj.ktype) {
				pr_err("fw_cfg: non-folder token %s already "
				       "exists while creating %s\n", tok, name);
				break;
			}

			/* descend into already existing subdirectory */
			dir = to_kset(ko);
		} else {
			/* create new subdirectory kset */
			subdir = kzalloc(sizeof(struct kset), GFP_KERNEL);
			if (!subdir) {
				pr_err("fw_cfg: alloc error creating subdir %s "
				       "of %s\n", tok, name);
				break;
			}
			subdir->kobj.kset = dir;
			subdir->kobj.ktype = dir->kobj.ktype;
			if (kobject_set_name(&subdir->kobj, "%s", tok)) {
				kfree(subdir);
				pr_err("fw_cfg: set_name failed on subdir %s "
				       "of %s\n", tok, name);
				break;
			}
			if (kset_register(subdir)) {
				kfree(subdir);
				pr_err("fw_cfg: register failed on subdir %s "
				       "of %s\n", tok, name);
				break;
			}

			/* descend into newly created subdirectory */
			dir = subdir;
		}
	}

	/* we're done with cloned copy of name */
	kfree(name_copy);
}

/* recursively unregister fw_cfg/by_name/ kset directory tree */
static void fw_cfg_kset_unregister_recursive(struct kset *kset)
{
	struct kobject *k, *next;

	list_for_each_entry_safe(k, next, &kset->list, entry)
		/* all set members are ksets too, but check just in case... */
		if (k->ktype == kset->kobj.ktype)
			fw_cfg_kset_unregister_recursive(to_kset(k));

	/* symlinks are cleanly and automatically removed with the directory */
	kset_unregister(kset);
}

/* kobjects & kset representing top-level, by_select, and by_name folders */
static struct kobject *fw_cfg_top_ko;
static struct kobject *fw_cfg_sel_ko;
static struct kset *fw_cfg_fname_kset;

/* callback function to register an individual fw_cfg file */
static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
{
	int err;
	struct fw_cfg_sysfs_entry *entry;

	/* allocate new entry */
	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
	if (!entry)
		return -ENOMEM;

	/* set file entry information */
	entry->f.size = be32_to_cpu(f->size);
	entry->f.select = be16_to_cpu(f->select);
	strcpy(entry->f.name, f->name);

	/* register entry under "/sys/firmware/fw_cfg/by_select/" */
	err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
				   fw_cfg_sel_ko, "%d", entry->f.select);
	if (err)
		goto err_register;

	/* add raw binary content access */
	err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
	if (err)
		goto err_add_raw;

	/* try adding "/sys/firmware/fw_cfg/by_name/" symlink (best-effort) */
	fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->f.name);

	/* success, add entry to global cache */
	fw_cfg_sysfs_cache_enlist(entry);
	return 0;

err_add_raw:
	kobject_del(&entry->kobj);
err_register:
	kfree(entry);
	return err;
}

/* unregister top-level or by_select folder */
static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
{
	kobject_del(kobj);
	kobject_put(kobj);
}

static int __init fw_cfg_sysfs_init(void)
{
	int err;

	/* probe for the fw_cfg "hardware" */
	err = fw_cfg_io_probe();
	if (err)
		return err;

	/* create sysfs toplevel, by_select, and by_name folders */
	err = -ENOMEM;
	fw_cfg_top_ko = kobject_create_and_add("fw_cfg", firmware_kobj);
	if (!fw_cfg_top_ko)
		goto err_sysfs_top;
	fw_cfg_sel_ko = kobject_create_and_add("by_select", fw_cfg_top_ko);
	if (!fw_cfg_sel_ko)
		goto err_sysfs_sel;
	fw_cfg_fname_kset = kset_create_and_add("by_name", NULL, fw_cfg_top_ko);
	if (!fw_cfg_fname_kset)
		goto err_sysfs_name;

	/* process fw_cfg file directory entry, registering each file */
	err = fw_cfg_scan_dir(fw_cfg_register_file);
	if (err)
		goto err_scan;

	/* success */
	pr_debug("fw_cfg: loaded.\n");
	return 0;

err_scan:
	fw_cfg_sysfs_cache_cleanup();
	fw_cfg_kset_unregister_recursive(fw_cfg_fname_kset);
err_sysfs_name:
	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
err_sysfs_sel:
	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
err_sysfs_top:
	fw_cfg_io_cleanup();
	return err;
}

static void __exit fw_cfg_sysfs_exit(void)
{
	pr_debug("fw_cfg: unloading.\n");
	fw_cfg_sysfs_cache_cleanup();
	fw_cfg_kset_unregister_recursive(fw_cfg_fname_kset);
	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
	fw_cfg_io_cleanup();
}

module_init(fw_cfg_sysfs_init);
module_exit(fw_cfg_sysfs_exit);
MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
  2015-07-25 23:21       ` Gabriel L. Somlo
@ 2015-07-26  9:37         ` Laszlo Ersek
  0 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2015-07-26  9:37 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Matt Fleming, mdroth, rjones, jordan.l.justen, qemu-devel, gleb,
	kraxel, pbonzini

On 07/26/15 01:21, Gabriel L. Somlo wrote:
> On Tue, Jul 21, 2015 at 12:07:08AM +0200, Laszlo Ersek wrote:
>> On 07/20/15 23:19, Gabriel L. Somlo wrote:
>>
>> ... I just love "find", sorry. :) Anyway, I don't envision myself as a
>> user of this feature any time soon, so please feel free to ignore me.
> 
> How about this: I create /sys/firmware/fw_cfg/by_select with *all*
> entries, named after their selector key, in a flat folder.

I think this is nice. The fw_cfg directory blob itself is available at a
fixed key (FW_CFG_FILE_DIR, 0x0019, and it is even documented :)), so
for a fool-proof, non-interactive lookup, a C language program can
search the 0x0019 blob, grab the relevant key value, and then open
/sys/firmware/fw_cfg/by_select/<THAT_KEY>.

For interactive use or scripting, "by_name" can be used with "find" etc.
I think this is really nice.

Thanks
Laszlo

> Then, I also create /sys/firmware/fw_cfg/by_name/... where I try to
> build the folder hierarchy given by whatever the file names end up
> being, *on a best effort basis*, i.e. I may fail to create *some* of
> them, if they have brain-damaged names :)
> 
> The kernel will issue big scary warnings if a symlink already exists
> where I'm trying to add a new subdirectory of the same name, or the
> other way around.
> 
> The whole thing is below, tested and working on x86 and arm (so one
> example of IOPort and one example of MMIO).
> 
> I'm out all of next week, but will submit this to the kernel (and cc
> Greg-K-H as advised by Matt) once I get back.
> 
> In the mean time, any final words of wisdom, advice, testing, etc.
> much much appreciated !
> 
> Cheers,
> --Gabriel
> 
> 
> /*
>  * drivers/firmware/fw_cfg.c
>  *
>  * Expose entries from QEMU's firmware configuration (fw_cfg) device in
>  * sysfs (read-only, under "/sys/firmware/fw_cfg/...").
>  *
>  * Copyright 2015 Gabriel L. Somlo <somlo@cmu.edu>
>  *
>  * This program is free software; you can redistribute it and/or modify
>  * it under the terms of the GNU General Public License v2 as published
>  * by the Free Software Foundation.
>  *
>  * This program is distributed in the hope that it will be useful, but
>  * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
>  * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>  * for more details.
>  *
>  * You should have received a copy of the GNU General Public License along
>  * with this program; if not, write to the Free Software Foundation, Inc.,
>  * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA
>  */
> 
> #include <linux/module.h>
> #include <linux/capability.h>
> #include <linux/slab.h>
> #include <linux/io.h>
> #include <linux/ioport.h>
> #include <linux/ctype.h>
> 
> /* selector values for "well-known" fw_cfg entries */
> #define FW_CFG_SIGNATURE  0x00
> #define FW_CFG_FILE_DIR   0x19
> 
> /* size in bytes of fw_cfg signature */
> #define FW_CFG_SIG_SIZE 4
> 
> /* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> #define FW_CFG_MAX_FILE_PATH 56
> 
> /* fw_cfg file directory entry type */
> struct fw_cfg_file {
> 	uint32_t size;
> 	uint16_t select;
> 	uint16_t reserved;
> 	char name[FW_CFG_MAX_FILE_PATH];
> };
> 
> /* fw_cfg device i/o access options type */
> struct fw_cfg_access {
> 	phys_addr_t start;
> 	uint8_t size;
> 	uint8_t ctrl_offset;
> 	uint8_t data_offset;
> 	bool is_mmio;
> 	const char *name;
> };
> 
> /* fw_cfg device i/o access available options for known architectures */
> static struct fw_cfg_access fw_cfg_modes[] = {
> 	{ 0x510,       2, 0, 1, false, "fw_cfg on i386, sun4u" },
> 	{ 0x9020000,  10, 8, 0,  true, "fw_cfg on arm" },
> 	{ 0xd00000510, 3, 0, 2,  true, "fw_cfg on sun4m" },
> 	{ 0xf0000510,  3, 0, 2,  true, "fw_cfg on ppc/mac" },
> 	{ }
> };
> 
> /* fw_cfg device i/o currently selected option set */
> static struct fw_cfg_access *fw_cfg_mode;
> 
> /* fw_cfg device i/o register addresses */
> static void __iomem *fw_cfg_dev_base;
> static void __iomem *fw_cfg_dev_ctrl;
> static void __iomem *fw_cfg_dev_data;
> 
> /* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
> static DEFINE_MUTEX(fw_cfg_dev_lock);
> 
> /* pick apropriate endianness for selector key */
> static inline uint16_t fw_cfg_sel_endianness(uint16_t select)
> {
> 	return fw_cfg_mode->is_mmio ? cpu_to_be16(select) : cpu_to_le16(select);
> }
> 
> /* type for fw_cfg "directory scan" visitor/callback function */
> typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);
> 
> /* run a given callback on each fw_cfg directory entry */
> static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
> {
> 	int ret = 0;
> 	uint32_t count, i;
> 	struct fw_cfg_file f;
> 
> 	mutex_lock(&fw_cfg_dev_lock);
> 	iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_dev_ctrl);
> 	ioread8_rep(fw_cfg_dev_data, &count, sizeof(count));
> 	for (i = 0; i < be32_to_cpu(count); i++) {
> 		ioread8_rep(fw_cfg_dev_data, &f, sizeof(f));
> 		ret = callback(&f);
> 		if (ret)
> 			break;
> 	}
> 	mutex_unlock(&fw_cfg_dev_lock);
> 	return ret;
> }
> 
> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static inline void fw_cfg_read_blob(uint16_t select,
> 				     void *buf, loff_t pos, size_t count)
> {
> 	mutex_lock(&fw_cfg_dev_lock);
> 	iowrite16(fw_cfg_sel_endianness(select), fw_cfg_dev_ctrl);
> 	while (pos-- > 0)
> 		ioread8(fw_cfg_dev_data);
> 	ioread8_rep(fw_cfg_dev_data, buf, count);
> 	mutex_unlock(&fw_cfg_dev_lock);
> }
> 
> /* clean up fw_cfg device i/o setup */
> static void fw_cfg_io_cleanup(void)
> {
> 	if (fw_cfg_mode->is_mmio) {
> 		iounmap(fw_cfg_dev_base);
> 		release_mem_region(fw_cfg_mode->start, fw_cfg_mode->size);
> 	} else {
> 		ioport_unmap(fw_cfg_dev_base);
> 		release_region(fw_cfg_mode->start, fw_cfg_mode->size);
> 	}
> }
> 
> /* probe and map fw_cfg device */
> static int __init fw_cfg_io_probe(void)
> {
> 	char sig[FW_CFG_SIG_SIZE];
> 
> 	for (fw_cfg_mode = &fw_cfg_modes[0];
> 	     fw_cfg_mode->start; fw_cfg_mode++) {
> 
> 		phys_addr_t start = fw_cfg_mode->start;
> 		uint8_t size = fw_cfg_mode->size;
> 
> 		/* reserve and map mmio or ioport region */
> 		if (fw_cfg_mode->is_mmio) {
> 			if (!request_mem_region(start, size, fw_cfg_mode->name))
> 				continue;
> 			fw_cfg_dev_base = ioremap(start, size);
> 			if (!fw_cfg_dev_base) {
> 				release_mem_region(start, size);
> 				continue;
> 			}
> 		} else {
> 			if (!request_region(start, size, fw_cfg_mode->name))
> 				continue;
> 			fw_cfg_dev_base = ioport_map(start, size);
> 			if (!fw_cfg_dev_base) {
> 				release_region(start, size);
> 				continue;
> 			}
> 		}
> 
> 		/* set control and data register addresses */
> 		fw_cfg_dev_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
> 		fw_cfg_dev_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;
> 
> 		/* verify fw_cfg device signature */
> 		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> 		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
> 			/* success, we're done */
> 			return 0;
> 
> 		/* clean up before probing next access mode */
> 		fw_cfg_io_cleanup();
> 	}
> 
> 	return -ENODEV;
> }
> 
> /* fw_cfg_sysfs_entry type */
> struct fw_cfg_sysfs_entry {
> 	struct kobject kobj;
> 	struct fw_cfg_file f;
> 	struct list_head list;
> };
> 
> /* get fw_cfg_sysfs_entry from kobject member */
> static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
> {
> 	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
> }
> 
> /* fw_cfg_sysfs_attribute type */
> struct fw_cfg_sysfs_attribute {
> 	struct attribute attr;
> 	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
> };
> 
> /* get fw_cfg_sysfs_attribute from attribute member */
> static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
> {
> 	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
> }
> 
> /* global cache of fw_cfg_sysfs_entry objects */
> static LIST_HEAD(fw_cfg_entry_cache);
> 
> /* kobjects removed lazily by kernel, mutual exclusion needed */
> static DEFINE_SPINLOCK(fw_cfg_cache_lock);
> 
> static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
> {
> 	spin_lock(&fw_cfg_cache_lock);
> 	list_add_tail(&entry->list, &fw_cfg_entry_cache);
> 	spin_unlock(&fw_cfg_cache_lock);
> }
> 
> static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
> {
> 	spin_lock(&fw_cfg_cache_lock);
> 	list_del(&entry->list);
> 	spin_unlock(&fw_cfg_cache_lock);
> }
> 
> static void fw_cfg_sysfs_cache_cleanup(void)
> {
> 	struct fw_cfg_sysfs_entry *entry, *next;
> 
> 	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
> 		/* will end up invoking fw_cfg_sysfs_cache_delist()
> 	         * via each object's release() method (i.e. destructor) */
> 		kobject_put(&entry->kobj);
> 	}
> }
> 
> /* default_attrs: per-entry attributes and show methods */
> 
> #define FW_CFG_SYSFS_ATTR(_attr) \
> struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
> 	.attr = { .name = __stringify(_attr), .mode = 0400 }, \
> 	.show = fw_cfg_sysfs_show_##_attr, \
> }
> 
> static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
> {
> 	return sprintf(buf, "%d\n", e->f.size);
> }
> 
> static ssize_t fw_cfg_sysfs_show_select(struct fw_cfg_sysfs_entry *e, char *buf)
> {
> 	return sprintf(buf, "%d\n", e->f.select);
> }
> 
> static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
> {
> 	return sprintf(buf, "%s\n", e->f.name);
> }
> 
> static FW_CFG_SYSFS_ATTR(size);
> static FW_CFG_SYSFS_ATTR(select);
> static FW_CFG_SYSFS_ATTR(name);
> 
> static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
> 	&fw_cfg_sysfs_attr_size.attr,
> 	&fw_cfg_sysfs_attr_select.attr,
> 	&fw_cfg_sysfs_attr_name.attr,
> 	NULL,
> };
> 
> /* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
> static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
> 				      char *buf)
> {
> 	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> 	struct fw_cfg_sysfs_attribute *attr = to_attr(a);
> 
> 	if (!capable(CAP_SYS_ADMIN))
> 		return -EACCES;
> 
> 	return attr->show(entry, buf);
> }
> 
> static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
> 	.show = fw_cfg_sysfs_attr_show,
> };
> 
> /* release: destructor, to be called via kobject_put() */
> static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
> {
> 	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> 
> 	fw_cfg_sysfs_cache_delist(entry);
> 	kfree(entry);
> }
> 
> /* kobj_type: ties together all properties required to register an entry */
> static struct kobj_type fw_cfg_sysfs_entry_ktype = {
> 	.default_attrs = fw_cfg_sysfs_entry_attrs,
> 	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
> 	.release = fw_cfg_sysfs_release_entry,
> };
> 
> /* raw-read method and attribute */
> static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> 				     struct bin_attribute *bin_attr,
> 				     char *buf, loff_t pos, size_t count)
> {
> 	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> 
> 	if (!capable(CAP_SYS_ADMIN))
> 		return -EACCES;
> 
> 	if (pos > entry->f.size)
> 		return -EINVAL;
> 
> 	if (count > entry->f.size - pos)
> 		count = entry->f.size - pos;
> 
> 	fw_cfg_read_blob(entry->f.select, buf, pos, count);
> 	return count;
> }
> 
> static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> 	.attr = { .name = "raw", .mode = 0400 },
> 	.read = fw_cfg_sysfs_read_raw,
> };
> 
> /*
>  * kset_find_obj() is not EXPORTed from lib/kobject.c, clone it here for now.
>  * FIXME: patch adding EXPORT in lib/kobject.c tested working, but then
>  * 	  building this module for older kernels wouldn't work anymore...
>  */
> struct kobject *my_kset_find_obj(struct kset *kset, const char *name)
> {
>         struct kobject *k;
>         struct kobject *ret = NULL;
> 
>         spin_lock(&kset->list_lock);
> 
>         list_for_each_entry(k, &kset->list, entry) {
>                 if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
>                         ret = kobject_get(k);
>                         break;
>                 }
>         }
> 
>         spin_unlock(&kset->list_lock);
>         return ret;
> }
> 
> /*
>  * Create a kset subdirectory matching each '/' delimited dirname token
>  * in 'name', starting with sysfs kset/folder 'dir'; At the end, create
>  * a symlink directed at the given 'target'.
>  * NOTE: We do this on a best-effort basis, since 'name' is not guaranteed
>  * to be a well-behaved path name. Whenever a symlink vs. kset directory
>  * name collision occurs, the kernel will issue big scary warnings while
>  * refusing to add the offending link or directory. We follow up with our
>  * own, slightly less scary error messages explaining the situation :)
>  */
> static void __init fw_cfg_build_symlink(struct kset *dir,
> 					struct kobject *target,
> 					const char *name)
> {
> 	struct kset *subdir;
> 	struct kobject *ko;
> 	char *name_copy, *p, *tok;
> 
> 	if (!dir || !target || !name || !*name) {
> 		pr_err("fw_cfg: invalid argument to fw_cfg_build_symlink\n");
> 		return;
> 	}
> 
> 	/* clone a copy of name for parsing */
> 	name_copy = p = kstrdup(name, GFP_KERNEL);
> 	if (!name_copy) {
> 		pr_err("fw_cfg: kstrdup failed while "
> 		       "creating symlink %s\n", name);
> 		return;
> 	}
> 
> 	/* create folders for each dirname token, then symlink for basename */
> 	while ((tok = strsep(&p, "/")) && *tok) {
> 
> 		/* last (basename) token? If so, add symlink here */
> 		if (!p || !*p) {
> 			if (sysfs_create_link(&dir->kobj, target, tok)) {
> 				pr_err("fw_cfg: can't create symlink %s "
> 				       "of %s\n", tok, name);
> 			}
> 			break;
> 		}
> 
> 		/* does the current dir contain an item named after tok ? */
> 		ko = my_kset_find_obj(dir, tok);
> 		if (ko) {
> 			/* drop reference added by kset_find_obj */
> 			kobject_put(ko);
> 
> 			/* ko MUST be a kset - we're about to use it as one ! */
> 			if (ko->ktype != dir->kobj.ktype) {
> 				pr_err("fw_cfg: non-folder token %s already "
> 				       "exists while creating %s\n", tok, name);
> 				break;
> 			}
> 
> 			/* descend into already existing subdirectory */
> 			dir = to_kset(ko);
> 		} else {
> 			/* create new subdirectory kset */
> 			subdir = kzalloc(sizeof(struct kset), GFP_KERNEL);
> 			if (!subdir) {
> 				pr_err("fw_cfg: alloc error creating subdir %s "
> 				       "of %s\n", tok, name);
> 				break;
> 			}
> 			subdir->kobj.kset = dir;
> 			subdir->kobj.ktype = dir->kobj.ktype;
> 			if (kobject_set_name(&subdir->kobj, "%s", tok)) {
> 				kfree(subdir);
> 				pr_err("fw_cfg: set_name failed on subdir %s "
> 				       "of %s\n", tok, name);
> 				break;
> 			}
> 			if (kset_register(subdir)) {
> 				kfree(subdir);
> 				pr_err("fw_cfg: register failed on subdir %s "
> 				       "of %s\n", tok, name);
> 				break;
> 			}
> 
> 			/* descend into newly created subdirectory */
> 			dir = subdir;
> 		}
> 	}
> 
> 	/* we're done with cloned copy of name */
> 	kfree(name_copy);
> }
> 
> /* recursively unregister fw_cfg/by_name/ kset directory tree */
> static void fw_cfg_kset_unregister_recursive(struct kset *kset)
> {
> 	struct kobject *k, *next;
> 
> 	list_for_each_entry_safe(k, next, &kset->list, entry)
> 		/* all set members are ksets too, but check just in case... */
> 		if (k->ktype == kset->kobj.ktype)
> 			fw_cfg_kset_unregister_recursive(to_kset(k));
> 
> 	/* symlinks are cleanly and automatically removed with the directory */
> 	kset_unregister(kset);
> }
> 
> /* kobjects & kset representing top-level, by_select, and by_name folders */
> static struct kobject *fw_cfg_top_ko;
> static struct kobject *fw_cfg_sel_ko;
> static struct kset *fw_cfg_fname_kset;
> 
> /* callback function to register an individual fw_cfg file */
> static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
> {
> 	int err;
> 	struct fw_cfg_sysfs_entry *entry;
> 
> 	/* allocate new entry */
> 	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> 	if (!entry)
> 		return -ENOMEM;
> 
> 	/* set file entry information */
> 	entry->f.size = be32_to_cpu(f->size);
> 	entry->f.select = be16_to_cpu(f->select);
> 	strcpy(entry->f.name, f->name);
> 
> 	/* register entry under "/sys/firmware/fw_cfg/by_select/" */
> 	err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
> 				   fw_cfg_sel_ko, "%d", entry->f.select);
> 	if (err)
> 		goto err_register;
> 
> 	/* add raw binary content access */
> 	err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
> 	if (err)
> 		goto err_add_raw;
> 
> 	/* try adding "/sys/firmware/fw_cfg/by_name/" symlink (best-effort) */
> 	fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->f.name);
> 
> 	/* success, add entry to global cache */
> 	fw_cfg_sysfs_cache_enlist(entry);
> 	return 0;
> 
> err_add_raw:
> 	kobject_del(&entry->kobj);
> err_register:
> 	kfree(entry);
> 	return err;
> }
> 
> /* unregister top-level or by_select folder */
> static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
> {
> 	kobject_del(kobj);
> 	kobject_put(kobj);
> }
> 
> static int __init fw_cfg_sysfs_init(void)
> {
> 	int err;
> 
> 	/* probe for the fw_cfg "hardware" */
> 	err = fw_cfg_io_probe();
> 	if (err)
> 		return err;
> 
> 	/* create sysfs toplevel, by_select, and by_name folders */
> 	err = -ENOMEM;
> 	fw_cfg_top_ko = kobject_create_and_add("fw_cfg", firmware_kobj);
> 	if (!fw_cfg_top_ko)
> 		goto err_sysfs_top;
> 	fw_cfg_sel_ko = kobject_create_and_add("by_select", fw_cfg_top_ko);
> 	if (!fw_cfg_sel_ko)
> 		goto err_sysfs_sel;
> 	fw_cfg_fname_kset = kset_create_and_add("by_name", NULL, fw_cfg_top_ko);
> 	if (!fw_cfg_fname_kset)
> 		goto err_sysfs_name;
> 
> 	/* process fw_cfg file directory entry, registering each file */
> 	err = fw_cfg_scan_dir(fw_cfg_register_file);
> 	if (err)
> 		goto err_scan;
> 
> 	/* success */
> 	pr_debug("fw_cfg: loaded.\n");
> 	return 0;
> 
> err_scan:
> 	fw_cfg_sysfs_cache_cleanup();
> 	fw_cfg_kset_unregister_recursive(fw_cfg_fname_kset);
> err_sysfs_name:
> 	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
> err_sysfs_sel:
> 	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
> err_sysfs_top:
> 	fw_cfg_io_cleanup();
> 	return err;
> }
> 
> static void __exit fw_cfg_sysfs_exit(void)
> {
> 	pr_debug("fw_cfg: unloading.\n");
> 	fw_cfg_sysfs_cache_cleanup();
> 	fw_cfg_kset_unregister_recursive(fw_cfg_fname_kset);
> 	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
> 	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
> 	fw_cfg_io_cleanup();
> }
> 
> module_init(fw_cfg_sysfs_init);
> module_exit(fw_cfg_sysfs_exit);
> MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
> MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> MODULE_LICENSE("GPL");
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-07-26  9:37 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-13 20:09 [Qemu-devel] RFC: guest-side retrieval of fw_cfg file Gabriel L. Somlo
2015-07-13 23:03 ` Eric Blake
2015-07-14 17:00   ` Gabriel L. Somlo
2015-07-15 11:06     ` Matt Fleming
2015-07-15 11:15       ` Laszlo Ersek
2015-07-14  9:43 ` Richard W.M. Jones
2015-07-14 18:23   ` Gabriel L. Somlo
2015-07-14 18:31     ` Gabriel L. Somlo
2015-07-14 18:48     ` Richard W.M. Jones
2015-07-14 18:51       ` Richard W.M. Jones
2015-07-14 19:16       ` Peter Maydell
2015-07-14 19:24       ` Gabriel L. Somlo
2015-07-14 19:24     ` Gerd Hoffmann
2015-07-16  0:43   ` Gabriel L. Somlo
2015-07-16 19:27     ` Eric Blake
2015-07-16 20:42       ` Gabriel L. Somlo
2015-07-14 11:38 ` Laszlo Ersek
2015-07-15 12:00 ` Matt Fleming
2015-07-20 21:19   ` Gabriel L. Somlo
2015-07-20 22:07     ` Laszlo Ersek
2015-07-25 23:21       ` Gabriel L. Somlo
2015-07-26  9:37         ` Laszlo Ersek
2015-07-15 14:08 ` Michael S. Tsirkin
2015-07-15 16:01   ` Igor Mammedov
2015-07-15 16:24     ` Michael S. Tsirkin
2015-07-16  1:21       ` Gabriel L. Somlo
2015-07-16  6:57       ` Igor Mammedov
2015-07-16  7:30         ` Michael S. Tsirkin
2015-07-16  9:50           ` Igor Mammedov
2015-07-16 10:25             ` Michael S. Tsirkin
2015-07-16 11:15               ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).