All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
From: Christophe Leroy @ 2020-05-29 18:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <5ce7982c72424eb3e7abf78063d454c38c42b343.1590778219.git.christophe.leroy@csgroup.eu>



Le 29/05/2020 à 20:50, Christophe Leroy a écrit :
> From: Christophe Leroy <christophe.leroy@c-s.fr>
> 
> 'thread' doesn't exist in kuap_check() macro.
> 
> Use 'current' instead.
> 
> Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Argh, can you drop this line ?

> Cc: stable@vger.kernel.org
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Reported-by: kbuild test robot <lkp@intel.com>

> ---
>   arch/powerpc/include/asm/book3s/32/kup.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h
> index db0a1c281587..668508c8a1b5 100644
> --- a/arch/powerpc/include/asm/book3s/32/kup.h
> +++ b/arch/powerpc/include/asm/book3s/32/kup.h
> @@ -75,7 +75,7 @@
>   
>   .macro kuap_check	current, gpr
>   #ifdef CONFIG_PPC_KUAP_DEBUG
> -	lwz	\gpr, KUAP(thread)
> +	lwz	\gpr, THREAD + KUAP(\current)
>   999:	twnei	\gpr, 0
>   	EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
>   #endif
> 

^ permalink raw reply

* [git pull] IOMMU Fixes for Linux v5.7-rc7
From: Joerg Roedel @ 2020-05-29 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: iommu, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1197 bytes --]

Hi Linus,

The following changes since commit 9cb1fd0efd195590b828b9b865421ad345a4a145:

  Linux 5.7-rc7 (2020-05-24 15:32:54 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v5.7-rc7

for you to fetch changes up to 7cc31613734c4870ae32f5265d576ef296621343:

  iommu: Fix reference count leak in iommu_group_alloc. (2020-05-29 15:27:50 +0200)

----------------------------------------------------------------
IOMMU Fixes for Linux v5.7-rc7

Including:

	- Two compile test fixes for issues introduced during the
	  5.7-rc1 merge window.

	- A fix for a reference count leak in an error path of
	  iommu_group_alloc().

----------------------------------------------------------------
Krzysztof Kozlowski (2):
      ia64: Hide the archdata.iommu field behind generic IOMMU_API
      x86: Hide the archdata.iommu field behind generic IOMMU_API

Qiushi Wu (1):
      iommu: Fix reference count leak in iommu_group_alloc.

 arch/ia64/include/asm/device.h | 2 +-
 arch/x86/include/asm/device.h  | 2 +-
 drivers/iommu/iommu.c          | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Please pull.

Thanks,

	Joerg

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply

* Re: [PATCH v1 2/2] Sample mtty: Add migration capability to mtty module
From: Alex Williamson @ 2020-05-29 18:58 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: cjia, kevin.tian, ziye.yang, changpeng.liu, yi.l.liu, mlevitsk,
	eskultet, cohuck, dgilbert, jonathan.davies, eauger, aik, pasic,
	felipe, Zhengxiao.zx, shuangtai.tst, Ken.Xue, zhi.a.wang,
	yan.y.zhao, qemu-devel, kvm
In-Reply-To: <1588614860-16330-3-git-send-email-kwankhede@nvidia.com>

On Mon, 4 May 2020 23:24:20 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> This patch makes mtty device migration capable. Purpose od this code is
> to test migration interface. Only stop-and-copy phase is implemented.
> Postcopy migration is not supported.
> 
> Actual data for mtty device migration is very less. Appended dummy data to
> migration data stream, default 100 Mbytes. Added sysfs file
> 'dummy_data_size_MB' to get dummy data size from user which can be used
> to check performance of based of data size. During resuming dummy data is
> read and discarded.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> ---
>  samples/vfio-mdev/mtty.c | 602 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 574 insertions(+), 28 deletions(-)
> 
> diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
> index bf666cce5bb7..f9194234fc6a 100644
> --- a/samples/vfio-mdev/mtty.c
> +++ b/samples/vfio-mdev/mtty.c
> @@ -44,9 +44,23 @@
>  
>  #define MTTY_STRING_LEN		16
>  
> -#define MTTY_CONFIG_SPACE_SIZE  0xff
> -#define MTTY_IO_BAR_SIZE        0x8
> -#define MTTY_MMIO_BAR_SIZE      0x100000
> +#define MTTY_CONFIG_SPACE_SIZE		0xff
> +#define MTTY_IO_BAR_SIZE		0x8
> +#define MTTY_MMIO_BAR_SIZE		0x100000
> +#define MTTY_MIGRATION_REGION_SIZE	0x1000000	// 16M
> +
> +#define MTTY_MIGRATION_REGION_INDEX	VFIO_PCI_NUM_REGIONS
> +#define MTTY_REGIONS_MAX		(MTTY_MIGRATION_REGION_INDEX + 1)
> +
> +/* Data section start from page aligned offset */
> +#define MTTY_MIGRATION_REGION_DATA_OFFSET	(0x1000)

Probably want to work in terms of PAGE_SIZE.

> +
> +/* First page is used for struct vfio_device_migration_info */
> +#define MTTY_MIGRATION_REGION_SIZE_MMAP     \
> +	(MTTY_MIGRATION_REGION_SIZE - MTTY_MIGRATION_REGION_DATA_OFFSET)
> +
> +#define MIGRATION_INFO_OFFSET(MEMBER)	\
> +		offsetof(struct vfio_device_migration_info, MEMBER)
>  
>  #define STORE_LE16(addr, val)   (*(u16 *)addr = val)
>  #define STORE_LE32(addr, val)   (*(u32 *)addr = val)
> @@ -129,6 +143,28 @@ struct serial_port {
>  	u8 intr_trigger_level;  /* interrupt trigger level */
>  };
>  
> +/* Migration packet */
> +#define PACKET_ID		(u16)(0xfeedbaba)
> +
> +#define PACKET_FLAGS_ACTUAL_DATA	(1 << 0)
> +#define PACKET_FLAGS_DUMMY_DATA		(1 << 1)
> +
> +#define PACKET_DATA_SIZE_MAX		(8 * 1024 * 1024)
> +
> +struct packet {
> +	u16 id;
> +	u16 flags;
> +	u32 data_size;
> +	u8 data[];
> +};
> +
> +enum {
> +	PACKET_STATE_NONE = 0,
> +	PACKET_STATE_PREPARED,
> +	PACKET_STATE_COPIED,
> +	PACKET_STATE_LAST,
> +};
> +
>  /* State of each mdev device */
>  struct mdev_state {
>  	int irq_fd;
> @@ -138,22 +174,37 @@ struct mdev_state {
>  	u8 *vconfig;
>  	struct mutex ops_lock;
>  	struct mdev_device *mdev;
> -	struct mdev_region_info region_info[VFIO_PCI_NUM_REGIONS];
> -	u32 bar_mask[VFIO_PCI_NUM_REGIONS];
> +	struct mdev_region_info region_info[MTTY_REGIONS_MAX];
> +	u32 bar_mask[MTTY_REGIONS_MAX];

A new region doesn't imply a new BAR, this should have been simply
bar_mask[2] from the start since this device implements 2 bars.

>  	struct list_head next;
>  	struct serial_port s[2];
>  	struct mutex rxtx_lock;
>  	struct vfio_device_info dev_info;
> -	int nr_ports;
> +	u32 nr_ports;
>  
>  	/* List of pinned gpfns, gpfn as index and content is translated hpfn */
>  	unsigned long *gpfn_to_hpfn;
>  	struct notifier_block nb;
> +
> +	u32 device_state;
> +	u64 saved_size;
> +	void *mig_region_base;
> +	bool is_actual_data_sent;
> +	struct packet *pkt;
> +	u32 packet_state;
> +	u64 dummy_data_size;

Please consider alignment and holes even for a sample driver.

>  };
>  
>  static struct mutex mdev_list_lock;
>  static struct list_head mdev_devices_list;
>  
> +/*
> + * Default dummy data size set to 100 MB. To change value of dummy data size at
> + * runtime but before migration write size in MB to sysfs file
> + * dummy_data_size_MB
> + */
> +static unsigned long user_dummy_data_size = (100 * 1024 * 1024);
> +
>  static const struct file_operations vd_fops = {
>  	.owner          = THIS_MODULE,
>  };
> @@ -639,6 +690,288 @@ static void mdev_read_base(struct mdev_state *mdev_state)
>  	}
>  }
>  
> +static int save_setup(struct mdev_state *mdev_state)
> +{
> +	mdev_state->is_actual_data_sent = false;
> +
> +	memset(mdev_state->pkt, 0, sizeof(struct packet) +
> +				   PACKET_DATA_SIZE_MAX);

I would have expected pkt to be allocated here as well, it looks like
there's an expectation that the user will have mmap'd the migration
region prior to this, but there's no obligation on the part of the user
to make use of the mmap at all.

> +
> +	return 0;
> +}
> +
> +static int set_device_state(struct mdev_state *mdev_state, u32 device_state)
> +{
> +	int ret = 0;
> +
> +	if (mdev_state->device_state == device_state)
> +		return 0;
> +
> +	if (device_state & VFIO_DEVICE_STATE_RUNNING) {
> +#if defined(DEBUG)
> +		if (device_state & VFIO_DEVICE_STATE_SAVING) {
> +			pr_info("%s: %s Pre-copy\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +		} else
> +			pr_info("%s: %s Running\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +#endif
> +	} else {
> +		if (device_state & VFIO_DEVICE_STATE_SAVING) {
> +#if defined(DEBUG)
> +			pr_info("%s: %s Stop-n-copy\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +#endif
> +			ret = save_setup(mdev_state);
> +
> +		} else if (device_state & VFIO_DEVICE_STATE_RESUMING) {
> +#if defined(DEBUG)
> +			pr_info("%s: %s Resuming\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +		} else {
> +			pr_info("%s: %s Stopped\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +#endif
> +		}
> +	}
> +
> +	mdev_state->device_state = device_state;
> +
> +	return ret;
> +}
> +
> +static u32 get_device_state(struct mdev_state *mdev_state)
> +{
> +	return mdev_state->device_state;
> +}
> +
> +static void write_to_packet(struct packet *pkt, u8 *data, size_t size)
> +{
> +	if ((pkt->data_size + size) > PACKET_DATA_SIZE_MAX) {
> +		pr_err("%s: packet data overflow\n", __func__);
> +		return;
> +	}
> +	memcpy((void *)&pkt->data[pkt->data_size], (void *)data, size);
> +	pkt->data_size += size;
> +}
> +
> +static void read_from_packet(struct packet *pkt, u8 *data,
> +			     int index, size_t size)
> +{
> +	if ((index + size) > PACKET_DATA_SIZE_MAX) {
> +		pr_err("%s: packet data overflow\n", __func__);

nit, underflow?  Asking for more data than is available.

> +		return;
> +	}
> +
> +	memcpy((void *)data, (void *)&pkt->data[index], size);
> +}
> +
> +static int save_device_data(struct mdev_state *mdev_state, u64 *pending)
> +{
> +	/* Save device data only during stop-and-copy phase */
> +	if (mdev_state->device_state != VFIO_DEVICE_STATE_SAVING) {
> +		*pending = 0;
> +		return 0;
> +	}
> +
> +	if (mdev_state->packet_state == PACKET_STATE_PREPARED) {
> +		*pending = sizeof(struct packet) + mdev_state->pkt->data_size;
> +		return 0;
> +	}
> +
> +	if (!mdev_state->is_actual_data_sent) {
> +
> +		/* create actual data packet */

I'm afraid this is where we really need a sample driver to set a good
precedent, which I think should include some sort of identification and
version field so that the receiving side can identify this as data
created by and intended for this device. 

> +		write_to_packet(mdev_state->pkt, (u8 *)&mdev_state->nr_ports,
> +				sizeof(mdev_state->nr_ports));
> +		write_to_packet(mdev_state->pkt, (u8 *)&mdev_state->s,
> +				sizeof(struct serial_port) * 2);
> +
> +		write_to_packet(mdev_state->pkt, mdev_state->vconfig,
> +				MTTY_CONFIG_SPACE_SIZE);
> +
> +		write_to_packet(mdev_state->pkt, (u8 *)mdev_state->gpfn_to_hpfn,
> +				sizeof(unsigned long) * MAX_GPFN_COUNT);
> +
> +		mdev_state->pkt->id = PACKET_ID;
> +		mdev_state->pkt->flags = PACKET_FLAGS_ACTUAL_DATA;
> +
> +		mdev_state->is_actual_data_sent = true;
> +	} else {
> +		/* create dummy data packet */
> +		if (mdev_state->dummy_data_size > user_dummy_data_size) {
> +			*pending = 0;
> +			mdev_state->packet_state = PACKET_STATE_NONE;
> +			return 0;
> +		}
> +
> +		memset(mdev_state->pkt->data, 0xa5, PACKET_DATA_SIZE_MAX);
> +
> +		mdev_state->pkt->id = PACKET_ID;
> +		mdev_state->pkt->flags = PACKET_FLAGS_DUMMY_DATA;
> +		mdev_state->pkt->data_size = PACKET_DATA_SIZE_MAX;
> +		mdev_state->dummy_data_size += PACKET_DATA_SIZE_MAX;
> +	}
> +
> +	*pending = sizeof(struct packet) + mdev_state->pkt->data_size;

This feeds back through to pending_bytes:

 * pending_bytes: (read only)
 *      The number of pending bytes still to be migrated from the vendor driver.

But what we're reporting here is size of the data area that we're
currently preparing.  This needs to report the dummy data size plus the
real data size and decrement as we go, not the packet size.

> +	mdev_state->packet_state = PACKET_STATE_PREPARED;
> +	mdev_state->saved_size = 0;
> +
> +	return 0;
> +}
> +
> +static int copy_device_data(struct mdev_state *mdev_state)
> +{
> +	u64 size;
> +
> +	if (!mdev_state->pkt || !mdev_state->mig_region_base)

mig_region_base is dependent on the user mmap'ing the migration region,
which they're not required to do.

> +		return -EINVAL;
> +
> +	if (mdev_state->packet_state == PACKET_STATE_COPIED)
> +		return 0;
> +
> +	if (!mdev_state->pkt->data_size)
> +		return 0;
> +
> +	size = sizeof(struct packet) + mdev_state->pkt->data_size;
> +
> +	memcpy(mdev_state->mig_region_base, mdev_state->pkt, size);

I'm not sure why the user's mmap isn't simply mapping pkt.

> +
> +	mdev_state->saved_size = size;
> +	mdev_state->packet_state = PACKET_STATE_COPIED;
> +	memset(mdev_state->pkt, 0, sizeof(struct packet));
> +	return 0;
> +}
> +
> +static int resume_device_data(struct mdev_state *mdev_state, u64 data_size)
> +{
> +	unsigned long i;
> +
> +	if (mdev_state->device_state != VFIO_DEVICE_STATE_RESUMING)
> +		return -EINVAL;
> +
> +	if (!mdev_state->pkt || !mdev_state->mig_region_base)

Again depends on the user having done something they're not required to
do.

> +		return -EINVAL;
> +
> +	memcpy(mdev_state->pkt, mdev_state->mig_region_base, data_size);
> +
> +	if (mdev_state->pkt->flags & PACKET_FLAGS_ACTUAL_DATA) {
> +		int index = 0;
> +		/* restore device data */
> +		read_from_packet(mdev_state->pkt, (u8 *)&mdev_state->nr_ports,
> +				 index, sizeof(mdev_state->nr_ports));

Zero integrity checking!

> +		index += sizeof(mdev_state->nr_ports);
> +
> +		read_from_packet(mdev_state->pkt, (u8 *)&mdev_state->s,
> +				index, sizeof(struct serial_port) * 2);
> +		index += sizeof(struct serial_port) * 2;
> +
> +		read_from_packet(mdev_state->pkt, mdev_state->vconfig,
> +				 index, MTTY_CONFIG_SPACE_SIZE);
> +		index += MTTY_CONFIG_SPACE_SIZE;
> +
> +		read_from_packet(mdev_state->pkt,
> +				(u8 *)mdev_state->gpfn_to_hpfn,
> +				index, sizeof(unsigned long) * MAX_GPFN_COUNT);
> +		index += sizeof(unsigned long) * MAX_GPFN_COUNT;
> +
> +		for (i = 0; i < MAX_GPFN_COUNT; i++) {
> +			if (mdev_state->gpfn_to_hpfn[i] != PFN_NULL) {
> +				int ret;
> +				unsigned long hpfn;
> +
> +				ret = vfio_pin_pages(mdev_dev(mdev_state->mdev),
> +				       &i, 1, IOMMU_READ | IOMMU_WRITE, &hpfn);
> +				if (ret <= 0) {
> +					pr_err("%s: 0x%lx unpin error %d\n",
> +							__func__, i, ret);
> +					continue;
> +				}
> +				mdev_state->gpfn_to_hpfn[i] = hpfn;
> +			}
> +		}

Where in this migration data did this vendor driver allow that some day
we might support more than 2 ports, we might create a PCIe device with
extended config space, we might enable a bigger table of pinned pages
or use a different data format?  This is a pretty poor example to
follow.

> +	} else {
> +#if defined(DEBUG)
> +		pr_info("%s: %s discard data 0x%llx\n",
> +			 __func__, dev_name(mdev_dev(mdev_state->mdev)),
> +			data_size);
> +#endif
> +	}
> +
> +	return 0;
> +}
> +
> +static int handle_mig_read(unsigned int index, struct mdev_state *mdev_state,
> +			   loff_t offset, u8 *buf, u32 count)
> +{
> +	int ret = 0;
> +	u64 pending = 0;
> +
> +	switch (offset) {
> +	case MIGRATION_INFO_OFFSET(device_state):	// 0x00
> +		*(u32 *)buf = get_device_state(mdev_state);
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(pending_bytes):	// 0x08
> +		ret = save_device_data(mdev_state, &pending);
> +		if (ret)
> +			break;
> +		*(u64 *)buf = pending;
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(data_offset):	// 0x10
> +		if (mdev_state->device_state & VFIO_DEVICE_STATE_SAVING) {
> +			ret = copy_device_data(mdev_state);
> +			if (ret)
> +				break;
> +		}
> +		*(u64 *)buf = MTTY_MIGRATION_REGION_DATA_OFFSET;
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(data_size):		// 0x18
> +		*(u64 *)buf = mdev_state->saved_size;
> +		break;
> +
> +	default:
> +		ret = -EINVAL;

These read/write functions MUST support read and write from the
migration data area, mmaps are optional for the user.  From our header:

  "The user is not required to access through mmap regardless of the
  capabilities of the region mmap."

Thanks,
Alex

> +	}
> +
> +#if defined(DEBUG)
> +	pr_info("%s: %s MIG  RD @0x%llx bytes: %d data: 0x%x\n",
> +			__func__, dev_name(mdev_dev(mdev_state->mdev)),
> +			offset, count, *(u32 *)buf);
> +#endif
> +	return ret;
> +}
> +
> +static int handle_mig_write(unsigned int index, struct mdev_state *mdev_state,
> +				loff_t offset, u8 *buf, u32 count)
> +{
> +	int ret = 0;
> +
> +#if defined(DEBUG)
> +	pr_info("%s: %s MIG  WR @0x%llx bytes: %d data: 0x%x\n",
> +			__func__, dev_name(mdev_dev(mdev_state->mdev)),
> +			offset, count, *(u32 *)buf);
> +#endif
> +	switch (offset) {
> +	case MIGRATION_INFO_OFFSET(device_state):	// 0x00
> +		ret = set_device_state(mdev_state, *(u32 *)buf);
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(data_size):		// 0x18
> +		ret = resume_device_data(mdev_state, *(u64 *)buf);
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(pending_bytes):	// 0x08
> +	case MIGRATION_INFO_OFFSET(data_offset):	// 0x10
> +	default:
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
>  static ssize_t mdev_access(struct mdev_device *mdev, u8 *buf, size_t count,
>  			   loff_t pos, bool is_write)
>  {
> @@ -702,6 +1035,18 @@ static ssize_t mdev_access(struct mdev_device *mdev, u8 *buf, size_t count,
>  		}
>  		break;
>  
> +	case MTTY_MIGRATION_REGION_INDEX:
> +		if (is_write) {
> +			ret = handle_mig_write(index, mdev_state, offset, buf,
> +					      count);
> +		} else {
> +			ret = handle_mig_read(index, mdev_state, offset, buf,
> +					      count);
> +		}
> +		if (ret)
> +			goto accessfailed;
> +		break;
> +
>  	default:
>  		ret = -1;
>  		goto accessfailed;
> @@ -709,7 +1054,6 @@ static ssize_t mdev_access(struct mdev_device *mdev, u8 *buf, size_t count,
>  
>  	ret = count;
>  
> -
>  accessfailed:
>  	mutex_unlock(&mdev_state->ops_lock);
>  
> @@ -819,13 +1163,29 @@ static int mtty_reset(struct mdev_device *mdev)
>  static ssize_t mtty_read(struct mdev_device *mdev, char __user *buf,
>  			 size_t count, loff_t *ppos)
>  {
> -	unsigned int done = 0;
> +	unsigned int done = 0, index;
>  	int ret;
>  
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +
>  	while (count) {
>  		size_t filled;
>  
> -		if (count >= 4 && !(*ppos % 4)) {
> +		if ((index == MTTY_MIGRATION_REGION_INDEX) &&
> +		    (count >= 8 && !(*ppos % 8))) {
> +			u64 val;
> +
> +			ret =  mdev_access(mdev, (u8 *)&val, sizeof(val),
> +					   *ppos, false);
> +			if (ret <= 0)
> +				goto read_err;
> +
> +			if (copy_to_user(buf, &val, sizeof(val)))
> +				goto read_err;
> +
> +			filled = 8;
> +
> +		} else if (count >= 4 && !(*ppos % 4)) {
>  			u32 val;
>  
>  			ret =  mdev_access(mdev, (u8 *)&val, sizeof(val),
> @@ -878,13 +1238,27 @@ static ssize_t mtty_read(struct mdev_device *mdev, char __user *buf,
>  static ssize_t mtty_write(struct mdev_device *mdev, const char __user *buf,
>  		   size_t count, loff_t *ppos)
>  {
> -	unsigned int done = 0;
> +	unsigned int done = 0, index;
>  	int ret;
>  
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>  	while (count) {
>  		size_t filled;
>  
> -		if (count >= 4 && !(*ppos % 4)) {
> +		if ((index == MTTY_MIGRATION_REGION_INDEX) &&
> +		    (count >= 8 && !(*ppos % 8))) {
> +			u64 val;
> +
> +			if (copy_from_user(&val, buf, sizeof(val)))
> +				goto write_err;
> +
> +			ret = mdev_access(mdev, (u8 *)&val, sizeof(val),
> +					  *ppos, true);
> +			if (ret <= 0)
> +				goto write_err;
> +
> +			filled = 8;
> +		} else if (count >= 4 && !(*ppos % 4)) {
>  			u32 val;
>  
>  			if (copy_from_user(&val, buf, sizeof(val)))
> @@ -1061,12 +1435,13 @@ static int mtty_trigger_interrupt(struct mdev_state *mdev_state)
>  }
>  
>  static int mtty_get_region_info(struct mdev_device *mdev,
> -			 struct vfio_region_info *region_info,
> -			 u16 *cap_type_id, void **cap_type)
> +				struct vfio_region_info *region_info,
> +				struct vfio_info_cap *caps)
>  {
>  	unsigned int size = 0;
>  	struct mdev_state *mdev_state;
> -	u32 bar_index;
> +	u32 index;
> +	int ret = 0;
>  
>  	if (!mdev)
>  		return -EINVAL;
> @@ -1075,13 +1450,13 @@ static int mtty_get_region_info(struct mdev_device *mdev,
>  	if (!mdev_state)
>  		return -EINVAL;
>  
> -	bar_index = region_info->index;
> -	if (bar_index >= VFIO_PCI_NUM_REGIONS)
> +	index = region_info->index;
> +	if (index >= MTTY_REGIONS_MAX)
>  		return -EINVAL;
>  
>  	mutex_lock(&mdev_state->ops_lock);
>  
> -	switch (bar_index) {
> +	switch (index) {
>  	case VFIO_PCI_CONFIG_REGION_INDEX:
>  		size = MTTY_CONFIG_SPACE_SIZE;
>  		break;
> @@ -1092,21 +1467,63 @@ static int mtty_get_region_info(struct mdev_device *mdev,
>  		if (mdev_state->nr_ports == 2)
>  			size = MTTY_IO_BAR_SIZE;
>  		break;
> +	case MTTY_MIGRATION_REGION_INDEX:
> +		size = MTTY_MIGRATION_REGION_SIZE;
> +		break;
>  	default:
>  		size = 0;
>  		break;
>  	}
>  
> -	mdev_state->region_info[bar_index].size = size;
> -	mdev_state->region_info[bar_index].vfio_offset =
> -		MTTY_VFIO_PCI_INDEX_TO_OFFSET(bar_index);
> +	mdev_state->region_info[index].size = size;
> +	mdev_state->region_info[index].vfio_offset =
> +					MTTY_VFIO_PCI_INDEX_TO_OFFSET(index);
>  
>  	region_info->size = size;
> -	region_info->offset = MTTY_VFIO_PCI_INDEX_TO_OFFSET(bar_index);
> +	region_info->offset = MTTY_VFIO_PCI_INDEX_TO_OFFSET(index);
>  	region_info->flags = VFIO_REGION_INFO_FLAG_READ |
> -		VFIO_REGION_INFO_FLAG_WRITE;
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +
> +	if (index == MTTY_MIGRATION_REGION_INDEX) {
> +		struct vfio_region_info_cap_sparse {
> +			struct vfio_region_info_cap_sparse_mmap sparse;
> +			struct vfio_region_sparse_mmap_area area;
> +		};
> +
> +		struct vfio_region_info_cap_sparse mig_region;
> +
> +		struct vfio_region_info_cap_type cap_type = {
> +			.header.id = VFIO_REGION_INFO_CAP_TYPE,
> +			.header.version = 1,
> +			.type = VFIO_REGION_TYPE_MIGRATION,
> +			.subtype = VFIO_REGION_SUBTYPE_MIGRATION
> +		};
> +
> +		/* Add REGION CAP type */
> +		ret = vfio_info_add_capability(caps, &cap_type.header,
> +						sizeof(cap_type));
> +		if (ret)
> +			goto exit;
> +
> +		/* Add sparse mmap cap type */
> +		mig_region.sparse.nr_areas = 1;
> +		mig_region.sparse.header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
> +		mig_region.sparse.header.version = 1;
> +
> +		mig_region.area.offset = MTTY_MIGRATION_REGION_DATA_OFFSET;
> +		mig_region.area.size = MTTY_MIGRATION_REGION_SIZE_MMAP;
> +
> +		region_info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
> +
> +		if (region_info->argsz > sizeof(*region_info))
> +			region_info->flags |= VFIO_REGION_INFO_FLAG_MMAP;
> +
> +		ret = vfio_info_add_capability(caps, &mig_region.sparse.header,
> +						sizeof(mig_region));
> +	}
> +exit:
>  	mutex_unlock(&mdev_state->ops_lock);
> -	return 0;
> +	return ret;
>  }
>  
>  static int mtty_get_irq_info(struct mdev_device *mdev,
> @@ -1138,7 +1555,7 @@ static int mtty_get_device_info(struct mdev_device *mdev,
>  			 struct vfio_device_info *dev_info)
>  {
>  	dev_info->flags = VFIO_DEVICE_FLAGS_PCI;
> -	dev_info->num_regions = VFIO_PCI_NUM_REGIONS;
> +	dev_info->num_regions = MTTY_REGIONS_MAX;
>  	dev_info->num_irqs = VFIO_PCI_NUM_IRQS;
>  
>  	return 0;
> @@ -1150,6 +1567,7 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  	int ret = 0;
>  	unsigned long minsz;
>  	struct mdev_state *mdev_state;
> +	struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>  
>  	if (!mdev)
>  		return -EINVAL;
> @@ -1185,8 +1603,6 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  	case VFIO_DEVICE_GET_REGION_INFO:
>  	{
>  		struct vfio_region_info info;
> -		u16 cap_type_id = 0;
> -		void *cap_type = NULL;
>  
>  		minsz = offsetofend(struct vfio_region_info, offset);
>  
> @@ -1196,11 +1612,29 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  		if (info.argsz < minsz)
>  			return -EINVAL;
>  
> -		ret = mtty_get_region_info(mdev, &info, &cap_type_id,
> -					   &cap_type);
> +		ret = mtty_get_region_info(mdev, &info, &caps);
>  		if (ret)
>  			return ret;
>  
> +		if (caps.size) {
> +			info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +							sizeof(info), caps.buf,
> +							caps.size)) {
> +					kfree(caps.buf);
> +					ret = -EFAULT;
> +					break;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +			kfree(caps.buf);
> +		}
> +
>  		if (copy_to_user((void __user *)arg, &info, minsz))
>  			return -EFAULT;
>  
> @@ -1266,6 +1700,89 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  	return -ENOTTY;
>  }
>  
> +void mmap_close(struct vm_area_struct *vma)
> +{
> +	struct mdev_device *mdev = vma->vm_private_data;
> +	struct mdev_state *mdev_state;
> +	uint32_t index = 0;
> +
> +	if (!mdev)
> +		return;
> +
> +	mdev_state = mdev_get_drvdata(mdev);
> +	if (!mdev_state)
> +		return;
> +
> +	mutex_lock(&mdev_state->ops_lock);
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
> +	if (index == MTTY_MIGRATION_REGION_INDEX) {
> +		if (mdev_state->mig_region_base != NULL) {
> +			vfree(mdev_state->mig_region_base);
> +			mdev_state->mig_region_base = NULL;
> +		}
> +
> +		if (mdev_state->pkt != NULL) {
> +			vfree(mdev_state->pkt);
> +			mdev_state->pkt = NULL;
> +		}
> +	}
> +	mutex_unlock(&mdev_state->ops_lock);
> +}
> +
> +static const struct vm_operations_struct mdev_vm_ops = {
> +	.close = mmap_close,
> +};
> +
> +static int mtty_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
> +{
> +	struct mdev_state *mdev_state;
> +	unsigned int index;
> +	int ret = 0;
> +
> +	if (!mdev)
> +		return -EINVAL;
> +
> +	mdev_state = mdev_get_drvdata(mdev);
> +	if (!mdev_state)
> +		return -ENODEV;
> +
> +	mutex_lock(&mdev_state->ops_lock);
> +
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
> +	if (index == MTTY_MIGRATION_REGION_INDEX) {
> +		mdev_state->mig_region_base =
> +				 vmalloc_user(MTTY_MIGRATION_REGION_SIZE_MMAP);
> +		if (mdev_state->mig_region_base == NULL) {
> +			ret = -ENOMEM;
> +			goto mmap_exit;
> +		}
> +
> +		mdev_state->pkt = vzalloc(sizeof(struct packet) +
> +					  PACKET_DATA_SIZE_MAX);
> +		if (mdev_state->pkt == NULL) {
> +			vfree(mdev_state->mig_region_base);
> +			mdev_state->mig_region_base = NULL;
> +			ret = -ENOMEM;
> +			goto mmap_exit;
> +		}
> +
> +		vma->vm_ops = &mdev_vm_ops;
> +
> +		ret = remap_vmalloc_range(vma, mdev_state->mig_region_base, 0);
> +		if (ret != 0) {
> +			pr_err("remap_vmalloc_range failed, ret= %d\n", ret);
> +			vfree(mdev_state->mig_region_base);
> +			mdev_state->mig_region_base = NULL;
> +			vfree(mdev_state->pkt);
> +			mdev_state->pkt = NULL;
> +			goto mmap_exit;
> +		}
> +	}
> +mmap_exit:
> +	mutex_unlock(&mdev_state->ops_lock);
> +	return ret;
> +}
> +
>  static void unpin_pages_all(struct mdev_state *mdev_state)
>  {
>  	struct mdev_device *mdev = mdev_state->mdev;
> @@ -1339,6 +1856,8 @@ static int mtty_open(struct mdev_device *mdev)
>  
>  	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, &events,
>  				     &mdev_state->nb);
> +	mdev_state->dummy_data_size = 0;
> +	mdev_state->mig_region_base = NULL;
>  	return ret;
>  }
>  
> @@ -1355,6 +1874,15 @@ static void mtty_close(struct mdev_device *mdev)
>  	unpin_pages_all(mdev_state);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>  				 &mdev_state->nb);
> +	if (mdev_state->pkt != NULL) {
> +		vfree(mdev_state->pkt);
> +		mdev_state->pkt = NULL;
> +	}
> +
> +	if (mdev_state->mig_region_base != NULL) {
> +		vfree(mdev_state->mig_region_base);
> +		mdev_state->mig_region_base = NULL;
> +	}
>  }
>  
>  static ssize_t
> @@ -1466,9 +1994,26 @@ pin_pages_store(struct device *dev, struct device_attribute *attr,
>  
>  static DEVICE_ATTR_RW(pin_pages);
>  
> +static ssize_t
> +dummy_data_size_MB_store(struct device *dev, struct device_attribute *attr,
> +			 const char *buf, size_t count)
> +{
> +	int ret;
> +
> +	ret = kstrtoul(buf, 0, &user_dummy_data_size);
> +	if (ret)
> +		return ret;
> +
> +	user_dummy_data_size = user_dummy_data_size << 20;
> +	return count;
> +}
> +
> +static DEVICE_ATTR_WO(dummy_data_size_MB);
> +
>  static struct attribute *mdev_dev_attrs[] = {
>  	&dev_attr_sample_mdev_dev.attr,
>  	&dev_attr_pin_pages.attr,
> +	&dev_attr_dummy_data_size_MB.attr,
>  	NULL,
>  };
>  
> @@ -1573,6 +2118,7 @@ static const struct mdev_parent_ops mdev_fops = {
>  	.read                   = mtty_read,
>  	.write                  = mtty_write,
>  	.ioctl		        = mtty_ioctl,
> +	.mmap			= mtty_mmap,
>  };
>  
>  static void mtty_device_release(struct device *dev)


^ permalink raw reply

* Re: [PATCH rfc v3] New ioctl BTRFS_IOC_GET_CHUNK_INFO.
From: Hans van Kranenburg @ 2020-05-29 18:59 UTC (permalink / raw)
  To: kreijack, dsterba, linux-btrfs, Josef Bacik
In-Reply-To: <f5f75774-0a35-bd35-7d80-71d1f1a285a1@knorrie.org>

On 5/29/20 8:54 PM, Hans van Kranenburg wrote:
> On 5/29/20 6:23 PM, Goffredo Baroncelli wrote:
> 
>> Another case use is to visulize how
>> the chunk are filled, or how the disks are used..
> 
> An example of that is btrfs-heatmap.

Oops, forgot the example. Here's it showing data and metadata getting
sorted out while doing some balance after setting things up for the
preferred metadata/data placement. It's two 10G disks, and it's the
'physical' view, so the left half is the "slow" part and the right half
is the "fast" part of the filesystem, fun. :)

https://syrinx.knorrie.org/~knorrie/btrfs/keep/2020-05-29-ssd_metadata.mp4

Hans

^ permalink raw reply

* [SPDK] Storage Performance Development Kit (SPDK), Persistent Memory Development Kit (PMDK) and Intel® Vtune Amplifier Virtual Forum, June 23rd-25th
From: Kariuki, John K @ 2020-05-29 18:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

Please, join us for a 3-day virtual experience with technical presentations, engaging demos, and interactive sessions with industry leaders, experts, developers, and user communities.

REGISTER HERE<http://www.cvent.com/events/spdk-pmdk-and-intel-vtune-profiler-virtual-forum/registration-8c934ff0ac7f4aa48bfd223f9ecded79.aspx>

We hope to see you there!

---------------------------------------------------
John Kariuki

^ permalink raw reply

* Re: [PATCH -tip v3 05/11] kcsan: Remove 'noinline' from __no_kcsan_or_inline
From: Peter Zijlstra @ 2020-05-29 18:59 UTC (permalink / raw)
  To: Marco Elver
  Cc: Paul E. McKenney, Dmitry Vyukov, Alexander Potapenko,
	Andrey Konovalov, kasan-dev, LKML, Thomas Gleixner, Ingo Molnar,
	Will Deacon, clang-built-linux, Borislav Petkov
In-Reply-To: <CANpmjNPaL=HRvaJOC37_Cf4S4kskZezmgRiDSGn460rO2dM4+g@mail.gmail.com>

On Fri, May 29, 2020 at 08:36:56PM +0200, Marco Elver wrote:

> > +/* Section for code which can't be instrumented at all */
> > +#define noinstr                                                                \
> > +       noinline notrace __attribute((__section__(".noinstr.text"))) __no_kcsan
> > +
> 
> Will this eventually need __no_sanitize_address?

Yes, and __no_sanitize_undefined and whatever else there is.

https://lkml.kernel.org/r/20200529171104.GD706518@hirez.programming.kicks-ass.net


> Acked -- if you send a patch, do split the test-related change, so
> that Paul can apply it to the test which is currently only in -rcu.

Ok, I'll try not forget over the weekend ;-)

^ permalink raw reply

* Re: [PATCH V4 2/4] RDMA/core: Introduce shared CQ pool API
From: Jason Gunthorpe @ 2020-05-29 18:59 UTC (permalink / raw)
  To: Yamin Friedman
  Cc: Sagi Grimberg, Christoph Hellwig, Leon Romanovsky, linux-rdma
In-Reply-To: <1590568495-101621-3-git-send-email-yaminf@mellanox.com>

On Wed, May 27, 2020 at 11:34:53AM +0300, Yamin Friedman wrote:
> +struct ib_cq *ib_cq_pool_get(struct ib_device *dev, unsigned int nr_cqe,
> +			     int comp_vector_hint,
> +			     enum ib_poll_context poll_ctx)
> +{
> +	static unsigned int default_comp_vector;
> +	unsigned int vector, num_comp_vectors;
> +	struct ib_cq *cq, *found = NULL;
> +	int ret;
> +
> +	if (poll_ctx > IB_POLL_LAST_POOL_TYPE) {
> +		WARN_ON_ONCE(poll_ctx > IB_POLL_LAST_POOL_TYPE);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	num_comp_vectors = min_t(int, dev->num_comp_vectors,
> +				 num_online_cpus());
> +	/* Project the affinty to the device completion vector range */
> +	if (comp_vector_hint < 0)
> +		vector = default_comp_vector++ % num_comp_vectors;

This should not be touching this shared data without some concurrency
management, I changed it to this:

	if (comp_vector_hint < 0) {
		comp_vector_hint =
			(READ_ONCE(default_comp_vector) + 1) % num_comp_vectors;
		WRITE_ONCE(default_comp_vector, comp_vector_hint);
	}
	vector = comp_vector_hint % num_comp_vectors;

Jason

^ permalink raw reply

* [PATCH 05/10] core: extend struct driver_info to point to device
From: Simon Glass @ 2020-05-29 19:00 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <d07e8a0b-2591-4134-afc8-55897cbd29eb@collabora.com>

Hi Walter,

On Fri, 29 May 2020 at 12:56, Walter Lozano <walter.lozano@collabora.com> wrote:
>
>
> On 29/5/20 15:15, Walter Lozano wrote:
> > Currently when creating an U_BOOT_DEVICE entry a struct driver_info
> > is declared, which contains the data needed to instantiate the device.
> > However, the actual device is created at runtime and there is no proper
> > way to get the device based on its struct driver_info.
> >
> > This patch extends struct driver_info adding a pointer to udevice which
> > is populated during the bind process, allowing to generate a set of
> > functions to get the device based on its struct driver_info.
> >
> > Signed-off-by: Walter Lozano <walter.lozano@collabora.com>
> > ---
> >   drivers/core/device.c | 26 +++++++++++++++++++++++---
> >   drivers/core/root.c   |  4 ++++
> >   include/dm/device.h   | 14 ++++++++++++++
> >   include/dm/platdata.h | 14 ++++++++++++++
> >   4 files changed, 55 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/core/device.c b/drivers/core/device.c
> > index a0ad080aaf..5adbc30849 100644
> > --- a/drivers/core/device.c
> > +++ b/drivers/core/device.c
> > @@ -250,6 +250,7 @@ int device_bind_by_name(struct udevice *parent, bool pre_reloc_only,
> >   {
> >       struct driver *drv;
> >       uint platdata_size = 0;
> > +     int ret = 0;
> >
> >       drv = lists_driver_lookup_name(info->name);
> >       if (!drv)
> > @@ -260,9 +261,16 @@ int device_bind_by_name(struct udevice *parent, bool pre_reloc_only,
> >   #if CONFIG_IS_ENABLED(OF_PLATDATA)
> >       platdata_size = info->platdata_size;
> >   #endif
> > -     return device_bind_common(parent, drv, info->name,
> > -                     (void *)info->platdata, 0, ofnode_null(), platdata_size,
> > -                     devp);
> > +     ret = device_bind_common(parent, drv, info->name,
> > +                              (void *)info->platdata, 0, ofnode_null(),
> > +                              platdata_size, devp);
> > +     if (ret)
> > +             return ret;
> > +#if CONFIG_IS_ENABLED(OF_PLATDATA)
> > +     info->dev = *devp;
> > +#endif
>
> I have tried to test this using sandbox_spl_defconfig but I've received
> a segmentation fault when trying to update info->dev, however this code
> works on iMX6.
>
> Could it be some kind of protection? Any thoughts?

Yes, see u-boot-dm/dtoc-working - arch/sandbox/cpu/u-boot-spl.lds has
an attempt to move some of the list stuff into the data region.

[..]

Regards,
Simon

^ permalink raw reply

* Re: [PATCH v1 2/2] Sample mtty: Add migration capability to mtty module
From: Alex Williamson @ 2020-05-29 18:58 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Zhengxiao.zx, kevin.tian, yi.l.liu, cjia, kvm, eskultet,
	ziye.yang, qemu-devel, cohuck, shuangtai.tst, dgilbert,
	zhi.a.wang, mlevitsk, pasic, aik, eauger, felipe, jonathan.davies,
	yan.y.zhao, changpeng.liu, Ken.Xue
In-Reply-To: <1588614860-16330-3-git-send-email-kwankhede@nvidia.com>

On Mon, 4 May 2020 23:24:20 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> This patch makes mtty device migration capable. Purpose od this code is
> to test migration interface. Only stop-and-copy phase is implemented.
> Postcopy migration is not supported.
> 
> Actual data for mtty device migration is very less. Appended dummy data to
> migration data stream, default 100 Mbytes. Added sysfs file
> 'dummy_data_size_MB' to get dummy data size from user which can be used
> to check performance of based of data size. During resuming dummy data is
> read and discarded.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> ---
>  samples/vfio-mdev/mtty.c | 602 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 574 insertions(+), 28 deletions(-)
> 
> diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
> index bf666cce5bb7..f9194234fc6a 100644
> --- a/samples/vfio-mdev/mtty.c
> +++ b/samples/vfio-mdev/mtty.c
> @@ -44,9 +44,23 @@
>  
>  #define MTTY_STRING_LEN		16
>  
> -#define MTTY_CONFIG_SPACE_SIZE  0xff
> -#define MTTY_IO_BAR_SIZE        0x8
> -#define MTTY_MMIO_BAR_SIZE      0x100000
> +#define MTTY_CONFIG_SPACE_SIZE		0xff
> +#define MTTY_IO_BAR_SIZE		0x8
> +#define MTTY_MMIO_BAR_SIZE		0x100000
> +#define MTTY_MIGRATION_REGION_SIZE	0x1000000	// 16M
> +
> +#define MTTY_MIGRATION_REGION_INDEX	VFIO_PCI_NUM_REGIONS
> +#define MTTY_REGIONS_MAX		(MTTY_MIGRATION_REGION_INDEX + 1)
> +
> +/* Data section start from page aligned offset */
> +#define MTTY_MIGRATION_REGION_DATA_OFFSET	(0x1000)

Probably want to work in terms of PAGE_SIZE.

> +
> +/* First page is used for struct vfio_device_migration_info */
> +#define MTTY_MIGRATION_REGION_SIZE_MMAP     \
> +	(MTTY_MIGRATION_REGION_SIZE - MTTY_MIGRATION_REGION_DATA_OFFSET)
> +
> +#define MIGRATION_INFO_OFFSET(MEMBER)	\
> +		offsetof(struct vfio_device_migration_info, MEMBER)
>  
>  #define STORE_LE16(addr, val)   (*(u16 *)addr = val)
>  #define STORE_LE32(addr, val)   (*(u32 *)addr = val)
> @@ -129,6 +143,28 @@ struct serial_port {
>  	u8 intr_trigger_level;  /* interrupt trigger level */
>  };
>  
> +/* Migration packet */
> +#define PACKET_ID		(u16)(0xfeedbaba)
> +
> +#define PACKET_FLAGS_ACTUAL_DATA	(1 << 0)
> +#define PACKET_FLAGS_DUMMY_DATA		(1 << 1)
> +
> +#define PACKET_DATA_SIZE_MAX		(8 * 1024 * 1024)
> +
> +struct packet {
> +	u16 id;
> +	u16 flags;
> +	u32 data_size;
> +	u8 data[];
> +};
> +
> +enum {
> +	PACKET_STATE_NONE = 0,
> +	PACKET_STATE_PREPARED,
> +	PACKET_STATE_COPIED,
> +	PACKET_STATE_LAST,
> +};
> +
>  /* State of each mdev device */
>  struct mdev_state {
>  	int irq_fd;
> @@ -138,22 +174,37 @@ struct mdev_state {
>  	u8 *vconfig;
>  	struct mutex ops_lock;
>  	struct mdev_device *mdev;
> -	struct mdev_region_info region_info[VFIO_PCI_NUM_REGIONS];
> -	u32 bar_mask[VFIO_PCI_NUM_REGIONS];
> +	struct mdev_region_info region_info[MTTY_REGIONS_MAX];
> +	u32 bar_mask[MTTY_REGIONS_MAX];

A new region doesn't imply a new BAR, this should have been simply
bar_mask[2] from the start since this device implements 2 bars.

>  	struct list_head next;
>  	struct serial_port s[2];
>  	struct mutex rxtx_lock;
>  	struct vfio_device_info dev_info;
> -	int nr_ports;
> +	u32 nr_ports;
>  
>  	/* List of pinned gpfns, gpfn as index and content is translated hpfn */
>  	unsigned long *gpfn_to_hpfn;
>  	struct notifier_block nb;
> +
> +	u32 device_state;
> +	u64 saved_size;
> +	void *mig_region_base;
> +	bool is_actual_data_sent;
> +	struct packet *pkt;
> +	u32 packet_state;
> +	u64 dummy_data_size;

Please consider alignment and holes even for a sample driver.

>  };
>  
>  static struct mutex mdev_list_lock;
>  static struct list_head mdev_devices_list;
>  
> +/*
> + * Default dummy data size set to 100 MB. To change value of dummy data size at
> + * runtime but before migration write size in MB to sysfs file
> + * dummy_data_size_MB
> + */
> +static unsigned long user_dummy_data_size = (100 * 1024 * 1024);
> +
>  static const struct file_operations vd_fops = {
>  	.owner          = THIS_MODULE,
>  };
> @@ -639,6 +690,288 @@ static void mdev_read_base(struct mdev_state *mdev_state)
>  	}
>  }
>  
> +static int save_setup(struct mdev_state *mdev_state)
> +{
> +	mdev_state->is_actual_data_sent = false;
> +
> +	memset(mdev_state->pkt, 0, sizeof(struct packet) +
> +				   PACKET_DATA_SIZE_MAX);

I would have expected pkt to be allocated here as well, it looks like
there's an expectation that the user will have mmap'd the migration
region prior to this, but there's no obligation on the part of the user
to make use of the mmap at all.

> +
> +	return 0;
> +}
> +
> +static int set_device_state(struct mdev_state *mdev_state, u32 device_state)
> +{
> +	int ret = 0;
> +
> +	if (mdev_state->device_state == device_state)
> +		return 0;
> +
> +	if (device_state & VFIO_DEVICE_STATE_RUNNING) {
> +#if defined(DEBUG)
> +		if (device_state & VFIO_DEVICE_STATE_SAVING) {
> +			pr_info("%s: %s Pre-copy\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +		} else
> +			pr_info("%s: %s Running\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +#endif
> +	} else {
> +		if (device_state & VFIO_DEVICE_STATE_SAVING) {
> +#if defined(DEBUG)
> +			pr_info("%s: %s Stop-n-copy\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +#endif
> +			ret = save_setup(mdev_state);
> +
> +		} else if (device_state & VFIO_DEVICE_STATE_RESUMING) {
> +#if defined(DEBUG)
> +			pr_info("%s: %s Resuming\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +		} else {
> +			pr_info("%s: %s Stopped\n", __func__,
> +				dev_name(mdev_dev(mdev_state->mdev)));
> +#endif
> +		}
> +	}
> +
> +	mdev_state->device_state = device_state;
> +
> +	return ret;
> +}
> +
> +static u32 get_device_state(struct mdev_state *mdev_state)
> +{
> +	return mdev_state->device_state;
> +}
> +
> +static void write_to_packet(struct packet *pkt, u8 *data, size_t size)
> +{
> +	if ((pkt->data_size + size) > PACKET_DATA_SIZE_MAX) {
> +		pr_err("%s: packet data overflow\n", __func__);
> +		return;
> +	}
> +	memcpy((void *)&pkt->data[pkt->data_size], (void *)data, size);
> +	pkt->data_size += size;
> +}
> +
> +static void read_from_packet(struct packet *pkt, u8 *data,
> +			     int index, size_t size)
> +{
> +	if ((index + size) > PACKET_DATA_SIZE_MAX) {
> +		pr_err("%s: packet data overflow\n", __func__);

nit, underflow?  Asking for more data than is available.

> +		return;
> +	}
> +
> +	memcpy((void *)data, (void *)&pkt->data[index], size);
> +}
> +
> +static int save_device_data(struct mdev_state *mdev_state, u64 *pending)
> +{
> +	/* Save device data only during stop-and-copy phase */
> +	if (mdev_state->device_state != VFIO_DEVICE_STATE_SAVING) {
> +		*pending = 0;
> +		return 0;
> +	}
> +
> +	if (mdev_state->packet_state == PACKET_STATE_PREPARED) {
> +		*pending = sizeof(struct packet) + mdev_state->pkt->data_size;
> +		return 0;
> +	}
> +
> +	if (!mdev_state->is_actual_data_sent) {
> +
> +		/* create actual data packet */

I'm afraid this is where we really need a sample driver to set a good
precedent, which I think should include some sort of identification and
version field so that the receiving side can identify this as data
created by and intended for this device. 

> +		write_to_packet(mdev_state->pkt, (u8 *)&mdev_state->nr_ports,
> +				sizeof(mdev_state->nr_ports));
> +		write_to_packet(mdev_state->pkt, (u8 *)&mdev_state->s,
> +				sizeof(struct serial_port) * 2);
> +
> +		write_to_packet(mdev_state->pkt, mdev_state->vconfig,
> +				MTTY_CONFIG_SPACE_SIZE);
> +
> +		write_to_packet(mdev_state->pkt, (u8 *)mdev_state->gpfn_to_hpfn,
> +				sizeof(unsigned long) * MAX_GPFN_COUNT);
> +
> +		mdev_state->pkt->id = PACKET_ID;
> +		mdev_state->pkt->flags = PACKET_FLAGS_ACTUAL_DATA;
> +
> +		mdev_state->is_actual_data_sent = true;
> +	} else {
> +		/* create dummy data packet */
> +		if (mdev_state->dummy_data_size > user_dummy_data_size) {
> +			*pending = 0;
> +			mdev_state->packet_state = PACKET_STATE_NONE;
> +			return 0;
> +		}
> +
> +		memset(mdev_state->pkt->data, 0xa5, PACKET_DATA_SIZE_MAX);
> +
> +		mdev_state->pkt->id = PACKET_ID;
> +		mdev_state->pkt->flags = PACKET_FLAGS_DUMMY_DATA;
> +		mdev_state->pkt->data_size = PACKET_DATA_SIZE_MAX;
> +		mdev_state->dummy_data_size += PACKET_DATA_SIZE_MAX;
> +	}
> +
> +	*pending = sizeof(struct packet) + mdev_state->pkt->data_size;

This feeds back through to pending_bytes:

 * pending_bytes: (read only)
 *      The number of pending bytes still to be migrated from the vendor driver.

But what we're reporting here is size of the data area that we're
currently preparing.  This needs to report the dummy data size plus the
real data size and decrement as we go, not the packet size.

> +	mdev_state->packet_state = PACKET_STATE_PREPARED;
> +	mdev_state->saved_size = 0;
> +
> +	return 0;
> +}
> +
> +static int copy_device_data(struct mdev_state *mdev_state)
> +{
> +	u64 size;
> +
> +	if (!mdev_state->pkt || !mdev_state->mig_region_base)

mig_region_base is dependent on the user mmap'ing the migration region,
which they're not required to do.

> +		return -EINVAL;
> +
> +	if (mdev_state->packet_state == PACKET_STATE_COPIED)
> +		return 0;
> +
> +	if (!mdev_state->pkt->data_size)
> +		return 0;
> +
> +	size = sizeof(struct packet) + mdev_state->pkt->data_size;
> +
> +	memcpy(mdev_state->mig_region_base, mdev_state->pkt, size);

I'm not sure why the user's mmap isn't simply mapping pkt.

> +
> +	mdev_state->saved_size = size;
> +	mdev_state->packet_state = PACKET_STATE_COPIED;
> +	memset(mdev_state->pkt, 0, sizeof(struct packet));
> +	return 0;
> +}
> +
> +static int resume_device_data(struct mdev_state *mdev_state, u64 data_size)
> +{
> +	unsigned long i;
> +
> +	if (mdev_state->device_state != VFIO_DEVICE_STATE_RESUMING)
> +		return -EINVAL;
> +
> +	if (!mdev_state->pkt || !mdev_state->mig_region_base)

Again depends on the user having done something they're not required to
do.

> +		return -EINVAL;
> +
> +	memcpy(mdev_state->pkt, mdev_state->mig_region_base, data_size);
> +
> +	if (mdev_state->pkt->flags & PACKET_FLAGS_ACTUAL_DATA) {
> +		int index = 0;
> +		/* restore device data */
> +		read_from_packet(mdev_state->pkt, (u8 *)&mdev_state->nr_ports,
> +				 index, sizeof(mdev_state->nr_ports));

Zero integrity checking!

> +		index += sizeof(mdev_state->nr_ports);
> +
> +		read_from_packet(mdev_state->pkt, (u8 *)&mdev_state->s,
> +				index, sizeof(struct serial_port) * 2);
> +		index += sizeof(struct serial_port) * 2;
> +
> +		read_from_packet(mdev_state->pkt, mdev_state->vconfig,
> +				 index, MTTY_CONFIG_SPACE_SIZE);
> +		index += MTTY_CONFIG_SPACE_SIZE;
> +
> +		read_from_packet(mdev_state->pkt,
> +				(u8 *)mdev_state->gpfn_to_hpfn,
> +				index, sizeof(unsigned long) * MAX_GPFN_COUNT);
> +		index += sizeof(unsigned long) * MAX_GPFN_COUNT;
> +
> +		for (i = 0; i < MAX_GPFN_COUNT; i++) {
> +			if (mdev_state->gpfn_to_hpfn[i] != PFN_NULL) {
> +				int ret;
> +				unsigned long hpfn;
> +
> +				ret = vfio_pin_pages(mdev_dev(mdev_state->mdev),
> +				       &i, 1, IOMMU_READ | IOMMU_WRITE, &hpfn);
> +				if (ret <= 0) {
> +					pr_err("%s: 0x%lx unpin error %d\n",
> +							__func__, i, ret);
> +					continue;
> +				}
> +				mdev_state->gpfn_to_hpfn[i] = hpfn;
> +			}
> +		}

Where in this migration data did this vendor driver allow that some day
we might support more than 2 ports, we might create a PCIe device with
extended config space, we might enable a bigger table of pinned pages
or use a different data format?  This is a pretty poor example to
follow.

> +	} else {
> +#if defined(DEBUG)
> +		pr_info("%s: %s discard data 0x%llx\n",
> +			 __func__, dev_name(mdev_dev(mdev_state->mdev)),
> +			data_size);
> +#endif
> +	}
> +
> +	return 0;
> +}
> +
> +static int handle_mig_read(unsigned int index, struct mdev_state *mdev_state,
> +			   loff_t offset, u8 *buf, u32 count)
> +{
> +	int ret = 0;
> +	u64 pending = 0;
> +
> +	switch (offset) {
> +	case MIGRATION_INFO_OFFSET(device_state):	// 0x00
> +		*(u32 *)buf = get_device_state(mdev_state);
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(pending_bytes):	// 0x08
> +		ret = save_device_data(mdev_state, &pending);
> +		if (ret)
> +			break;
> +		*(u64 *)buf = pending;
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(data_offset):	// 0x10
> +		if (mdev_state->device_state & VFIO_DEVICE_STATE_SAVING) {
> +			ret = copy_device_data(mdev_state);
> +			if (ret)
> +				break;
> +		}
> +		*(u64 *)buf = MTTY_MIGRATION_REGION_DATA_OFFSET;
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(data_size):		// 0x18
> +		*(u64 *)buf = mdev_state->saved_size;
> +		break;
> +
> +	default:
> +		ret = -EINVAL;

These read/write functions MUST support read and write from the
migration data area, mmaps are optional for the user.  From our header:

  "The user is not required to access through mmap regardless of the
  capabilities of the region mmap."

Thanks,
Alex

> +	}
> +
> +#if defined(DEBUG)
> +	pr_info("%s: %s MIG  RD @0x%llx bytes: %d data: 0x%x\n",
> +			__func__, dev_name(mdev_dev(mdev_state->mdev)),
> +			offset, count, *(u32 *)buf);
> +#endif
> +	return ret;
> +}
> +
> +static int handle_mig_write(unsigned int index, struct mdev_state *mdev_state,
> +				loff_t offset, u8 *buf, u32 count)
> +{
> +	int ret = 0;
> +
> +#if defined(DEBUG)
> +	pr_info("%s: %s MIG  WR @0x%llx bytes: %d data: 0x%x\n",
> +			__func__, dev_name(mdev_dev(mdev_state->mdev)),
> +			offset, count, *(u32 *)buf);
> +#endif
> +	switch (offset) {
> +	case MIGRATION_INFO_OFFSET(device_state):	// 0x00
> +		ret = set_device_state(mdev_state, *(u32 *)buf);
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(data_size):		// 0x18
> +		ret = resume_device_data(mdev_state, *(u64 *)buf);
> +		break;
> +
> +	case MIGRATION_INFO_OFFSET(pending_bytes):	// 0x08
> +	case MIGRATION_INFO_OFFSET(data_offset):	// 0x10
> +	default:
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
>  static ssize_t mdev_access(struct mdev_device *mdev, u8 *buf, size_t count,
>  			   loff_t pos, bool is_write)
>  {
> @@ -702,6 +1035,18 @@ static ssize_t mdev_access(struct mdev_device *mdev, u8 *buf, size_t count,
>  		}
>  		break;
>  
> +	case MTTY_MIGRATION_REGION_INDEX:
> +		if (is_write) {
> +			ret = handle_mig_write(index, mdev_state, offset, buf,
> +					      count);
> +		} else {
> +			ret = handle_mig_read(index, mdev_state, offset, buf,
> +					      count);
> +		}
> +		if (ret)
> +			goto accessfailed;
> +		break;
> +
>  	default:
>  		ret = -1;
>  		goto accessfailed;
> @@ -709,7 +1054,6 @@ static ssize_t mdev_access(struct mdev_device *mdev, u8 *buf, size_t count,
>  
>  	ret = count;
>  
> -
>  accessfailed:
>  	mutex_unlock(&mdev_state->ops_lock);
>  
> @@ -819,13 +1163,29 @@ static int mtty_reset(struct mdev_device *mdev)
>  static ssize_t mtty_read(struct mdev_device *mdev, char __user *buf,
>  			 size_t count, loff_t *ppos)
>  {
> -	unsigned int done = 0;
> +	unsigned int done = 0, index;
>  	int ret;
>  
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +
>  	while (count) {
>  		size_t filled;
>  
> -		if (count >= 4 && !(*ppos % 4)) {
> +		if ((index == MTTY_MIGRATION_REGION_INDEX) &&
> +		    (count >= 8 && !(*ppos % 8))) {
> +			u64 val;
> +
> +			ret =  mdev_access(mdev, (u8 *)&val, sizeof(val),
> +					   *ppos, false);
> +			if (ret <= 0)
> +				goto read_err;
> +
> +			if (copy_to_user(buf, &val, sizeof(val)))
> +				goto read_err;
> +
> +			filled = 8;
> +
> +		} else if (count >= 4 && !(*ppos % 4)) {
>  			u32 val;
>  
>  			ret =  mdev_access(mdev, (u8 *)&val, sizeof(val),
> @@ -878,13 +1238,27 @@ static ssize_t mtty_read(struct mdev_device *mdev, char __user *buf,
>  static ssize_t mtty_write(struct mdev_device *mdev, const char __user *buf,
>  		   size_t count, loff_t *ppos)
>  {
> -	unsigned int done = 0;
> +	unsigned int done = 0, index;
>  	int ret;
>  
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>  	while (count) {
>  		size_t filled;
>  
> -		if (count >= 4 && !(*ppos % 4)) {
> +		if ((index == MTTY_MIGRATION_REGION_INDEX) &&
> +		    (count >= 8 && !(*ppos % 8))) {
> +			u64 val;
> +
> +			if (copy_from_user(&val, buf, sizeof(val)))
> +				goto write_err;
> +
> +			ret = mdev_access(mdev, (u8 *)&val, sizeof(val),
> +					  *ppos, true);
> +			if (ret <= 0)
> +				goto write_err;
> +
> +			filled = 8;
> +		} else if (count >= 4 && !(*ppos % 4)) {
>  			u32 val;
>  
>  			if (copy_from_user(&val, buf, sizeof(val)))
> @@ -1061,12 +1435,13 @@ static int mtty_trigger_interrupt(struct mdev_state *mdev_state)
>  }
>  
>  static int mtty_get_region_info(struct mdev_device *mdev,
> -			 struct vfio_region_info *region_info,
> -			 u16 *cap_type_id, void **cap_type)
> +				struct vfio_region_info *region_info,
> +				struct vfio_info_cap *caps)
>  {
>  	unsigned int size = 0;
>  	struct mdev_state *mdev_state;
> -	u32 bar_index;
> +	u32 index;
> +	int ret = 0;
>  
>  	if (!mdev)
>  		return -EINVAL;
> @@ -1075,13 +1450,13 @@ static int mtty_get_region_info(struct mdev_device *mdev,
>  	if (!mdev_state)
>  		return -EINVAL;
>  
> -	bar_index = region_info->index;
> -	if (bar_index >= VFIO_PCI_NUM_REGIONS)
> +	index = region_info->index;
> +	if (index >= MTTY_REGIONS_MAX)
>  		return -EINVAL;
>  
>  	mutex_lock(&mdev_state->ops_lock);
>  
> -	switch (bar_index) {
> +	switch (index) {
>  	case VFIO_PCI_CONFIG_REGION_INDEX:
>  		size = MTTY_CONFIG_SPACE_SIZE;
>  		break;
> @@ -1092,21 +1467,63 @@ static int mtty_get_region_info(struct mdev_device *mdev,
>  		if (mdev_state->nr_ports == 2)
>  			size = MTTY_IO_BAR_SIZE;
>  		break;
> +	case MTTY_MIGRATION_REGION_INDEX:
> +		size = MTTY_MIGRATION_REGION_SIZE;
> +		break;
>  	default:
>  		size = 0;
>  		break;
>  	}
>  
> -	mdev_state->region_info[bar_index].size = size;
> -	mdev_state->region_info[bar_index].vfio_offset =
> -		MTTY_VFIO_PCI_INDEX_TO_OFFSET(bar_index);
> +	mdev_state->region_info[index].size = size;
> +	mdev_state->region_info[index].vfio_offset =
> +					MTTY_VFIO_PCI_INDEX_TO_OFFSET(index);
>  
>  	region_info->size = size;
> -	region_info->offset = MTTY_VFIO_PCI_INDEX_TO_OFFSET(bar_index);
> +	region_info->offset = MTTY_VFIO_PCI_INDEX_TO_OFFSET(index);
>  	region_info->flags = VFIO_REGION_INFO_FLAG_READ |
> -		VFIO_REGION_INFO_FLAG_WRITE;
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +
> +	if (index == MTTY_MIGRATION_REGION_INDEX) {
> +		struct vfio_region_info_cap_sparse {
> +			struct vfio_region_info_cap_sparse_mmap sparse;
> +			struct vfio_region_sparse_mmap_area area;
> +		};
> +
> +		struct vfio_region_info_cap_sparse mig_region;
> +
> +		struct vfio_region_info_cap_type cap_type = {
> +			.header.id = VFIO_REGION_INFO_CAP_TYPE,
> +			.header.version = 1,
> +			.type = VFIO_REGION_TYPE_MIGRATION,
> +			.subtype = VFIO_REGION_SUBTYPE_MIGRATION
> +		};
> +
> +		/* Add REGION CAP type */
> +		ret = vfio_info_add_capability(caps, &cap_type.header,
> +						sizeof(cap_type));
> +		if (ret)
> +			goto exit;
> +
> +		/* Add sparse mmap cap type */
> +		mig_region.sparse.nr_areas = 1;
> +		mig_region.sparse.header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
> +		mig_region.sparse.header.version = 1;
> +
> +		mig_region.area.offset = MTTY_MIGRATION_REGION_DATA_OFFSET;
> +		mig_region.area.size = MTTY_MIGRATION_REGION_SIZE_MMAP;
> +
> +		region_info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
> +
> +		if (region_info->argsz > sizeof(*region_info))
> +			region_info->flags |= VFIO_REGION_INFO_FLAG_MMAP;
> +
> +		ret = vfio_info_add_capability(caps, &mig_region.sparse.header,
> +						sizeof(mig_region));
> +	}
> +exit:
>  	mutex_unlock(&mdev_state->ops_lock);
> -	return 0;
> +	return ret;
>  }
>  
>  static int mtty_get_irq_info(struct mdev_device *mdev,
> @@ -1138,7 +1555,7 @@ static int mtty_get_device_info(struct mdev_device *mdev,
>  			 struct vfio_device_info *dev_info)
>  {
>  	dev_info->flags = VFIO_DEVICE_FLAGS_PCI;
> -	dev_info->num_regions = VFIO_PCI_NUM_REGIONS;
> +	dev_info->num_regions = MTTY_REGIONS_MAX;
>  	dev_info->num_irqs = VFIO_PCI_NUM_IRQS;
>  
>  	return 0;
> @@ -1150,6 +1567,7 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  	int ret = 0;
>  	unsigned long minsz;
>  	struct mdev_state *mdev_state;
> +	struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>  
>  	if (!mdev)
>  		return -EINVAL;
> @@ -1185,8 +1603,6 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  	case VFIO_DEVICE_GET_REGION_INFO:
>  	{
>  		struct vfio_region_info info;
> -		u16 cap_type_id = 0;
> -		void *cap_type = NULL;
>  
>  		minsz = offsetofend(struct vfio_region_info, offset);
>  
> @@ -1196,11 +1612,29 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  		if (info.argsz < minsz)
>  			return -EINVAL;
>  
> -		ret = mtty_get_region_info(mdev, &info, &cap_type_id,
> -					   &cap_type);
> +		ret = mtty_get_region_info(mdev, &info, &caps);
>  		if (ret)
>  			return ret;
>  
> +		if (caps.size) {
> +			info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +							sizeof(info), caps.buf,
> +							caps.size)) {
> +					kfree(caps.buf);
> +					ret = -EFAULT;
> +					break;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +			kfree(caps.buf);
> +		}
> +
>  		if (copy_to_user((void __user *)arg, &info, minsz))
>  			return -EFAULT;
>  
> @@ -1266,6 +1700,89 @@ static long mtty_ioctl(struct mdev_device *mdev, unsigned int cmd,
>  	return -ENOTTY;
>  }
>  
> +void mmap_close(struct vm_area_struct *vma)
> +{
> +	struct mdev_device *mdev = vma->vm_private_data;
> +	struct mdev_state *mdev_state;
> +	uint32_t index = 0;
> +
> +	if (!mdev)
> +		return;
> +
> +	mdev_state = mdev_get_drvdata(mdev);
> +	if (!mdev_state)
> +		return;
> +
> +	mutex_lock(&mdev_state->ops_lock);
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
> +	if (index == MTTY_MIGRATION_REGION_INDEX) {
> +		if (mdev_state->mig_region_base != NULL) {
> +			vfree(mdev_state->mig_region_base);
> +			mdev_state->mig_region_base = NULL;
> +		}
> +
> +		if (mdev_state->pkt != NULL) {
> +			vfree(mdev_state->pkt);
> +			mdev_state->pkt = NULL;
> +		}
> +	}
> +	mutex_unlock(&mdev_state->ops_lock);
> +}
> +
> +static const struct vm_operations_struct mdev_vm_ops = {
> +	.close = mmap_close,
> +};
> +
> +static int mtty_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
> +{
> +	struct mdev_state *mdev_state;
> +	unsigned int index;
> +	int ret = 0;
> +
> +	if (!mdev)
> +		return -EINVAL;
> +
> +	mdev_state = mdev_get_drvdata(mdev);
> +	if (!mdev_state)
> +		return -ENODEV;
> +
> +	mutex_lock(&mdev_state->ops_lock);
> +
> +	index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
> +	if (index == MTTY_MIGRATION_REGION_INDEX) {
> +		mdev_state->mig_region_base =
> +				 vmalloc_user(MTTY_MIGRATION_REGION_SIZE_MMAP);
> +		if (mdev_state->mig_region_base == NULL) {
> +			ret = -ENOMEM;
> +			goto mmap_exit;
> +		}
> +
> +		mdev_state->pkt = vzalloc(sizeof(struct packet) +
> +					  PACKET_DATA_SIZE_MAX);
> +		if (mdev_state->pkt == NULL) {
> +			vfree(mdev_state->mig_region_base);
> +			mdev_state->mig_region_base = NULL;
> +			ret = -ENOMEM;
> +			goto mmap_exit;
> +		}
> +
> +		vma->vm_ops = &mdev_vm_ops;
> +
> +		ret = remap_vmalloc_range(vma, mdev_state->mig_region_base, 0);
> +		if (ret != 0) {
> +			pr_err("remap_vmalloc_range failed, ret= %d\n", ret);
> +			vfree(mdev_state->mig_region_base);
> +			mdev_state->mig_region_base = NULL;
> +			vfree(mdev_state->pkt);
> +			mdev_state->pkt = NULL;
> +			goto mmap_exit;
> +		}
> +	}
> +mmap_exit:
> +	mutex_unlock(&mdev_state->ops_lock);
> +	return ret;
> +}
> +
>  static void unpin_pages_all(struct mdev_state *mdev_state)
>  {
>  	struct mdev_device *mdev = mdev_state->mdev;
> @@ -1339,6 +1856,8 @@ static int mtty_open(struct mdev_device *mdev)
>  
>  	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, &events,
>  				     &mdev_state->nb);
> +	mdev_state->dummy_data_size = 0;
> +	mdev_state->mig_region_base = NULL;
>  	return ret;
>  }
>  
> @@ -1355,6 +1874,15 @@ static void mtty_close(struct mdev_device *mdev)
>  	unpin_pages_all(mdev_state);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>  				 &mdev_state->nb);
> +	if (mdev_state->pkt != NULL) {
> +		vfree(mdev_state->pkt);
> +		mdev_state->pkt = NULL;
> +	}
> +
> +	if (mdev_state->mig_region_base != NULL) {
> +		vfree(mdev_state->mig_region_base);
> +		mdev_state->mig_region_base = NULL;
> +	}
>  }
>  
>  static ssize_t
> @@ -1466,9 +1994,26 @@ pin_pages_store(struct device *dev, struct device_attribute *attr,
>  
>  static DEVICE_ATTR_RW(pin_pages);
>  
> +static ssize_t
> +dummy_data_size_MB_store(struct device *dev, struct device_attribute *attr,
> +			 const char *buf, size_t count)
> +{
> +	int ret;
> +
> +	ret = kstrtoul(buf, 0, &user_dummy_data_size);
> +	if (ret)
> +		return ret;
> +
> +	user_dummy_data_size = user_dummy_data_size << 20;
> +	return count;
> +}
> +
> +static DEVICE_ATTR_WO(dummy_data_size_MB);
> +
>  static struct attribute *mdev_dev_attrs[] = {
>  	&dev_attr_sample_mdev_dev.attr,
>  	&dev_attr_pin_pages.attr,
> +	&dev_attr_dummy_data_size_MB.attr,
>  	NULL,
>  };
>  
> @@ -1573,6 +2118,7 @@ static const struct mdev_parent_ops mdev_fops = {
>  	.read                   = mtty_read,
>  	.write                  = mtty_write,
>  	.ioctl		        = mtty_ioctl,
> +	.mmap			= mtty_mmap,
>  };
>  
>  static void mtty_device_release(struct device *dev)



^ permalink raw reply

* [PATCH] rockchip: rockpro64: Set cooling levels for pwm-fan
From: Simon Glass @ 2020-05-29 19:00 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <1590756151.9708.47.camel@intricatesoftware.com>

Hi Kurt,

On Fri, 29 May 2020 at 06:42, Kurt Miller <kurt@intricatesoftware.com> wrote:
>
> On Fri, 2020-05-29 at 09:27 +0100, Peter Robinson wrote:
> > On Thu, May 28, 2020 at 8:32 PM Kurt Miller <kurt@intricatesoftware.com> wrote:
> > >
> > >
> > > The cooling levels are tuned to the fan that comes with the rockpro64 NAS
> > > case. A gpu_thermal zone was not added because having two active cooling
> > > maps control one physical fan causes them to compete for the fan speed
> > > which results in erratic fan behavior.
> > Is there any reason this shouldn't go to the linux kernel first and
> > then be synced back to the standard rk3399-rockpro64.dtsi?
>
> Is that a requirement? I do my primary development on OpenBSD and
> while I use Linux for work tasks, I don't have available time right
> now to push these changes to Linux kernel first.
>

The problem is that we need to keep Linux and U-Boot in sync. If a DT
change is submitted only to one then it isn't clear who is taking on
the task of syncing them up.

You don't actually need to be using Linux to send a DT change - just
clone Linux, apply your patch and send to devicetree at vger.kernel.org.
I wonder if it would be good enough to cc that group and the Linux
maintainer on these patches, assuming the files are currently in sync?
But probably a new patch is needed.

Regards,
Simon

^ permalink raw reply

* Re: [PATCH] usb: dwc2: Fix shutdown callback in platform
From: Alan Stern @ 2020-05-29 19:00 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Frank Mori Hess, Minas Harutyunyan, John Youn, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb@vger.kernel.org, # 4.0+
In-Reply-To: <CAD=FV=XsLA3w2QPcNF3-mgZbZoGsz4kg_QvHcoZV=XTVDYhnSg@mail.gmail.com>

On Fri, May 29, 2020 at 11:45:53AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Fri, May 29, 2020 at 11:21 AM Frank Mori Hess <fmh6jj@gmail.com> wrote:
> >
> > On Fri, May 29, 2020 at 1:53 PM Doug Anderson <dianders@chromium.org> wrote:
> > > >
> > > > I don't get it.  A hypothetical machine could have literally anything
> > > > sharing the IRQ line, right?
> > >
> > > It's not a real physical line, though?  I don't think it's common to
> > > have a shared interrupt between different IP blocks in a given SoC.
> > > Even if it existed, all the drivers should disable their interrupts?
> >
> > I don't know, it's a hypothetical machine so it can be whatever you
> > want.  The driver requests shared irqs, if it doesn't actually support
> > irq sharing, it shouldn't request them.
> 
> I guess?  As I understood it drivers have to be very carefully coded
> up to support sharing their IRQ with someone else and I'm not
> convinced dwc2 does that anyway.  Certainly it doesn't hurt to keep
> dwc2 clean, but until I see someone that's actually sharing dwc2's
> interrupt and I can actually see an example I'm not sure I'm going to
> spend too much time thinking about it.

This is silly.  If the driver says it supports shared IRQs, then it 
should actually support them.

> > > > Anyways, my screaming interrupt occurs after a a new kernel has been
> > > > booted with kexec.  In this case, it doesn't matter if the old kernel
> > > > called disable_irq or not.  As soon as the new kernel re-enables the
> > > > interrupt line, the kernel immediately disables it again with a
> > > > backtrace due to the unhandled screaming interrupt.  That's why the
> > > > dwc2 hardware needs to have its interrupts turned off when the old
> > > > kernel is shutdown.
> > >
> > > Isn't that a bug with your new kernel?  I've seen plenty of bugs where
> > > drivers enable their interrupt before their interrupt handler is set
> > > to handle it.  You never know what state the bootloader (or previous
> > > kernel) might have left things in and if an interrupt was pending it
> > > shouldn't kill you.
> >
> > It wouldn't hurt to add disabling of the dwc2 irq early in dwc2
> > initialization,
> 
> More than it not hurting, I'd consider it a bug in the driver (and a
> much more serious one than shutdown not disabling the interrupt).

Normally the first thing a driver would do is reset the hardware, and 
that reset should disable any interrupt source.

> > but why leave the irq screaming after shutdown?
> 
> Sure.  So I guess the answer is to just do both disable the interrupt
> and make sure that the interrupt handler has finished.
> 
> dwc2_disable_global_interrupts(hsotg);
> disable_irq(hsotg->irq);

Drivers with shared IRQs don't call disable_irq(); they call 
synchronize_irq().  It will do what you want (wait until all running 
handlers have returned).

Alan Stern

^ permalink raw reply

* Re: [PATCH] ovl: fix some bug exist in ovl_get_inode
From: Vivek Goyal @ 2020-05-29 19:00 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Miklos Szeredi, yangerkun, overlayfs
In-Reply-To: <CAOQ4uxhie2s+yvF1jpPnh6-+a-r8kz589Y5znAX_jmeWqo+SCQ@mail.gmail.com>

On Fri, May 29, 2020 at 06:46:43PM +0300, Amir Goldstein wrote:
> > > > @@ -1023,7 +1020,7 @@ struct dentry *ovl_lookup(struct inode *
> > > >          *
> > > >          * Always lookup index of non-dir non-metacopy and non-upper.
> > > >          */
> > > > -       if (ctr && (!upperdentry || (!d.is_dir && !metacopy)))
> > > > +       if (ctr && (!upperdentry || (!d.is_dir && !uppermetacopy)))
> > > >                 origin = stack[0].dentry;
> > > >
> > >
> > > I think this should be:
> > >
> > >           * Always lookup index of non-dir and non-upper.
> > >           */
> > >           if (!origin && ctr && (!upperdentry || !d.is_dir))
> > >                  origin = stack[0].dentry;
> > >
> > > uppermetacopy is guaranteed to either have origin already set or
> > > exit with an an error for ovl_verify_origin().
> >
> > Only if index is enabled and upper had origin xattr.
> >
> > (!d.is_dir && ofs->config.index && origin_path)
> >
> > So if index is disabled or uppermetacopy did not have "origin" xattr,
> > we will not have origin set by the time we come out of the loop.
> >
> 
> True. But if index is disabled, setting origin is moot. origin is only
> ever used here to lookup the index.

Well, while looking up for index, we are checking for presence of
index dir (and not checking whether index is currently enabled or
not). So if somebody mounts overlayfs with index=on and later remounts
with index=off, we can still start looking up the index even if it
is not enabled. Is it intentional? If not, to simplify it, should
we lookup index only if it is enabled.

        if (origin && ofs->config.index &&
            (!d.is_dir || ovl_index_all(dentry->d_sb))) {
                index = ovl_lookup_index(ofs, upperdentry, origin, true);


> 
> About "origin" xattr. If it is not set in upper that lower fs probably does
> not have file handle support. In that case, index cannot be enabled
> anyway.

What about the case of multiple lower layers. IIUC, we will only 
ensure that top most lower layer has file handle support and not
worry about rest of the layers. This will break the case of setting
origin for !upperdentry. This will lookup index and fail if lower
layer does not support file handle.

So may be while enabling index, we should make sure all lower
layers support file handles otherwise fail?

> 
> > I see for non-metacopy regular files, if upper did not have origin
> > xattr, that means origin_path will by NULL. That means ctr will be
> > 0 and that means we will not set "origin" for non-metacopy regular
> > files in such case. So question is, should we set "origin" for
> > metacopy upper files in such a case.
> >
> > We did not have origin xattr, but we looked up lower layers for
> > upper metacopy. In theory, stack[0].dentry is origin for upper
> > metacopy files. Should we use it? Current logic does not and that's
> > why this additiona check (!d.is_dir && !uppermetacopy).
> >
> 
> I agree with your analysis, but this is a very theoretical discussion.
> Unless I am missing something, I think we have written a very complex
> condition for a corner case that doesn't seem to be valid or interesting.

I agree. I want to simplify it too. Just trying to make sure that
I don't end up breaking some valid configuration.

> 
> Basically, for non-dir, if there is no "origin" xattr, then there should be no
> index, because the metacopy feature was added way long after we
> started storing "origin" on copy up. That's not the case for directories.
> 
> There is one corner case where it may be relevant -
> overlay layers with metacopy that were created on fs with no file handle
> support (or no uuid) that are migrated to a filesystem with file handle
> support (and metacopy xattr are preserved in migration).
> In that case, index may be enabled while upper metacopy exists
> without "origin".
> 
> What happens if we do not set origin and do not lookup index in that case?
> We can get two overlay inodes, both from different metacopy upper inodes
> redirected to the same lower inode, that have the same st_ino, but differnt
> metadata.

We do not set origin on upper for broken hardlinks. So we will report
inode number from upper. I tried. it.

I tried following.

- touch foo.txt
- ln foo.txt foo-link.txt
- mount with metacopy=on
- chwon test:test foo.txt
- umount
- Goto upper/ and remove origin xattr from foo.txt. But there should not
  be one because we do not create ORIGIN for broken hardlinks if index is
  not enabled.
- mount overlay with index=on
- Do stat on foo.txt and foo-link.txt. foo.txt reports inode number from
  upper and foo-link.txt reports inode number from lower.
- chown test:test foo-link.txt
- stat foo-link.txt still reports inode number from lower.

Anyway, at this point of time, how about following.

- For non-upper dentry, always set origin.
- For upperdentry, there are 3 cases.
	- directories
	- regular files
	- regular metacopy files

  For directories and regular metacopy files only use verified origin.
  That means upper has origin xattr and it matches patch based looked
  up dentry. If we did not verify because either ORIGIN xattr is not
  there, or because index is not enabled or because ovl_verify_lower()
  is not set, then don't use path based looked up dentry as origin.

  For the case of regular file upper dentry, use unverified origin.
  It implies that ORGIN xattr is there. As there is no path based
  lookup origin for upper regular files. 

I am attaching a simple patch. Please let me know what do you think
of it.

> 
> > >
> > > HOWEVER, if we set origin to lower, which turns out to be a lower
> > > metacopy, we then skip this layer to the next one, but origin remains
> > > set on the skipped layer dentry, which we had already dput().
> > > Ay ay ay!
> >
> > We only skip the intermediate metacopy entries in lower. So top most
> > lower metacopy will still be retained. For example, if there are 3
> > lower layers where top two are metacopy and one data, then we will
> > only skip middle one. And middle one should not be origin for upper.
> >
> >                 /*
> >                  * Do not store intermediate metacopy dentries in chain,
> >                  * except top most lower metacopy dentry
> >                  */
> >                 if (d.metacopy && ctr) {
> >                         dput(this);
> >                         continue;
> >                 }
> >
> > For the first lower, ctr will be 0 and we will always store it in
> > stack. So if it is metacopy dentry, it will still be stored at
> > stack[0].
> >
> > Do you still see the problem?
> 
> No. it's fine. My eyes missed the ctr condition.
> I still think since you are changing this code.
> It will be much easier to follow if both simple continue statement
> are at the top of the loop.

Ok, will do.

Here is the patch to simplify the condition or origin. I will add some
changelog and comments in code in v2 of patch if you like the patch.

Thanks
Vivek


---
 fs/overlayfs/namei.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: redhat-linux/fs/overlayfs/namei.c
===================================================================
--- redhat-linux.orig/fs/overlayfs/namei.c	2020-05-29 14:24:45.997113946 -0400
+++ redhat-linux/fs/overlayfs/namei.c	2020-05-29 14:46:46.692113946 -0400
@@ -1005,6 +1005,7 @@ struct dentry *ovl_lookup(struct inode *
 		}
 		stack = origin_path;
 		ctr = 1;
+		origin = origin_path->dentry;
 		origin_path = NULL;
 	}
 
@@ -1021,9 +1022,9 @@ struct dentry *ovl_lookup(struct inode *
 	 * index. This case should be handled in same way as a non-dir upper
 	 * without ORIGIN is handled.
 	 *
-	 * Always lookup index of non-dir non-metacopy and non-upper.
+	 * Always lookup index of non-upper.
 	 */
-	if (ctr && (!upperdentry || (!d.is_dir && !metacopy)))
+	if (!origin && ctr && !upperdentry)
 		origin = stack[0].dentry;
 
 	if (origin && ovl_indexdir(dentry->d_sb) &&


^ permalink raw reply

* [PATCH v2 0/2] fix missing handling of __user in nommu's uaccess()
From: Luc Van Oostenryck @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer
  Cc: linux-m68k, linux-kernel, Luc Van Oostenryck

I received a bug report for an unrelated patch when used with m68k-nommu.
It appears that the origin of the problem is that __get_user() and
__put_user() doesn't handle correctly __user. These 2 patches fix this.

Note: this is only minimaly tested but is quite straightforward and
      since this only change __user annotation it will not change the
      generated code.


Changes since v1:
* fix typo: s/plan/plain/
* appease checkpatch with better style: s/__force*/__force */
* avoid excessive line length caused by the added cast.

Luc Van Oostenryck (2):
  m68k,nommu: add missing __user in uaccess' __ptr() macro
  m68k,nommu: fix implicit cast from __user in __{get,put}_user_asm()

 arch/m68k/include/asm/uaccess_no.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [PATCH v2 1/2] m68k,nommu: add missing __user in uaccess' __ptr() macro
From: Luc Van Oostenryck @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer
  Cc: linux-m68k, linux-kernel, Luc Van Oostenryck, kbuild test robot
In-Reply-To: <20200529190218.36560-1-luc.vanoostenryck@gmail.com>

The assembly for __get_user() & __put_user() uses a macro, __ptr(),
to cast the pointer to 'unsigned long *' but the pointer is always
a __user one and so this cast creates a lot of warnings when using
Sparse.

So, change to the cast to 'unsigned long __user *'.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 arch/m68k/include/asm/uaccess_no.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/m68k/include/asm/uaccess_no.h b/arch/m68k/include/asm/uaccess_no.h
index a24cfe4a0d32..9651766a62af 100644
--- a/arch/m68k/include/asm/uaccess_no.h
+++ b/arch/m68k/include/asm/uaccess_no.h
@@ -60,7 +60,7 @@ extern int __put_user_bad(void);
  * aliasing issues.
  */
 
-#define __ptr(x) ((unsigned long *)(x))
+#define __ptr(x) ((unsigned long __user *)(x))
 
 #define __put_user_asm(err,x,ptr,bwl)				\
 	__asm__ ("move" #bwl " %0,%1"				\
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 2/2] m68k,nommu: fix implicit cast from __user in __{get,put}_user_asm()
From: Luc Van Oostenryck @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer
  Cc: linux-m68k, linux-kernel, Luc Van Oostenryck, kbuild test robot
In-Reply-To: <20200529190218.36560-1-luc.vanoostenryck@gmail.com>

The assembly for __get_user_asm() & __put_user_asm() uses memcpy()
when the size is 8.

However, the pointer is always a __user one while memcpy() expects
a plain one and so this cast creates a lot of warnings when using
Sparse.

So, fix this by adding a cast to 'void __force *' at memcpy()'s
argument.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 arch/m68k/include/asm/uaccess_no.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess_no.h b/arch/m68k/include/asm/uaccess_no.h
index 9651766a62af..9959327e99b0 100644
--- a/arch/m68k/include/asm/uaccess_no.h
+++ b/arch/m68k/include/asm/uaccess_no.h
@@ -42,7 +42,7 @@ static inline int _access_ok(unsigned long addr, unsigned long size)
 	__put_user_asm(__pu_err, __pu_val, ptr, l);	\
 	break;						\
     case 8:						\
-	memcpy(ptr, &__pu_val, sizeof (*(ptr))); \
+	memcpy((void __force *)ptr, &__pu_val, sizeof(*(ptr))); \
 	break;						\
     default:						\
 	__pu_err = __put_user_bad();			\
@@ -85,7 +85,7 @@ extern int __put_user_bad(void);
 	    u64 l;						\
 	    __typeof__(*(ptr)) t;				\
 	} __gu_val;						\
-	memcpy(&__gu_val.l, ptr, sizeof(__gu_val.l));		\
+	memcpy(&__gu_val.l, (const void __force *)ptr, sizeof (__gu_val.l)); \
 	(x) = __gu_val.t;					\
 	break;							\
     }								\
-- 
2.26.2


^ permalink raw reply related

* Re: The audit "context" and when to expect it.
From: Paul Moore @ 2020-05-29 19:01 UTC (permalink / raw)
  To: Casey Schaufler; +Cc: Richard Guy Briggs, Linux-Audit Mailing List
In-Reply-To: <45ce3357-ca82-8721-22d6-dabe751ad8fa@schaufler-ca.com>

On Fri, May 29, 2020 at 1:59 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> What does a NULL audit context (e.g. ab->cxt == NULL) tell
> me about the status of the audit buffer? It seems like it should
> be telling me that the audit buffer is being created for some
> purpose unrelated to the current task. And yet there are places
> where information is pulled from the current task even when
> the cxt is NULL.

The simple answer is that a NULL audit_context indicates a standalone
record, meaning a record with a unique timestamp so that it is not
associated with any other records into an event.  If the audit_context
it not NULL then the information in the context is used to group, or
associate, all of the records sharing that context into a single
event.

This is just one example, but a non-NULL audit_context is how PATH
records end up being associated with SYSCALL records in a single
event.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/gem: Taint all shrinkable object locks
From: Patchwork @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx
In-Reply-To: <20200529183204.16850-1-chris@chris-wilson.co.uk>

== Series Details ==

Series: series starting with [1/2] drm/i915/gem: Taint all shrinkable object locks
URL   : https://patchwork.freedesktop.org/series/77799/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8554 -> Patchwork_17821
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17821/index.html


Changes
-------

  No changes found


Participating hosts (49 -> 43)
------------------------------

  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8554 -> Patchwork_17821

  CI-20190529: 20190529
  CI_DRM_8554: ac5c0538eb79074e97a7a5c03c285f339290d961 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5684: bd399f5eb8263bb4a84ae6a5bb1a13d329e0515d @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17821: 35c96a6a980bde9bcf4baf8a35ee85b468640124 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

35c96a6a980b drm/i915/gem: Give each object class a friendly name
9330abaf4eee drm/i915/gem: Taint all shrinkable object locks

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17821/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* Re: [PATCH 10/30] KVM: nSVM: extract preparation of VMCB for nested run
From: Paolo Bonzini @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm
In-Reply-To: <bf123ad6-3313-351c-a7b8-e55cefb53f63@oracle.com>

On 29/05/20 20:27, Krish Sadhukhan wrote:
>>
>> +static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct
>> vmcb *nested_vmcb)
> 
> 
> Not a big deal, but I feel that it helps a lot in readability if we keep
> the names symmetric. This one could be named prepare_nested_vmcb_save to
> match load_nested_vmcb_control that you created in the previous patch.
> Or load_nested_vmcb_control could be renamed to nested_load_vmcb_control
> to match the name here.

This is actually intended: while load_nested_vmcb_control loads the
members of nested_vmcb->control into svm->nested, the two functions in
this patch prepare the svm->vmcb.  A couple patches later,
nested_prepare_vmcb_control will not use nested_vmcb anymore.

I could use nested_load_nested_vmcb_control, but that is just too ugly!
 Instead, the best thing to do would be to use the vmcb01/vmcb02/vmcb12
names as in nVMX, in which case the functions would become
nested_load_vmcb12_control and nested_prepare_vmcb02_{save,control}.
However this is a bit hard to do right now because the svm->vmcb acts as
both vmcb01 and vmcb02 depending on what is running.

Thanks,

Paolo


^ permalink raw reply

* Re: [PATCH V4 0/4] Introducing RDMA shared CQ pool
From: Jason Gunthorpe @ 2020-05-29 19:03 UTC (permalink / raw)
  To: Yamin Friedman
  Cc: Sagi Grimberg, Christoph Hellwig, Leon Romanovsky, linux-rdma
In-Reply-To: <1590568495-101621-1-git-send-email-yaminf@mellanox.com>

On Wed, May 27, 2020 at 11:34:51AM +0300, Yamin Friedman wrote:
> This is the fourth re-incarnation of the CQ pool patches proposed
> by Sagi and Christoph. I have started with the patches that Sagi last
> submitted and built the CQ pool as a new API for acquiring shared CQs.
> 
> The main change from Sagi's last proposal is that I have simplified the
> method that ULP drivers interact with the CQ pool. Instead of calling 
> ib_alloc_cq they now call ib_cq_pool_get but use the CQ in the same manner
> that they did before. This allows for a much easier transition to using
> shared CQs by the ULP and makes it easier to deal with IB_POLL_DIRECT
> contexts. Certain types of actions on CQs have been prevented on shared
> CQs in order to prevent one user from harming another.
> 
> Our ULPs often want to make smart decisions on completion vector
> affinitization when using multiple completion queues spread on
> multiple cpu cores. We can see examples for this in iser, srp, nvme-rdma.
> 
> This patch set attempts to move this smartness to the rdma core by
> introducing per-device CQ pools that by definition spread
> across cpu cores. In addition, we completely make the completion
> queue allocation transparent to the ULP by adding affinity hints
> to create_qp which tells the rdma core to select (or allocate)
> a completion queue that has the needed affinity for it.
> 
> This API gives us a similar approach to whats used in the networking
> stack where the device completion queues are hidden from the application.
> With the affinitization hints, we also do not compromise performance
> as the completion queue will be affinitized correctly.
> 
> One thing that should be noticed is that now different ULPs using this
> API may share completion queues (given that they use the same polling context).
> However, even without this API they share interrupt vectors (and CPUs
> that are assigned to them). Thus aggregating consumers on less completion
> queues will result in better overall completion processing efficiency per
> completion event (or interrupt).
> 
> An advantage of this method of using the CQ pool is that changes in the ULP
> driver are minimal (around 14 altered lines of code).
> 
> The patch set converts nvme-rdma and nvmet-rdma to use the new API.
> 
> Test results can be found in patch-0002.
> 
> Comments and feedback are welcome.
> 
> Changes since v3
> 
> *Refactored ib_poll_ctx enum
> *Moved to correct use of unsigned types
> *Removed use of spin_lock_irqsave
> *Moved pool init and destroy function calls
> *Corrected function documentation
> 
> Changes since v2
> 
> *Minor code refactoring
> 
> Changes since v1
> 
> *Simplified cq pool shutdown process
> *Renamed cq pool functions to be like mr pool
> *Simplified process for finding cqs in pool
> *Removed unhelpful WARN prints
> *Removed one liner functions
> *Replaced cq_type with boolean shared
> *Updated test results to more properly show effect of patch
> *Minor bug fixes
> 
> Yamin Friedman (4):
>   RDMA/core: Add protection for shared CQs used by ULPs
>   RDMA/core: Introduce shared CQ pool API
>   nvme-rdma: use new shared CQ mechanism
>   nvmet-rdma: use new shared CQ mechanism
> 
>  drivers/infiniband/core/core_priv.h |   3 +
>  drivers/infiniband/core/cq.c        | 171 ++++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/device.c    |   2 +
>  drivers/infiniband/core/verbs.c     |   9 ++
>  drivers/nvme/host/rdma.c            |  75 ++++++++++------
>  drivers/nvme/target/rdma.c          |  14 +--
>  include/rdma/ib_verbs.h             |  21 ++++-
>  7 files changed, 261 insertions(+), 34 deletions(-)

Applied to for-next, thanks

Jason

^ permalink raw reply

* Re: Kernel crash due to memory corruption with v5.4.26-rt17 and PowerPC e500
From: Mark Marshall @ 2020-05-29 19:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Mark Marshall, thomas.graziadei, Thomas Gleixner,
	linux-kernel, rostedt
In-Reply-To: <20200529161518.svpxhkeljafbtdz2@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 2851 bytes --]

My config is attached.  This is the greatly reduced config that I used
when trying to narrow down the problem.  We normally have much more
enabled, but that had no effect on the bug in my testing.  We do,
unfortunately, have quite a few out-of-tree patches, but they are all
in USB or Networking, which are disabled here.

I've never tried out the kernel under qemu, but I will try that next
week to see if I can reproduce the problem there.  It's certainly
quite a narrow race window though, so it might behave quite
differently under qemu.  In general, how reliable is qemu at showing
these kinds of problems?

Thanks,
Mark

PS.
I've also noticed that THREAD_SHIFT is set in this config.  That's
because when I added lots of debug options, I got warnings about the
stack being too small.  This had no impact on the bug that I had, I
increased the size of the stack, and the stack warnings stopped, but
the bug was still the same.

On Fri, 29 May 2020 at 18:15, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-05-29 17:38:39 [+0200], Mark Marshall wrote:
> > Hi Sebastian & list,
> Hi,
>
> > I had assumed that my e-mail had got lost or overlooked, I was meaning to
> > post a follow up message this week...
> >
> > All I could find from the debugging and tracing that we added was that
> > something was going wrong with the mm data structures somewhere in the
> > exec code.  In the end I just spent a week or two pouring over the diffs
> > of this code between the versions that I new worked and didn't work.
> >
> > I eventually found the culprit.  On the working kernel versions there is
> > a patch called "mm: Protect activate_mm() by preempt_[disable&enable]_rt()".
> > This is commit f0b4a9cb253a on the V4.19.82-rt30 branch, for instance.
> > Although the commit message talks about ARM, it seems that we need this for
> > PowerPC too (I guess, any PowerPC with the "nohash" MMU?).
>
> Could you drop me your config, please? I need to dig here a little and I
> should have seen this on qemu, right?
>
> > Could you please add this commit back to the RT branch?  I'm not sure how
> > to find out the history of this commit.  For instance, why has it been
> > removed from the RT patchset?  How are these things tracked, generally?
>
> I dropped that patch in v5.4.3-rt1. I couldn't reproduce the issue that
> was documented in the patch and the code that triggered the warning was
> removed / reworked in commit
>     b5466f8728527 ("ARM: mm: remove IPI broadcasting on ASID rollover")
>
> So it looked like no longer needed and then got dropped during the
> rebase.
> In order to get it back into the RT queue I need to understand why it is
> required. What exactly is it fixing. Let me stare at for a little…
>
> > Best regards,
> > Mark
>
> Sebastian

[-- Attachment #2: config-5.4-rt --]
[-- Type: application/octet-stream, Size: 5142 bytes --]

# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_RT=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_RCU_EXPERT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
# CONFIG_RSEQ is not set
CONFIG_EMBEDDED=y
CONFIG_PERF_EVENTS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PPC_85xx=y
CONFIG_MPC85xx_DS=y
CONFIG_MPC85xx_RDB=y
CONFIG_P1010_RDB=y
CONFIG_MAIO400=y
CONFIG_MIC400=y
CONFIG_GEN_RTC=y
CONFIG_HZ_1000=y
CONFIG_THREAD_SHIFT=14
# CONFIG_SUSPEND is not set
# CONFIG_SECCOMP is not set
CONFIG_FSL_LBC=y
CONFIG_JUMP_LABEL=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLK_DEV_INTEGRITY=y
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_COMPACTION is not set
# CONFIG_MIGRATION is not set
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_STANDALONE is not set
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_RAW_NAND=y
CONFIG_MTD_NAND_FSL_IFC=y
CONFIG_MTD_SPI_NOR=y
CONFIG_MTD_UBI=y
CONFIG_MTD_UBI_FASTMAP=y
CONFIG_MTD_UBI_BLOCK=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=1
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=2
CONFIG_BLK_DEV_RAM_SIZE=131072
CONFIG_EEPROM_AT24=y
CONFIG_EEPROM_AT25=y
CONFIG_EEPROM_93CX6=m
CONFIG_SCSI=y
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=y
# CONFIG_SCSI_LOWLEVEL is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_GPIO=y
# CONFIG_INPUT_MOUSE is not set
# CONFIG_SERIO is not set
CONFIG_LEGACY_PTY_COUNT=64
CONFIG_DEVKMEM=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_NR_UARTS=2
CONFIG_SERIAL_8250_RUNTIME_UARTS=2
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y
# CONFIG_NVRAM is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS_SPI=y
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MPC=y
CONFIG_SPI=y
CONFIG_SPI_FSL_ESPI=y
CONFIG_GPIOLIB=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_MPC8XXX=y
CONFIG_GPIO_PCA953X=y
CONFIG_GPIO_PCA953X_IRQ=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_BOOKE_WDT=y
CONFIG_BOOKE_WDT_DEFAULT_TIMEOUT=34
# CONFIG_VGA_CONSOLE is not set
# CONFIG_HID is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_DRV_DS1307=y
CONFIG_RTC_DRV_CMOS=y
# CONFIG_DNOTIFY is not set
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
CONFIG_JFFS2_FS_WBUF_VERIFY=y
CONFIG_JFFS2_SUMMARY=y
CONFIG_JFFS2_FS_XATTR=y
CONFIG_UBIFS_FS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_DIRECT=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
CONFIG_KEYS=y
CONFIG_CRYPTO_ECDH=y
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_ECHAINIV=m
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_ESSIV=y
CONFIG_CRYPTO_CMAC=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MD5_PPC=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_PPC_SPE=y
CONFIG_CRYPTO_SHA256_PPC_SPE=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_PPC_SPE=y
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_DEV_FSL_CAAM=y
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_PKCS7_MESSAGE_PARSER=y
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_CRC_CCITT=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_IA64 is not set
# CONFIG_XZ_DEC_ARM is not set
# CONFIG_XZ_DEC_ARMTHUMB is not set
# CONFIG_XZ_DEC_SPARC is not set
CONFIG_DYNAMIC_DEBUG=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_PAGE_POISONING=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_VMACACHE=y
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_POISON=y
CONFIG_DEBUG_VIRTUAL=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_KASAN=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=60
CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_RCU_EQS_DEBUG=y
CONFIG_FUNCTION_TRACER=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_UBSAN=y
CONFIG_PPC_DISABLE_WERROR=y
CONFIG_PPC_EMULATED_STATS=y
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y
CONFIG_BDI_SWITCH=y

^ permalink raw reply

* Re: [PATCH net-next v4 3/4] dt-bindings: net: Add RGMII internal delay for DP83869
From: Rob Herring @ 2020-05-29 19:03 UTC (permalink / raw)
  To: Dan Murphy
  Cc: andrew, f.fainelli, hkallweit1, davem, netdev, linux-kernel,
	devicetree
In-Reply-To: <20200527164934.28651-4-dmurphy@ti.com>

On Wed, May 27, 2020 at 11:49:33AM -0500, Dan Murphy wrote:
> Add the internal delay values into the header and update the binding
> with the internal delay properties.
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
>  .../devicetree/bindings/net/ti,dp83869.yaml      | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/ti,dp83869.yaml b/Documentation/devicetree/bindings/net/ti,dp83869.yaml
> index 5b69ef03bbf7..2971dd3fc039 100644
> --- a/Documentation/devicetree/bindings/net/ti,dp83869.yaml
> +++ b/Documentation/devicetree/bindings/net/ti,dp83869.yaml
> @@ -64,6 +64,20 @@ properties:
>         Operational mode for the PHY.  If this is not set then the operational
>         mode is set by the straps. see dt-bindings/net/ti-dp83869.h for values
>  
> +  rx-internal-delay-ps:
> +    $ref: "#/properties/rx-internal-delay-ps"

This just creates a circular reference which probably blows up.

> +    description: Delay is in pico seconds
> +    enum: [ 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000,
> +            3250, 3500, 3750, 4000 ]
> +    default: 2000
> +
> +  tx-internal-delay-ps:
> +    $ref: "#/properties/tx-internal-delay-ps"
> +    description: Delay is in pico seconds
> +    enum: [ 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000,
> +            3250, 3500, 3750, 4000 ]
> +    default: 2000
> +
>  required:
>    - reg
>  
> @@ -80,5 +94,7 @@ examples:
>          ti,op-mode = <DP83869_RGMII_COPPER_ETHERNET>;
>          ti,max-output-impedance = "true";
>          ti,clk-output-sel = <DP83869_CLK_O_SEL_CHN_A_RCLK>;
> +        rx-internal-delay-ps = <2000>;
> +        tx-internal-delay-ps = <2000>;
>        };
>      };
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [PATCH v2 1/2] spi: dw: Make DMA request line assignments explicit for Intel Medfield
From: Serge Semin @ 2020-05-29 19:04 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Serge Semin, Mark Brown, linux-spi
In-Reply-To: <20200529184955.GY1634618@smile.fi.intel.com>

On Fri, May 29, 2020 at 09:49:55PM +0300, Andy Shevchenko wrote:
> On Fri, May 29, 2020 at 09:40:50PM +0300, Serge Semin wrote:
> > On Fri, May 29, 2020 at 09:31:49PM +0300, Andy Shevchenko wrote:
> > > The 2afccbd283ae ("spi: dw: Discard static DW DMA slave structures")
> > > did a clean up of global variables, which is fine, but messed up with
> > > the carefully provided information in the custom DMA slave structures.
> > > There reader can find an assignment of the DMA request lines in use.
> > > 
> > > Partially revert the above mentioned commit to restore readability
> > > and maintainability of the code.
> > > 
> > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > ---
> > > v2: rebased against latest spi/for-next
> > >  drivers/spi/spi-dw-dma.c | 14 ++++++--------
> > >  1 file changed, 6 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c
> > > index 1b96cec6d8cd..53d5257662e8 100644
> > > --- a/drivers/spi/spi-dw-dma.c
> > > +++ b/drivers/spi/spi-dw-dma.c
> > > @@ -61,10 +61,8 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws)
> > >  
> > >  static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws)
> > >  {
> > > -	struct dw_dma_slave slave = {
> > > -		.src_id = 0,
> > > -		.dst_id = 0
> > > -	};
> > 
> > > +	struct dw_dma_slave dma_tx = { .dst_id = 1 }, *tx = &dma_tx;
> > > +	struct dw_dma_slave dma_rx = { .src_id = 0 }, *rx = &dma_rx;
> > 


> > You know my attitude to these changes.) But anyway what's the point in having
> > the *tx and *rx pointers here? Without any harm to the readability you can use
> > the structures names directly, don't you?
> 
> I will wait for Mark to decide.

So no response to a review comment? Shall I do the same when get a review from
you?.)

I am not asking about the whole patch purpose. You know what I think about it.
My question was about why *tx and *rx pointers are required? Just wondering, I
may misunderstand something... As I see it you could use dma_tx and dma_rx here
directly with the same level of readability.

-Sergey
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply

* Re: [PATCH 08/30] KVM: nSVM: move map argument out of enter_svm_guest_mode
From: Paolo Bonzini @ 2020-05-29 19:04 UTC (permalink / raw)
  To: Krish Sadhukhan, linux-kernel, kvm
In-Reply-To: <f7946509-ff69-03e3-ec43-90a04714afe3@oracle.com>

On 29/05/20 20:10, Krish Sadhukhan wrote:
>> Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart,
>> since the map is not used elsewhere in the function.  There are
>> just two calls, so move it there.
> 
> The last sentence sounds bit incomplete.

Good point---more precisely, "calls" should be "callers".  "It" refers
to "unmapping".

> Also, does it make sense to mention the reason why unmapping is not
> required before we enter guest mode ?

Unmapping is a host operation and is not visible by the guest; is this
what you mean?

Paolo


^ permalink raw reply

* Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
From: Logan Gunthorpe @ 2020-05-29 19:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kvm, David Airlie, dri-devel, Bjorn Andersson, linux-tegra,
	Julien Grall, Will Deacon, Marek Szyprowski,
	Jean-Philippe Brucker, linux-samsung-soc, Marc Zyngier,
	Krzysztof Kozlowski, Jonathan Hunter, linux-rockchip, Andy Gross,
	linux-arm-kernel, linux-s390, linux-arm-msm, intel-gfx,
	linux-mediatek, Matthias Brugger, Thomas Gleixner, virtualization,
	Gerald Schaefer
In-Reply-To: <20200529124523.GA11817@infradead.org>



On 2020-05-29 6:45 a.m., Christoph Hellwig wrote:
> On Thu, May 28, 2020 at 06:00:44PM -0600, Logan Gunthorpe wrote:
>>> This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
>>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>>>
>>> Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.  
> 
> Mark did a big audit into the map_sg API abuse and initially had
> some i915 patches, but then gave up on them with this comment:
> 
> "The biggest TODO is DRM/i915 driver and I don't feel brave enough to fix
>  it fully. The driver creatively uses sg_table->orig_nents to store the
>  size of the allocate scatterlist and ignores the number of the entries
>  returned by dma_map_sg function. In this patchset I only fixed the
>  sg_table objects exported by dmabuf related functions. I hope that I
>  didn't break anything there."
> 
> it would be really nice if the i915 maintainers could help with sorting
> that API abuse out.
> 

I agree completely that the API abuse should be sorted out, but I think
that's much larger than just the i915 driver. Pretty much every dma-buf
map_dma_buf implementation I looked at ignores the returned nents of
sg_attrs. This sucks, but I don't think it's the bug Tom ran into. See:

amdgpu_dma_buf_map
armada_gem_prime_map_dma_buf
drm_gem_map_dma_buf
i915_gem_map_dma_buf
tegra_gem_prime_map_dma_buf

So this should probably be addressed by the whole GPU community.

However, as Robin pointed out, there are other ugly tricks like stopping
iterating through the SGL when sg_dma_len() is zero. For example, the
AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does
this trick and thus likely isn't buggy (otherwise, I'd expect someone to
have complained by now seeing AMD has already switched to IOMMU-DMA.

As I tried to point out in my previous email, i915 does not do this
trick. In fact, it completely ignores sg_dma_len() and is thus
completely broken. See i915_scatterlist.h and the __sgt_iter() function.
So it doesn't sound to me like Mark's fix would address the issue at
all. Per my previous email, I'd guess that it can be fixed simply by
adjusting the __sgt_iter() function to do something more sensible.
(Better yet, if possible, ditch __sgt_iter() and use the common DRM
function that AMD uses).

This would at least allow us to make progress with Tom's IOMMU-DMA patch
set and once that gets in, it will be harder for other drivers to make
the same mistake.

Logan


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* Re: [PATCH 1/6] x86/entry: Introduce local_db_{save,restore}()
From: Peter Zijlstra @ 2020-05-29 19:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, tglx, linux-kernel, x86, Lai Jiangshan,
	sean.j.christopherson, daniel.thompson
In-Reply-To: <AC94E789-9EC7-4156-955C-841FF7114176@amacapital.net>

On Fri, May 29, 2020 at 10:28:33AM -0700, Andy Lutomirski wrote:
> > +static __always_inline unsigned long local_db_save(void)
> > +{
> > +    unsigned long dr7;
> > +
> > +    get_debugreg(&dr7, 7);
> > +    dr7 ^= 0x400;
> 
> Why xor?  This seems extra confusing.

I'll do the normal mask thing ..

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.