LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3] powerpc/kernel/sysfs: cleanup set up macros for PMC/non-PMC SPRs
From: Olof Johansson @ 2013-10-04  4:19 UTC (permalink / raw)
  To: Madhavan Srinivasan; +Cc: linuxppc-dev
In-Reply-To: <1380792456-19128-1-git-send-email-maddy@linux.vnet.ibm.com>

On Thu, Oct 3, 2013 at 2:27 AM, Madhavan Srinivasan
<maddy@linux.vnet.ibm.com> wrote:
> Currently PMC (Performance Monitor Counter) setup macros are used
> for other SPRs. Since not all SPRs are PMC related, this patch
> modifies the exisiting macro and uses it to setup both PMC and
> non PMC SPRs accordingly.
>
> V3 changes:
>
> 1) No logic change, just renamed generic macro and removed #define for empty string
> 2) Changes in the comment to explain better.
>
> V2 changes:
>
> 1) Modified SYSFS_PMCSETUP to a generic macro with additional parameter
> 2) Added PMC and SPR macro to call the generic macro
> 3) Changes in the comment to explain better.
>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

Acked-by: Olof Johansson <olof@lixom.net>

For the pa6t parts


-Olof

^ permalink raw reply

* Re: [PATCH RFC 06/77] PCI/MSI: Factor out pci_get_msi_cap() interface
From: Alexander Gordeev @ 2013-10-04  5:13 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, linux-s390,
	Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar, linux-pci,
	iss_storagedev, linux-driver, Tejun Heo, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Martin Schwidefsky,
	linux390, linuxppc-dev
In-Reply-To: <1380837174.3419.21.camel@bwh-desktop.uk.level5networks.com>

On Thu, Oct 03, 2013 at 10:52:54PM +0100, Ben Hutchings wrote:
> On Wed, 2013-10-02 at 12:48 +0200, Alexander Gordeev wrote:
> >  #ifndef CONFIG_PCI_MSI
> > +static inline int pci_get_msi_cap(struct pci_dev *dev)
> > +{
> > +	return -1;
> [...]
> 
> Shouldn't this also return -EINVAL?

Yep, all inliners here are better to return -EINVAL.
Will do unless someone speaks out against.

> Ben.

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Eric Dumazet @ 2013-10-04  5:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-s390, netdev, Eric Dumazet, Daniel Borkmann, linuxppc-dev,
	David S. Miller, linux-arm-kernel
In-Reply-To: <1380859875-31025-1-git-send-email-ast@plumgrid.com>

On Thu, 2013-10-03 at 21:11 -0700, Alexei Starovoitov wrote:

> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index a6ac848..5d66cd9 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -25,15 +25,20 @@ struct sk_filter
>  {
>  	atomic_t		refcnt;
>  	unsigned int         	len;	/* Number of filter blocks */
> +	struct rcu_head		rcu;
>  	unsigned int		(*bpf_func)(const struct sk_buff *skb,
>  					    const struct sock_filter *filter);
> -	struct rcu_head		rcu;
> +	/* insns start right after bpf_func, so that sk_run_filter() fetches
> +	 * first insn from the same cache line that was used to call into
> +	 * sk_run_filter()
> +	 */
>  	struct sock_filter     	insns[0];
>  };
>  
>  static inline unsigned int sk_filter_len(const struct sk_filter *fp)
>  {
> -	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
> +	return max(fp->len * sizeof(struct sock_filter),
> +		   sizeof(struct work_struct)) + sizeof(*fp);
>  }

I would use for include/linux/filter.h this (untested) diff :

(Note the include <linux/workqueue.h>)

I also remove your comment about cache lines, since there is nothing
to align stuff on a cache line boundary.

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a6ac848..281b05c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -6,6 +6,7 @@
 
 #include <linux/atomic.h>
 #include <linux/compat.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/filter.h>
 
 #ifdef CONFIG_COMPAT
@@ -25,15 +26,20 @@ struct sk_filter
 {
 	atomic_t		refcnt;
 	unsigned int         	len;	/* Number of filter blocks */
+	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
-	struct rcu_head		rcu;
-	struct sock_filter     	insns[0];
+	union {
+		struct work_struct	work;
+		struct sock_filter	insns[0];
+	};
 };
 
-static inline unsigned int sk_filter_len(const struct sk_filter *fp)
+static inline unsigned int sk_filter_size(const struct sk_filter *fp,
+					  unsigned int proglen)
 {
-	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+	return max(sizeof(*fp),
+		   offsetof(struct sk_filter, insns[proglen]));
 }
 
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);



This way, you can use sk_filter_size(fp, fprog->len)
instead of doing the max() games in sk_attach_filter() and
sk_unattached_filter_create()

Other than that, I think your patch is fine.

^ permalink raw reply related

* RE: [PATCH 1/7] powerpc: Add interface to get msi region information
From: Bhushan Bharat-R65777 @ 2013-10-04  5:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: agraf@suse.de, Wood Scott-B07421, joro@8bytes.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	alex.williamson@redhat.com, linux-pci@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20130924235812.GD9302@google.com>



> -----Original Message-----
> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-owner@vger.kernel=
.org]
> On Behalf Of Bjorn Helgaas
> Sent: Wednesday, September 25, 2013 5:28 AM
> To: Bhushan Bharat-R65777
> Cc: alex.williamson@redhat.com; joro@8bytes.org; benh@kernel.crashing.org=
;
> galak@kernel.crashing.org; linux-kernel@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org; linux-pci@vger.kernel.org; agraf@suse.de; Wood Scot=
t-
> B07421; iommu@lists.linux-foundation.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region informa=
tion
>=20
> On Thu, Sep 19, 2013 at 12:59:17PM +0530, Bharat Bhushan wrote:
> > This patch adds interface to get following information
> >   - Number of MSI regions (which is number of MSI banks for powerpc).
> >   - Get the region address range: Physical page which have the
> >      address/addresses used for generating MSI interrupt
> >      and size of the page.
> >
> > These are required to create IOMMU (Freescale PAMU) mapping for
> > devices which are directly assigned using VFIO.
> >
> > Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> > ---
> >  arch/powerpc/include/asm/machdep.h |    8 +++++++
> >  arch/powerpc/include/asm/pci.h     |    2 +
> >  arch/powerpc/kernel/msi.c          |   18 ++++++++++++++++
> >  arch/powerpc/sysdev/fsl_msi.c      |   39 ++++++++++++++++++++++++++++=
+++++--
> >  arch/powerpc/sysdev/fsl_msi.h      |   11 ++++++++-
> >  drivers/pci/msi.c                  |   26 ++++++++++++++++++++++++
> >  include/linux/msi.h                |    8 +++++++
> >  include/linux/pci.h                |   13 ++++++++++++
> >  8 files changed, 120 insertions(+), 5 deletions(-)
> >
> > ...
>=20
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index
> > aca7578..6d85c15 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -30,6 +30,20 @@ static int pci_msi_enable =3D 1;
> >
> >  /* Arch hooks */
> >
> > +#ifndef arch_msi_get_region_count
> > +int arch_msi_get_region_count(void)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> > +#ifndef arch_msi_get_region
> > +int arch_msi_get_region(int region_num, struct msi_region *region) {
> > +	return 0;
> > +}
> > +#endif
>=20
> This #define strategy is gone; see 4287d824 ("PCI: use weak functions for=
 MSI
> arch-specific functions").  Please use the weak function strategy for you=
r new
> MSI region functions.

ok

>=20
> > +
> >  #ifndef arch_msi_check_device
> >  int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)  {
> > @@ -903,6 +917,18 @@ void pci_disable_msi(struct pci_dev *dev)  }
> > EXPORT_SYMBOL(pci_disable_msi);
> >
> > +int msi_get_region_count(void)
> > +{
> > +	return arch_msi_get_region_count();
> > +}
> > +EXPORT_SYMBOL(msi_get_region_count);
> > +
> > +int msi_get_region(int region_num, struct msi_region *region) {
> > +	return arch_msi_get_region(region_num, region); }
> > +EXPORT_SYMBOL(msi_get_region);
>=20
> Please split these interface additions, i.e., the drivers/pci/msi.c,
> include/linux/msi.h, and include/linux/pci.h changes, into a separate pat=
ch.

ok

>=20
> I don't know enough about VFIO to understand why these new interfaces are
> needed.  Is this the first VFIO IOMMU driver?  I see vfio_iommu_spapr_tce=
.c and
> vfio_iommu_type1.c but I don't know if they're comparable to the Freescal=
e PAMU.
> Do other VFIO IOMMU implementations support MSI?  If so, do they handle t=
he
> problem of mapping the MSI regions in a different way?

PAMU is an aperture type of IOMMU while other are paging type, So they are =
completely different from what PAMU is and handle that differently.

>=20
> >  /**
> >   * pci_msix_table_size - return the number of device's MSI-X table ent=
ries
> >   * @dev: pointer to the pci_dev data structure of MSI-X device
> > function diff --git a/include/linux/msi.h b/include/linux/msi.h index
> > ee66f3a..ae32601 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -50,6 +50,12 @@ struct msi_desc {
> >  	struct kobject kobj;
> >  };
> >
> > +struct msi_region {
> > +	int region_num;
> > +	dma_addr_t addr;
> > +	size_t size;
> > +};
>=20
> This needs some sort of explanatory comment.

Ok

-Bharat

>=20
> >  /*
> >   * The arch hook for setup up msi irqs
> >   */
> > @@ -58,5 +64,7 @@ void arch_teardown_msi_irq(unsigned int irq);  int
> > arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);  void
> > arch_teardown_msi_irqs(struct pci_dev *dev);  int
> > arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
> > +int arch_msi_get_region_count(void);
> > +int arch_msi_get_region(int region_num, struct msi_region *region);
> >
> >  #endif /* LINUX_MSI_H */
> > diff --git a/include/linux/pci.h b/include/linux/pci.h index
> > 186540d..2b26a59 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1126,6 +1126,7 @@ struct msix_entry {
> >  	u16	entry;	/* driver uses to specify entry, OS writes */
> >  };
> >
> > +struct msi_region;
> >
> >  #ifndef CONFIG_PCI_MSI
> >  static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned
> > int nvec) @@ -1168,6 +1169,16 @@ static inline int
> > pci_msi_enabled(void)  {
> >  	return 0;
> >  }
> > +
> > +static inline int msi_get_region_count(void) {
> > +	return 0;
> > +}
> > +
> > +static inline int msi_get_region(int region_num, struct msi_region
> > +*region) {
> > +	return 0;
> > +}
> >  #else
> >  int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
> > int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int
> > *maxvec); @@ -1180,6 +1191,8 @@ void pci_disable_msix(struct pci_dev
> > *dev);  void msi_remove_pci_irq_vectors(struct pci_dev *dev);  void
> > pci_restore_msi_state(struct pci_dev *dev);  int
> > pci_msi_enabled(void);
> > +int msi_get_region_count(void);
> > +int msi_get_region(int region_num, struct msi_region *region);
> >  #endif
> >
> >  #ifdef CONFIG_PCIEPORTBUS
> > --
> > 1.7.0.4
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in t=
he body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04  6:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-s390, netdev, Eric Dumazet, Daniel Borkmann, linuxppc-dev,
	David S. Miller, linux-arm-kernel, Alexei Starovoitov
In-Reply-To: <1380863798.3564.12.camel@edumazet-glaptop.roam.corp.google.com>

On Thu, Oct 3, 2013 at 10:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-03 at 21:11 -0700, Alexei Starovoitov wrote:
>
> -static inline unsigned int sk_filter_len(const struct sk_filter *fp)
> +static inline unsigned int sk_filter_size(const struct sk_filter *fp,
> +                                         unsigned int proglen)
>  {
> -       return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
> +       return max(sizeof(*fp),
> +                  offsetof(struct sk_filter, insns[proglen]));
>  }

indeed that's cleaner.
Like this then:
-static inline unsigned int sk_filter_len(const struct sk_filter *fp)
+static inline unsigned int sk_filter_size(unsigned int proglen)
 {
-       return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+       return max(sizeof(struct sk_filter),
+                  offsetof(struct sk_filter, insns[proglen]));
 }

testing it... will send v4 shortly

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Heiko Carstens @ 2013-10-04  6:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Paul Mackerras, H. Peter Anvin, sparclinux,
	Nicolas Dichtel, linux-s390, Russell King, x86, James Morris,
	Ingo Molnar, Alexey Kuznetsov, Paul E. McKenney, Xi Wang,
	Matt Evans, Thomas Gleixner, linux-arm-kernel, Stelian Nirlu,
	Nicolas Schichan, Hideaki YOSHIFUJI, netdev, linux-kernel,
	David S. Miller, Mircea Gherzan, Daniel Borkmann,
	Martin Schwidefsky, linux390, linuxppc-dev, Patrick McHardy
In-Reply-To: <1380853446-30537-1-git-send-email-ast@plumgrid.com>

On Thu, Oct 03, 2013 at 07:24:06PM -0700, Alexei Starovoitov wrote:
> diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> index 7092392..a5df511 100644
> --- a/arch/s390/net/bpf_jit_comp.c
> +++ b/arch/s390/net/bpf_jit_comp.c
> @@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
>  	struct bpf_binary_header *header = (void *)addr;
> 
>  	if (fp->bpf_func == sk_run_filter)
> -		return;
> +		goto free_filter;
>  	set_memory_rw(addr, header->pages);
>  	module_free(NULL, header);
> +free_filter:
> +	kfree(fp);
>  }

For the s390 part:

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>

^ permalink raw reply

* [PATCH v4 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04  7:14 UTC (permalink / raw)
  To: David S. Miller
  Cc: linux-s390, netdev, Heiko Carstens, Eric Dumazet, Daniel Borkmann,
	linuxppc-dev, linux-arm-kernel

on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 arch/arm/net/bpf_jit_32.c       |    1 +
 arch/powerpc/net/bpf_jit_comp.c |    1 +
 arch/s390/net/bpf_jit_comp.c    |    4 +++-
 arch/sparc/net/bpf_jit_comp.c   |    1 +
 arch/x86/net/bpf_jit_comp.c     |   18 +++++++++++++-----
 include/linux/filter.h          |   15 +++++++++++----
 include/net/sock.h              |    6 ++----
 net/core/filter.c               |    8 ++++----
 8 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f50d223..99b44e0 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index bf56e33..2345bdb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 7092392..a5df511 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
 	struct bpf_binary_header *header = (void *)addr;
 
 	if (fp->bpf_func == sk_run_filter)
-		return;
+		goto free_filter;
 	set_memory_rw(addr, header->pages);
 	module_free(NULL, header);
+free_filter:
+	kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 9c7be59..218b6b2 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 79c216a..516593e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,13 +772,21 @@ out:
 	return;
 }
 
+static void bpf_jit_free_deferred(struct work_struct *work)
+{
+	struct sk_filter *fp = container_of(work, struct sk_filter, work);
+	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
+	struct bpf_binary_header *header = (void *)addr;
+
+	set_memory_rw(addr, header->pages);
+	module_free(NULL, header);
+	kfree(fp);
+}
+
 void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter) {
-		unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
-		struct bpf_binary_header *header = (void *)addr;
-
-		set_memory_rw(addr, header->pages);
-		module_free(NULL, header);
+		INIT_WORK(&fp->work, bpf_jit_free_deferred);
+		schedule_work(&fp->work);
 	}
 }
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a6ac848..ff4e40c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -6,6 +6,7 @@
 
 #include <linux/atomic.h>
 #include <linux/compat.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/filter.h>
 
 #ifdef CONFIG_COMPAT
@@ -25,15 +26,19 @@ struct sk_filter
 {
 	atomic_t		refcnt;
 	unsigned int         	len;	/* Number of filter blocks */
+	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
-	struct rcu_head		rcu;
-	struct sock_filter     	insns[0];
+	union {
+		struct sock_filter     	insns[0];
+		struct work_struct	work;
+	};
 };
 
-static inline unsigned int sk_filter_len(const struct sk_filter *fp)
+static inline unsigned int sk_filter_size(unsigned int proglen)
 {
-	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+	return max(sizeof(struct sk_filter),
+		   offsetof(struct sk_filter, insns[proglen]));
 }
 
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);
@@ -67,11 +72,13 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
 }
 #define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
 #else
+#include <linux/slab.h>
 static inline void bpf_jit_compile(struct sk_filter *fp)
 {
 }
 static inline void bpf_jit_free(struct sk_filter *fp)
 {
+	kfree(fp);
 }
 #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
 #endif
diff --git a/include/net/sock.h b/include/net/sock.h
index e3bf213..1105357 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1612,16 +1612,14 @@ static inline void sk_filter_release(struct sk_filter *fp)
 
 static inline void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
 {
-	unsigned int size = sk_filter_len(fp);
-
-	atomic_sub(size, &sk->sk_omem_alloc);
+	atomic_sub(sk_filter_size(fp->len), &sk->sk_omem_alloc);
 	sk_filter_release(fp);
 }
 
 static inline void sk_filter_charge(struct sock *sk, struct sk_filter *fp)
 {
 	atomic_inc(&fp->refcnt);
-	atomic_add(sk_filter_len(fp), &sk->sk_omem_alloc);
+	atomic_add(sk_filter_size(fp->len), &sk->sk_omem_alloc);
 }
 
 /*
diff --git a/net/core/filter.c b/net/core/filter.c
index 6438f29..01b7808 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -644,7 +644,6 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
 	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
 	bpf_jit_free(fp);
-	kfree(fp);
 }
 EXPORT_SYMBOL(sk_filter_release_rcu);
 
@@ -683,7 +682,7 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
 	if (fprog->filter == NULL)
 		return -EINVAL;
 
-	fp = kmalloc(fsize + sizeof(*fp), GFP_KERNEL);
+	fp = kmalloc(sk_filter_size(fprog->len), GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
 	memcpy(fp->insns, fprog->filter, fsize);
@@ -723,6 +722,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 {
 	struct sk_filter *fp, *old_fp;
 	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+	unsigned int sk_fsize = sk_filter_size(fprog->len);
 	int err;
 
 	if (sock_flag(sk, SOCK_FILTER_LOCKED))
@@ -732,11 +732,11 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	if (fprog->filter == NULL)
 		return -EINVAL;
 
-	fp = sock_kmalloc(sk, fsize+sizeof(*fp), GFP_KERNEL);
+	fp = sock_kmalloc(sk, sk_fsize, GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
 	if (copy_from_user(fp->insns, fprog->filter, fsize)) {
-		sock_kfree_s(sk, fp, fsize+sizeof(*fp));
+		sock_kfree_s(sk, fp, sk_fsize);
 		return -EFAULT;
 	}
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH RFC 04/77] PCI/MSI/s390: Remove superfluous check of MSI type
From: Martin Schwidefsky @ 2013-10-04  7:40 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, stable,
	linux-s390, Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar,
	linux-pci, iss_storagedev, linux-driver, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Tejun Heo, linux390,
	linuxppc-dev
In-Reply-To: <bae65aa3e30dfd23bd5ed47add7310cfbb96243a.1380703262.git.agordeev@redhat.com>

On Wed,  2 Oct 2013 12:48:20 +0200
Alexander Gordeev <agordeev@redhat.com> wrote:

> arch_setup_msi_irqs() hook can only be called from the generic
> MSI code which ensures correct MSI type parameter.
> 
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  arch/s390/pci/pci.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index c79c6e4..61a3c2c 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -425,8 +425,6 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>  	int rc;
> 
>  	pr_debug("%s: requesting %d MSI-X interrupts...", __func__, nvec);
> -	if (type != PCI_CAP_ID_MSIX && type != PCI_CAP_ID_MSI)
> -		return -EINVAL;
>  	if (type == PCI_CAP_ID_MSI && nvec > 1)
>  		return 1;
>  	msi_vecs = min(nvec, ZPCI_MSI_VEC_MAX);

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply

* Re: [PATCH RFC 03/77] PCI/MSI/s390: Fix single MSI only check
From: Martin Schwidefsky @ 2013-10-04  7:39 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, stable,
	linux-s390, Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar,
	linux-pci, iss_storagedev, linux-driver, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Tejun Heo, linux390,
	linuxppc-dev
In-Reply-To: <8c9811b13fd93e73641dab8e3bd1bd5b2dc37a61.1380703262.git.agordeev@redhat.com>

On Wed,  2 Oct 2013 12:48:19 +0200
Alexander Gordeev <agordeev@redhat.com> wrote:

> Multiple MSIs have never been supported on s390 architecture,
> but the platform code fails to report single MSI only.
> 
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  arch/s390/pci/pci.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index f17a834..c79c6e4 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -427,6 +427,8 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
>  	pr_debug("%s: requesting %d MSI-X interrupts...", __func__, nvec);
>  	if (type != PCI_CAP_ID_MSIX && type != PCI_CAP_ID_MSI)
>  		return -EINVAL;
> +	if (type == PCI_CAP_ID_MSI && nvec > 1)
> +		return 1;
>  	msi_vecs = min(nvec, ZPCI_MSI_VEC_MAX);
>  	msi_vecs = min_t(unsigned int, msi_vecs, CONFIG_PCI_NR_MSI);
> 

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Ingo Molnar @ 2013-10-04  7:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Heiko Carstens, Eric Dumazet, Paul Mackerras, H. Peter Anvin,
	sparclinux, Nicolas Dichtel, linux-s390, Russell King, x86,
	James Morris, Ingo Molnar, Alexey Kuznetsov, Paul E. McKenney,
	Xi Wang, Matt Evans, Thomas Gleixner, linux-arm-kernel,
	Stelian Nirlu, Nicolas Schichan, Hideaki YOSHIFUJI, netdev,
	linux-kernel, David S. Miller, Mircea Gherzan, Daniel Borkmann,
	Martin Schwidefsky, linux390, linuxppc-dev, Patrick McHardy
In-Reply-To: <1380853446-30537-1-git-send-email-ast@plumgrid.com>


* Alexei Starovoitov <ast@plumgrid.com> wrote:

> on x86 system with net.core.bpf_jit_enable = 1
> 
> sudo tcpdump -i eth1 'tcp port 22'
> 
> causes the warning:
> [   56.766097]  Possible unsafe locking scenario:
> [   56.766097]
> [   56.780146]        CPU0
> [   56.786807]        ----
> [   56.793188]   lock(&(&vb->lock)->rlock);
> [   56.799593]   <Interrupt>
> [   56.805889]     lock(&(&vb->lock)->rlock);
> [   56.812266]
> [   56.812266]  *** DEADLOCK ***
> [   56.812266]
> [   56.830670] 1 lock held by ksoftirqd/1/13:
> [   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
> [   56.849757]
> [   56.849757] stack backtrace:
> [   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
> [   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
> [   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
> [   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
> [   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
> [   56.923006] Call Trace:
> [   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
> [   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
> [   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
> [   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
> [   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
> [   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
> [   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
> [   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
> [   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
> [   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
> [   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
> [   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
> [   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
> [   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
> [   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
> [   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
> [   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
> [   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
> [   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
> [   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
> [   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
> [   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
> [   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
> [   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70
> 
> cannot reuse jited filter memory, since it's readonly,
> so use original bpf insns memory to hold work_struct
> 
> defer kfree of sk_filter until jit completed freeing
> 
> tested on x86_64 and i386
> 
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  arch/arm/net/bpf_jit_32.c       |    1 +
>  arch/powerpc/net/bpf_jit_comp.c |    1 +
>  arch/s390/net/bpf_jit_comp.c    |    4 +++-
>  arch/sparc/net/bpf_jit_comp.c   |    1 +
>  arch/x86/net/bpf_jit_comp.c     |   20 +++++++++++++++-----
>  include/linux/filter.h          |   11 +++++++++--
>  net/core/filter.c               |   11 +++++++----
>  7 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index f50d223..99b44e0 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
>  {
>  	if (fp->bpf_func != sk_run_filter)
>  		module_free(NULL, fp->bpf_func);
> +	kfree(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index bf56e33..2345bdb 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
>  {
>  	if (fp->bpf_func != sk_run_filter)
>  		module_free(NULL, fp->bpf_func);
> +	kfree(fp);
>  }
> diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> index 7092392..a5df511 100644
> --- a/arch/s390/net/bpf_jit_comp.c
> +++ b/arch/s390/net/bpf_jit_comp.c
> @@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
>  	struct bpf_binary_header *header = (void *)addr;
>  
>  	if (fp->bpf_func == sk_run_filter)
> -		return;
> +		goto free_filter;
>  	set_memory_rw(addr, header->pages);
>  	module_free(NULL, header);
> +free_filter:
> +	kfree(fp);
>  }
> diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
> index 9c7be59..218b6b2 100644
> --- a/arch/sparc/net/bpf_jit_comp.c
> +++ b/arch/sparc/net/bpf_jit_comp.c
> @@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
>  {
>  	if (fp->bpf_func != sk_run_filter)
>  		module_free(NULL, fp->bpf_func);
> +	kfree(fp);
>  }
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 79c216a..1396a0a 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -772,13 +772,23 @@ out:
>  	return;
>  }
>  
> +static void bpf_jit_free_deferred(struct work_struct *work)
> +{
> +	struct sk_filter *fp = container_of((void *)work, struct sk_filter,
> +					    insns);
> +	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> +	struct bpf_binary_header *header = (void *)addr;
> +
> +	set_memory_rw(addr, header->pages);
> +	module_free(NULL, header);
> +	kfree(fp);
> +}

Using the data type suggestions I make further below, this could be 
written in a simpler form, as:

	struct sk_filter *fp = container_of(work, struct sk_filter, work);

Also, a question, why do you mask with PAGE_MASK here:

	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;

?

AFAICS bpf_func is the module_alloc() result - and module code is page 
aligned. So ->bpf_func is always page aligned here. (The sk_run_filter 
special case cannot happen here.)

> +
>  void bpf_jit_free(struct sk_filter *fp)
>  {
>  	if (fp->bpf_func != sk_run_filter) {
> -		unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> -		struct bpf_binary_header *header = (void *)addr;
> -
> -		set_memory_rw(addr, header->pages);
> -		module_free(NULL, header);
> +		struct work_struct *work = (struct work_struct *)fp->insns;
> +		INIT_WORK(work, bpf_jit_free_deferred);

Missing newline between local variables and statements.

> +		schedule_work(work);
>  	}
>  }
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index a6ac848..5d66cd9 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -25,15 +25,20 @@ struct sk_filter
>  {
>  	atomic_t		refcnt;
>  	unsigned int         	len;	/* Number of filter blocks */
> +	struct rcu_head		rcu;
>  	unsigned int		(*bpf_func)(const struct sk_buff *skb,
>  					    const struct sock_filter *filter);
> -	struct rcu_head		rcu;
> +	/* insns start right after bpf_func, so that sk_run_filter() fetches
> +	 * first insn from the same cache line that was used to call into
> +	 * sk_run_filter()
> +	 */
>  	struct sock_filter     	insns[0];

Please use the customary (multi-line) comment style:

  /*
   * Comment .....
   * ...... goes here.
   */

specified in Documentation/CodingStyle.

>  };
>  
>  static inline unsigned int sk_filter_len(const struct sk_filter *fp)
>  {
> -	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
> +	return max(fp->len * sizeof(struct sock_filter),
> +		   sizeof(struct work_struct)) + sizeof(*fp);

So, "sizeof(struct work_struct)) + sizeof(*fp)" is a pattern that repeats 
a couple of times. Might make sense to stick that into a helper of its own 
- but in general this open coded overlay allocation method looks a bit 
fragile and not very obvious in isolation.

So it could be done a bit cleaner, using an anonymous union:

	/*
	 * These two overlay, the work struct is used during workqueue 
	 * driven teardown, when the instructions are not used anymore:
	 */
	union {
		struct sock_filter     	insns[0];
		struct work_struct	work;
	};

And then all the sizeof() calculations become obvious and sk_filter_len() 
could be eliminated - I've marked the conversions in the code further 
below.

>  extern int sk_filter(struct sock *sk, struct sk_buff *skb);
> @@ -67,11 +72,13 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
>  }
>  #define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
>  #else
> +#include <linux/slab.h>

Inlines in the middle of header files are generally frowned upon.

The standard pattern is to put them at the top, that way it's easier to 
see the dependencies and there's also less .config dependent inclusion, 
which makes header hell cleanup work easier.

>  static inline void bpf_jit_compile(struct sk_filter *fp)
>  {
>  }
>  static inline void bpf_jit_free(struct sk_filter *fp)
>  {
> +	kfree(fp);
>  }
>  #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
>  #endif
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 6438f29..ad5eaba 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -644,7 +644,6 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
>  	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
>  
>  	bpf_jit_free(fp);
> -	kfree(fp);
>  }
>  EXPORT_SYMBOL(sk_filter_release_rcu);
>  
> @@ -677,13 +676,15 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
>  {
>  	struct sk_filter *fp;
>  	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
> +	unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
> +		+ sizeof(*fp);

Using the structure definition I suggested, this could be replaced with 
the more obvious:

	unsigned int sk_fsize = max(fsize, sizeof(*fp));

>  	int err;
>  
>  	/* Make sure new filter is there and in the right amounts. */
>  	if (fprog->filter == NULL)
>  		return -EINVAL;
>  
> -	fp = kmalloc(fsize + sizeof(*fp), GFP_KERNEL);
> +	fp = kmalloc(sk_fsize, GFP_KERNEL);
>  	if (!fp)
>  		return -ENOMEM;
>  	memcpy(fp->insns, fprog->filter, fsize);
> @@ -723,6 +724,8 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>  {
>  	struct sk_filter *fp, *old_fp;
>  	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
> +	unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
> +		+ sizeof(*fp);

This too could be written as:

	unsigned int sk_fsize = max(fsize, sizeof(*fp));

>  	int err;
>  
>  	if (sock_flag(sk, SOCK_FILTER_LOCKED))
> @@ -732,11 +735,11 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>  	if (fprog->filter == NULL)
>  		return -EINVAL;
>  
> -	fp = sock_kmalloc(sk, fsize+sizeof(*fp), GFP_KERNEL);
> +	fp = sock_kmalloc(sk, sk_fsize, GFP_KERNEL);
>  	if (!fp)
>  		return -ENOMEM;
>  	if (copy_from_user(fp->insns, fprog->filter, fsize)) {
> -		sock_kfree_s(sk, fp, fsize+sizeof(*fp));
> +		sock_kfree_s(sk, fp, sk_fsize);
>  		return -EFAULT;
>  	}

A couple of questions/suggestions:

1)

I took a brief look at arch/x86/net/bpf_jit_comp.c while reviewing this 
patch.

You need to split up bpf_jit_compile(), it's an obscenely large, ~600 
lines long function. We don't do that in modern, maintainable kernel code.

2)

This 128 bytes extra padding:

        /* Most of BPF filters are really small,
         * but if some of them fill a page, allow at least
         * 128 extra bytes to insert a random section of int3
         */
        sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE);

why is it done? It's not clear to me from the comment.

3)

It's nice code altogether! Are there any plans to generalize its 
interfaces, to allow arbitrary bytecode to be used by other kernel 
subsystems as well? In particular tracing filters could make use of it, 
but it would also allow safe probe points.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Alexander Gordeev @ 2013-10-04  8:29 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, linux-s390,
	Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar, linux-pci,
	iss_storagedev, linux-driver, Tejun Heo, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Martin Schwidefsky,
	linux390, linuxppc-dev
In-Reply-To: <1380840585.3419.50.camel@bwh-desktop.uk.level5networks.com>

On Thu, Oct 03, 2013 at 11:49:45PM +0100, Ben Hutchings wrote:
> On Wed, 2013-10-02 at 12:48 +0200, Alexander Gordeev wrote:
> > This update converts pci_enable_msix() and pci_enable_msi_block()
> > interfaces to canonical kernel functions and makes them return a
> > error code in case of failure or 0 in case of success.
> [...]
> 
> I think this is fundamentally flawed: pci_msix_table_size() and
> pci_get_msi_cap() can only report the limits of the *device* (which the
> driver usually already knows), whereas MSI allocation can also be
> constrained due to *global* limits on the number of distinct IRQs.

Even the current implementation by no means addresses it. Although it
might seem a case for architectures to report the number of IRQs available
for a driver to retry, in fact they all just fail. The same applies to
*any* other type of resource involved: irq_desc's, CPU interrupt vector
space, msi_desc's etc. No platform cares about it and just bails out once
a constrain met (please correct me if I am wrong here). Given that Linux
has been doing well even on embedded I think we should not change it.

The only exception to the above is pSeries platform which takes advantage
of the current design (to implement MSI quota). There are indications we
can satisfy pSeries requirements, but the design proposed in this RFC
is not going to change drastically anyway. The start of the discusstion
is here: https://lkml.org/lkml/2013/9/5/293

> Currently pci_enable_msix() will report a positive value if it fails due
> to the global limit.  Your patch 7 removes that.  pci_enable_msi_block()
> unfortunately doesn't appear to do this.

pci_enable_msi_block() can do more than one MSI only on x86 (with IOMMU),
but it does not bother to return positive numbers, indeed.

> It seems to me that a more useful interface would take a minimum and
> maximum number of vectors from the driver.  This wouldn't allow the
> driver to specify that it could only accept, say, any even number within
> a certain range, but you could still leave the current functions
> available for any driver that needs that.

Mmmm.. I am not sure I am getting it. Could you please rephrase?

> Ben.

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* RE: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: David Laight @ 2013-10-04  8:31 UTC (permalink / raw)
  To: Alexander Gordeev, Ben Hutchings
  Cc: linux-mips, VMware, Inc., linux-pci, linux-nvme, linux-ide,
	Martin Schwidefsky, linux-s390, Andy King, linux-scsi,
	e1000-devel, x86, Ingo Molnar, iss_storagedev, Tejun Heo,
	Bjorn Helgaas, Dan Williams, Jon Mason,
	Solarflare linux maintainers, netdev, linux-kernel, Ralf Baechle,
	linux-rdma, linux-driver, linux390, linuxppc-dev
In-Reply-To: <20131004082920.GA4536@dhcp-26-207.brq.redhat.com>

> > It seems to me that a more useful interface would take a minimum and
> > maximum number of vectors from the driver.  This wouldn't allow the
> > driver to specify that it could only accept, say, any even number =
within
> > a certain range, but you could still leave the current functions
> > available for any driver that needs that.
>=20
> Mmmm.. I am not sure I am getting it. Could you please rephrase?

One possibility is for drivers than can use a lot of interrupts to
request a minimum number initially and then request the additional
ones much later on.
That would make it less likely that none will be available for
devices/drivers that need them but are initialised later.

	David

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Alexander Gordeev @ 2013-10-04  9:21 UTC (permalink / raw)
  To: David Laight
  Cc: linux-mips, VMware, Inc., linux-pci, linux-nvme, linux-ide,
	Martin Schwidefsky, linux-s390, Andy King, linux-scsi, linux-rdma,
	x86, Ingo Molnar, iss_storagedev, linuxppc-dev, linux-driver,
	Ben Hutchings, Dan Williams, Jon Mason,
	Solarflare linux maintainers, netdev, linux-kernel, Ralf Baechle,
	e1000-devel, Tejun Heo, linux390, Bjorn Helgaas
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B7371@saturn3.aculab.com>

On Fri, Oct 04, 2013 at 09:31:49AM +0100, David Laight wrote:
> > Mmmm.. I am not sure I am getting it. Could you please rephrase?
> 
> One possibility is for drivers than can use a lot of interrupts to
> request a minimum number initially and then request the additional
> ones much later on.
> That would make it less likely that none will be available for
> devices/drivers that need them but are initialised later.

It sounds as a whole new topic for me. Isn't it?

Anyway, what prevents the above from being done with pci_enable_msix(nvec1) -
pci_disable_msix() - pci_enable_msix(nvec2) where nvec1 < nvec2?

> 	David

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
From: Bhushan Bharat-R65777 @ 2013-10-04  9:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: agraf@suse.de, Wood Scott-B07421, linux-pci@vger.kernel.org,
	joro@8bytes.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1380127533.3030.319.camel@ul30vt.home>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogbGludXgtcGNpLW93bmVy
QHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LXBjaS1vd25lckB2Z2VyLmtlcm5lbC5vcmdd
DQo+IE9uIEJlaGFsZiBPZiBBbGV4IFdpbGxpYW1zb24NCj4gU2VudDogV2VkbmVzZGF5LCBTZXB0
ZW1iZXIgMjUsIDIwMTMgMTA6MTYgUE0NCj4gVG86IEJodXNoYW4gQmhhcmF0LVI2NTc3Nw0KPiBD
Yzogam9yb0A4Ynl0ZXMub3JnOyBiZW5oQGtlcm5lbC5jcmFzaGluZy5vcmc7IGdhbGFrQGtlcm5l
bC5jcmFzaGluZy5vcmc7IGxpbnV4LQ0KPiBrZXJuZWxAdmdlci5rZXJuZWwub3JnOyBsaW51eHBw
Yy1kZXZAbGlzdHMub3psYWJzLm9yZzsgbGludXgtDQo+IHBjaUB2Z2VyLmtlcm5lbC5vcmc7IGFn
cmFmQHN1c2UuZGU7IFdvb2QgU2NvdHQtQjA3NDIxOyBpb21tdUBsaXN0cy5saW51eC0NCj4gZm91
bmRhdGlvbi5vcmc7IEJodXNoYW4gQmhhcmF0LVI2NTc3Nw0KPiBTdWJqZWN0OiBSZTogW1BBVENI
IDIvN10gaW9tbXU6IGFkZCBhcGkgdG8gZ2V0IGlvbW11X2RvbWFpbiBvZiBhIGRldmljZQ0KPiAN
Cj4gT24gVGh1LCAyMDEzLTA5LTE5IGF0IDEyOjU5ICswNTMwLCBCaGFyYXQgQmh1c2hhbiB3cm90
ZToNCj4gPiBUaGlzIGFwaSByZXR1cm4gdGhlIGlvbW11IGRvbWFpbiB0byB3aGljaCB0aGUgZGV2
aWNlIGlzIGF0dGFjaGVkLg0KPiA+IFRoZSBpb21tdV9kb21haW4gaXMgcmVxdWlyZWQgZm9yIG1h
a2luZyBBUEkgY2FsbHMgcmVsYXRlZCB0byBpb21tdS4NCj4gPiBGb2xsb3cgdXAgcGF0Y2hlcyB3
aGljaCB1c2UgdGhpcyBBUEkgdG8ga25vdyBpb21tdSBtYXBpbmcuDQo+ID4NCj4gPiBTaWduZWQt
b2ZmLWJ5OiBCaGFyYXQgQmh1c2hhbiA8YmhhcmF0LmJodXNoYW5AZnJlZXNjYWxlLmNvbT4NCj4g
PiAtLS0NCj4gPiAgZHJpdmVycy9pb21tdS9pb21tdS5jIHwgICAxMCArKysrKysrKysrDQo+ID4g
IGluY2x1ZGUvbGludXgvaW9tbXUuaCB8ICAgIDcgKysrKysrKw0KPiA+ICAyIGZpbGVzIGNoYW5n
ZWQsIDE3IGluc2VydGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0
IGEvZHJpdmVycy9pb21tdS9pb21tdS5jIGIvZHJpdmVycy9pb21tdS9pb21tdS5jIGluZGV4DQo+
ID4gZmJlOWNhNy4uNmFjNWY1MCAxMDA2NDQNCj4gPiAtLS0gYS9kcml2ZXJzL2lvbW11L2lvbW11
LmMNCj4gPiArKysgYi9kcml2ZXJzL2lvbW11L2lvbW11LmMNCj4gPiBAQCAtNjk2LDYgKzY5Niwx
NiBAQCB2b2lkIGlvbW11X2RldGFjaF9kZXZpY2Uoc3RydWN0IGlvbW11X2RvbWFpbg0KPiA+ICpk
b21haW4sIHN0cnVjdCBkZXZpY2UgKmRldikgIH0NCj4gPiBFWFBPUlRfU1lNQk9MX0dQTChpb21t
dV9kZXRhY2hfZGV2aWNlKTsNCj4gPg0KPiA+ICtzdHJ1Y3QgaW9tbXVfZG9tYWluICppb21tdV9n
ZXRfZGV2X2RvbWFpbihzdHJ1Y3QgZGV2aWNlICpkZXYpIHsNCj4gPiArCXN0cnVjdCBpb21tdV9v
cHMgKm9wcyA9IGRldi0+YnVzLT5pb21tdV9vcHM7DQo+ID4gKw0KPiA+ICsJaWYgKHVubGlrZWx5
KG9wcyA9PSBOVUxMIHx8IG9wcy0+Z2V0X2Rldl9pb21tdV9kb21haW4gPT0gTlVMTCkpDQo+ID4g
KwkJcmV0dXJuIE5VTEw7DQo+ID4gKw0KPiA+ICsJcmV0dXJuIG9wcy0+Z2V0X2Rldl9pb21tdV9k
b21haW4oZGV2KTsgfQ0KPiA+ICtFWFBPUlRfU1lNQk9MX0dQTChpb21tdV9nZXRfZGV2X2RvbWFp
bik7DQo+IA0KPiBXaGF0IHByZXZlbnRzIHRoaXMgZnJvbSByYWNpbmcgaW9tbXVfZG9tYWluX2Zy
ZWUoKT8gIFRoZXJlJ3Mgbm8gcmVmZXJlbmNlcw0KPiBhY3F1aXJlZCwgc28gdGhlcmUncyBubyBy
ZWFzb24gZm9yIHRoZSBjYWxsZXIgdG8gYXNzdW1lIHRoZSBwb2ludGVyIGlzIHZhbGlkLg0KDQpT
b3JyeSBmb3IgbGF0ZSBxdWVyeSwgc29tZWhvdyB0aGlzIGVtYWlsIHdlbnQgaW50byBhIGZvbGRl
ciBhbmQgZXNjYXBlZDsNCg0KSnVzdCB0byBiZSBzdXJlLCB0aGVyZSBpcyBub3QgbG9jayBhdCBn
ZW5lcmljICJzdHJ1Y3QgaW9tbXVfZG9tYWluIiwgYnV0IElQIHNwZWNpZmljIHN0cnVjdHVyZSAo
bGluayBGU0wgZG9tYWluKSBsaW5rZWQgaW4gaW9tbXVfZG9tYWluLT5wcml2IGhhdmUgYSBsb2Nr
LCBzbyB3ZSBuZWVkIHRvIGVuc3VyZSB0aGlzIHJhY2UgaW4gRlNMIGlvbW11IGNvZGUgKHNheSBk
cml2ZXJzL2lvbW11L2ZzbF9wYW11X2RvbWFpbi5jKSwgcmlnaHQ/DQoNClRoYW5rcw0KLUJoYXJh
dA0KDQo+IA0KPiA+ICAvKg0KPiA+ICAgKiBJT01NVSBncm91cHMgYXJlIHJlYWxseSB0aGUgbmF0
cnVhbCB3b3JraW5nIHVuaXQgb2YgdGhlIElPTU1VLCBidXQNCj4gPiAgICogdGhlIElPTU1VIEFQ
SSB3b3JrcyBvbiBkb21haW5zIGFuZCBkZXZpY2VzLiAgQnJpZGdlIHRoYXQgZ2FwIGJ5DQo+ID4g
ZGlmZiAtLWdpdCBhL2luY2x1ZGUvbGludXgvaW9tbXUuaCBiL2luY2x1ZGUvbGludXgvaW9tbXUu
aCBpbmRleA0KPiA+IDdlYTMxOWUuLmZhMDQ2YmQgMTAwNjQ0DQo+ID4gLS0tIGEvaW5jbHVkZS9s
aW51eC9pb21tdS5oDQo+ID4gKysrIGIvaW5jbHVkZS9saW51eC9pb21tdS5oDQo+ID4gQEAgLTEy
Nyw2ICsxMjcsNyBAQCBzdHJ1Y3QgaW9tbXVfb3BzIHsNCj4gPiAgCWludCAoKmRvbWFpbl9zZXRf
d2luZG93cykoc3RydWN0IGlvbW11X2RvbWFpbiAqZG9tYWluLCB1MzIgd19jb3VudCk7DQo+ID4g
IAkvKiBHZXQgdGhlIG51bWVyIG9mIHdpbmRvdyBwZXIgZG9tYWluICovDQo+ID4gIAl1MzIgKCpk
b21haW5fZ2V0X3dpbmRvd3MpKHN0cnVjdCBpb21tdV9kb21haW4gKmRvbWFpbik7DQo+ID4gKwlz
dHJ1Y3QgaW9tbXVfZG9tYWluICooKmdldF9kZXZfaW9tbXVfZG9tYWluKShzdHJ1Y3QgZGV2aWNl
ICpkZXYpOw0KPiA+DQo+ID4gIAl1bnNpZ25lZCBsb25nIHBnc2l6ZV9iaXRtYXA7DQo+ID4gIH07
DQo+ID4gQEAgLTE5MCw2ICsxOTEsNyBAQCBleHRlcm4gaW50IGlvbW11X2RvbWFpbl93aW5kb3df
ZW5hYmxlKHN0cnVjdCBpb21tdV9kb21haW4NCj4gKmRvbWFpbiwgdTMyIHduZF9uciwNCj4gPiAg
CQkJCSAgICAgIHBoeXNfYWRkcl90IG9mZnNldCwgdTY0IHNpemUsDQo+ID4gIAkJCQkgICAgICBp
bnQgcHJvdCk7DQo+ID4gIGV4dGVybiB2b2lkIGlvbW11X2RvbWFpbl93aW5kb3dfZGlzYWJsZShz
dHJ1Y3QgaW9tbXVfZG9tYWluICpkb21haW4sDQo+ID4gdTMyIHduZF9ucik7DQo+ID4gK2V4dGVy
biBzdHJ1Y3QgaW9tbXVfZG9tYWluICppb21tdV9nZXRfZGV2X2RvbWFpbihzdHJ1Y3QgZGV2aWNl
ICpkZXYpOw0KPiA+ICAvKioNCj4gPiAgICogcmVwb3J0X2lvbW11X2ZhdWx0KCkgLSByZXBvcnQg
YWJvdXQgYW4gSU9NTVUgZmF1bHQgdG8gdGhlIElPTU1VIGZyYW1ld29yaw0KPiA+ICAgKiBAZG9t
YWluOiB0aGUgaW9tbXUgZG9tYWluIHdoZXJlIHRoZSBmYXVsdCBoYXMgaGFwcGVuZWQgQEAgLTI4
NCw2DQo+ID4gKzI4NiwxMSBAQCBzdGF0aWMgaW5saW5lIHZvaWQgaW9tbXVfZG9tYWluX3dpbmRv
d19kaXNhYmxlKHN0cnVjdA0KPiA+IGlvbW11X2RvbWFpbiAqZG9tYWluLCAgeyAgfQ0KPiA+DQo+
ID4gK3N0YXRpYyBpbmxpbmUgc3RydWN0IGlvbW11X2RvbWFpbiAqaW9tbXVfZ2V0X2Rldl9kb21h
aW4oc3RydWN0IGRldmljZQ0KPiA+ICsqZGV2KSB7DQo+ID4gKwlyZXR1cm4gTlVMTDsNCj4gPiAr
fQ0KPiA+ICsNCj4gPiAgc3RhdGljIGlubGluZSBwaHlzX2FkZHJfdCBpb21tdV9pb3ZhX3RvX3Bo
eXMoc3RydWN0IGlvbW11X2RvbWFpbg0KPiA+ICpkb21haW4sIGRtYV9hZGRyX3QgaW92YSkgIHsN
Cj4gPiAgCXJldHVybiAwOw0KPiANCj4gDQo+IA0KPiAtLQ0KPiBUbyB1bnN1YnNjcmliZSBmcm9t
IHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtcGNpIiBpbiB0aGUg
Ym9keQ0KPiBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZyBNb3JlIG1h
am9yZG9tbyBpbmZvIGF0DQo+IGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8u
aHRtbA0KDQo=

^ permalink raw reply

* RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
From: Bhushan Bharat-R65777 @ 2013-10-04 10:42 UTC (permalink / raw)
  To: Bhushan Bharat-R65777, Alex Williamson
  Cc: agraf@suse.de, Wood Scott-B07421, linux-pci@vger.kernel.org,
	joro@8bytes.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1380127533.3030.319.camel@ul30vt.home>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQmh1c2hhbiBCaGFyYXQt
UjY1Nzc3DQo+IFNlbnQ6IEZyaWRheSwgT2N0b2JlciAwNCwgMjAxMyAzOjI0IFBNDQo+IFRvOiAn
QWxleCBXaWxsaWFtc29uJw0KPiBDYzogam9yb0A4Ynl0ZXMub3JnOyBiZW5oQGtlcm5lbC5jcmFz
aGluZy5vcmc7IGdhbGFrQGtlcm5lbC5jcmFzaGluZy5vcmc7IGxpbnV4LQ0KPiBrZXJuZWxAdmdl
ci5rZXJuZWwub3JnOyBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgbGludXgtDQo+IHBj
aUB2Z2VyLmtlcm5lbC5vcmc7IGFncmFmQHN1c2UuZGU7IFdvb2QgU2NvdHQtQjA3NDIxOyBpb21t
dUBsaXN0cy5saW51eC0NCj4gZm91bmRhdGlvbi5vcmcNCj4gU3ViamVjdDogUkU6IFtQQVRDSCAy
LzddIGlvbW11OiBhZGQgYXBpIHRvIGdldCBpb21tdV9kb21haW4gb2YgYSBkZXZpY2UNCj4gDQo+
IA0KPiANCj4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+IEZyb206IGxpbnV4LXBj
aS1vd25lckB2Z2VyLmtlcm5lbC5vcmcNCj4gPiBbbWFpbHRvOmxpbnV4LXBjaS1vd25lckB2Z2Vy
Lmtlcm5lbC5vcmddDQo+ID4gT24gQmVoYWxmIE9mIEFsZXggV2lsbGlhbXNvbg0KPiA+IFNlbnQ6
IFdlZG5lc2RheSwgU2VwdGVtYmVyIDI1LCAyMDEzIDEwOjE2IFBNDQo+ID4gVG86IEJodXNoYW4g
QmhhcmF0LVI2NTc3Nw0KPiA+IENjOiBqb3JvQDhieXRlcy5vcmc7IGJlbmhAa2VybmVsLmNyYXNo
aW5nLm9yZzsNCj4gPiBnYWxha0BrZXJuZWwuY3Jhc2hpbmcub3JnOyBsaW51eC0ga2VybmVsQHZn
ZXIua2VybmVsLm9yZzsNCj4gPiBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgbGludXgt
IHBjaUB2Z2VyLmtlcm5lbC5vcmc7DQo+ID4gYWdyYWZAc3VzZS5kZTsgV29vZCBTY290dC1CMDc0
MjE7IGlvbW11QGxpc3RzLmxpbnV4LSBmb3VuZGF0aW9uLm9yZzsNCj4gPiBCaHVzaGFuIEJoYXJh
dC1SNjU3NzcNCj4gPiBTdWJqZWN0OiBSZTogW1BBVENIIDIvN10gaW9tbXU6IGFkZCBhcGkgdG8g
Z2V0IGlvbW11X2RvbWFpbiBvZiBhDQo+ID4gZGV2aWNlDQo+ID4NCj4gPiBPbiBUaHUsIDIwMTMt
MDktMTkgYXQgMTI6NTkgKzA1MzAsIEJoYXJhdCBCaHVzaGFuIHdyb3RlOg0KPiA+ID4gVGhpcyBh
cGkgcmV0dXJuIHRoZSBpb21tdSBkb21haW4gdG8gd2hpY2ggdGhlIGRldmljZSBpcyBhdHRhY2hl
ZC4NCj4gPiA+IFRoZSBpb21tdV9kb21haW4gaXMgcmVxdWlyZWQgZm9yIG1ha2luZyBBUEkgY2Fs
bHMgcmVsYXRlZCB0byBpb21tdS4NCj4gPiA+IEZvbGxvdyB1cCBwYXRjaGVzIHdoaWNoIHVzZSB0
aGlzIEFQSSB0byBrbm93IGlvbW11IG1hcGluZy4NCj4gPiA+DQo+ID4gPiBTaWduZWQtb2ZmLWJ5
OiBCaGFyYXQgQmh1c2hhbiA8YmhhcmF0LmJodXNoYW5AZnJlZXNjYWxlLmNvbT4NCj4gPiA+IC0t
LQ0KPiA+ID4gIGRyaXZlcnMvaW9tbXUvaW9tbXUuYyB8ICAgMTAgKysrKysrKysrKw0KPiA+ID4g
IGluY2x1ZGUvbGludXgvaW9tbXUuaCB8ICAgIDcgKysrKysrKw0KPiA+ID4gIDIgZmlsZXMgY2hh
bmdlZCwgMTcgaW5zZXJ0aW9ucygrKSwgMCBkZWxldGlvbnMoLSkNCj4gPiA+DQo+ID4gPiBkaWZm
IC0tZ2l0IGEvZHJpdmVycy9pb21tdS9pb21tdS5jIGIvZHJpdmVycy9pb21tdS9pb21tdS5jIGlu
ZGV4DQo+ID4gPiBmYmU5Y2E3Li42YWM1ZjUwIDEwMDY0NA0KPiA+ID4gLS0tIGEvZHJpdmVycy9p
b21tdS9pb21tdS5jDQo+ID4gPiArKysgYi9kcml2ZXJzL2lvbW11L2lvbW11LmMNCj4gPiA+IEBA
IC02OTYsNiArNjk2LDE2IEBAIHZvaWQgaW9tbXVfZGV0YWNoX2RldmljZShzdHJ1Y3QgaW9tbXVf
ZG9tYWluDQo+ID4gPiAqZG9tYWluLCBzdHJ1Y3QgZGV2aWNlICpkZXYpICB9DQo+ID4gPiBFWFBP
UlRfU1lNQk9MX0dQTChpb21tdV9kZXRhY2hfZGV2aWNlKTsNCj4gPiA+DQo+ID4gPiArc3RydWN0
IGlvbW11X2RvbWFpbiAqaW9tbXVfZ2V0X2Rldl9kb21haW4oc3RydWN0IGRldmljZSAqZGV2KSB7
DQo+ID4gPiArCXN0cnVjdCBpb21tdV9vcHMgKm9wcyA9IGRldi0+YnVzLT5pb21tdV9vcHM7DQo+
ID4gPiArDQo+ID4gPiArCWlmICh1bmxpa2VseShvcHMgPT0gTlVMTCB8fCBvcHMtPmdldF9kZXZf
aW9tbXVfZG9tYWluID09IE5VTEwpKQ0KPiA+ID4gKwkJcmV0dXJuIE5VTEw7DQo+ID4gPiArDQo+
ID4gPiArCXJldHVybiBvcHMtPmdldF9kZXZfaW9tbXVfZG9tYWluKGRldik7IH0NCj4gPiA+ICtF
WFBPUlRfU1lNQk9MX0dQTChpb21tdV9nZXRfZGV2X2RvbWFpbik7DQo+ID4NCj4gPiBXaGF0IHBy
ZXZlbnRzIHRoaXMgZnJvbSByYWNpbmcgaW9tbXVfZG9tYWluX2ZyZWUoKT8gIFRoZXJlJ3Mgbm8N
Cj4gPiByZWZlcmVuY2VzIGFjcXVpcmVkLCBzbyB0aGVyZSdzIG5vIHJlYXNvbiBmb3IgdGhlIGNh
bGxlciB0byBhc3N1bWUgdGhlIHBvaW50ZXINCj4gaXMgdmFsaWQuDQo+IA0KPiBTb3JyeSBmb3Ig
bGF0ZSBxdWVyeSwgc29tZWhvdyB0aGlzIGVtYWlsIHdlbnQgaW50byBhIGZvbGRlciBhbmQgZXNj
YXBlZDsNCj4gDQo+IEp1c3QgdG8gYmUgc3VyZSwgdGhlcmUgaXMgbm90IGxvY2sgYXQgZ2VuZXJp
YyAic3RydWN0IGlvbW11X2RvbWFpbiIsIGJ1dCBJUA0KPiBzcGVjaWZpYyBzdHJ1Y3R1cmUgKGxp
bmsgRlNMIGRvbWFpbikgbGlua2VkIGluIGlvbW11X2RvbWFpbi0+cHJpdiBoYXZlIGEgbG9jaywN
Cj4gc28gd2UgbmVlZCB0byBlbnN1cmUgdGhpcyByYWNlIGluIEZTTCBpb21tdSBjb2RlIChzYXkN
Cj4gZHJpdmVycy9pb21tdS9mc2xfcGFtdV9kb21haW4uYyksIHJpZ2h0Pw0KDQpGdXJ0aGVyIHRo
aW5raW5nIG9mIHRoaXMsIHRoZXJlIGFyZSBtb3JlIHByb2JsZW1zIGhlcmU6DQogLSBMaWtlIE1T
SSBzdWJzeXN0ZW0gd2lsbCBjYWxsIGlvbW11X2dldF9kZXZfZG9tYWluKCksIHdoaWNoIHdpbGwg
dGFrZSBhIGxvY2ssIGZpbmQgdGhlIGRvbWFpbiBwb2ludGVyLCByZWxlYXNlIHRoZSBsb2NrLCBh
bmQgcmV0dXJuIHRoZSBkb21haW4NCiAtIE5vdyBpZiBkb21haW4gaW4gZnJlZWQgdXANCiAtIFdo
aWxlIE1TSSBzdWJzeXN0ZW0gdHJpZXMgdG8gZG8gd29yayBvbiBkb21haW4gKGxpa2UgZ2V0X2F0
dHJpYnV0ZS9zZXRfYXR0cmlidXRlIGV0YykgPz8/DQoNClNvIGNhbiB3ZSBkbyBsaWtlIGlvbW11
X2dldF9kZXZfZG9tYWluKCkgd2lsbCByZXR1cm4gZG9tYWluIHdpdGggdGhlIGxvY2sgaGVsZCwg
YW5kIGlvbW11X3B1dF9kZXZfZG9tYWluKCkgd2lsbCByZWxlYXNlIHRoZSBsb2NrPyBBbmQgaW9t
bXVfZ2V0X2Rldl9kb21haW4oKSBtdXN0IGFsd2F5cyBiZSBmb2xsb3dlZCBieSBpb21tdV9nZXRf
ZGV2X2RvbWFpbigpDQoNClRoYW5rcw0KLUJoYXJhdA0KDQo+IA0KPiBUaGFua3MNCj4gLUJoYXJh
dA0KPiANCj4gPg0KPiA+ID4gIC8qDQo+ID4gPiAgICogSU9NTVUgZ3JvdXBzIGFyZSByZWFsbHkg
dGhlIG5hdHJ1YWwgd29ya2luZyB1bml0IG9mIHRoZSBJT01NVSwgYnV0DQo+ID4gPiAgICogdGhl
IElPTU1VIEFQSSB3b3JrcyBvbiBkb21haW5zIGFuZCBkZXZpY2VzLiAgQnJpZGdlIHRoYXQgZ2Fw
IGJ5DQo+ID4gPiBkaWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51eC9pb21tdS5oIGIvaW5jbHVkZS9s
aW51eC9pb21tdS5oIGluZGV4DQo+ID4gPiA3ZWEzMTllLi5mYTA0NmJkIDEwMDY0NA0KPiA+ID4g
LS0tIGEvaW5jbHVkZS9saW51eC9pb21tdS5oDQo+ID4gPiArKysgYi9pbmNsdWRlL2xpbnV4L2lv
bW11LmgNCj4gPiA+IEBAIC0xMjcsNiArMTI3LDcgQEAgc3RydWN0IGlvbW11X29wcyB7DQo+ID4g
PiAgCWludCAoKmRvbWFpbl9zZXRfd2luZG93cykoc3RydWN0IGlvbW11X2RvbWFpbiAqZG9tYWlu
LCB1MzIgd19jb3VudCk7DQo+ID4gPiAgCS8qIEdldCB0aGUgbnVtZXIgb2Ygd2luZG93IHBlciBk
b21haW4gKi8NCj4gPiA+ICAJdTMyICgqZG9tYWluX2dldF93aW5kb3dzKShzdHJ1Y3QgaW9tbXVf
ZG9tYWluICpkb21haW4pOw0KPiA+ID4gKwlzdHJ1Y3QgaW9tbXVfZG9tYWluICooKmdldF9kZXZf
aW9tbXVfZG9tYWluKShzdHJ1Y3QgZGV2aWNlICpkZXYpOw0KPiA+ID4NCj4gPiA+ICAJdW5zaWdu
ZWQgbG9uZyBwZ3NpemVfYml0bWFwOw0KPiA+ID4gIH07DQo+ID4gPiBAQCAtMTkwLDYgKzE5MSw3
IEBAIGV4dGVybiBpbnQgaW9tbXVfZG9tYWluX3dpbmRvd19lbmFibGUoc3RydWN0DQo+ID4gPiBp
b21tdV9kb21haW4NCj4gPiAqZG9tYWluLCB1MzIgd25kX25yLA0KPiA+ID4gIAkJCQkgICAgICBw
aHlzX2FkZHJfdCBvZmZzZXQsIHU2NCBzaXplLA0KPiA+ID4gIAkJCQkgICAgICBpbnQgcHJvdCk7
DQo+ID4gPiAgZXh0ZXJuIHZvaWQgaW9tbXVfZG9tYWluX3dpbmRvd19kaXNhYmxlKHN0cnVjdCBp
b21tdV9kb21haW4NCj4gPiA+ICpkb21haW4sDQo+ID4gPiB1MzIgd25kX25yKTsNCj4gPiA+ICtl
eHRlcm4gc3RydWN0IGlvbW11X2RvbWFpbiAqaW9tbXVfZ2V0X2Rldl9kb21haW4oc3RydWN0IGRl
dmljZQ0KPiA+ID4gKypkZXYpOw0KPiA+ID4gIC8qKg0KPiA+ID4gICAqIHJlcG9ydF9pb21tdV9m
YXVsdCgpIC0gcmVwb3J0IGFib3V0IGFuIElPTU1VIGZhdWx0IHRvIHRoZSBJT01NVQ0KPiBmcmFt
ZXdvcmsNCj4gPiA+ICAgKiBAZG9tYWluOiB0aGUgaW9tbXUgZG9tYWluIHdoZXJlIHRoZSBmYXVs
dCBoYXMgaGFwcGVuZWQgQEAgLTI4NCw2DQo+ID4gPiArMjg2LDExIEBAIHN0YXRpYyBpbmxpbmUg
dm9pZCBpb21tdV9kb21haW5fd2luZG93X2Rpc2FibGUoc3RydWN0DQo+ID4gPiBpb21tdV9kb21h
aW4gKmRvbWFpbiwgIHsgIH0NCj4gPiA+DQo+ID4gPiArc3RhdGljIGlubGluZSBzdHJ1Y3QgaW9t
bXVfZG9tYWluICppb21tdV9nZXRfZGV2X2RvbWFpbihzdHJ1Y3QNCj4gPiA+ICtkZXZpY2UNCj4g
PiA+ICsqZGV2KSB7DQo+ID4gPiArCXJldHVybiBOVUxMOw0KPiA+ID4gK30NCj4gPiA+ICsNCj4g
PiA+ICBzdGF0aWMgaW5saW5lIHBoeXNfYWRkcl90IGlvbW11X2lvdmFfdG9fcGh5cyhzdHJ1Y3Qg
aW9tbXVfZG9tYWluDQo+ID4gPiAqZG9tYWluLCBkbWFfYWRkcl90IGlvdmEpICB7DQo+ID4gPiAg
CXJldHVybiAwOw0KPiA+DQo+ID4NCj4gPg0KPiA+IC0tDQo+ID4gVG8gdW5zdWJzY3JpYmUgZnJv
bSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LXBjaSINCj4gPiBp
biB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZyBNb3Jl
IG1ham9yZG9tbw0KPiA+IGluZm8gYXQgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8t
aW5mby5odG1sDQoNCg==

^ permalink raw reply

* Re: [PATCH 0/2] move of_find_next_cache_node to DT core
From: Sudeep KarkadaNagesha @ 2013-10-04 10:42 UTC (permalink / raw)
  To: Sudeep KarkadaNagesha, Grant Likely
  Cc: devicetree@vger.kernel.org, Rob Herring,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5239D26E.5090505@arm.com>

Hi Grant,

On 18/09/13 17:18, Sudeep KarkadaNagesha wrote:
> On 18/09/13 15:51, Grant Likely wrote:
>> On Wed, 18 Sep 2013 11:53:03 +0100, Sudeep KarkadaNagesha <Sudeep.Karkad=
aNagesha@arm.com> wrote:
>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>
>>> Hi,
>>>
>>> The cache bindings are generic and used by many other architectures
>>> apart from PPC. These patches fixes and move the existing definition
>>> of of_find_next_cache_node to DT common code.
>>>
>>> Regards,
>>> Sudeep
>>
>> Acked-by: Grant Likely <grant.likely@secretlab.ca>
>>
>> However, do you have a user for this function on other architectures
>> yet? I'd like to see a user for the function in the same patch series..
>>
>=20
> Yes I have posted an RFC[1] following this series implementing cacheinfo
> for ARM similar to x86 implementation. I was not sure if it's good idea
> to combine it as its still initial RFC version.
>=20

Do you prefer to have this a independent change or to go with the cache top=
ology
support patches[1] on ARM ?

Regards,
Sudeep

[1] https://lkml.org/lkml/2013/9/18/340

^ permalink raw reply

* Gianfar driver crashes in Kernel v3.10
From: Thomas Hühn @ 2013-10-04 12:03 UTC (permalink / raw)
  To: claudiu.manoil@freescale.com; +Cc: linuxppc-dev@lists.ozlabs.org

Hi all,

We are several Openwrt users based on the TPlink 4900 device and suffer fro=
m a crashing gianfar driver.
We troubleshooted the problem down to the fact, that a 3.8er Linux kernel i=
s working, and a v3.10 crashes, but there is
no reproducable case yet. The driver crashes after a couple of minutes but =
this can not be triggered by high network load, or routing traffic.
I recorded the crash via a serial line and did a gdb lookup in gainfar.c
All infos and logs we collected so far are here: https://forum.openwrt.org/=
viewtopic.php?pid=3D213901#p213901

I cc the linuxppc-dev mailing but not sure this is the rigth one.
Please let us know how we could help to find that bug within the gianfar NA=
PI.

Greetings Thomas




ps: here is my last troubleshooting log on the openwrt mailing list

I just hooked up a serial line to my tplinl4900. Used a recent trunk image =
and could catch the output of the crash.
The problem comes from the ethernet driver gfar

[code]
[ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2671.847141] Freescale P1014
[ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath9k_c=
ommon pppox p
e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt=
_pkttype xt_o
mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt=
_LOG xt_IPMAR
ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf_nat=
_sip nf_nat_r
ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_con=
ntrack_h323 n
 compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sch_ht=
b sch_gred sc
skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch=
_hfsc sch_ing
r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod fsl=
_mph_dr_of gp
[ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
[ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
[ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
[ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
[ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
[ 2672.017125]=20
GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000 70000=
000=20
GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000 00000=
008=20
GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086 fffff=
800=20
GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000 00000=
001=20
[ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
[ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
[ 2672.055962] Call Trace:
[ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
[ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
[ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
[ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
[ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
[ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
[ 2672.090991] --- Exception: 501 at 0x48083c28
[ 2672.090991]     LR =3D 0x48083bf8
[ 2672.098378] Instruction dump:
[ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378 81c100=
50 7ffafb78=20
[ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c 4800=
0060 41a2004c
[ 2672.117021] ---[ end trace 565fb54528d305fa ]---
[ 2672.121628]=20
[ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
[ 2673.109474] Rebooting in 3 seconds..

U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
[/code]


A cross-gdb lookup to gianfar.o shows that the problem appier in function "=
gfar_poll"

[code]
./gdb ../../../target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3=
.10.12/drivers/net/ethernet/freescale/gianfar.o

This GDB was configured as "--host=3Dx86_64-linux-gnu --target=3Dpowerpc-op=
enwrt-linux-uclibcspe".
For bug reporting instructions, please see:
<[url]http://bugs.launchpad.net/gdb-linaro/[/url]>...
Reading symbols from /home/thomas/BB-evernet/build_dir/target-powerpc_uClib=
c-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freesca=
le/gianfar.o...done.
(gdb) l *gfar_poll+0x2f8/0x520
0x4538 is in gfar_poll (drivers/net/ethernet/freescale/gianfar.c:2829).
2824
2825            return howmany;
2826    }
2827
2828    static int gfar_poll(struct napi_struct *napi, int budget)
2829    {
2830            struct gfar_priv_grp *gfargrp =3D
2831                    container_of(napi, struct gfar_priv_grp, napi);
2832            struct gfar_private *priv =3D gfargrp->priv;
2833            struct gfar __iomem *regs =3D gfargrp->regs;
(gdb) q

[/code]


The changes from Linux kernel 3.8, which seems to have proper working ehter=
net, to the current 3.10 seem to intruduce a bug in the GIANFAR driver: dri=
vers/net/ethernet/freescale/gianfra.c
There were different changes in the NAPI of gianfar driver made between the=
 two kernel versions.=20
You can have a look at them by doin a "git whatchanged -p v3.8..v3.10 drive=
rs/net/ethernet/freescale/gianfar.c" in a recent Linux kernel verion.

[b]So let us all have a look to those changes to find the bug !!![/b]

Probably the maintainer of the gianfar driver should be included here. Clau=
diu Manoil <claudiu.manoil@freescale.com>


So far from troubleshooting.

Greetings Bluse=

^ permalink raw reply

* Re: [PATCH] kvm: powerpc: book3s: Fix build break for BOOK3S_32
From: Alexander Graf @ 2013-10-04 12:23 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <20131003041415.GA2801@iris.ozlabs.ibm.com>


On 03.10.2013, at 06:14, Paul Mackerras wrote:

> On Wed, Oct 02, 2013 at 08:08:44PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>=20
>> This was introduced by 85a0d845d8bb5df5d2669416212f56cbe1474c6b
>=20
> It's a good idea to give the headline of the commit as well as the ID.
> I also like to trim the ID to 10 characters or so.  So it should look
> like this:
>=20
> This was introduced by 85a0d845d8 ("KVM: PPC: Book3S PR: Allocate
> kvm_vcpu structs from kvm_vcpu_cache").
>=20
>> arch/powerpc/kvm/book3s_pr.c: In function 'kvmppc_core_vcpu_create':
>> arch/powerpc/kvm/book3s_pr.c:1182:30: error: 'struct =
kvmppc_vcpu_book3s' has no member named 'shadow_vcpu'
>> make[1]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
>>=20
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>=20
> Acked-by: Paul Mackerras <paulus@samba.org>

Would you guys mind if I merge this into the offending patch? It's not =
trickled into -next yet, so rebasing should work.

If not, please resend with the fixed commit message.


Alex

^ permalink raw reply

* [RFC PATCH] PPC: KVM: vfio kvm device: support spapr tce
From: Alexey Kardashevskiy @ 2013-10-04 12:24 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alex Williamson, kvm-ppc, kvm, linux-kernel
In-Reply-To: <20131001200757.31715.22994.stgit@bling.home>

This is a very rough change set required for ppc64 to use this KVM device.

vfio_rm.c is a piece of code which is going to be called from the realmode (MMU off),
and I will put everything spapr-related under #ifdef CONFIG_SPAPR_TCE_IOMMU,
it is just friday and I have to run :)

This is an RFC but it works.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/Kconfig  |  1 +
 arch/powerpc/kvm/Makefile |  4 ++++
 include/linux/kvm_host.h  |  8 ++++---
 include/linux/vfio.h      |  3 +++
 include/uapi/linux/kvm.h  |  1 +
 virt/kvm/vfio.c           | 46 ++++++++++++++++++++++++++++++++++++++++
 virt/kvm/vfio_rm.c        | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 114 insertions(+), 3 deletions(-)
 create mode 100644 virt/kvm/vfio_rm.c

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 61b3535..d1b7f64 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -60,6 +60,7 @@ config KVM_BOOK3S_64
 	select KVM_BOOK3S_64_HANDLER
 	select KVM
 	select SPAPR_TCE_IOMMU
+	select KVM_VFIO
 	---help---
 	  Support running unmodified book3s_64 and book3s_32 guest kernels
 	  in virtual machines on book3s_64 host processors.
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 6646c95..fc2878b 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -55,6 +55,8 @@ kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
 
 kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
 	$(KVM)/coalesced_mmio.o \
+	$(KVM)/vfio.o \
+	$(KVM)/vfio_rm.o \
 	fpu.o \
 	book3s_paired_singles.o \
 	book3s_pr.o \
@@ -76,6 +78,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
 kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
 	book3s_hv_rm_xics.o
 kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
+	$(KVM)/vfio_rm.o \
 	book3s_hv_rmhandlers.o \
 	book3s_hv_rm_mmu.o \
 	book3s_64_vio_hv.o \
@@ -89,6 +92,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
 
 kvm-book3s_64-module-objs := \
 	$(KVM)/kvm_main.o \
+	$(KVM)/vfio.o \
 	$(KVM)/eventfd.o \
 	powerpc.o \
 	emulate.o \
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad2b581..43c0290 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -407,6 +407,8 @@ struct kvm {
 #endif
 	long tlbs_dirty;
 	struct list_head devices;
+
+	struct kvm_vfio *vfio;
 };
 
 #define kvm_err(fmt, ...) \
@@ -677,15 +679,15 @@ void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
 bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
 #else
-static inline void kvm_arch_register_noncoherent_dma(void)
+static inline void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
 {
 }
 
-static inline void kvm_arch_unregister_noncoherent_dma(void)
+static inline void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
 {
 }
 
-static inline bool kvm_arch_has_noncoherent_dma(void)
+static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 {
 	return false;
 }
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 24579a0..681e19b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -97,4 +97,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 
+extern struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm,
+		unsigned long liobn);
+
 #endif /* VFIO_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7c1a349..a74ad16 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -847,6 +847,7 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP			1
 #define   KVM_DEV_VFIO_GROUP_ADD			1
 #define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_SPAPR_TCE_LIOBN		2
 
 /*
  * ioctls for VM fds
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 2e336a7..39dea9f 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -22,6 +22,7 @@
 struct kvm_vfio_group {
 	struct list_head node;
 	struct vfio_group *vfio_group;
+	uint64_t liobn; /* sPAPR */
 };
 
 struct kvm_vfio {
@@ -188,12 +189,52 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 	return -ENXIO;
 }
 
+static int kvm_vfio_set_spapr_tce_liobn(struct kvm_device *dev,
+		long attr, u64 arg)
+{
+	struct kvm_vfio *kv = dev->private;
+	struct vfio_group *vfio_group;
+	struct kvm_vfio_group *kvg;
+	void __user *argp = (void __user *)arg;
+	struct fd f;
+	int32_t fd;
+	uint64_t liobn = attr;
+
+	if (get_user(fd, (int32_t __user *)argp))
+		return -EFAULT;
+
+	f = fdget(fd);
+	if (!f.file)
+		return -EBADF;
+
+	vfio_group = kvm_vfio_group_get_external_user(f.file);
+	fdput(f);
+
+	list_for_each_entry(kvg, &kv->group_list, node) {
+		if (kvg->vfio_group == vfio_group) {
+			WARN_ON(kvg->liobn);
+			kvg->liobn = liobn;
+			kvm_vfio_group_put_external_user(vfio_group);
+			return 0;
+		}
+	}
+
+	kvm_vfio_group_put_external_user(vfio_group);
+
+	return -ENXIO;
+}
+
 static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
 		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
+		return kvm_vfio_set_spapr_tce_liobn(dev, attr->attr,
+				attr->addr);
+#endif
 	}
 
 	return -ENXIO;
@@ -211,6 +252,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 		}
 
 		break;
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
+		return 0;
+#endif
 	}
 
 	return -ENXIO;
@@ -250,6 +295,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
+	dev->kvm->vfio = kv;
 
 	return 0;
 }
diff --git a/virt/kvm/vfio_rm.c b/virt/kvm/vfio_rm.c
new file mode 100644
index 0000000..ee9fd96
--- /dev/null
+++ b/virt/kvm/vfio_rm.c
@@ -0,0 +1,54 @@
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+
+struct kvm_vfio_group {
+	struct list_head node;
+	struct vfio_group *vfio_group;
+	uint64_t liobn; /* sPAPR */
+};
+
+struct kvm_vfio {
+	struct list_head group_list;
+	struct mutex lock;
+	bool noncoherent;
+};
+
+struct vfio_group {
+	struct kref			kref;
+	int				minor;
+	atomic_t			container_users;
+	struct iommu_group		*iommu_group;
+	struct vfio_container		*container;
+	struct list_head		device_list;
+	struct mutex			device_lock;
+	struct device			*dev;
+	struct notifier_block		nb;
+	struct list_head		vfio_next;
+	struct list_head		container_next;
+	atomic_t			opened;
+};
+
+struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm, unsigned long liobn)
+{
+	struct kvm_vfio_group *kvg;
+
+	if (!kvm->vfio)
+		return NULL;
+
+	list_for_each_entry(kvg, &kvm->vfio->group_list, node) {
+		if (kvg->liobn == liobn)
+			return kvg->vfio_group->iommu_group;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_find_group_by_liobn);
+
+
-- 
1.8.4.rc4

^ permalink raw reply related

* Re: [PATCH] kvm: powerpc: book3s: Fix build break for BOOK3S_32
From: Alexander Graf @ 2013-10-04 12:27 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <EB193553-5B3C-4669-9C71-BACC6882D18B@suse.de>


On 04.10.2013, at 14:23, Alexander Graf wrote:

>=20
> On 03.10.2013, at 06:14, Paul Mackerras wrote:
>=20
>> On Wed, Oct 02, 2013 at 08:08:44PM +0530, Aneesh Kumar K.V wrote:
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>=20
>>> This was introduced by 85a0d845d8bb5df5d2669416212f56cbe1474c6b
>>=20
>> It's a good idea to give the headline of the commit as well as the =
ID.
>> I also like to trim the ID to 10 characters or so.  So it should look
>> like this:
>>=20
>> This was introduced by 85a0d845d8 ("KVM: PPC: Book3S PR: Allocate
>> kvm_vcpu structs from kvm_vcpu_cache").
>>=20
>>> arch/powerpc/kvm/book3s_pr.c: In function 'kvmppc_core_vcpu_create':
>>> arch/powerpc/kvm/book3s_pr.c:1182:30: error: 'struct =
kvmppc_vcpu_book3s' has no member named 'shadow_vcpu'
>>> make[1]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
>>>=20
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>=20
>> Acked-by: Paul Mackerras <paulus@samba.org>
>=20
> Would you guys mind if I merge this into the offending patch? It's not =
trickled into -next yet, so rebasing should work.
>=20
> If not, please resend with the fixed commit message.

Eh - I must've missed v2 :). So that leaves only the question on whether =
you'd be ok to squash the patch instead. It'd help bisectability.


Alex

^ permalink raw reply

* Gianfar driver crashes in Kernel v3.10
From: Thomas Hühn @ 2013-10-04 12:28 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

[-- Attachment #1: Type: text/plain, Size: 4881 bytes --]

Hi all,

We are several Openwrt users based on the TPlink 4900 device and suffer from a crashing gianfar driver.
We troubleshooted the problem down to the fact, that a 3.8er Linux kernel is working, and a v3.10 crashes, but there is
no reproducable case yet. The driver crashes after a couple of minutes but this can not be triggered by high network load, or routing traffic.
I recorded the crash via a serial line and did a gdb lookup in gainfar.c
All infos and logs we collected so far in the OpenWRt forum:https://forum.openwrt.org/viewtopic.php?pid=213901#p213901

Here is my last troubleshooting log on the openwrt mailing list

I just hooked up a serial line to my tplinl4900. Used a recent trunk image and could catch the output of the crash.
The problem comes from the ethernet driver gfar

[code]
[ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2671.847141] Freescale P1014
[ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath9k_common pppox p
e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_o
mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_IPMAR
ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_r
ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 n
compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sch_htb sch_gred sc
skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ing
r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod fsl_mph_dr_of gp
[ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
[ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
[ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
[ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
[ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
[ 2672.017125] 
GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000 70000000 
GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000 00000008 
GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086 fffff800 
GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000 00000001 
[ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
[ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
[ 2672.055962] Call Trace:
[ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
[ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
[ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
[ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
[ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
[ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
[ 2672.090991] --- Exception: 501 at 0x48083c28
[ 2672.090991]     LR = 0x48083bf8
[ 2672.098378] Instruction dump:
[ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378 81c10050 7ffafb78 
[ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c 48000060 41a2004c
[ 2672.117021] ---[ end trace 565fb54528d305fa ]---
[ 2672.121628] 
[ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
[ 2673.109474] Rebooting in 3 seconds..

U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
[/code]


A cross-gdb lookup to gianfar.o shows that the problem appier in function "gfar_poll"

[code]
./gdb ../../../target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o

This GDB was configured as "--host=x86_64-linux-gnu --target=powerpc-openwrt-linux-uclibcspe".
For bug reporting instructions, please see:
<[url]http://bugs.launchpad.net/gdb-linaro/[/url]>...
Reading symbols from /home/thomas/BB-evernet/build_dir/target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o...done.
(gdb) l *gfar_poll+0x2f8/0x520
0x4538 is in gfar_poll (drivers/net/ethernet/freescale/gianfar.c:2829).
2824
2825            return howmany;
2826    }
2827
2828    static int gfar_poll(struct napi_struct *napi, int budget)
2829    {
2830            struct gfar_priv_grp *gfargrp =
2831                    container_of(napi, struct gfar_priv_grp, napi);
2832            struct gfar_private *priv = gfargrp->priv;
2833            struct gfar __iomem *regs = gfargrp->regs;
(gdb) q

[/code]


The changes from Linux kernel 3.8, which seems to have proper working ehternet, to the current 3.10 seem to intruduce a bug in the GIANFAR driver: drivers/net/ethernet/freescale/gianfra.c
There were different changes in the NAPI of gianfar driver made between the two kernel versions. 
Please let us know which next troubleshooting step you would recommend to nail down the issue.

So far from troubleshooting.

Greetings Thomas

[-- Attachment #2: Type: text/html, Size: 5864 bytes --]

^ permalink raw reply

* Re: [PATCH] kvm: powerpc: book3s: Fix build break for BOOK3S_32
From: Paul Mackerras @ 2013-10-04 12:35 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <36F00CBE-E313-465A-BD93-A8E1B96FD2A9@suse.de>

On Fri, Oct 04, 2013 at 02:27:02PM +0200, Alexander Graf wrote:
> 
> On 04.10.2013, at 14:23, Alexander Graf wrote:
> 
> > 
> > On 03.10.2013, at 06:14, Paul Mackerras wrote:
> > 
> >> On Wed, Oct 02, 2013 at 08:08:44PM +0530, Aneesh Kumar K.V wrote:
> >>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >>> 
> >>> This was introduced by 85a0d845d8bb5df5d2669416212f56cbe1474c6b
> >> 
> >> It's a good idea to give the headline of the commit as well as the ID.
> >> I also like to trim the ID to 10 characters or so.  So it should look
> >> like this:
> >> 
> >> This was introduced by 85a0d845d8 ("KVM: PPC: Book3S PR: Allocate
> >> kvm_vcpu structs from kvm_vcpu_cache").
> >> 
> >>> arch/powerpc/kvm/book3s_pr.c: In function 'kvmppc_core_vcpu_create':
> >>> arch/powerpc/kvm/book3s_pr.c:1182:30: error: 'struct kvmppc_vcpu_book3s' has no member named 'shadow_vcpu'
> >>> make[1]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
> >>> 
> >>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> Acked-by: Paul Mackerras <paulus@samba.org>
> > 
> > Would you guys mind if I merge this into the offending patch? It's not trickled into -next yet, so rebasing should work.
> > 
> > If not, please resend with the fixed commit message.
> 
> Eh - I must've missed v2 :). So that leaves only the question on whether you'd be ok to squash the patch instead. It'd help bisectability.

I'm OK with that.  If you do, why don't you squash the first of the
two patches that I just sent into the commit it fixes as well?

Paul.

^ permalink raw reply

* Re: [PATCH] kvm: powerpc: book3s: Fix build break for BOOK3S_32
From: Alexander Graf @ 2013-10-04 13:00 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <20131004123545.GC26884@iris.ozlabs.ibm.com>


On 04.10.2013, at 14:35, Paul Mackerras wrote:

> On Fri, Oct 04, 2013 at 02:27:02PM +0200, Alexander Graf wrote:
>>=20
>> On 04.10.2013, at 14:23, Alexander Graf wrote:
>>=20
>>>=20
>>> On 03.10.2013, at 06:14, Paul Mackerras wrote:
>>>=20
>>>> On Wed, Oct 02, 2013 at 08:08:44PM +0530, Aneesh Kumar K.V wrote:
>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>>>=20
>>>>> This was introduced by 85a0d845d8bb5df5d2669416212f56cbe1474c6b
>>>>=20
>>>> It's a good idea to give the headline of the commit as well as the =
ID.
>>>> I also like to trim the ID to 10 characters or so.  So it should =
look
>>>> like this:
>>>>=20
>>>> This was introduced by 85a0d845d8 ("KVM: PPC: Book3S PR: Allocate
>>>> kvm_vcpu structs from kvm_vcpu_cache").
>>>>=20
>>>>> arch/powerpc/kvm/book3s_pr.c: In function =
'kvmppc_core_vcpu_create':
>>>>> arch/powerpc/kvm/book3s_pr.c:1182:30: error: 'struct =
kvmppc_vcpu_book3s' has no member named 'shadow_vcpu'
>>>>> make[1]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
>>>>>=20
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>>=20
>>>> Acked-by: Paul Mackerras <paulus@samba.org>
>>>=20
>>> Would you guys mind if I merge this into the offending patch? It's =
not trickled into -next yet, so rebasing should work.
>>>=20
>>> If not, please resend with the fixed commit message.
>>=20
>> Eh - I must've missed v2 :). So that leaves only the question on =
whether you'd be ok to squash the patch instead. It'd help =
bisectability.
>=20
> I'm OK with that.  If you do, why don't you squash the first of the
> two patches that I just sent into the commit it fixes as well?

Because patch 1/2 spans two separate commits it would have to get =
squashed into (6aa82e, 70afec) and patch 2/2 doesn't make sense to get =
squashed anywhere :).


Alex

^ permalink raw reply

* Re: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in lookup_linux_pte
From: Alexander Graf @ 2013-10-04 13:27 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: kvm, kvm-ppc, Bharat Bhushan, paulus, scottwood, linuxppc-dev
In-Reply-To: <1379570566-3715-5-git-send-email-Bharat.Bhushan@freescale.com>


On 19.09.2013, at 08:02, Bharat Bhushan wrote:

> lookup_linux_pte() was searching for a pte and also sets access
> flags is writable. This function now searches only pte while
> access flag setting is done explicitly.
>=20
> This pte lookup is not kvm specific, so moved to common code =
(asm/pgtable.h)
> My Followup patch will use this on booke.
>=20
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> v4->v5
> - No change
>=20
> arch/powerpc/include/asm/pgtable.h  |   24 +++++++++++++++++++++++
> arch/powerpc/kvm/book3s_hv_rm_mmu.c |   36 =
+++++++++++-----------------------
> 2 files changed, 36 insertions(+), 24 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/pgtable.h =
b/arch/powerpc/include/asm/pgtable.h
> index 7d6eacf..3a5de5c 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -223,6 +223,30 @@ extern int gup_hugepte(pte_t *ptep, unsigned long =
sz, unsigned long addr,
> #endif
> pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
> 				 unsigned *shift);
> +
> +static inline pte_t *lookup_linux_pte(pgd_t *pgdir, unsigned long =
hva,
> +				     unsigned long *pte_sizep)
> +{
> +	pte_t *ptep;
> +	unsigned long ps =3D *pte_sizep;
> +	unsigned int shift;
> +
> +	ptep =3D find_linux_pte_or_hugepte(pgdir, hva, &shift);
> +	if (!ptep)
> +		return __pte(0);

This returns a struct pte_t, but your return value of the function is a =
struct pte_t *. So this code will fail compiling with =
STRICT_MM_TYPECHECKS set. Any reason you don't just return NULL here?

That way callers could simply check on if (ptep) ... or you leave the =
return value as struct pte_t.


Alex

> +	if (shift)
> +		*pte_sizep =3D 1ul << shift;
> +	else
> +		*pte_sizep =3D PAGE_SIZE;
> +
> +	if (ps > *pte_sizep)
> +		return __pte(0);
> +
> +	if (!pte_present(*ptep))
> +		return __pte(0);

> +
> +	return ptep;
> +}
> #endif /* __ASSEMBLY__ */
>=20
> #endif /* __KERNEL__ */
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c =
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 45e30d6..74fa7f8 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -134,25 +134,6 @@ static void remove_revmap_chain(struct kvm *kvm, =
long pte_index,
> 	unlock_rmap(rmap);
> }
>=20
> -static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> -			      int writing, unsigned long *pte_sizep)
> -{
> -	pte_t *ptep;
> -	unsigned long ps =3D *pte_sizep;
> -	unsigned int hugepage_shift;
> -
> -	ptep =3D find_linux_pte_or_hugepte(pgdir, hva, &hugepage_shift);
> -	if (!ptep)
> -		return __pte(0);
> -	if (hugepage_shift)
> -		*pte_sizep =3D 1ul << hugepage_shift;
> -	else
> -		*pte_sizep =3D PAGE_SIZE;
> -	if (ps > *pte_sizep)
> -		return __pte(0);
> -	return kvmppc_read_update_linux_pte(ptep, writing, =
hugepage_shift);
> -}
> -
> static inline void unlock_hpte(unsigned long *hpte, unsigned long =
hpte_v)
> {
> 	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
> @@ -173,6 +154,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned =
long flags,
> 	unsigned long is_io;
> 	unsigned long *rmap;
> 	pte_t pte;
> +	pte_t *ptep;
> 	unsigned int writing;
> 	unsigned long mmu_seq;
> 	unsigned long rcbits;
> @@ -231,8 +213,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned =
long flags,
>=20
> 		/* Look up the Linux PTE for the backing page */
> 		pte_size =3D psize;
> -		pte =3D lookup_linux_pte(pgdir, hva, writing, =
&pte_size);
> -		if (pte_present(pte)) {
> +		ptep =3D lookup_linux_pte(pgdir, hva, &pte_size);
> +		if (pte_present(pte_val(*ptep))) {
> +			pte =3D kvmppc_read_update_linux_pte(ptep, =
writing);
> 			if (writing && !pte_write(pte))
> 				/* make the actual HPTE be read-only */
> 				ptel =3D hpte_make_readonly(ptel);
> @@ -661,15 +644,20 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, =
unsigned long flags,
> 			struct kvm_memory_slot *memslot;
> 			pgd_t *pgdir =3D vcpu->arch.pgdir;
> 			pte_t pte;
> +			pte_t *ptep;
>=20
> 			psize =3D hpte_page_size(v, r);
> 			gfn =3D ((r & HPTE_R_RPN) & ~(psize - 1)) >> =
PAGE_SHIFT;
> 			memslot =3D __gfn_to_memslot(kvm_memslots(kvm), =
gfn);
> 			if (memslot) {
> 				hva =3D __gfn_to_hva_memslot(memslot, =
gfn);
> -				pte =3D lookup_linux_pte(pgdir, hva, 1, =
&psize);
> -				if (pte_present(pte) && !pte_write(pte))
> -					r =3D hpte_make_readonly(r);
> +				ptep =3D lookup_linux_pte(pgdir, hva, =
&psize);
> +				if (pte_present(pte_val(*ptep))) {
> +					pte =3D =
kvmppc_read_update_linux_pte(ptep,
> +									 =
  1);
> +					if (pte_present(pte) && =
!pte_write(pte))
> +						r =3D =
hpte_make_readonly(r);
> +				}
> 			}
> 		}
> 	}
> --=20
> 1.7.0.4
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in lookup_linux_pte
From: Bhushan Bharat-R65777 @ 2013-10-04 13:44 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Wood Scott-B07421, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	paulus@samba.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <7DEA9422-69D4-4470-BE28-A3B0A623D000@suse.de>



> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, October 04, 2013 6:57 PM
> To: Bhushan Bharat-R65777
> Cc: benh@kernel.crashing.org; paulus@samba.org; kvm@vger.kernel.org; kvm-
> ppc@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Bh=
ushan
> Bharat-R65777
> Subject: Re: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in
> lookup_linux_pte
>=20
>=20
> On 19.09.2013, at 08:02, Bharat Bhushan wrote:
>=20
> > lookup_linux_pte() was searching for a pte and also sets access flags
> > is writable. This function now searches only pte while access flag
> > setting is done explicitly.
> >
> > This pte lookup is not kvm specific, so moved to common code
> > (asm/pgtable.h) My Followup patch will use this on booke.
> >
> > Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> > ---
> > v4->v5
> > - No change
> >
> > arch/powerpc/include/asm/pgtable.h  |   24 +++++++++++++++++++++++
> > arch/powerpc/kvm/book3s_hv_rm_mmu.c |   36 +++++++++++-----------------=
------
> > 2 files changed, 36 insertions(+), 24 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pgtable.h
> > b/arch/powerpc/include/asm/pgtable.h
> > index 7d6eacf..3a5de5c 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -223,6 +223,30 @@ extern int gup_hugepte(pte_t *ptep, unsigned long
> > sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t
> > *pgdir, unsigned long ea,
> > 				 unsigned *shift);
> > +
> > +static inline pte_t *lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> > +				     unsigned long *pte_sizep)
> > +{
> > +	pte_t *ptep;
> > +	unsigned long ps =3D *pte_sizep;
> > +	unsigned int shift;
> > +
> > +	ptep =3D find_linux_pte_or_hugepte(pgdir, hva, &shift);
> > +	if (!ptep)
> > +		return __pte(0);
>=20
> This returns a struct pte_t, but your return value of the function is a s=
truct
> pte_t *. So this code will fail compiling with STRICT_MM_TYPECHECKS set. =
Any
> reason you don't just return NULL here?

I want to return the ptep (pte pointer) , so yes this should be NULL.
Will correct this.

Thanks
-Bharat

>=20
> That way callers could simply check on if (ptep) ... or you leave the ret=
urn
> value as struct pte_t.
>=20
>=20
> Alex
>=20
> > +	if (shift)
> > +		*pte_sizep =3D 1ul << shift;
> > +	else
> > +		*pte_sizep =3D PAGE_SIZE;
> > +
> > +	if (ps > *pte_sizep)
> > +		return __pte(0);
> > +
> > +	if (!pte_present(*ptep))
> > +		return __pte(0);
>=20
> > +
> > +	return ptep;
> > +}
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* __KERNEL__ */
> > diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > index 45e30d6..74fa7f8 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > @@ -134,25 +134,6 @@ static void remove_revmap_chain(struct kvm *kvm, l=
ong
> pte_index,
> > 	unlock_rmap(rmap);
> > }
> >
> > -static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> > -			      int writing, unsigned long *pte_sizep)
> > -{
> > -	pte_t *ptep;
> > -	unsigned long ps =3D *pte_sizep;
> > -	unsigned int hugepage_shift;
> > -
> > -	ptep =3D find_linux_pte_or_hugepte(pgdir, hva, &hugepage_shift);
> > -	if (!ptep)
> > -		return __pte(0);
> > -	if (hugepage_shift)
> > -		*pte_sizep =3D 1ul << hugepage_shift;
> > -	else
> > -		*pte_sizep =3D PAGE_SIZE;
> > -	if (ps > *pte_sizep)
> > -		return __pte(0);
> > -	return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
> > -}
> > -
> > static inline void unlock_hpte(unsigned long *hpte, unsigned long
> > hpte_v) {
> > 	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory"); @@ -173,6 +154,7
> > @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
> > 	unsigned long is_io;
> > 	unsigned long *rmap;
> > 	pte_t pte;
> > +	pte_t *ptep;
> > 	unsigned int writing;
> > 	unsigned long mmu_seq;
> > 	unsigned long rcbits;
> > @@ -231,8 +213,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned
> > long flags,
> >
> > 		/* Look up the Linux PTE for the backing page */
> > 		pte_size =3D psize;
> > -		pte =3D lookup_linux_pte(pgdir, hva, writing, &pte_size);
> > -		if (pte_present(pte)) {
> > +		ptep =3D lookup_linux_pte(pgdir, hva, &pte_size);
> > +		if (pte_present(pte_val(*ptep))) {
> > +			pte =3D kvmppc_read_update_linux_pte(ptep, writing);
> > 			if (writing && !pte_write(pte))
> > 				/* make the actual HPTE be read-only */
> > 				ptel =3D hpte_make_readonly(ptel);
> > @@ -661,15 +644,20 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsi=
gned
> long flags,
> > 			struct kvm_memory_slot *memslot;
> > 			pgd_t *pgdir =3D vcpu->arch.pgdir;
> > 			pte_t pte;
> > +			pte_t *ptep;
> >
> > 			psize =3D hpte_page_size(v, r);
> > 			gfn =3D ((r & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
> > 			memslot =3D __gfn_to_memslot(kvm_memslots(kvm), gfn);
> > 			if (memslot) {
> > 				hva =3D __gfn_to_hva_memslot(memslot, gfn);
> > -				pte =3D lookup_linux_pte(pgdir, hva, 1, &psize);
> > -				if (pte_present(pte) && !pte_write(pte))
> > -					r =3D hpte_make_readonly(r);
> > +				ptep =3D lookup_linux_pte(pgdir, hva, &psize);
> > +				if (pte_present(pte_val(*ptep))) {
> > +					pte =3D kvmppc_read_update_linux_pte(ptep,
> > +									   1);
> > +					if (pte_present(pte) && !pte_write(pte))
> > +						r =3D hpte_make_readonly(r);
> > +				}
> > 			}
> > 		}
> > 	}
> > --
> > 1.7.0.4
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> > the body of a message to majordomo@vger.kernel.org More majordomo info
> > at  http://vger.kernel.org/majordomo-info.html
>=20

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox