The Linux Kernel Mailing List

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked
From: Michal Hocko @ 2012-11-30 16:53 UTC (permalink / raw)
  To: azurIt
  Cc: linux-kernel, linux-mm, cgroups mailinglist, KAMEZAWA Hiroyuki,
	Johannes Weiner
In-Reply-To: <20121130172651.B6917602@pobox.sk>

On Fri 30-11-12 17:26:51, azurIt wrote:
> >Could you also post your complete containers configuration, maybe there
> >is something strange in there (basically grep . -r YOUR_CGROUP_MNT
> >except for tasks files which are of no use right now).
> 
> 
> Here it is:
> http://www.watchdog.sk/lkml/cgroups.gz

The only strange thing I noticed is that some groups have 0 limit. Is
this intentional?
grep memory.limit_in_bytes cgroups | grep -v uid | sed 's@.*/@@' | sort | uniq -c
      3 memory.limit_in_bytes:0
    254 memory.limit_in_bytes:104857600
    107 memory.limit_in_bytes:157286400
     68 memory.limit_in_bytes:209715200
     10 memory.limit_in_bytes:262144000
     28 memory.limit_in_bytes:314572800
      1 memory.limit_in_bytes:346030080
      1 memory.limit_in_bytes:524288000
      2 memory.limit_in_bytes:9223372036854775807
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: FW: [PATCH v2] mmc: sdhci: apply voltage range check only for non-fixed regulators
From: Chris Ball @ 2012-11-30 16:48 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Kevin Liu, linux-mmc, linux-kernel, kyungmin.park, Mark Brown,
	lrg, Philip Rakity
In-Reply-To: <50AB9B90.1030404@samsung.com>

Hi Marek,

On Tue, Nov 20 2012, Marek Szyprowski wrote:
> The problem with dummy regulator is the fact that it can be enabled only
> globally for all devices in the system. I think that the best solution
> would be to introduce regulator_can_change_voltage() as Mark suggested.
> I will post patches soon.

Does this mean that I shouldn't merge either yours or Kevin's patch for
3.8, while we wait for this?  Any ETA on it?

Thanks very much,

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply

* Re: [PATCH] vfio powerpc: enabled on powernv platform
From: Alex Williamson @ 2012-11-30 16:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	linux-kernel, David Gibson
In-Reply-To: <1354256043-24963-1-git-send-email-aik@ozlabs.ru>

On Fri, 2012-11-30 at 17:14 +1100, Alexey Kardashevskiy wrote:
> This patch initializes IOMMU groups based on the IOMMU
> configuration discovered during the PCI scan on POWERNV
> (POWER non virtualized) platform. The IOMMU groups are
> to be used later by VFIO driver (PCI pass through).
> 
> It also implements an API for mapping/unmapping pages for
> guest PCI drivers and providing DMA window properties.
> This API is going to be used later by QEMU-VFIO to handle
> h_put_tce hypercalls from the KVM guest.
> 
> Although this driver has been tested only on the POWERNV
> platform, it should work on any platform which supports
> TCE tables.
> 
> To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
> option and configure VFIO as required.
> 
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/iommu.h     |    9 ++
>  arch/powerpc/kernel/iommu.c          |  186 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/pci.c |  135 ++++++++++++++++++++++++
>  drivers/iommu/Kconfig                |    8 ++
>  4 files changed, 338 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index cbfe678..5c7087a 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -76,6 +76,9 @@ struct iommu_table {
>  	struct iommu_pool large_pool;
>  	struct iommu_pool pools[IOMMU_NR_POOLS];
>  	unsigned long *it_map;       /* A simple allocation bitmap for now */
> +#ifdef CONFIG_IOMMU_API
> +	struct iommu_group *it_group;
> +#endif
>  };
>  
>  struct scatterlist;
> @@ -147,5 +150,11 @@ static inline void iommu_restore(void)
>  }
>  #endif
>  
> +extern long iommu_clear_tces(struct iommu_table *tbl, unsigned long entry,
> +		unsigned long pages);
> +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long entry,
> +		uint64_t tce, enum dma_data_direction direction,
> +		unsigned long pages);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_IOMMU_H */
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ff5a6ce..0646c50 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -44,6 +44,7 @@
>  #include <asm/kdump.h>
>  #include <asm/fadump.h>
>  #include <asm/vio.h>
> +#include <asm/tce.h>
>  
>  #define DBG(...)
>  
> @@ -856,3 +857,188 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size,
>  		free_pages((unsigned long)vaddr, get_order(size));
>  	}
>  }
> +
> +#ifdef CONFIG_IOMMU_API
> +/*
> + * SPAPR TCE API
> + */
> +
> +/*
> + * Returns the number of used IOMMU pages (4K) within
> + * the same system page (4K or 64K).
> + * bitmap_weight is not used as it does not support bigendian maps.
> + */
> +static int syspage_weight(unsigned long *map, unsigned long entry)
> +{
> +	int ret = 0, nbits = PAGE_SIZE/IOMMU_PAGE_SIZE;
> +
> +	/* Aligns TCE entry number to system page boundary */
> +	entry &= PAGE_MASK >> IOMMU_PAGE_SHIFT;
> +
> +	/* Count used 4K pages */
> +	while (nbits--)
> +		ret += (test_bit(entry++, map) == 0) ? 0 : 1;

Ok, entry is the iova page number.  So presumably it's relative to the
start of dma32_window_start since you're unlikely to have a bitmap that
covers all of memory.  I hadn't realized that previously.  Doesn't that
mean that it's actually impossible to create an ioctl based interface to
the dma64_window since we're not going to know which window is the
target?  I know you're not planning on one, but it seems limiting.  We
at least need some documentation here, but I'm wondering if iova
shouldn't be zero based so we can determine which window it hits.  Also,
now that I look at it, I can't find any range checking on the iova.
Thanks,

Alex

> +
> +	return ret;
> +}
> +
> +static void tce_flush(struct iommu_table *tbl)
> +{
> +	/* Flush/invalidate TLB caches if necessary */
> +	if (ppc_md.tce_flush)
> +		ppc_md.tce_flush(tbl);
> +
> +	/* Make sure updates are seen by hardware */
> +	mb();
> +}
> +
> +/*
> + * iommu_clear_tces clears tces and returned the number of system pages
> + * which it called put_page() on
> + */
> +static long clear_tces_nolock(struct iommu_table *tbl, unsigned long entry,
> +		unsigned long pages)
> +{
> +	int i, retpages = 0;
> +	unsigned long oldtce, oldweight;
> +	struct page *page;
> +
> +	for (i = 0; i < pages; ++i) {
> +		oldtce = ppc_md.tce_get(tbl, entry + i);
> +		ppc_md.tce_free(tbl, entry + i, 1);
> +
> +		oldweight = syspage_weight(tbl->it_map, entry);
> +		__clear_bit(entry, tbl->it_map);
> +
> +		if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +			continue;
> +
> +		page = pfn_to_page(oldtce >> PAGE_SHIFT);
> +
> +		WARN_ON(!page);
> +		if (!page)
> +			continue;
> +
> +		if (oldtce & TCE_PCI_WRITE)
> +			SetPageDirty(page);
> +
> +		put_page(page);
> +
> +		/* That was the last IOMMU page within the system page */
> +		if ((oldweight == 1) && !syspage_weight(tbl->it_map, entry))
> +			++retpages;
> +	}
> +
> +	return retpages;
> +}
> +
> +/*
> + * iommu_clear_tces clears tces and returned the number
> + / of released system pages
> + */
> +long iommu_clear_tces(struct iommu_table *tbl, unsigned long entry,
> +		unsigned long pages)
> +{
> +	int ret;
> +	struct iommu_pool *pool = get_pool(tbl, entry);
> +
> +	spin_lock(&(pool->lock));
> +	ret = clear_tces_nolock(tbl, entry, pages);
> +	tce_flush(tbl);
> +	spin_unlock(&(pool->lock));
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_clear_tces);
> +
> +static int put_tce(struct iommu_table *tbl, unsigned long entry,
> +		uint64_t tce, enum dma_data_direction direction)
> +{
> +	int ret;
> +	struct page *page = NULL;
> +	unsigned long kva, offset, oldweight;
> +
> +	/* Map new TCE */
> +	offset = (tce & IOMMU_PAGE_MASK) - (tce & PAGE_MASK);
> +	ret = get_user_pages_fast(tce & PAGE_MASK, 1,
> +			direction != DMA_TO_DEVICE, &page);
> +	if (ret < 1) {
> +		printk(KERN_ERR "tce_vfio: get_user_pages_fast failed tce=%llx ioba=%lx ret=%d\n",
> +				tce, entry << IOMMU_PAGE_SHIFT, ret);
> +		if (!ret || (ret > 1))
> +			ret = -EFAULT;
> +		return ret;
> +	}
> +
> +	kva = (unsigned long) page_address(page);
> +	kva += offset;
> +
> +	/* tce_build receives a virtual address */
> +	entry += tbl->it_offset; /* Offset into real TCE table */
> +	ret = ppc_md.tce_build(tbl, entry, 1, kva, direction, NULL);
> +
> +	/* tce_build() only returns non-zero for transient errors */
> +	if (unlikely(ret)) {
> +		printk(KERN_ERR "tce_vfio: tce_put failed on tce=%llx ioba=%lx kva=%lx ret=%d\n",
> +				tce, entry << IOMMU_PAGE_SHIFT, kva, ret);
> +		put_page(page);
> +		return -EIO;
> +	}
> +
> +	/* Calculate if new system page has been locked */
> +	oldweight = syspage_weight(tbl->it_map, entry);
> +	__set_bit(entry, tbl->it_map);
> +
> +	return (oldweight == 0) ? 1 : 0;
> +}
> +
> +/*
> + * iommu_put_tces builds tces and returned the number of actually
> + * locked system pages
> + */
> +long iommu_put_tces(struct iommu_table *tbl, unsigned long entry,
> +		uint64_t tce, enum dma_data_direction direction,
> +		unsigned long pages)
> +{
> +	int i, ret = 0, retpages = 0;
> +	struct iommu_pool *pool = get_pool(tbl, entry);
> +
> +	BUILD_BUG_ON(PAGE_SIZE < IOMMU_PAGE_SIZE);
> +	BUG_ON(direction == DMA_NONE);
> +
> +	spin_lock(&(pool->lock));
> +
> +	/* Check if any is in use */
> +	for (i = 0; i < pages; ++i) {
> +		unsigned long oldtce = ppc_md.tce_get(tbl, entry + i);
> +		if ((oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)) ||
> +				test_bit(entry + i, tbl->it_map)) {
> +			WARN_ON(test_bit(entry + i, tbl->it_map));
> +			spin_unlock(&(pool->lock));
> +			return -EBUSY;
> +		}
> +	}
> +
> +	/* Put tces to the table */
> +	for (i = 0; (i < pages) && (ret >= 0); ++i, tce += IOMMU_PAGE_SIZE) {
> +		ret = put_tce(tbl, entry + i, tce, direction);
> +		if (ret == 1)
> +			++retpages;
> +	}
> +
> +	/*
> +	 * If failed, release locked pages, otherwise return the number
> +	 * of locked system pages
> +	 */
> +	if (ret < 0)
> +		clear_tces_nolock(tbl, entry, i);
> +	else
> +		ret = retpages;
> +
> +	tce_flush(tbl);
> +	spin_unlock(&(pool->lock));
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_put_tces);
> +#endif /* CONFIG_IOMMU_API */
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 05205cf..21250ef 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -20,6 +20,7 @@
>  #include <linux/irq.h>
>  #include <linux/io.h>
>  #include <linux/msi.h>
> +#include <linux/iommu.h>
>  
>  #include <asm/sections.h>
>  #include <asm/io.h>
> @@ -613,3 +614,137 @@ void __init pnv_pci_init(void)
>  	ppc_md.teardown_msi_irqs = pnv_teardown_msi_irqs;
>  #endif
>  }
> +
> +#ifdef CONFIG_IOMMU_API
> +/*
> + * IOMMU groups support required by VFIO
> + */
> +static int add_device(struct device *dev)
> +{
> +	struct iommu_table *tbl;
> +	int ret = 0;
> +
> +	if (WARN_ON(dev->iommu_group)) {
> +		printk(KERN_WARNING "tce_vfio: device %s is already in iommu group %d, skipping\n",
> +				dev_name(dev),
> +				iommu_group_id(dev->iommu_group));
> +		return -EBUSY;
> +	}
> +
> +	tbl = get_iommu_table_base(dev);
> +	if (!tbl) {
> +		pr_debug("tce_vfio: skipping device %s with no tbl\n",
> +				dev_name(dev));
> +		return 0;
> +	}
> +
> +	pr_debug("tce_vfio: adding %s to iommu group %d\n",
> +			dev_name(dev), iommu_group_id(tbl->it_group));
> +
> +	ret = iommu_group_add_device(tbl->it_group, dev);
> +	if (ret < 0)
> +		printk(KERN_ERR "tce_vfio: %s has not been added, ret=%d\n",
> +				dev_name(dev), ret);
> +
> +	return ret;
> +}
> +
> +static void del_device(struct device *dev)
> +{
> +	iommu_group_remove_device(dev);
> +}
> +
> +static int iommu_bus_notifier(struct notifier_block *nb,
> +			      unsigned long action, void *data)
> +{
> +	struct device *dev = data;
> +
> +	switch (action) {
> +	case BUS_NOTIFY_ADD_DEVICE:
> +		return add_device(dev);
> +	case BUS_NOTIFY_DEL_DEVICE:
> +		del_device(dev);
> +		return 0;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static struct notifier_block tce_iommu_bus_nb = {
> +	.notifier_call = iommu_bus_notifier,
> +};
> +
> +static void group_release(void *iommu_data)
> +{
> +	struct iommu_table *tbl = iommu_data;
> +	tbl->it_group = NULL;
> +}
> +
> +static int __init tce_iommu_init(void)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct iommu_table *tbl;
> +	struct iommu_group *grp;
> +
> +	/* Allocate and initialize IOMMU groups */
> +	for_each_pci_dev(pdev) {
> +		tbl = get_iommu_table_base(&pdev->dev);
> +		if (!tbl)
> +			continue;
> +
> +		/* Skip already initialized */
> +		if (tbl->it_group)
> +			continue;
> +
> +		grp = iommu_group_alloc();
> +		if (IS_ERR(grp)) {
> +			printk(KERN_INFO "tce_vfio: cannot create "
> +					"new IOMMU group, ret=%ld\n",
> +					PTR_ERR(grp));
> +			return PTR_ERR(grp);
> +		}
> +		tbl->it_group = grp;
> +		iommu_group_set_iommudata(grp, tbl, group_release);
> +	}
> +
> +	bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
> +
> +	/* Add PCI devices to VFIO groups */
> +	for_each_pci_dev(pdev)
> +		add_device(&pdev->dev);
> +
> +	return 0;
> +}
> +
> +static void __exit tce_iommu_cleanup(void)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct iommu_table *tbl;
> +	struct iommu_group *grp = NULL;
> +
> +	bus_unregister_notifier(&pci_bus_type, &tce_iommu_bus_nb);
> +
> +	/* Delete PCI devices from VFIO groups */
> +	for_each_pci_dev(pdev)
> +		del_device(&pdev->dev);
> +
> +	/* Release VFIO groups */
> +	for_each_pci_dev(pdev) {
> +		tbl = get_iommu_table_base(&pdev->dev);
> +		if (!tbl)
> +			continue;
> +		grp = tbl->it_group;
> +
> +		/* Skip (already) uninitialized */
> +		if (!grp)
> +			continue;
> +
> +		/* Do actual release, group_release() is expected to work */
> +		iommu_group_put(grp);
> +		BUG_ON(tbl->it_group);
> +	}
> +}
> +
> +module_init(tce_iommu_init);
> +module_exit(tce_iommu_cleanup);
> +#endif /* CONFIG_IOMMU_API */
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 9f69b56..29d11dc 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -187,4 +187,12 @@ config EXYNOS_IOMMU_DEBUG
>  
>  	  Say N unless you need kernel log message for IOMMU debugging
>  
> +config SPAPR_TCE_IOMMU
> +	bool "sPAPR TCE IOMMU Support"
> +	depends on PPC_POWERNV
> +	select IOMMU_API
> +	help
> +	  Enables bits of IOMMU API required by VFIO. The iommu_ops is
> +	  still not implemented.
> +
>  endif # IOMMU_SUPPORT




^ permalink raw reply

* Re: [PATCH 12/12] VMCI: Some header and config files.
From: Andy King @ 2012-11-30 16:47 UTC (permalink / raw)
  To: Greg KH
  Cc: George Zhang, linux-kernel, virtualization, pv-drivers,
	Dmitry Torokhov
In-Reply-To: <20121127002357.GA27683@core.coreip.homeip.net>

I didn't get the resend either, so it seems our corporate mail really is
eating messages.  Lovely.

> > > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
> > 
> > I don't recall ever getting a valid answer for this (if you did, my
> > appologies, can you repeat it).  What in the world are you talking
> > about here?  Why is your driver somehow special from the thousands
> > of other ones that use the in-kernel IO macros properly for an
> > ioctl?

Because we're morons.  And unfortunately, we've shipped our product
using those broken definitions: our VMX uses them to talk to the driver.
So here's what we'd like to do.  We will send out a patch soon that
fixes the other issues you mention and also adds IOCTL definitions the
proper way using _IOBLAH().  But we'd also like to retain these broken
definitions for a short period, commented as such, at least until we
can get out a patch release to Workstation 9, at which point we can
remove them.  Does that sound reasonable?

Thanks!
- Andy

^ permalink raw reply

* Re: SPARC and OF_GPIO
From: David Miller @ 2012-11-30 16:46 UTC (permalink / raw)
  To: grant.likely; +Cc: thierry.reding, sparclinux, linus.walleij, linux-kernel
In-Reply-To: <20121130093520.631503E070C@localhost>

From: Grant Likely <grant.likely@secretlab.ca>
Date: Fri, 30 Nov 2012 09:35:20 +0000

> On non-sparc I've actually been moving in the direction of resolving
> resources at .probe time to make it easier to handle deferred probing.
> So if, for example, a device irq line is routed to a GPIO instead of the
> core interrupt controller, then the irq number won't be known until
> after the gpio driver .probe occurs. For addresses, this situation is
> unlikely, but for all the other kinds of resources (gpios, regs, clocks, irqs,
> etc) it is a problem that we're actually seeing.

Every interrupt in the device tree is resolvable with, at worst, very
small bus drivers, and that's what we pack into the generic sparc OF
device creation layer.

Actually much of it is generic and not bus type specific at all, and
is a simply mask and match into an interrupt routing table property.

^ permalink raw reply

* Re: [PATCH v2 RESEND] Add NumaChip remote PCI support
From: Bjorn Helgaas @ 2012-11-30 16:45 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: linux-pci, linux-kernel, Steffen Persvold
In-Reply-To: <50B84418.5070804@numascale-asia.com>

On Thu, Nov 29, 2012 at 10:28 PM, Daniel J Blueman
<daniel@numascale-asia.com> wrote:
> Hi Bjorn,
>
>
> On 29/11/2012 07:08, Bjorn Helgaas wrote:
>>
>> On Wed, Nov 21, 2012 at 1:39 AM, Daniel J Blueman
>> <daniel@numascale-asia.com> wrote:
>>>
>>> Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
>>> preventing access to AMD Northbridges which shouldn't respond.
>>>
>>> v2: Use PCI_DEVFN in precomputed constant limit; drop unneeded includes
>>>
>>> Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
>>> ---
>>>   arch/x86/include/asm/numachip/numachip.h |   20 +++++
>>>   arch/x86/kernel/apic/apic_numachip.c     |    2 +
>>>   arch/x86/pci/Makefile                    |    1 +
>>>   arch/x86/pci/numachip.c                  |  134
>>> ++++++++++++++++++++++++++++++
>>>   4 files changed, 157 insertions(+)
>>>   create mode 100644 arch/x86/include/asm/numachip/numachip.h
>>>   create mode 100644 arch/x86/pci/numachip.c
>>>
>>> diff --git a/arch/x86/include/asm/numachip/numachip.h
>>> b/arch/x86/include/asm/numachip/numachip.h
>>> new file mode 100644
>>> index 0000000..d35e71a
>>> --- /dev/null
>>> +++ b/arch/x86/include/asm/numachip/numachip.h
>>> @@ -0,0 +1,20 @@
>>> +/*
>>> + * This file is subject to the terms and conditions of the GNU General
>>> Public
>>> + * License.  See the file "COPYING" in the main directory of this
>>> archive
>>> + * for more details.
>>> + *
>>> + * Numascale NumaConnect-specific header file
>>> + *
>>> + * Copyright (C) 2012 Numascale AS. All rights reserved.
>>> + *
>>> + * Send feedback to <support@numascale.com>
>>> + *
>>> + */
>>> +
>>> +#ifndef _ASM_X86_NUMACHIP_NUMACHIP_H
>>> +#define _ASM_X86_NUMACHIP_NUMACHIP_H
>>> +
>>> +extern int __init pci_numachip_init(void);
>>> +
>>> +#endif /* _ASM_X86_NUMACHIP_NUMACHIP_H */
>>> +
>>> diff --git a/arch/x86/kernel/apic/apic_numachip.c
>>> b/arch/x86/kernel/apic/apic_numachip.c
>>> index a65829a..9c2aa89 100644
>>> --- a/arch/x86/kernel/apic/apic_numachip.c
>>> +++ b/arch/x86/kernel/apic/apic_numachip.c
>>> @@ -22,6 +22,7 @@
>>>   #include <linux/hardirq.h>
>>>   #include <linux/delay.h>
>>>
>>> +#include <asm/numachip/numachip.h>
>>>   #include <asm/numachip/numachip_csr.h>
>>>   #include <asm/smp.h>
>>>   #include <asm/apic.h>
>>> @@ -179,6 +180,7 @@ static int __init numachip_system_init(void)
>>>                  return 0;
>>>
>>>          x86_cpuinit.fixup_cpu_id = fixup_cpu_id;
>>> +       x86_init.pci.arch_init = pci_numachip_init;
>>>
>>>          map_csrs();
>>>
>>> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
>>> index 3af5a1e..ee0af58 100644
>>> --- a/arch/x86/pci/Makefile
>>> +++ b/arch/x86/pci/Makefile
>>> @@ -16,6 +16,7 @@ obj-$(CONFIG_STA2X11)           += sta2x11-fixup.o
>>>   obj-$(CONFIG_X86_VISWS)                += visws.o
>>>
>>>   obj-$(CONFIG_X86_NUMAQ)                += numaq_32.o
>>> +obj-$(CONFIG_X86_NUMACHIP)     += numachip.o
>>
>>
>> It looks like this depends on CONFIG_PCI_MMCONFIG for
>> pci_mmconfig_lookup().  Are there config constraints that force
>> CONFIG_PCI_MMCONFIG=y when CONFIG_X86_NUMACHIP=y?
>
>
> I'll revise the patch with this constraint after we work out the best
> approach for below.
>
>
>>>   obj-$(CONFIG_X86_INTEL_MID)    += mrst.o
>>>
>>> diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c
>>> new file mode 100644
>>> index 0000000..3773e05
>>> --- /dev/null
>>> +++ b/arch/x86/pci/numachip.c
>>> @@ -0,0 +1,129 @@
>>> +/*
>>> + * This file is subject to the terms and conditions of the GNU General
>>> Public
>>> + * License.  See the file "COPYING" in the main directory of this
>>> archive
>>> + * for more details.
>>> + *
>>> + * Numascale NumaConnect-specific PCI code
>>> + *
>>> + * Copyright (C) 2012 Numascale AS. All rights reserved.
>>> + *
>>> + * Send feedback to <support@numascale.com>
>>> + *
>>> + * PCI accessor functions derived from mmconfig_64.c
>>> + *
>>> + */
>>> +
>>> +#include <linux/pci.h>
>>> +#include <asm/pci_x86.h>
>>> +
>>> +static u8 limit __read_mostly;
>>> +
>>> +static inline char __iomem *pci_dev_base(unsigned int seg, unsigned int
>>> bus, unsigned int devfn)
>>> +{
>>> +       struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
>>> +
>>> +       if (cfg && cfg->virt)
>>> +               return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn <<
>>> 12));
>>> +       return NULL;
>>> +}
>>
>>
>> Most of this file is copied directly from mmconfig_64.c (as you
>> mentioned above).  I wonder if we could avoid the code duplication by
>> making the pci_dev_base() implementation in mmconfig_64.c a weak
>> definition.  Then you could just supply a non-weak pci_dev_base() here
>> that would override that default version.  Your version would look
>> something like:
>>
>>    char __iomem *pci_dev_base(unsigned int seg, unsigned int bus,
>> unsigned int devfn)
>>    {
>>        struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
>>
>>        if (cfg && cfg->virt && devfn < limit)
>>            return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
>>        return NULL;
>>    }
>>
>> That would be different from what you have in this patch because reads
>> & writes to devices above "limit" would return -EINVAL rather than 0
>> as you do here.  Would that be a problem?
>
>
> That would work nicely (pointer lookup and inlining etc aside) if there was
> the runtime ability to override pci_dev_base only if the NumaChip signature
> was detected.
>
> We could expose pci_dev_base via struct x86_init_pci; the extra complexity
> and performance tradeoff may not be worth it for a single case perhaps?

Oh, right, I forgot that you can't decide this at build-time.  This is
PCI config access, which is not a performance path, so I'm not really
concerned about it from that angle, but you make a good point about
the complexity.

The reason I'm interested in this is because MMCONFIG is a generic
PCIe feature but is currently done via several arch-specific
implementations, so I'm starting to think about how we can make parts
of it more generic.  From that perspective, it's nicer to parameterize
an existing implementation than to clone it because it makes
refactoring opportunities more obvious.

Backing up a bit, I'm curious about exactly why you need to check for
the limit to begin with.  The comment says "Ensure AMD Northbridges
don't decode reads to other devices," but that doesn't seem strictly
accurate.  You're not changing anything in the hardware to prevent it
from *decoding* a read, so it seems like you're actually just
preventing the read in the first place.

What happens without the limit check?  Do you get a response timeout
and a machine check?  Read from the wrong device?

As far as I can tell, you still describe your MMCONFIG area with an
MCFG table (since you use pci_mmconfig_lookup() to find the region).
That table only includes the starting and ending bus numbers, so the
assumption is that the MMCONFIG space is valid for every possible
device on those buses.  So it seems like your system is not really
compatible with the spec here.

Because the MCFG table can't describe finer granularity than start/end
bus numbers, we manage MMCONFIG regions as (segment, start_bus,
end_bus, address) tuples.  Maybe if we tracked it with slightly finer
granularity, e.g., (segment, start_bus, end_bus, end_bus_device,
address), you could have some sort of MCFG-parsing quirk that reduces
the size of the MMCONFIG region you register for bus 0.

Just brainstorming here; it's not obvious to me yet what the best solution is.

Bjorn

^ permalink raw reply

* Re: [PATCH v2] Do a proper locking for mmap and block size change
From: Linus Torvalds @ 2012-11-30 16:42 UTC (permalink / raw)
  To: Chris Mason, Dave Chinner, Linus Torvalds, Chris Mason,
	Mikulas Patocka, Al Viro, Jens Axboe, Jeff Chua, Lai Jiangshan,
	Jan Kara, lkml, linux-fsdevel
In-Reply-To: <20121130143110.GD11004@shiny.int.fusionio.com>

On Fri, Nov 30, 2012 at 6:31 AM, Chris Mason <chris.mason@fusionio.com> wrote:
> On Thu, Nov 29, 2012 at 07:49:10PM -0700, Dave Chinner wrote:
>>
>> Same with mpage_readpages(), so it's not just direct IO that has
>> this problem....
>
> I guess the good news is that block devices don't have readpages.  The
> bad news would be that we can't put readpages in without much bigger
> changes.

Well, the new block-dev branch no longer cares. It basically says "ok,
we use inode->i_blkbits, but for raw device accesses we know it's
unstable, so we'll just not use it".

So both mpage_readpages and direct-IO should be happy. And it actually
removed redundant code, so it's all good.

                 Linus

^ permalink raw reply

* Re: [PATCH] perf tools: fix build for various architectures
From: Mark Rutland @ 2012-11-30 16:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel@vger.kernel.org, David Howells,
	Deng-Cheng Zhu, Ingo Molnar, Kyle McMartin, Martin Schwidefsky,
	Paul Mackerras, Peter Zijlstra, Tony Luck, Will Deacon
In-Reply-To: <20121127134116.GC18340@ghostprotocols.net>

On Tue, Nov 27, 2012 at 01:41:16PM +0000, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 27, 2012 at 12:16:31PM +0000, Mark Rutland escreveu:
> > The UAPI changes broke the perf tool, and as of 3.7-rc7, it
> > still won't build for arm:
> > 
> > 	util/../../../arch/arm/include/asm/unistd.h:16:29: fatal error: uapi/asm/unistd.h: No such file or directory
> > 	compilation terminated.
>  
> > I've tested this on arm, but I don't have the necessary toolchains to
> > check the other cases.
> 
> Can you try with my perf/urgent branch?
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/urgent

It builds and runs fine for me on my A9x4 coretile.

> I tested it with raspbian on a raspberry pi system and also with a cross
> compiler on a x86_64 workstation.
> 
> I already sent the pull request to Ingo, that should process it and push
> to Linus soon.

Let's hope it gets merged before v3.7 is tagged.

Thanks,
Mark


^ permalink raw reply

* Re: Results for balancenuma v8, autonuma-v28fast and numacore-20121126
From: Rik van Riel @ 2012-11-30 16:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Peter Zijlstra, Andrea Arcangeli, Ingo Molnar, Johannes Weiner,
	Hugh Dickins, Thomas Gleixner, Paul Turner, Hillf Danton,
	Lee Schermerhorn, Alex Shi, Srikar Dronamraju, Aneesh Kumar,
	Linus Torvalds, Andrew Morton, Linux-MM, LKML
In-Reply-To: <20121130114145.GD20087@suse.de>

On 11/30/2012 06:41 AM, Mel Gorman wrote:
> This is an another insanely long mail. Short summary, based on the results
> of what is in tip/master right now, I think if we're going to merge
> anything for v3.8 it should be the "Automatic NUMA Balancing V8". It does
> reasonably well for many of the workloads and AFAIK there is no reason why
> numacore or autonuma could not be rebased on top with the view to merging
> proper scheduling and placement policies in 3.9.

Given how minimalistic balancenuma is, and how there does not seem
to be anything significant in the way of performance regressions
with balancenuma, I have no objections to Linus merging all of
balancenuma for 3.8.

That could significantly reduce the amount of NUMA code we need
to "fight over" for the 3.9 kernel :)

-- 
All rights reversed

^ permalink raw reply

* Re: [PATCH v2] Do a proper locking for mmap and block size change
From: Christoph Hellwig @ 2012-11-30 16:36 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Chris Mason, Chris Mason, Mikulas Patocka,
	Al Viro, Jens Axboe, Jeff Chua, Lai Jiangshan, Jan Kara, lkml,
	linux-fsdevel
In-Reply-To: <20121130024910.GF6434@dastard>

On Fri, Nov 30, 2012 at 01:49:10PM +1100, Dave Chinner wrote:
> > Ugh. That's a big violation of how buffer-heads are supposed to work:
> > the block number is very much defined to be in multiples of b_size
> > (see for example "submit_bh()" that turns it into a sector number).
> > 
> > But you're right. The direct-IO code really *is* violating that, and
> > knows that get_block() ends up being defined in i_blkbits regardless
> > of b_size.
> 
> Same with mpage_readpages(), so it's not just direct IO that has
> this problem....

The mpage code may actually fall back to BHs.

I have a version of the direct I/O code that uses the iomap_ops from the
multi-page write code that you originally started.  It uses the new op
as primary interface for direct I/O and provides a helper for
filesystems that still use buffer heads internally.  I'll try to dust it
off and send out a version for the current kernel.


^ permalink raw reply

* Re: [PATCH v2 2/2] New driver: Xillybus generic interface for FPGA (programmable logic)
From: Greg KH @ 2012-11-30 16:35 UTC (permalink / raw)
  To: Eli Billauer; +Cc: linux-kernel, arnd
In-Reply-To: <50B8D069.8070608@gmail.com>

On Fri, Nov 30, 2012 at 05:27:37PM +0200, Eli Billauer wrote:
> I made changes in the code as a response to almost all your comments
> to my best understanding.
> 
> I also sent a separate mail responding to a couple of issues, which
> seems not have reached you.

I now see it, sorry for the grumpy pre-coffee response I made earlier.

> But to put it short:
> 
> * The pci_ids: I wasn't sure if I should remove only my own product
> ID or all the vendor IDs, and this is clear now.

Good.

> * As for the documentation, I don't quite understand what I should
> add. There's a rather extensive documentation for download at the
> site. The docs for the host side mostly instruct common UNIX
> programming techniques: The device files are just data pipes to
> FIFOs in the FPGA, behaving like one would expect.

You need to document the user/kernel API that you have now created for
this driver, either in the Documentation directory, or in the very
least, the driver submission changelog entry.  Otherwise we don't know
where to look, nor if those docs are even correct anymore.

> * As for the special class issue: When Xillybus is used, the whole
> system's mission is usually around it (e.g. it's a computer doing
> data acquisition through the Xillybus pipes). So giving it a high
> profile makes sense, I believe. Besides, a dozen of device files are
> not rare. Needless to say, I'm not going to insist on this.

Good, please use misc device, for the reasons I suggested in my other
email.

> Other than that, it's all changes in the code. It's a major change
> there.

I don't understand what these sentances mean.

greg k-h

^ permalink raw reply

* Re: [PATCH 2/2] New driver: Xillybus generic interface for FPGA (programmable logic)
From: Greg KH @ 2012-11-30 16:32 UTC (permalink / raw)
  To: Eli Billauer; +Cc: linux-kernel, arnd
In-Reply-To: <50B8C7BF.4000004@gmail.com>

On Fri, Nov 30, 2012 at 04:50:39PM +0200, Eli Billauer wrote:
> Thanks for the remarks.
> 
> I'm sending the updated patches in a minute. Basically, I divided
> the module into three (one core, one for PCIe and one for OF) and
> made several corrections.
> 
> On 11/28/2012 06:57 PM, Greg KH wrote:
> >What is the user/kernel interface for this driver?  Is it documented
> >anywhere?
> There's a rather extensive documentation for download at the site.
> The docs for the host side mostly instruct common UNIX programming
> techniques: The device files are just data pipes to FIFOs in the
> FPGA, behaving like one would expect.

As we need to review the user/kernel api here, putting the docs as part
of the driver submission is a good idea :)

I didn't know, nor do I trust, that a random web site would have the
correct documentation for a kernel driver.

> >>+#if (PAGE_SIZE<  4096)
> >>+#error Your processor architecture has a page size smaller than 4096
> >>+#endif
> >That can never happen.  Even if it does, you don't care about that in
> >the driver.
> >
> I removed this check because it can't happen. But the driver *does*
> care about this, since it creates a lot of buffers with different
> alignments, hence depending on the pages' alignment.

Alignment is different than the size of a page.  What happens if your
driver runs on a machine with a page size bigger than 4K?  You need to
be able to handle that properly, so perhaps you should check that?

> >>+static struct class *xillybus_class;
> >Why not just use the misc interface instead of your own class?
> When Xillybus is used, the whole system's mission is usually around
> it (e.g. it's a computer doing data acquisition through the Xillybus
> pipes). So giving it a high profile makes sense, I believe. Besides,
> a dozen of device files are not rare.

It is no problem to create dozens of misc devices.  It makes your driver
smaller, contain less code that I have to audit and you have to ensure
you got right, and it removes another user of 'struct class' which we
are trying to get rid of anyway.  So please, move to use a misc device.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] add hardware I2C support for ARM IMX23
From: sander van ginkel @ 2012-11-30 16:32 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linux-kernel, kernel, shawn.guo, linux-arm-kernel, linux-i2c
In-Reply-To: <20121128170907.GC4659@pengutronix.de>

> You might want to try my for-next branch or wait for 3.8-rc1, maybe
the
> DMA termination patch helps? Also enabling I2C debug messages is
> probably worth it.
>
> Regards,
>
>    Wolfram
>
> --
> Pengutronix e.K.                           | Wolfram Sang
      |
> Industrial Linux Solutions                 |
http://www.pengutronix.de/  |
>

I've tested your git tree, with the same config as I used for the  
3.7.0-rc6 release (IMX i2c build in)
It's not a 100% fix

If I do "i2cdetect -y -r 0" with 3.7.0-rc6, i2cdetect gets confused  
and the kernel keeps dumping "mxs-i2c 80058000.i2c: Failed to get PIO  
reg. write descriptor."
If I do the same with your git release I get the same result, but only  
when i2cdetect is running. When it's finished or I terminate, the  
kprint will also disappear.
So there is some improvement but not 100% yet.

^ permalink raw reply

* Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked
From: azurIt @ 2012-11-30 16:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, cgroups mailinglist, KAMEZAWA Hiroyuki,
	Johannes Weiner
In-Reply-To: <20121130161923.GN29317@dhcp22.suse.cz>

>Could you also post your complete containers configuration, maybe there
>is something strange in there (basically grep . -r YOUR_CGROUP_MNT
>except for tasks files which are of no use right now).


Here it is:
http://www.watchdog.sk/lkml/cgroups.gz

^ permalink raw reply

* Re: [Suggestion] drivers/tty: drivers/char/:  for MAX_ASYNC_BUFFER_SIZE
From: Paul Fulghum @ 2012-11-30 16:24 UTC (permalink / raw)
  To: Chen Gang; +Cc: Greg KH, linux-kernel@vger.kernel.org, linux-serial, Alan Cox
In-Reply-To: <50B81F76.8020508@asianux.com>

On 11/29/2012 8:52 PM, Chen Gang wrote:
> 于 2012年11月30日 02:32, Greg KH 写道:
>> On Thu, Nov 29, 2012 at 01:57:59PM +0800, Chen Gang wrote:
>>>> And, I really don't understand here, why do you want to change this?
>>>> What is it going to change?  And why?
>>>
>>> Why:
>>>   for the context MGSLPC_INFO *info in drivers/char/pcmcia/synclink_cs.c
>>>     info->max_frame_size can be the value between 4096 .. 65535 (can be
>>> set by its module input parameter)
>>>     info->flag_buf length is 4096 (MAX_ASYNC_BUFFER_SIZE)
>>>   in function rx_get_frame
>>>     the framesize is limit by info->max_frame_size, but may still be
>>> larger that 4096.
>>>     when call function ldisc_receive_buf, info->flag_buf is equal to
>>> 4096, but framesize can be more than 4096. it will cause memory over flow.

The confusion centers on calling the line discipline receive_buf
function with a data buffer larger than the flag buffer.

The synclink drivers support asynchronous and synchronous (HDLC)
serial communications.

In asynchronous mode, the tty flip buffer is used to feed
data to the line discipline. In this mode, the above argument
does not apply. The receive_buf function is not called directly.

In synchronous mode, the driver calls the line discipline
receive_buf function directly to feed one HDLC frame
of data per call. Maintaining frame boundaries is needed
in this mode. This is done only with the N_HDLC line
discipline which expects this format and ignores the flag buffer.
The flag buffer passed is just a place holder to meet the
calling conventions of the line discipline receive_buf function.

The only danger is if:
1. driver is configured for synchronous mode
2. driver is configured for frames > 4K
3. line discipline other than N_HDLC is selected

In this case the line discipline might try to access
beyond the end of the flag buffer. This is a non-functional
configuration that would not occur on purpose.

Increasing the flag buffer size would prevent a problem
in this degenerate case of purposeful misconfiguration.
This would be at the expense of larger allocations that are
not used.

I think the correct fix is for me to change the direct
calls to pass the same buffer for both data and flag and
add a comment describing the fact the flag buffer is ignored
when using N_HDLC. That way a misconfigured setup won't
cause problems and no unneeded allocations are made.

My suggestion is to leave it as is for now until I can make
those changes. I admit the current code is ugly enough to
cause confusion (sorry Chen Gang), but I don't see any immediate danger.

-- 
Paul Fulghum
MicroGate Systems, Ltd.
=Customer Driven, by Design=
(800)444-1982 (US Sales)
(512)345-7791 x102 (Direct)
(512)343-9046 (Fax)
Central Time Zone (GMT -6h)
www.microgate.com

^ permalink raw reply

* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: David Woodhouse @ 2012-11-30 16:23 UTC (permalink / raw)
  To: Krzysztof Mazur
  Cc: Chas Williams (CONTRACTOR), David Laight, davem, netdev,
	linux-kernel, nathan
In-Reply-To: <1354277415.21562.284.camel@shinybook.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 5095 bytes --]

On Fri, 2012-11-30 at 12:10 +0000, David Woodhouse wrote:
> In that case I think we're fine. I'll just do the same thing in
> br2684_push(), fix up the comment you just corrected, and we're all
> good.

OK, here's an update to me my patch 8/17 'br2684: don't send frames on
not-ready vcc'. It takes the socket lock and does fairly much the same
thing as your pppoatm version. It returns NETDEV_TX_BUSY and stops the
queue if the socket is locked, and it gets woken from the ->release_cb
callback.

I've dropped your Acked-By: since it's mostly new, but feel free to give
me a fresh one. With this I think we're done.

Unless Chas has any objections, I'll ask Dave to pull it...


From 47d5ad4c98452bcddfd00da1c659dac85202f213 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw2@infradead.org>
Date: Tue, 27 Nov 2012 23:28:36 +0000
Subject: [PATCH] br2684: don't send frames on not-ready vcc

Avoid submitting packets to a vcc which is being closed. Things go badly
wrong when the ->pop method gets later called after everything's been
torn down.

Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
which clears the ATM_VF_READY bit under the same lock. Otherwise, we
could end up submitting a packet to the device driver even after its
->ops->close method has been called. And it could call the vcc's ->pop
method after the protocol has been shut down. Which leads to a panic.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 net/atm/br2684.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 8eb6fbe..5ff145f 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -68,6 +68,7 @@ struct br2684_vcc {
 	/* keep old push, pop functions for chaining */
 	void (*old_push)(struct atm_vcc *vcc, struct sk_buff *skb);
 	void (*old_pop)(struct atm_vcc *vcc, struct sk_buff *skb);
+	void (*old_release_cb)(struct atm_vcc *vcc);
 	enum br2684_encaps encaps;
 	struct list_head brvccs;
 #ifdef CONFIG_ATM_BR2684_IPFILTER
@@ -269,6 +270,22 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 	return !atmvcc->send(atmvcc, skb);
 }
 
+static void br2684_release_cb(struct atm_vcc *atmvcc)
+{
+	struct br2684_vcc *brvcc = BR2684_VCC(atmvcc);
+
+	/*
+	 * A race with br2684_xmit_vcc() might cause a spurious wakeup just
+	 * after that function *stops* the queue, and qspace might actually
+	 * go negative before the queue stops again. We cope with that.
+	 */
+	if (atomic_read(&brvcc->qspace) > 0)
+		netif_wake_queue(brvcc->device);
+
+	if (brvcc->old_release_cb)
+		brvcc->old_release_cb(atmvcc);
+}
+
 static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb,
 						   const struct br2684_dev *brdev)
 {
@@ -280,6 +297,8 @@ static netdev_tx_t br2684_start_xmit(struct sk_buff *skb,
 {
 	struct br2684_dev *brdev = BRPRIV(dev);
 	struct br2684_vcc *brvcc;
+	struct atm_vcc *atmvcc;
+	netdev_tx_t ret = NETDEV_TX_OK;
 
 	pr_debug("skb_dst(skb)=%p\n", skb_dst(skb));
 	read_lock(&devs_lock);
@@ -290,9 +309,26 @@ static netdev_tx_t br2684_start_xmit(struct sk_buff *skb,
 		dev->stats.tx_carrier_errors++;
 		/* netif_stop_queue(dev); */
 		dev_kfree_skb(skb);
-		read_unlock(&devs_lock);
-		return NETDEV_TX_OK;
+		goto out_devs;
+	}
+	atmvcc = brvcc->atmvcc;
+
+	bh_lock_sock(sk_atm(atmvcc));
+
+	if (test_bit(ATM_VF_RELEASED, &atmvcc->flags) ||
+	    test_bit(ATM_VF_CLOSE, &atmvcc->flags) ||
+	    !test_bit(ATM_VF_READY, &atmvcc->flags)) {
+		dev->stats.tx_dropped++;
+		dev_kfree_skb(skb);
+		goto out;
 	}
+
+	if (sock_owned_by_user(sk_atm(atmvcc))) {
+		netif_stop_queue(brvcc->device);
+		ret = NETDEV_TX_BUSY;
+		goto out;
+	}
+
 	if (!br2684_xmit_vcc(skb, dev, brvcc)) {
 		/*
 		 * We should probably use netif_*_queue() here, but that
@@ -304,8 +340,11 @@ static netdev_tx_t br2684_start_xmit(struct sk_buff *skb,
 		dev->stats.tx_errors++;
 		dev->stats.tx_fifo_errors++;
 	}
+ out:
+	bh_unlock_sock(sk_atm(atmvcc));
+ out_devs:
 	read_unlock(&devs_lock);
-	return NETDEV_TX_OK;
+	return ret;
 }
 
 /*
@@ -378,6 +417,7 @@ static void br2684_close_vcc(struct br2684_vcc *brvcc)
 	list_del(&brvcc->brvccs);
 	write_unlock_irq(&devs_lock);
 	brvcc->atmvcc->user_back = NULL;	/* what about vcc->recvq ??? */
+	brvcc->atmvcc->release_cb = brvcc->old_release_cb;
 	brvcc->old_push(brvcc->atmvcc, NULL);	/* pass on the bad news */
 	kfree(brvcc);
 	module_put(THIS_MODULE);
@@ -554,9 +594,11 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 	brvcc->encaps = (enum br2684_encaps)be.encaps;
 	brvcc->old_push = atmvcc->push;
 	brvcc->old_pop = atmvcc->pop;
+	brvcc->old_release_cb = atmvcc->release_cb;
 	barrier();
 	atmvcc->push = br2684_push;
 	atmvcc->pop = br2684_pop;
+	atmvcc->release_cb = br2684_release_cb;
 
 	/* initialize netdev carrier state */
 	if (atmvcc->dev->signal == ATM_PHY_SIG_LOST)
-- 
1.8.0





-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* Re: [PATCH 2/2] ring-buffer: Fix race between integrity check and readers
From: Steven Rostedt @ 2012-11-30 16:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker
In-Reply-To: <20121130161334.164485242@goodmis.org>

On Fri, 2012-11-30 at 11:12 -0500, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> The function rb_check_pages() was added to make sure the ring buffer's
> pages were sane. This check is done when the ring buffer size is modified
> as well as when the iterator is released (closing the "trace" file),
> as that was considered a non fast path and a good place to do a sanity
> check.
> 
> The problem is that the check does not have any locks around it.
> If one process were to read the trace file, and another were to read
> the raw binary file, the check could happen while the reader is reading
> the file.
> 
> The issues with this is that the check requires to clear the HEAD page
> before doing the full check and it restores it afterward. But readers
> require the HEAD page to exist before it can read the buffer, otherwise
> it gives a nasty warning and disables the buffer.
> 
> By adding the reader lock around the check, this keeps the race from
> happening.
> 
> Cc: stable@vger.kernel.org # 3.6

Again, quilt failed to Cc stable :-(

-- Steve

> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/ring_buffer.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index ec01803..4cb5e51 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -3783,12 +3783,17 @@ void
>  ring_buffer_read_finish(struct ring_buffer_iter *iter)
>  {
>  	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
> +	unsigned long flags;
>  
>  	/*
>  	 * Ring buffer is disabled from recording, here's a good place
> -	 * to check the integrity of the ring buffer. 
> +	 * to check the integrity of the ring buffer.
> +	 * Must prevent readers from trying to read, as the check
> +	 * clears the HEAD page and readers require it.
>  	 */
> +	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>  	rb_check_pages(cpu_buffer);
> +	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>  
>  	atomic_dec(&cpu_buffer->record_disabled);
>  	atomic_dec(&cpu_buffer->buffer->resize_disabled);



^ permalink raw reply

* Re: [PATCH 1/2] ring-buffer: Fix NULL pointer if rb_set_head_page() fails
From: Steven Rostedt @ 2012-11-30 16:20 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker
In-Reply-To: <20121130161333.983378567@goodmis.org>

On Fri, 2012-11-30 at 11:12 -0500, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> The function rb_set_head_page() searches the list of ring buffer
> pages for a the page that has the HEAD page flag set. If it does
> not find it, it will do a WARN_ON(), disable the ring buffer and
> return NULL, as this should never happen.
> 
> But if this bug happens to happen, not all callers of this function
> can handle a NULL pointer being returned from it. That needs to be
> fixed.
> 
> Cc: stable@vger.kernel.org # 3.0+

Hmm, quilt didn't Cc. Grumble, I think a system update of quilt removed
my modification to not have quilt get confused by the hash symbol :-(

-- Steve

> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/ring_buffer.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index b979426..ec01803 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1396,6 +1396,8 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
>  		struct list_head *head_page_with_bit;
>  
>  		head_page = &rb_set_head_page(cpu_buffer)->list;
> +		if (!head_page)
> +			break;
>  		prev_page = head_page->prev;
>  
>  		first_page = pages->next;
> @@ -2934,7 +2936,7 @@ unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
>  	unsigned long flags;
>  	struct ring_buffer_per_cpu *cpu_buffer;
>  	struct buffer_page *bpage;
> -	unsigned long ret;
> +	unsigned long ret = 0;
>  
>  	if (!cpumask_test_cpu(cpu, buffer->cpumask))
>  		return 0;
> @@ -2949,7 +2951,8 @@ unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
>  		bpage = cpu_buffer->reader_page;
>  	else
>  		bpage = rb_set_head_page(cpu_buffer);
> -	ret = bpage->page->time_stamp;
> +	if (bpage)
> +		ret = bpage->page->time_stamp;
>  	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>  
>  	return ret;
> @@ -3260,6 +3263,8 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	 * Splice the empty reader page into the list around the head.
>  	 */
>  	reader = rb_set_head_page(cpu_buffer);
> +	if (!reader)
> +		goto out;
>  	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
>  	cpu_buffer->reader_page->list.prev = reader->list.prev;
>  



^ permalink raw reply

* Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked
From: Michal Hocko @ 2012-11-30 16:19 UTC (permalink / raw)
  To: azurIt
  Cc: linux-kernel, linux-mm, cgroups mailinglist, KAMEZAWA Hiroyuki,
	Johannes Weiner
In-Reply-To: <20121130165937.F9564EBE@pobox.sk>

On Fri 30-11-12 16:59:37, azurIt wrote:
> >> Here is the full boot log:
> >> www.watchdog.sk/lkml/kern.log
> >
> >The log is not complete. Could you paste the comple dmesg output? Or
> >even better, do you have logs from the previous run?
> 
> 
> What is missing there? All kernel messages are logging into
> /var/log/kern.log (it's the same as dmesg), dmesg itself was already
> rewrited by other messages. I think it's all what that kernel printed.

Early boot messages are missing - so exactly the BIOS memory map I was
asking for. As the NUMA has been excluded it is probably not that
relevant anymore.
The important question is why you see VM_FAULT_OOM and whether memcg
charging failure can trigger that. I don not see how this could happen
right now because __GFP_NORETRY is not used for user pages (except for
THP which disable memcg OOM already), file backed page faults (aka
__do_fault) use mem_cgroup_newpage_charge which doesn't disable OOM.
This is a real head scratcher.

Could you also post your complete containers configuration, maybe there
is something strange in there (basically grep . -r YOUR_CGROUP_MNT
except for tasks files which are of no use right now).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* [v3.2-v3.4 stable version][PATCH 1/2] ring-buffer: Fix NULL pointer if rb_set_head_page() fails
From: Steven Rostedt @ 2012-11-30 16:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker
In-Reply-To: <20121130161333.983378567@goodmis.org>

The function rb_set_head_page() searches the list of ring buffer
pages for a the page that has the HEAD page flag set. If it does
not find it, it will do a WARN_ON(), disable the ring buffer and
return NULL, as this should never happen.

But if this bug happens to happen, not all callers of this function
can handle a NULL pointer being returned from it. That needs to be
fixed.

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

Index: linux-test.git/kernel/trace/ring_buffer.c
===================================================================
--- linux-test.git.orig/kernel/trace/ring_buffer.c
+++ linux-test.git/kernel/trace/ring_buffer.c
@@ -2683,7 +2683,7 @@ unsigned long ring_buffer_oldest_event_t
 	unsigned long flags;
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
-	unsigned long ret;
+	unsigned long ret = 0;
 
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return 0;
@@ -2698,7 +2698,8 @@ unsigned long ring_buffer_oldest_event_t
 		bpage = cpu_buffer->reader_page;
 	else
 		bpage = rb_set_head_page(cpu_buffer);
-	ret = bpage->page->time_stamp;
+	if (bpage)
+		ret = bpage->page->time_stamp;
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	return ret;
@@ -3005,6 +3006,8 @@ rb_get_reader_page(struct ring_buffer_pe
 	 * Splice the empty reader page into the list around the head.
 	 */
 	reader = rb_set_head_page(cpu_buffer);
+	if (!reader)
+		goto out;
 	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
 	cpu_buffer->reader_page->list.prev = reader->list.prev;
 



^ permalink raw reply

* Re: [PATCH] perf tools: fix build for various architectures
From: Arnaldo Carvalho de Melo @ 2012-11-27 13:41 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ingo Molnar, linux-kernel, David Howells, Deng-Cheng Zhu,
	Ingo Molnar, Kyle McMartin, Martin Schwidefsky, Paul Mackerras,
	Peter Zijlstra, Tony Luck, Will Deacon
In-Reply-To: <1354018591-26656-1-git-send-email-mark.rutland@arm.com>

Em Tue, Nov 27, 2012 at 12:16:31PM +0000, Mark Rutland escreveu:
> The UAPI changes broke the perf tool, and as of 3.7-rc7, it
> still won't build for arm:
> 
> 	util/../../../arch/arm/include/asm/unistd.h:16:29: fatal error: uapi/asm/unistd.h: No such file or directory
> 	compilation terminated.
 
> I've tested this on arm, but I don't have the necessary toolchains to
> check the other cases.

Can you try with my perf/urgent branch?

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/urgent

I tested it with raspbian on a raspberry pi system and also with a cross
compiler on a x86_64 workstation.

I already sent the pull request to Ingo, that should process it and push
to Linus soon.

- Arnaldo

^ permalink raw reply

* [v3.0 stable version][PATCH 1/2] ring-buffer: Fix NULL pointer if rb_set_head_page() fails
From: Steven Rostedt @ 2012-11-30 16:16 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker
In-Reply-To: <20121130161333.983378567@goodmis.org>

The function rb_set_head_page() searches the list of ring buffer
pages for a the page that has the HEAD page flag set. If it does
not find it, it will do a WARN_ON(), disable the ring buffer and
return NULL, as this should never happen.

But if this bug happens to happen, not all callers of this function
can handle a NULL pointer being returned from it. That needs to be
fixed.

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

Index: linux-test.git/kernel/trace/ring_buffer.c
===================================================================
--- linux-test.git.orig/kernel/trace/ring_buffer.c
+++ linux-test.git/kernel/trace/ring_buffer.c
@@ -2926,6 +2926,8 @@ rb_get_reader_page(struct ring_buffer_pe
 	 * Splice the empty reader page into the list around the head.
 	 */
 	reader = rb_set_head_page(cpu_buffer);
+	if (!reader)
+		goto out;
 	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
 	cpu_buffer->reader_page->list.prev = reader->list.prev;
 



^ permalink raw reply

* [PATCH 2/2] ring-buffer: Fix race between integrity check and readers
From: Steven Rostedt @ 2012-11-30 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker
In-Reply-To: <20121130161238.909829067@goodmis.org>

[-- Attachment #1: Type: text/plain, Size: 2020 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function rb_check_pages() was added to make sure the ring buffer's
pages were sane. This check is done when the ring buffer size is modified
as well as when the iterator is released (closing the "trace" file),
as that was considered a non fast path and a good place to do a sanity
check.

The problem is that the check does not have any locks around it.
If one process were to read the trace file, and another were to read
the raw binary file, the check could happen while the reader is reading
the file.

The issues with this is that the check requires to clear the HEAD page
before doing the full check and it restores it afterward. But readers
require the HEAD page to exist before it can read the buffer, otherwise
it gives a nasty warning and disables the buffer.

By adding the reader lock around the check, this keeps the race from
happening.

Cc: stable@vger.kernel.org # 3.6
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index ec01803..4cb5e51 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3783,12 +3783,17 @@ void
 ring_buffer_read_finish(struct ring_buffer_iter *iter)
 {
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+	unsigned long flags;

 	/*
 	 * Ring buffer is disabled from recording, here's a good place
-	 * to check the integrity of the ring buffer. 
+	 * to check the integrity of the ring buffer.
+	 * Must prevent readers from trying to read, as the check
+	 * clears the HEAD page and readers require it.
 	 */
+	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 	rb_check_pages(cpu_buffer);
+	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);

 	atomic_dec(&cpu_buffer->record_disabled);
 	atomic_dec(&cpu_buffer->buffer->resize_disabled);
-- 
1.7.10.4

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related

* [PATCH 1/2] ring-buffer: Fix NULL pointer if rb_set_head_page() fails
From: Steven Rostedt @ 2012-11-30 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker
In-Reply-To: <20121130161238.909829067@goodmis.org>

[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function rb_set_head_page() searches the list of ring buffer
pages for a the page that has the HEAD page flag set. If it does
not find it, it will do a WARN_ON(), disable the ring buffer and
return NULL, as this should never happen.

But if this bug happens to happen, not all callers of this function
can handle a NULL pointer being returned from it. That needs to be
fixed.

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b979426..ec01803 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1396,6 +1396,8 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 		struct list_head *head_page_with_bit;
 
 		head_page = &rb_set_head_page(cpu_buffer)->list;
+		if (!head_page)
+			break;
 		prev_page = head_page->prev;
 
 		first_page = pages->next;
@@ -2934,7 +2936,7 @@ unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
 	unsigned long flags;
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
-	unsigned long ret;
+	unsigned long ret = 0;
 
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return 0;
@@ -2949,7 +2951,8 @@ unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
 		bpage = cpu_buffer->reader_page;
 	else
 		bpage = rb_set_head_page(cpu_buffer);
-	ret = bpage->page->time_stamp;
+	if (bpage)
+		ret = bpage->page->time_stamp;
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	return ret;
@@ -3260,6 +3263,8 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	 * Splice the empty reader page into the list around the head.
 	 */
 	reader = rb_set_head_page(cpu_buffer);
+	if (!reader)
+		goto out;
 	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
 	cpu_buffer->reader_page->list.prev = reader->list.prev;
 
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related

* [PATCH 0/2] [GIT PULL][v3.7] ring-buffer: Bug fixes
From: Steven Rostedt @ 2012-11-30 16:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]


Ingo,

This is based off of my last urgent pull request.

Please pull the latest tip/perf/urgent-2 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/urgent-2

Head SHA1: 9366c1ba13fbc41bdb57702e75ca4382f209c82f


Steven Rostedt (2):
      ring-buffer: Fix NULL pointer if rb_set_head_page() fails
      ring-buffer: Fix race between integrity check and readers

----
 kernel/trace/ring_buffer.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox