All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] ACPI: kill acpi_pci_root_start
From: Yinghai Lu @ 2012-10-03 23:00 UTC (permalink / raw)
  To: Len Brown, Bjorn Helgaas, Greg Kroah-Hartman
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel,
	linux-acpi, Yinghai Lu
In-Reply-To: <CAE9FiQWgvVqnTTeYFvxf1GxLcEEeRyN-uSKPdg3o69oCpm43sQ@mail.gmail.com>

Now acpi_pci_root_driver has two ops: .add and .start, aka acpi_pci_root_add
and acpi_pci_root_start.

That is for hotplug handling: .add need to return early to make sure all
acpi device could be created and added early. So .start could device_add
pci device that are found in acpi_pci_root_add/pci_acpi_scan_root().

That is holding pci devics to be out of devices for while.

We could use bus notifier to handle hotplug case.
	CONFIG_HOTPLUG is enabled always now.
Need to add drivers_autoprobe bit in acpi_device to hold attaching drivers
for acpi_devices, so could make sure all acpi_devices get created at first.
Then acpi_bus_attach() that is called from acpi_bus_add will attach driver
for all acpi_devices that are just created.

That make the logic more simple: hotplug path handling just like booting path
that drivers are attached after all acpi device get created.

At last we could remove all acpi_bus_start workaround.

Thanks

Yinghai

Yinghai Lu (4):
  ACPI: add drivers_autoprobe in struct acpi_device
  ACPI: use device drivers_autoprobe to delay loading acpi drivers
  PCI, ACPI: Remove not used acpi_pci_root_start()
  ACPI: remove acpi_op_start workaround

 drivers/acpi/pci_root.c |   27 +++------
 drivers/acpi/scan.c     |  145 ++++++++++++++++++++++-------------------------
 include/acpi/acpi_bus.h |    9 +---
 3 files changed, 77 insertions(+), 104 deletions(-)

-- 
1.7.7

^ permalink raw reply

* [U-Boot] [PATCH 1/2] ext4: Rename block group descriptor table from gd to bgd
From: Graeme Russ @ 2012-10-03 22:59 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <CALButCLC7yY2qLAegsed-3=C35+ynj0NYwxu4DRnAdQrGaPEiA@mail.gmail.com>

Hi Simon,

On Thu, Oct 4, 2012 at 7:44 AM, Graeme Russ <graeme.russ@gmail.com> wrote:
> Hi Simon,
>
> On Oct 4, 2012 6:58 AM, "Simon Glass" <sjg@chromium.org> wrote:
>>
>> Hi Graeme,
>>
>> On Wed, Oct 3, 2012 at 1:47 PM, Graeme Russ <graeme.russ@gmail.com> wrote:
>> > Hi Simon,
>> >
>> > On Oct 4, 2012 6:40 AM, "Simon Glass" <sjg@chromium.org> wrote:
>> >>
>> >> Hi Tom,
>> >>
>> >> On Wed, Oct 3, 2012 at 1:04 PM, Tom Rini <trini@ti.com> wrote:
>> >> > -----BEGIN PGP SIGNED MESSAGE-----
>> >> > Hash: SHA1
>> >> >
>> >> > On 10/03/12 12:53, Simon Glass wrote:
>> >> >
>> >> >> On x86 machines gd is unfortunately a #define, so we should avoid
>> >> >> using gd for anything. This patch changes uses of gd to bgd so that
>> >> >> ext4fs can be used on x86.
>> >> >>
>> >> >> Signed-off-by: Simon Glass <sjg@chromium.org>
>> >> >
>> >> > Is there any way to change x86 to not be using a #define for gd?
>> >>
>> >> I wasn't brave enough to look hard at that, although Graeme is on copy
>> >> and will know. It is actually using inline assembly to access this
>> >> special variable.
>> >
>> > Isn't 'gd' used by everyone (global data)? I fail to see how this ever
>> > worked.
>>
>> Well only x86 uses a #define for it, so other archs cause no problem.
>> It means that we can't use 'gd' as a symbol anywhere in U-Boot. I
>> suppose the only sensible use is a structure member, as here.
>
> Ah, I see - and I don't see a quick and easy way out. Let me look a bit
> deeper...

I remember now... commit 9e6c572ff03cda84c88663b23c7157d8b1f275ac
explains why the #define gd came about:

"Use the base address of the 'F' segment as a pointer to the global data
structure. By adding the linear address (i.e. the 'D' segment address) as
the first word of the global data structure, the address of the global data
relative to the 'D' segment can be found simply, for example, by:

fs movl 0, %eax

This makes the gd 'pointer' writable prior to relocation (by reloading the
Global Desctriptor Table) which brings x86 into line with all other arches

NOTE: Writing to the gd 'pointer' is expensive (but we only do it
twice) but using it to access global data members (read and write) is
still fairly cheap"

The other option was rather ugly - create gd_get() and gd_set() inline
functions and replace all instances of gd-> in all U-Boot source with
gd_get()-> or gd_set(foo). I don't think it would have made any
difference to code size, but the amount of code touched would have
been massive.

The only other option I can think of is to change gd into something
much less likely to be used as a symbol (__gd for example), but again,
the patch to do so would be huge

I'm open to alternatives

Regards,

Graeme

^ permalink raw reply

* Re: [PATCH v3 04/13] kmem accounting basic infrastructure
From: Tejun Heo @ 2012-10-03 22:59 UTC (permalink / raw)
  To: Glauber Costa
  Cc: James Bottomley, Mel Gorman, Michal Hocko, linux-kernel, cgroups,
	kamezawa.hiroyu, devel, linux-mm, Suleiman Souhlal,
	Frederic Weisbecker, David Rientjes, Johannes Weiner
In-Reply-To: <5069584A.8090809@parallels.com>

Hello, Glauber.

On Mon, Oct 01, 2012 at 12:46:02PM +0400, Glauber Costa wrote:
> > Yeah, it will need some hooks.  For dentry and inode, I think it would
> > be pretty well isolated tho.  Wasn't it?
> 
> We would still need something for the stack. For open files, and for
> everything that becomes a potential problem. We then end up with 35
> different knobs instead of one. One of the perceived advantages of this
> approach, is that it condenses as much data as a single knob as
> possible, reducing complexity and over flexibility.

Oh, I didn't mean to use object-specific counting for all of them.
Most resources don't have such common misaccounting problem.  I mean,
for stack, it doesn't exist by definition (other than cgroup
migration).  There's no reason to use anything other than first-use
kmem based accounting for them.  My point was that for particularly
problematic ones like dentry/inode, it might be better to treat them
differently.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v3 04/13] kmem accounting basic infrastructure
From: Tejun Heo @ 2012-10-03 22:59 UTC (permalink / raw)
  To: Glauber Costa
  Cc: James Bottomley, Mel Gorman, Michal Hocko, linux-kernel, cgroups,
	kamezawa.hiroyu, devel, linux-mm, Suleiman Souhlal,
	Frederic Weisbecker, David Rientjes, Johannes Weiner
In-Reply-To: <5069584A.8090809@parallels.com>

Hello, Glauber.

On Mon, Oct 01, 2012 at 12:46:02PM +0400, Glauber Costa wrote:
> > Yeah, it will need some hooks.  For dentry and inode, I think it would
> > be pretty well isolated tho.  Wasn't it?
> 
> We would still need something for the stack. For open files, and for
> everything that becomes a potential problem. We then end up with 35
> different knobs instead of one. One of the perceived advantages of this
> approach, is that it condenses as much data as a single knob as
> possible, reducing complexity and over flexibility.

Oh, I didn't mean to use object-specific counting for all of them.
Most resources don't have such common misaccounting problem.  I mean,
for stack, it doesn't exist by definition (other than cgroup
migration).  There's no reason to use anything other than first-use
kmem based accounting for them.  My point was that for particularly
problematic ones like dentry/inode, it might be better to treat them
differently.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [GIT PULL] xtensa patchset for 3.7
From: Chris Zankel @ 2012-10-03 22:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton
In-Reply-To: <CA+55aFyva3aWXBOBwERPJMyX+2LqydYs4Bq4ZBUPXZFMtPqZOg@mail.gmail.com>

Hi Linus,

On 10/03/2012 03:41 PM, Linus Torvalds wrote:
> That should be
>
>     git://github.com/czankel/xtensa-linux tags/xtensa-next-20121003
>
> which is the public git address of that server.
Sorry about that. I mistakenly used 'origin' instead of the actual address.

> However, you now do have a signed tag, but the key you have used for
> it is not available on any of the regular keyservers. Neither
> pgp.mit.edu nor keys.gnupg.net know about that key A1F191F0, which I
> assume also means that it's not actually signed by anybody else
> either.
>
> I see that you're based in St Petersburg, is there anybody around you
> can get your key signed with?
I'm actually located in the Bay Area (Max, who's helping out, is in St. 
Petersburg), and could stop by anywhere around here. I'll also try ask 
some old friends if they are still in the position to certify the signature.

Thanks,
-Chris


^ permalink raw reply

* [U-Boot] [PATCH v4 14/16] tegra: fdt: Add LCD definitions for Seaboard
From: Stephen Warren @ 2012-10-03 22:58 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <1348793077-10126-15-git-send-email-sjg@chromium.org>

On 09/27/2012 06:44 PM, Simon Glass wrote:
> The Seaboard has a 1366x768 16bpp LCD. The backlight is controlled
> by one of the PWMs.

> diff --git a/board/nvidia/dts/tegra20-seaboard.dts b/board/nvidia/dts/tegra20-seaboard.dts

> +	host1x {
> +		dc at 54200000 {

So based on my previous comment, I think you want status="okay" at this
level too, perhaps even at the host1x level.

> +			rgb {
> +				status = "okay";
> +				nvidia,panel = <&lcd_panel>;
> +			};
> +		};
> +	};

^ permalink raw reply

* Re: [PATCH v4 14/16] tegra: fdt: Add LCD definitions for Seaboard
From: Stephen Warren @ 2012-10-03 22:58 UTC (permalink / raw)
  To: Simon Glass
  Cc: Devicetree Discuss, U-Boot Mailing List, Jerry Van Baren,
	Stephen Warren, Tom Warren
In-Reply-To: <1348793077-10126-15-git-send-email-sjg-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

On 09/27/2012 06:44 PM, Simon Glass wrote:
> The Seaboard has a 1366x768 16bpp LCD. The backlight is controlled
> by one of the PWMs.

> diff --git a/board/nvidia/dts/tegra20-seaboard.dts b/board/nvidia/dts/tegra20-seaboard.dts

> +	host1x {
> +		dc@54200000 {

So based on my previous comment, I think you want status="okay" at this
level too, perhaps even at the host1x level.

> +			rgb {
> +				status = "okay";
> +				nvidia,panel = <&lcd_panel>;
> +			};
> +		};
> +	};

^ permalink raw reply

* Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()
From: Linus Torvalds @ 2012-10-03 22:58 UTC (permalink / raw)
  To: Andy Walls
  Cc: Greg KH, Al Viro, Mauro Carvalho Chehab, Ming Lei, Kay Sievers,
	Lennart Poettering, Linux Kernel Mailing List, Kay Sievers,
	Linux Media Mailing List, Michael Krufky, Ivan Kalvachev
In-Reply-To: <3560b86d-e2ad-484d-ab6e-2b9048894a12@email.android.com>

On Wed, Oct 3, 2012 at 3:48 PM, Andy Walls <awalls@md.metrocast.net> wrote:
>
> I don't know if you can remove the /sys/.../firmware ABI altogether, because there is at least one, somewhat popular udev replacement that also uses it: mdev
>
> http://git.busybox.net/busybox/plain/docs/mdev.txt

Heh. That web doc documents /lib/firmware as being the place to be.

That said, there's clearly enough variation here that I think that for
now I won't take the step to disable the udev part. I'll do the patch
to support "direct filesystem firmware loading" using the udev default
paths, and that hopefully fixes the particular case people see with
media modules.

We definitely want to have configurable paths and a way to configure
udev entirely off for firmware (together with a lack of paths
configuring the direct filesystem loading off - that way people can
set things up just the way they like), but since I'm travelling
tomorrow and this clearly needs more work, I'll do the first step only
for now..

                Linus

^ permalink raw reply

* [U-Boot] [PATCH v4 04/16] tegra: fdt: Add LCD definitions for Tegra
From: Stephen Warren @ 2012-10-03 22:58 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <1348793077-10126-5-git-send-email-sjg@chromium.org>

On 09/27/2012 06:44 PM, Simon Glass wrote:
> Add LCD definitions and also a proposed binding for LCD displays.
> 
> The PWM is as per what will likely be committed to linux-next soon.
> 
> The displaymode binding comes from a proposal here:
> 
> http://lists.freedesktop.org/archives/dri-devel/2012-July/024875.html
> 
> The panel binding is new, and fills a need to specify the panel
> timings and other tegra-specific information. Should a binding appear
> that allows the pwm to handle this automatically, we can revisit
> this.

> diff --git a/arch/arm/dts/tegra20.dtsi b/arch/arm/dts/tegra20.dtsi

> +	host1x {
> +		compatible = "nvidia,tegra20-host1x", "simple-bus";
> +		reg = <0x50000000 0x00024000>;
> +		interrupts = <0 65 0x04   /* mpcore syncpt */
> +			      0 67 0x04>; /* mpcore general */
> +
> +		#address-cells = <1>;
> +		#size-cells = <1>;
> +
> +		ranges = <0x54000000 0x54000000 0x04000000>;
> +
> +		/* video-encoding/decoding */
> +		mpe {
> +			reg = <0x54040000 0x00040000>;
> +			interrupts = <0 68 0x04>;
> +		};

Shouldn't all of these nodes have status="disabled", since in general
boards will want to opt-in to these modules. In fact, many of these
nodes won't end up (ever?) being used in U-Boot; perhaps we should only
add the nodes we care about.

> +		/* display controllers */
> +		dc at 54200000 {
> +			compatible = "nvidia,tegra20-dc";
> +			reg = <0x54200000 0x00040000>;
> +			interrupts = <0 73 0x04>;
> +
> +			rgb {
> +				status = "disabled";
> +			};
> +		};

I think we definitely want status="disabled" in the dc nodes themselves,
since we definitely want boards in U-Boot to control whether to enable
the dc.

^ permalink raw reply

* Re: [PATCH v4 04/16] tegra: fdt: Add LCD definitions for Tegra
From: Stephen Warren @ 2012-10-03 22:58 UTC (permalink / raw)
  To: Simon Glass
  Cc: Devicetree Discuss, U-Boot Mailing List, Jerry Van Baren,
	Stephen Warren, Tom Warren
In-Reply-To: <1348793077-10126-5-git-send-email-sjg-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

On 09/27/2012 06:44 PM, Simon Glass wrote:
> Add LCD definitions and also a proposed binding for LCD displays.
> 
> The PWM is as per what will likely be committed to linux-next soon.
> 
> The displaymode binding comes from a proposal here:
> 
> http://lists.freedesktop.org/archives/dri-devel/2012-July/024875.html
> 
> The panel binding is new, and fills a need to specify the panel
> timings and other tegra-specific information. Should a binding appear
> that allows the pwm to handle this automatically, we can revisit
> this.

> diff --git a/arch/arm/dts/tegra20.dtsi b/arch/arm/dts/tegra20.dtsi

> +	host1x {
> +		compatible = "nvidia,tegra20-host1x", "simple-bus";
> +		reg = <0x50000000 0x00024000>;
> +		interrupts = <0 65 0x04   /* mpcore syncpt */
> +			      0 67 0x04>; /* mpcore general */
> +
> +		#address-cells = <1>;
> +		#size-cells = <1>;
> +
> +		ranges = <0x54000000 0x54000000 0x04000000>;
> +
> +		/* video-encoding/decoding */
> +		mpe {
> +			reg = <0x54040000 0x00040000>;
> +			interrupts = <0 68 0x04>;
> +		};

Shouldn't all of these nodes have status="disabled", since in general
boards will want to opt-in to these modules. In fact, many of these
nodes won't end up (ever?) being used in U-Boot; perhaps we should only
add the nodes we care about.

> +		/* display controllers */
> +		dc@54200000 {
> +			compatible = "nvidia,tegra20-dc";
> +			reg = <0x54200000 0x00040000>;
> +			interrupts = <0 73 0x04>;
> +
> +			rgb {
> +				status = "disabled";
> +			};
> +		};

I think we definitely want status="disabled" in the dc nodes themselves,
since we definitely want boards in U-Boot to control whether to enable
the dc.

^ permalink raw reply

* Re: The BitBake equivalent of "Hello, World!"
From: Patrick Turley @ 2012-10-03 22:56 UTC (permalink / raw)
  To: Patrick Turley, yocto@yoctoproject.org
In-Reply-To: <CC9226B2.365%PatrickTurley@gamestop.com>

In my previous message, some of the indentation in the representation of
my file tree was wrong (because we're using Outlook, which destroy all
indentation when you paste it into an e-mail message). The errors are
small, but I want to avoid annoying anyone who might think I don't even
have the file tree constructed correctly.

The following is accurate:

>/home/pturley/Workspace/woohoo
>    |
>    +-- build
>    |   |
>    |   +-- classes
>    |   |   |
>    |   |   +-- base.bbclass
>    |   |
>    |   |     +-------------------------------------------
>    |   |     | do_hello() {
>    |   |     |     echo Hello
>    |   |     | }
>    |   |     |
>    |   |     | addtask hello
>    |   |     +-------------------------------------------
>    |   |
>    |   +-- conf
>    |       |
>    |       +-- bblayers.conf
>    |       |
>    |       | +-------------------------------------------
>    |       | | BBLAYERS ?= " \
>    |       | |   /home/pturley/Workspace/woohoo/LayerA \
>    |       | |   "
>    |       | +-------------------------------------------
>    |       |
>    |       +-- bitbake.conf
>    |
>    |         +-------------------------------------------
>    |         | CACHE = "${TOPDIR}/cache"
>    |         +-------------------------------------------
>    |
>    +-- LayerA
>    |   |
>    |   +-- a.bb
>    |   |
>    |   |     +-------------------------------------------
>    |   |     | PN = 'a'
>    |   |     | PV = '1'
>    |   |     +-------------------------------------------
>    |   |
>    |   +-- conf
>    |       |
>    |       +-- layer.conf
>    |
>    |         +-------------------------------------------
>    |         | BBPATH .= ":${LAYERDIR}"
>    |         | BBFILES += "${LAYERDIR}/*.bb"
>    |         +-------------------------------------------
>    |
>    +-- BitBake ...
>
>    The BitBake directory origin is:
>
>        http://git.openembedded.org/bitbake/
>
>    I have the 1.15.2 tag checked out, which is what
>    Yocto denzil uses.



^ permalink raw reply

* Re: [PATCH v3 04/13] kmem accounting basic infrastructure
From: Tejun Heo @ 2012-10-03 22:54 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Mel Gorman, Michal Hocko, linux-kernel, cgroups, kamezawa.hiroyu,
	devel, linux-mm, Suleiman Souhlal, Frederic Weisbecker,
	David Rientjes, Johannes Weiner
In-Reply-To: <50695817.2030201@parallels.com>

Hello, Glauber.

On Mon, Oct 01, 2012 at 12:45:11PM +0400, Glauber Costa wrote:
> > where kmemcg_slab_idx is updated from sched notifier (or maybe add and
> > use current->kmemcg_slab_idx?).  You would still need __GFP_* and
> > in_interrupt() tests but current->mm and PF_KTHREAD tests can be
> > rolled into index selection.
> 
> How big would this array be? there can be a lot more kmem_caches than
> there are memcgs. That is why it is done from memcg side.

The total number of memcgs are pretty limited due to the ID thing,
right?  And kmemcg is only applied to subset of caches.  I don't think
the array size would be a problem in terms of memory overhead, would
it?  If so, RCU synchronize and dynamically grow them?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v3 04/13] kmem accounting basic infrastructure
From: Tejun Heo @ 2012-10-03 22:54 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Mel Gorman, Michal Hocko, linux-kernel, cgroups, kamezawa.hiroyu,
	devel, linux-mm, Suleiman Souhlal, Frederic Weisbecker,
	David Rientjes, Johannes Weiner
In-Reply-To: <50695817.2030201@parallels.com>

Hello, Glauber.

On Mon, Oct 01, 2012 at 12:45:11PM +0400, Glauber Costa wrote:
> > where kmemcg_slab_idx is updated from sched notifier (or maybe add and
> > use current->kmemcg_slab_idx?).  You would still need __GFP_* and
> > in_interrupt() tests but current->mm and PF_KTHREAD tests can be
> > rolled into index selection.
> 
> How big would this array be? there can be a lot more kmem_caches than
> there are memcgs. That is why it is done from memcg side.

The total number of memcgs are pretty limited due to the ID thing,
right?  And kmemcg is only applied to subset of caches.  I don't think
the array size would be a problem in terms of memory overhead, would
it?  If so, RCU synchronize and dynamically grow them?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] revert "PCI: log vendor/device ID always"
From: Bjorn Helgaas @ 2012-10-03 22:54 UTC (permalink / raw)
  To: Nathan Zimmer; +Cc: linux-kernel, linux-pci, Jesse Barnes
In-Reply-To: <1349187780-25692-1-git-send-email-nzimmer@sgi.com>

On Tue, Oct 2, 2012 at 8:23 AM, Nathan Zimmer <nzimmer@sgi.com> wrote:
> Revert commit id 2c6413aee215a43b1f95e218067abcde50ccbc5e
> On larger systems (256 cores+) with signifigant IO attached this single message
> represents over 20% of the messages at boot.

Is this causing a problem?  The messages are at KERN_DEBUG, so they
shouldn't be going to the console by default anyway.

I/O devices normally have at least one BAR, as well as some PME
messages, so a change like this won't affect them too much.  My guess
is that it's really the large number of CPUs, where we find all the
uncore/memory controller/etc stuff where this is a problem.  Those
devices don't have BARs, so
this line is probably the only information about them in dmesg.

> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
>
> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
> ---
>  drivers/pci/probe.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 9f8a6b7..a1add54 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1002,8 +1002,8 @@ int pci_setup_device(struct pci_dev *dev)
>         dev->revision = class & 0xff;
>         dev->class = class >> 8;                    /* upper 3 bytes */
>
> -       dev_printk(KERN_DEBUG, &dev->dev, "[%04x:%04x] type %02x class %#08x\n",
> -                  dev->vendor, dev->device, dev->hdr_type, dev->class);
> +       dev_dbg(&dev->dev, "[%04x:%04x] type %02x class %#08x\n",
> +               dev->vendor, dev->device, dev->hdr_type, dev->class);
>
>         /* need to have dev->class ready */
>         dev->cfg_size = pci_cfg_space_size(dev);
> --
> 1.6.0.2
>

^ permalink raw reply

* [U-Boot] [PATCH v4 04/16] tegra: fdt: Add LCD definitions for Tegra
From: Stephen Warren @ 2012-10-03 22:54 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <1348793077-10126-5-git-send-email-sjg@chromium.org>

On 09/27/2012 06:44 PM, Simon Glass wrote:
> Add LCD definitions and also a proposed binding for LCD displays.
> 
> The PWM is as per what will likely be committed to linux-next soon.
> 
> The displaymode binding comes from a proposal here:
> 
> http://lists.freedesktop.org/archives/dri-devel/2012-July/024875.html
> 
> The panel binding is new, and fills a need to specify the panel
> timings and other tegra-specific information. Should a binding appear
> that allows the pwm to handle this automatically, we can revisit
> this.

> diff --git a/doc/device-tree-bindings/video/displaymode.txt b/doc/device-tree-bindings/video/displaymode.txt

> +Example:
> +
> +	display at 0 {

That should really be "mode" or "timing", since it's describing a
display mode/timing not a display. I believe the latest messages in that
thread indicate a similar change will be made to the proposed Linux binding.

> diff --git a/doc/device-tree-bindings/video/tegra20-dc.txt b/doc/device-tree-bindings/video/tegra20-dc.txt

> +Required properties (rgb) :
> + - nvidia,panel : phandle of LCD panel information
> +
> +
> +The panel node describes the panel itself. This has the properties listed in
> +displaymode.txt as well as:

It should really be a sub-node; like:

panel {
    ... all the panel-specific stuff from this binding ...
    modes {
        default {
            ... all the stuff from displaymode.txt ...
        };
    };
};

Since the panel itself and the (potentially, list of) modes it can
support are separate things.

^ permalink raw reply

* Re: [PATCH v4 04/16] tegra: fdt: Add LCD definitions for Tegra
From: Stephen Warren @ 2012-10-03 22:54 UTC (permalink / raw)
  To: Simon Glass
  Cc: Devicetree Discuss, U-Boot Mailing List, Jerry Van Baren,
	Stephen Warren, Tom Warren
In-Reply-To: <1348793077-10126-5-git-send-email-sjg-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

On 09/27/2012 06:44 PM, Simon Glass wrote:
> Add LCD definitions and also a proposed binding for LCD displays.
> 
> The PWM is as per what will likely be committed to linux-next soon.
> 
> The displaymode binding comes from a proposal here:
> 
> http://lists.freedesktop.org/archives/dri-devel/2012-July/024875.html
> 
> The panel binding is new, and fills a need to specify the panel
> timings and other tegra-specific information. Should a binding appear
> that allows the pwm to handle this automatically, we can revisit
> this.

> diff --git a/doc/device-tree-bindings/video/displaymode.txt b/doc/device-tree-bindings/video/displaymode.txt

> +Example:
> +
> +	display@0 {

That should really be "mode" or "timing", since it's describing a
display mode/timing not a display. I believe the latest messages in that
thread indicate a similar change will be made to the proposed Linux binding.

> diff --git a/doc/device-tree-bindings/video/tegra20-dc.txt b/doc/device-tree-bindings/video/tegra20-dc.txt

> +Required properties (rgb) :
> + - nvidia,panel : phandle of LCD panel information
> +
> +
> +The panel node describes the panel itself. This has the properties listed in
> +displaymode.txt as well as:

It should really be a sub-node; like:

panel {
    ... all the panel-specific stuff from this binding ...
    modes {
        default {
            ... all the stuff from displaymode.txt ...
        };
    };
};

Since the panel itself and the (potentially, list of) modes it can
support are separate things.

^ permalink raw reply

* Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()
From: Stephen Rothwell @ 2012-10-03 22:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg KH, Al Viro, Mauro Carvalho Chehab, Ming Lei, Kay Sievers,
	Lennart Poettering, Linux Kernel Mailing List, Kay Sievers,
	Linux Media Mailing List, Michael Krufky, Ivan Kalvachev
In-Reply-To: <CA+55aFwjyABgr-nmsDb-184nQF7KfA8+5kbuBNwyQBHs671qQg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

Hi Linus,

On Wed, 3 Oct 2012 13:39:23 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Ok, I wish this had been getting more testing in Linux-next or
> something

If you ever want a patch tested for a few days, just send it to me and I
will put it in my "fixes" tree which is merged into linux-next
immediately on top of your tree.  If nothing else, that will give it wide
build testing (see http://kisskb.ellerman.id.au/linux-next).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [refpolicy] [PATCH] Changes to the dbus policy and its dependencies
From: Dominick Grift @ 2012-10-03 22:49 UTC (permalink / raw)
  To: refpolicy
In-Reply-To: <1349302198.22995.25.camel@d30.localdomain>



On Thu, 2012-10-04 at 00:09 +0200, Dominick Grift wrote:
> 
> On Wed, 2012-10-03 at 23:54 +0200, Sven Vermeulen wrote:
> > Recently, the dbus policy has seen some changes. At least one of them
> > makes an interface incompatible with its earlier declaration.
> > 
> > dbus_session_bus_client() previously took its argument as being a
> > domain ($1) and now takes the argument to create its own domain
> > ($1_dbusd_t). As a result, modules that used to do something like
> > "dbus_session_bus_client(chromium_t)" are now broken.
> > 
> > I'm wondering, how are we supposed to work with these interfaces now?
> > Do we need to declare the subtype ourselves (I don't think the idea is
> > to use the dbus_role_template for non-user domains, but it seems that
> > this is the only interface that creates the specific type)?
> 
> You need the dbus_all_session_bus_client(domain) now
> 
> the dbus_session_bus_client() is for prefixed domains like gkeyringd for
> example
> 
> its finer grained. instead of allowing the caller to sendrecv to all
> sessions busses you can define to which session bus the caller can
> sendrecv
> 
> it takes two params $1, role_prefix and $2 domain
> 
> allow $2 $1_dbusd_t:dbus send_msg;
> 
> vs.
> 
> allow $1 session_bus_type:dbus send_msg;
> 

I know i should probably have deprecated the dbus_session_bus_client()
but by doing so i would make the interface name unavailable for its new
use.

Then it would get really messy and so i decided to just overlook this
incident and go with this solution. There is one other similar case in
the dbus module but i doubt anyone will hit that.

Some might also argue that the role prefixed dbus_session_bus_client is
not needed/ not wanted but i beg to differ and wanted to at least
support it (it is already put to good use for wm as well as gkeyringd).

I am sorry for the inconvenience caused by this

^ permalink raw reply

* Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()
From: Andy Walls @ 2012-10-03 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Greg KH
  Cc: Al Viro, Mauro Carvalho Chehab, Ming Lei, Kay Sievers,
	Lennart Poettering, Linux Kernel Mailing List, Kay Sievers,
	Linux Media Mailing List, Michael Krufky, Ivan Kalvachev
In-Reply-To: <CA+55aFwjyABgr-nmsDb-184nQF7KfA8+5kbuBNwyQBHs671qQg@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:

>On Wed, Oct 3, 2012 at 12:50 PM, Greg KH <gregkh@linuxfoundation.org>
>wrote:
>>>
>>> Ok, like this?
>>
>> This looks good to me.  Having udev do firmware loading and tieing it
>to
>> the driver model may have not been such a good idea so many years
>ago.
>> Doing it this way makes more sense.
>
>Ok, I wish this had been getting more testing in Linux-next or
>something, but I suspect that what I'll do is to commit this patch
>asap, and then commit another patch that turns off udev firmware
>loading entirely for the synchronous firmware loading case.
>
>Why? Just to get more testing, and seeing if there are reports of
>breakage. Maybe some udev out there has a different search path (or
>because udev runs in a different filesystem namespace or whatever), in
>which case running udev as a fallback would otherwise hide the fact
>that he direct kernel firmware loading isn't working.
>
>We can (and will) revert things if that turns out to break things, but
>I'd like to make any failures of the firmware direct-load path be fast
>and hard, so that we can see when/what it breaks.
>
>Ok? Comments?
>
>              Linus
>--
>To unsubscribe from this list: send the line "unsubscribe linux-media"
>in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

I don't know if you can remove the /sys/.../firmware ABI altogether, because there is at least one, somewhat popular udev replacement that also uses it: mdev

http://git.busybox.net/busybox/plain/docs/mdev.txt

Regards,
Andy

^ permalink raw reply

* Re: Inconsistency when mounting a directory that 'world' cannot access.
From: NeilBrown @ 2012-10-03 22:46 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Myklebust, Trond, NFS
In-Reply-To: <20121003162728.GE14313@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 4235 bytes --]

On Wed, 3 Oct 2012 12:27:28 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, Oct 03, 2012 at 03:48:43PM +0000, Myklebust, Trond wrote:
> > On Wed, 2012-10-03 at 11:13 -0400, J. Bruce Fields wrote:
> > > On Wed, Oct 03, 2012 at 01:46:29PM +1000, NeilBrown wrote:
> > > > On Tue, 2 Oct 2012 10:33:34 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > > > wrote:
> > > > 
> > > > > I guess you're right.  So it starts to sound more like: "you have a
> > > > > confusing setup.  Your export configuration says one thing, and your
> > > > > filesystem permissions say another.  Under NFSv3 the confusion didn't
> > > > > matter, but now it does--time to fix it."
> > > > > 
> > > > 
> > > > That's the best I could come to - I'm glad to have it confirmed.  Thanks!
> > > > 
> > > > It is unfortunate that Linux NFS uses an anon credential to mount when krb5
> > > > is in use, and uses 'root' when auth_sys is used (which might be anon if
> > > > "root_squash" is active, but might not).
> > > > I wonder if it would work to use auth_none for the mount-time lookup, just
> > > > for consistency..
> > > > 
> > > > Is the following appropriate?  Is there somewhere better to put this caveat?
> > > 
> > > Unfortunately, it's more complicated than this, as it depends on client
> > > implementation and configuration details.
> > > 
> > > Something like this would be more accurate but possibly too long:
> > > 
> > > 	Note that under NFSv2 and NFSv3, the mount path is traversed by
> > > 	mountd acting as root, but under NFSv4 the mount path is looked
> > > 	up using the client's credentials.  This means that, for
> > > 	example, if a client mounts using a krb5 credential that the
> > > 	server maps to an "anonmyous" user, then the mount will only
> > > 	succeed if that directory and all its parents allow eXecute
> > > 	permissions.
> > 
> > So you're listing this as a "feature" rather than a bug? There should be
> > no reason to constrain the pseudofs to use the permission checks from
> > the underlying filesystem.
> 
> I'd be fine with that.
> 
> (That still leaves some subtle v3/v4 difference in the case of mount
> paths underneath an export?
> 
> What *is* the existing mountd behavior there, exactly?  I'm inclined to
> think allowing mounts of arbitrary subdirectories is a bug, but maybe
> there's some historical reason for it or maybe someone already depends
> on it.)
> 
> --b.

The behaviour is simple that you mount a filehandle (typically belonging to a
directory) and that filehandle can be anything inside any exported filesystem.
Yes, please do depend on being able to mount filehandles that aren't to root
of a filesystem.

The case the brought this issue to my attention involved the server having
a directory containing hundreds of home directories.  This directory is
exported.

If they mount that top level directory they get horrible performance.  If
they use an automounter to just mount the homes that are accessed it works
better.  They weren't able to explain why but my guess is that some tools
(GUI filesystem browser) would occasionally do the equivalent of "ls  -l" of
the top level directory which would hammer nfs-idmapd and probably ldap....
though you would think that would get cached and not be a problem for long.
So maybe it is more subtle than that.

I've built similar setups before.  There is something attractive about
everyone's home directory being /home/$USERNAME even though they are on
different servers and different filesystems.

In the particular problem scenario, local policy requires that the 'staff'
directory on the server to not be world-accessible, but they still want to
mount the individual home directories from there onto client machines as
required.
I cannot easily justify that policy, but the point is that it works with
NFSv3 and with AUTH_SYS/no_root_squash, but not with NFSv4/kerb5.  I don't
think we can fix this inconsistency but maybe we can explain it.

I think your text is more accurate than mine, but also a little more vague so
the important may not be immediately obvious.  That might be a price we have
to pay for accuracy.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [RFC/PATCH 0/3] swap/frontswap: allow movement of zcache pages to swap
From: Dan Magenheimer @ 2012-10-03 22:43 UTC (permalink / raw)
  To: linux-kernel, linux-mm, ngupta, konrad.wilk, sjenning, minchan,
	hughd, akpm, riel, hannes, dan.magenheimer, aarcange, mgorman,
	gregkh

INTRODUCTION

This is an initial patchset attempting to address Andrea Arcangeli's concern
expressed in https://lwn.net/Articles/516538/ "Moving zcache towards the
mainline".  It works, but not well, so feedback/help/expertise is definitely
needed!  Note that the zcache code that invokes this already exists in
zcache2, currently housed in Linus' 3.7-rc0 tree in drivers/staging/ramster,
so is not included in this patchset.  All relevant code in zcache is
conveniently marked with "#ifdef FRONTSWAP_UNUSE" which this patchset enables.

This patchset currently applies cleanly in Linus' tree following commit
33c2a174120b2c1baec9d1dac513f9d4b761b26a

[PROMOTION NOTE: Due to the great zcache compromise, the author won't take a
position on whether this "unuse" capability is a new zcache feature nor
whether this capability is necessary for production use of zcache.  This post
merely documents the motivation, mechanism, and policy of a possible solution
and provides a work-to-date patch, and requests discussion and feedback from
mm experts to ensure that, at some point in zcache's future, this important
capability can be completed and made available to zcache users.]

MOTIVATION

Several mm developers have noted that one of the key limiters to zcache
is that zcache may fill up with pages from frontswap and, once full,
there is no mechanism (or policy) to "evict" pages to make room for
newer pages.  This is problematic for some workloads, especially those
with a mix of long-lived-rarely-running threads (type A) such as system
services that run at boot and shutdown, and short-lived-always-running
threads (type B).  In a non-zcache system under memory pressure, type A
pages will be swapped to ensure that type B pages remain in RAM.  With
zcache, if even with compression there is insufficient RAM for both,
type A pages will fill zcache and type B pages will be swapped, resulting
in noticably reduced performance for type B threads.  Not good.

So it would be good if zcache had some mechanism for saying "oops, this
frontswap page saved in zcache should go out to the swap device after all."
We will call this "eviction".  (Note that zcache already supports eviction
of cleancache pages because these need only be "reclaimed"... the data
contained in cleancache pages can be discarded.  For the remainder of this
discussion, "eviction" implies a frontswap page.)

This begs the question: What page should zcache choose to evict?  While it
is impossible to predict what pages will be used when, the memory management
subsystem normally employs some form of LRU queue to minimize the probability
that an actively used page will be evicted.  Note, however, that evicting
one zcache page ("zpage") may or may not free up a pageframe as two (or,
in the case of the zsmalloc, more than two) zpages may reside in the same
pageframe.  Since freeing up a part of a pageframe has little value and,
indeed, even swapping a page to a swap device requires a full pageframe,
the zcache eviction policy must evict all zpages in a pageframe.  We will
call this "unuse".  For the remainder of this discussion, we will assume
a zcache pageframe contains exactly two zpages (though this may change in
the future).

MECHANISM

Should zcache write entire pageframes to the swap device, with multiple
zpages contained intact?  This might be possible, and could be explored,
but raises some interesting metadata challenges;  a fair amount of
additional RAM would be necessary to track these sets of zpages.  It might
also be easier if each swap disk had a "shadow": one swap device for
uncompressed pages, and a shadow device for compressed pages, else we
must take great care to avoid overwriting one with the other, which would
likely require some fairly invasive changes to the swap subsystem, which
already has a number of interesting coherency problems solved.  A shadow
swap disk would also require additional devices and/or disk space, which
may not be available, plus userland/sysadmin changes, which would be
difficult to mainstream.

So, we'd like an implementation that has a low requirement for in-RAM
metadata and has no requirement for additional swap device/space or
additional userland changes.

To achieve this, we will rely heavily on the existing swapcache.  When
zcache wishes to unuse a zcache pageframe, it first allocates two pageframes,
one for each zpage, and decompresses each zpage into a pageframe.  It then
frontswap to mark a new bit (in a one-bit-per-swap-page array) called a
"denial" bit.  Next it puts the uncompressed pageframe back into the swap
cache, at the least-recently-used end of the anonymous-inactive queue which,
presumably, makes it a candidate for immediate swapwrite.  At this point,
zcache is now free to release the pageframe that contained the two zpages.
Soon, memory pressure causes kswapd to select some pages to write
to swap.  As with all swap attempts, frontswap will first be called,
but since the denial bit is set, frontswap will reject the page
and the swap subsystem will write the page to the true swapdisk.

There are a number of housekeeping details, but that's the proposed
mechanism, implemented in this patchset, in a nutshell.

POLICY

There are some policy questions: How do we maintain an LRU queue
of zpages?  How do we know when zcache is "full"?  If zcache is
full, how do we ensure we can allocate two fresh pageframes for
decompression, and won't this potentially cause an OOM?  The
new zcache implementation (aka zcache2) attempts a trial policy
for each of these, but without a known working mechanism, the
policy may be irrelevant or wrong.  So let's focus first on
getting a working mechanism, OK?

CALL FOR INPUT/HELP

The patchset proposed does work and, in limited testing, does not OOM.
However, performance is much slower than expected and, with extensive
debug output, it appears that "immediate swapwrite" is not very immediate.
This may be related to the performance degradation.  Or there may be
a stupid bug lingering.  Or... the patchset may be completely braindead!
Any help or input in getting this working (or perhaps justifying why a
completely different mechanism might work better) would be appreciated!

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>

---
Diffstat:

 include/linux/frontswap.h |   57 ++++++++++++++++++++++++++++++++
 include/linux/swap.h      |   12 +++++++
 mm/frontswap.c            |   29 ++++++++++++++++
 mm/swap.c                 |   16 +++++++++
 mm/swap_state.c           |   80 +++++++++++++++++++++++++++++++++++++++++++++
 mm/swapfile.c             |   18 +++++++---
 6 files changed, 207 insertions(+), 5 deletions(-)

^ permalink raw reply

* [RFC/PATCH 3/3] frontswap/swap: allow frontswap "unuse" and add metadata for tracking it
From: Dan Magenheimer @ 2012-10-03 22:43 UTC (permalink / raw)
  To: linux-kernel, linux-mm, ngupta, konrad.wilk, sjenning, minchan,
	hughd, akpm, riel, hannes, dan.magenheimer, aarcange, mgorman,
	gregkh
In-Reply-To: <1349304234-19273-1-git-send-email-dan.magenheimer@oracle.com>

We wish for transcendent memory backends to be able to push
frontswap pages back into the swap cache and need to ensure
that such a page, once pushed back, doesn't get immediately
recaptured by frontswap.  We add frontswap_unuse to do the
pushing via the recently added read_frontswap_async.  We
also add metadata to track when a page has been pushed and
code to manage (and count with debugfs) this metadata.

The initialization/destruction code for the metadata (aka
frontswap_denial_map) is a bit clunky in swapfile.c but
cleanup can be addressed when all the unuse code is working.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
---
 include/linux/frontswap.h |   57 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/swap.h      |    1 +
 mm/frontswap.c            |   29 +++++++++++++++++++++++
 mm/swapfile.c             |   18 ++++++++++----
 4 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
index 3044254..f48bb34 100644
--- a/include/linux/frontswap.h
+++ b/include/linux/frontswap.h
@@ -21,6 +21,8 @@ extern unsigned long frontswap_curr_pages(void);
 extern void frontswap_writethrough(bool);
 #define FRONTSWAP_HAS_EXCLUSIVE_GETS
 extern void frontswap_tmem_exclusive_gets(bool);
+#define FRONTSWAP_HAS_UNUSE
+extern int frontswap_unuse(int, pgoff_t, struct page *, gfp_t);
 
 extern void __frontswap_init(unsigned type);
 extern int __frontswap_store(struct page *page);
@@ -61,6 +63,38 @@ static inline unsigned long *frontswap_map_get(struct swap_info_struct *p)
 {
 	return p->frontswap_map;
 }
+
+static inline int frontswap_test_denial(struct swap_info_struct *sis, pgoff_t offset)
+{
+	int ret = 0;
+
+	if (frontswap_enabled && sis->frontswap_denial_map)
+		ret = test_bit(offset, sis->frontswap_denial_map);
+	return ret;
+}
+
+static inline void frontswap_set_denial(struct swap_info_struct *sis, pgoff_t offset)
+{
+	if (frontswap_enabled && sis->frontswap_denial_map)
+		set_bit(offset, sis->frontswap_denial_map);
+}
+
+static inline void frontswap_clear_denial(struct swap_info_struct *sis, pgoff_t offset)
+{
+	if (frontswap_enabled && sis->frontswap_denial_map)
+		clear_bit(offset, sis->frontswap_denial_map);
+}
+
+static inline void frontswap_denial_map_set(struct swap_info_struct *p,
+				     unsigned long *map)
+{
+	p->frontswap_denial_map = map;
+}
+
+static inline unsigned long *frontswap_denial_map_get(struct swap_info_struct *p)
+{
+	return p->frontswap_denial_map;
+}
 #else
 /* all inline routines become no-ops and all externs are ignored */
 
@@ -88,6 +122,29 @@ static inline unsigned long *frontswap_map_get(struct swap_info_struct *p)
 {
 	return NULL;
 }
+
+static inline int frontswap_test_denial(struct swap_info_struct *sis, pgoff_t offset)
+{
+	return 0;
+}
+
+static inline void frontswap_set_denial(struct swap_info_struct *sis, pgoff_t offset)
+{
+}
+
+static inline void frontswap_clear_denial(struct swap_info_struct *sis, pgoff_t offset)
+{
+}
+
+static inline void frontswap_map_set_denial(struct swap_info_struct *p,
+				     unsigned long *map)
+{
+}
+
+static inline unsigned long *frontswap_map_get_denial(struct swap_info_struct *p)
+{
+	return NULL;
+}
 #endif
 
 static inline int frontswap_store(struct page *page)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a59ddb..aef02bc 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -200,6 +200,7 @@ struct swap_info_struct {
 	unsigned int old_block_size;	/* seldom referenced */
 #ifdef CONFIG_FRONTSWAP
 	unsigned long *frontswap_map;	/* frontswap in-use, one bit per page */
+	unsigned long *frontswap_denial_map;	/* deny frontswap, 1bit/page */
 	atomic_t frontswap_pages;	/* frontswap pages in-use counter */
 #endif
 };
diff --git a/mm/frontswap.c b/mm/frontswap.c
index 2890e67..1af07d1 100644
--- a/mm/frontswap.c
+++ b/mm/frontswap.c
@@ -61,6 +61,8 @@ static u64 frontswap_loads;
 static u64 frontswap_succ_stores;
 static u64 frontswap_failed_stores;
 static u64 frontswap_invalidates;
+static u64 frontswap_unuses;
+static u64 frontswap_denials;
 
 static inline void inc_frontswap_loads(void) {
 	frontswap_loads++;
@@ -151,6 +153,11 @@ int __frontswap_store(struct page *page)
 	BUG_ON(sis == NULL);
 	if (frontswap_test(sis, offset))
 		dup = 1;
+	if (frontswap_test_denial(sis, offset) && (dup == 0)) {
+		frontswap_clear_denial(sis, offset);
+		frontswap_denials++;
+		goto out;
+	}
 	ret = frontswap_ops.store(type, offset, page);
 	if (ret == 0) {
 		frontswap_set(sis, offset);
@@ -169,6 +176,7 @@ int __frontswap_store(struct page *page)
 	if (frontswap_writethrough_enabled)
 		/* report failure so swap also writes to swap device */
 		ret = -1;
+out:
 	return ret;
 }
 EXPORT_SYMBOL(__frontswap_store);
@@ -213,6 +221,7 @@ void __frontswap_invalidate_page(unsigned type, pgoff_t offset)
 	if (frontswap_test(sis, offset)) {
 		frontswap_ops.invalidate_page(type, offset);
 		__frontswap_clear(sis, offset);
+		frontswap_clear_denial(sis, offset);
 		inc_frontswap_invalidates();
 	}
 }
@@ -351,6 +360,24 @@ unsigned long frontswap_curr_pages(void)
 }
 EXPORT_SYMBOL(frontswap_curr_pages);
 
+int frontswap_unuse(int type, pgoff_t offset,
+			struct page *newpage, gfp_t gfp_mask)
+{
+	struct swap_info_struct *sis = swap_info[type];
+	int ret = 0;
+
+	frontswap_set_denial(sis, offset);
+	ret = read_frontswap_async(type, offset, newpage, gfp_mask);
+	if (ret == 0 || ret == -EEXIST) {
+		(*frontswap_ops.invalidate_page)(type, offset);
+		atomic_dec(&sis->frontswap_pages);
+		frontswap_clear(sis, offset);
+		frontswap_unuses++;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(frontswap_unuse);
+
 static int __init init_frontswap(void)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -363,6 +390,8 @@ static int __init init_frontswap(void)
 				&frontswap_failed_stores);
 	debugfs_create_u64("invalidates", S_IRUGO,
 				root, &frontswap_invalidates);
+	debugfs_create_u64("unuses", S_IRUGO, root, &frontswap_unuses);
+	debugfs_create_u64("denials", S_IRUGO, root, &frontswap_denials);
 #endif
 	return 0;
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 14e254c..b3d6266 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1445,7 +1445,8 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 
 static void enable_swap_info(struct swap_info_struct *p, int prio,
 				unsigned char *swap_map,
-				unsigned long *frontswap_map)
+				unsigned long *frontswap_map,
+				unsigned long *frontswap_denial_map)
 {
 	int i, prev;
 
@@ -1456,6 +1457,7 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
 		p->prio = --least_priority;
 	p->swap_map = swap_map;
 	frontswap_map_set(p, frontswap_map);
+	frontswap_denial_map_set(p, frontswap_denial_map);
 	p->flags |= SWP_WRITEOK;
 	nr_swap_pages += p->pages;
 	total_swap_pages += p->pages;
@@ -1557,7 +1559,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 		 * sys_swapoff for this swap_info_struct at this point.
 		 */
 		/* re-insert swap space back into swap_list */
-		enable_swap_info(p, p->prio, p->swap_map, frontswap_map_get(p));
+		enable_swap_info(p, p->prio, p->swap_map,
+			frontswap_map_get(p), frontswap_denial_map_get(p));
 		goto out_dput;
 	}
 
@@ -1588,6 +1591,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	mutex_unlock(&swapon_mutex);
 	vfree(swap_map);
 	vfree(frontswap_map_get(p));
+	vfree(frontswap_denial_map_get(p));
 	/* Destroy swap account informatin */
 	swap_cgroup_swapoff(type);
 
@@ -1948,6 +1952,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	unsigned long maxpages;
 	unsigned char *swap_map = NULL;
 	unsigned long *frontswap_map = NULL;
+	unsigned long *frontswap_denial_map = NULL;
 	struct page *page = NULL;
 	struct inode *inode = NULL;
 
@@ -2032,8 +2037,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		goto bad_swap;
 	}
 	/* frontswap enabled? set up bit-per-page map for frontswap */
-	if (frontswap_enabled)
-		frontswap_map = vzalloc(maxpages / sizeof(long));
+	if (frontswap_enabled) {
+ 		frontswap_map = vzalloc(maxpages / sizeof(long));
+		frontswap_denial_map = vzalloc(maxpages / sizeof(long));
+	}
 
 	if (p->bdev) {
 		if (blk_queue_nonrot(bdev_get_queue(p->bdev))) {
@@ -2049,7 +2056,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (swap_flags & SWAP_FLAG_PREFER)
 		prio =
 		  (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT;
-	enable_swap_info(p, prio, swap_map, frontswap_map);
+	enable_swap_info(p, prio, swap_map,
+				frontswap_map, frontswap_denial_map);
 
 	printk(KERN_INFO "Adding %uk swap on %s.  "
 			"Priority:%d extents:%d across:%lluk %s%s%s\n",
-- 
1.7.1


^ permalink raw reply related

* [RFC/PATCH 1/3] swap: allow adding of pages to tail of anonymous inactive queue
From: Dan Magenheimer @ 2012-10-03 22:43 UTC (permalink / raw)
  To: linux-kernel, linux-mm, ngupta, konrad.wilk, sjenning, minchan,
	hughd, akpm, riel, hannes, dan.magenheimer, aarcange, mgorman,
	gregkh
In-Reply-To: <1349304234-19273-1-git-send-email-dan.magenheimer@oracle.com>

When moving a page of anonymous data out of zcache and back
into swap cache, such pages are VERY inactive, and we want
them to be swapped to disk ASAP.  So we need to add them
at the tail of the proper queue.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
---
 include/linux/swap.h |   10 ++++++++++
 mm/swap.c            |   16 ++++++++++++++++
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 388e706..d3c7281 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -225,6 +225,7 @@ extern unsigned int nr_free_pagecache_pages(void);
 
 /* linux/mm/swap.c */
 extern void __lru_cache_add(struct page *, enum lru_list lru);
+extern void __lru_cache_add_tail(struct page *, enum lru_list lru);
 extern void lru_cache_add_lru(struct page *, enum lru_list lru);
 extern void lru_add_page_tail(struct page *page, struct page *page_tail,
 			      struct lruvec *lruvec);
@@ -247,6 +248,15 @@ static inline void lru_cache_add_anon(struct page *page)
 {
 	__lru_cache_add(page, LRU_INACTIVE_ANON);
 }
+ 
+/**
+ * lru_cache_add_tail: add a page to the tail of the page lists
+ * @page: the page to add
+ */
+static inline void lru_cache_add_anon_tail(struct page *page)
+{
+	__lru_cache_add_tail(page, LRU_INACTIVE_ANON);
+}
 
 static inline void lru_cache_add_file(struct page *page)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 7782588..67216d8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -456,6 +456,22 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
 	put_cpu_var(lru_add_pvecs);
 }
 EXPORT_SYMBOL(__lru_cache_add);
+ 
+void __lru_cache_add_tail(struct page *page, enum lru_list lru)
+{
+	struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
+	unsigned long flags;
+
+	page_cache_get(page);
+	if (!pagevec_add(pvec, page)) {
+		__pagevec_lru_add(pvec, lru);
+		local_irq_save(flags);
+		pagevec_move_tail(pvec);
+		local_irq_restore(flags);
+	}
+	put_cpu_var(lru_add_pvecs);
+}
+EXPORT_SYMBOL(__lru_cache_add_tail);
 
 /**
  * lru_cache_add_lru - add a page to a page list
-- 
1.7.1


^ permalink raw reply related

* [RFC/PATCH 2/3] swap: add read_frontswap_async to move a page from frontswap to swapcache
From: Dan Magenheimer @ 2012-10-03 22:43 UTC (permalink / raw)
  To: linux-kernel, linux-mm, ngupta, konrad.wilk, sjenning, minchan,
	hughd, akpm, riel, hannes, dan.magenheimer, aarcange, mgorman,
	gregkh
In-Reply-To: <1349304234-19273-1-git-send-email-dan.magenheimer@oracle.com>

We would like to move a "swap page" identified by swaptype/offset
out of frontswap and into swap cache.  Add read_frontswap_async
that, given an unused new page and a gfp_mask (for necessary radix
tree work), attempts to do that and communicates success, failure,
or the fact that a (possibly dirty) copy already exists in swap cache.
This new routine will be called from zcache (via frontswap) to
allow pages to be swapped to a true swap device when zcache gets "full".

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
---
 include/linux/swap.h |    1 +
 mm/swap_state.c      |   80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index d3c7281..8a59ddb 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -352,6 +352,7 @@ extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t);
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr);
+extern int read_frontswap_async(int, pgoff_t, struct page *, gfp_t);
 extern struct page *swapin_readahead(swp_entry_t, gfp_t,
 			struct vm_area_struct *vma, unsigned long addr);
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 0cb36fb..ad790bf 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -18,6 +18,7 @@
 #include <linux/pagevec.h>
 #include <linux/migrate.h>
 #include <linux/page_cgroup.h>
+#include <linux/frontswap.h>
 
 #include <asm/pgtable.h>
 
@@ -351,6 +352,85 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 	return found_page;
 }
 
+/*
+ * Similar to read_swap_cache_async except we know the page is in frontswap
+ * and we are trying to place it in swapcache so we can remove it from
+ * frontswap. Success means the data in frontswap can be thrown away,
+ * ENOMEM means it cannot, and -EEXIST means a (possibly dirty) copy
+ * already exists in the swapcache.
+ */
+int read_frontswap_async(int type, pgoff_t offset, struct page *new_page,
+				gfp_t gfp_mask)
+{
+	struct page *found_page;
+	swp_entry_t entry;
+	int ret = 0;
+
+	entry = swp_entry(type, offset);
+	do {
+		/*
+		 * First check the swap cache.  Since this is normally
+		 * called after lookup_swap_cache() failed, re-calling
+		 * that would confuse statistics.
+		 */
+		found_page = find_get_page(&swapper_space, entry.val);
+		if (found_page) {
+			/* its already in the swap cache */
+			ret = -EEXIST;
+			break;
+		}
+
+
+		/*
+		 * call radix_tree_preload() while we can wait.
+		 */
+		ret = radix_tree_preload(gfp_mask);
+		if (ret)
+			break;
+
+		/*
+		 * Swap entry may have been freed since our caller observed it.
+		 */
+		ret = swapcache_prepare(entry);
+		if (ret == -EEXIST) {	/* seems racy */
+			radix_tree_preload_end();
+			continue;
+		}
+		if (ret) {		/* swp entry is obsolete ? */
+			radix_tree_preload_end();
+			break;
+		}
+
+		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
+		__set_page_locked(new_page);
+		SetPageSwapBacked(new_page);
+		ret = __add_to_swap_cache(new_page, entry);
+		if (likely(!ret)) {
+			radix_tree_preload_end();
+			/* FIXME: how do I add this at tail of lru? */
+			SetPageDirty(new_page);
+			lru_cache_add_anon_tail(new_page);
+			/* Get page (from frontswap) and return */
+			if (frontswap_load(new_page) == 0)
+				SetPageUptodate(new_page);
+			unlock_page(new_page);
+			ret = 0;
+			goto out;
+		}
+		radix_tree_preload_end();
+		ClearPageSwapBacked(new_page);
+		__clear_page_locked(new_page);
+		/*
+		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
+		 * clear SWAP_HAS_CACHE flag.
+		 */
+		swapcache_free(entry, NULL);
+	} while (ret != -ENOMEM);
+
+out:
+	return ret;
+}
+
 /**
  * swapin_readahead - swap in pages in hope we need them soon
  * @entry: swap entry of this memory
-- 
1.7.1


^ permalink raw reply related

* [RFC/PATCH 0/3] swap/frontswap: allow movement of zcache pages to swap
From: Dan Magenheimer @ 2012-10-03 22:43 UTC (permalink / raw)
  To: linux-kernel, linux-mm, ngupta, konrad.wilk, sjenning, minchan,
	hughd, akpm, riel, hannes, dan.magenheimer, aarcange, mgorman,
	gregkh

INTRODUCTION

This is an initial patchset attempting to address Andrea Arcangeli's concern
expressed in https://lwn.net/Articles/516538/ "Moving zcache towards the
mainline".  It works, but not well, so feedback/help/expertise is definitely
needed!  Note that the zcache code that invokes this already exists in
zcache2, currently housed in Linus' 3.7-rc0 tree in drivers/staging/ramster,
so is not included in this patchset.  All relevant code in zcache is
conveniently marked with "#ifdef FRONTSWAP_UNUSE" which this patchset enables.

This patchset currently applies cleanly in Linus' tree following commit
33c2a174120b2c1baec9d1dac513f9d4b761b26a

[PROMOTION NOTE: Due to the great zcache compromise, the author won't take a
position on whether this "unuse" capability is a new zcache feature nor
whether this capability is necessary for production use of zcache.  This post
merely documents the motivation, mechanism, and policy of a possible solution
and provides a work-to-date patch, and requests discussion and feedback from
mm experts to ensure that, at some point in zcache's future, this important
capability can be completed and made available to zcache users.]

MOTIVATION

Several mm developers have noted that one of the key limiters to zcache
is that zcache may fill up with pages from frontswap and, once full,
there is no mechanism (or policy) to "evict" pages to make room for
newer pages.  This is problematic for some workloads, especially those
with a mix of long-lived-rarely-running threads (type A) such as system
services that run at boot and shutdown, and short-lived-always-running
threads (type B).  In a non-zcache system under memory pressure, type A
pages will be swapped to ensure that type B pages remain in RAM.  With
zcache, if even with compression there is insufficient RAM for both,
type A pages will fill zcache and type B pages will be swapped, resulting
in noticably reduced performance for type B threads.  Not good.

So it would be good if zcache had some mechanism for saying "oops, this
frontswap page saved in zcache should go out to the swap device after all."
We will call this "eviction".  (Note that zcache already supports eviction
of cleancache pages because these need only be "reclaimed"... the data
contained in cleancache pages can be discarded.  For the remainder of this
discussion, "eviction" implies a frontswap page.)

This begs the question: What page should zcache choose to evict?  While it
is impossible to predict what pages will be used when, the memory management
subsystem normally employs some form of LRU queue to minimize the probability
that an actively used page will be evicted.  Note, however, that evicting
one zcache page ("zpage") may or may not free up a pageframe as two (or,
in the case of the zsmalloc, more than two) zpages may reside in the same
pageframe.  Since freeing up a part of a pageframe has little value and,
indeed, even swapping a page to a swap device requires a full pageframe,
the zcache eviction policy must evict all zpages in a pageframe.  We will
call this "unuse".  For the remainder of this discussion, we will assume
a zcache pageframe contains exactly two zpages (though this may change in
the future).

MECHANISM

Should zcache write entire pageframes to the swap device, with multiple
zpages contained intact?  This might be possible, and could be explored,
but raises some interesting metadata challenges;  a fair amount of
additional RAM would be necessary to track these sets of zpages.  It might
also be easier if each swap disk had a "shadow": one swap device for
uncompressed pages, and a shadow device for compressed pages, else we
must take great care to avoid overwriting one with the other, which would
likely require some fairly invasive changes to the swap subsystem, which
already has a number of interesting coherency problems solved.  A shadow
swap disk would also require additional devices and/or disk space, which
may not be available, plus userland/sysadmin changes, which would be
difficult to mainstream.

So, we'd like an implementation that has a low requirement for in-RAM
metadata and has no requirement for additional swap device/space or
additional userland changes.

To achieve this, we will rely heavily on the existing swapcache.  When
zcache wishes to unuse a zcache pageframe, it first allocates two pageframes,
one for each zpage, and decompresses each zpage into a pageframe.  It then
frontswap to mark a new bit (in a one-bit-per-swap-page array) called a
"denial" bit.  Next it puts the uncompressed pageframe back into the swap
cache, at the least-recently-used end of the anonymous-inactive queue which,
presumably, makes it a candidate for immediate swapwrite.  At this point,
zcache is now free to release the pageframe that contained the two zpages.
Soon, memory pressure causes kswapd to select some pages to write
to swap.  As with all swap attempts, frontswap will first be called,
but since the denial bit is set, frontswap will reject the page
and the swap subsystem will write the page to the true swapdisk.

There are a number of housekeeping details, but that's the proposed
mechanism, implemented in this patchset, in a nutshell.

POLICY

There are some policy questions: How do we maintain an LRU queue
of zpages?  How do we know when zcache is "full"?  If zcache is
full, how do we ensure we can allocate two fresh pageframes for
decompression, and won't this potentially cause an OOM?  The
new zcache implementation (aka zcache2) attempts a trial policy
for each of these, but without a known working mechanism, the
policy may be irrelevant or wrong.  So let's focus first on
getting a working mechanism, OK?

CALL FOR INPUT/HELP

The patchset proposed does work and, in limited testing, does not OOM.
However, performance is much slower than expected and, with extensive
debug output, it appears that "immediate swapwrite" is not very immediate.
This may be related to the performance degradation.  Or there may be
a stupid bug lingering.  Or... the patchset may be completely braindead!
Any help or input in getting this working (or perhaps justifying why a
completely different mechanism might work better) would be appreciated!

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>

---
Diffstat:

 include/linux/frontswap.h |   57 ++++++++++++++++++++++++++++++++
 include/linux/swap.h      |   12 +++++++
 mm/frontswap.c            |   29 ++++++++++++++++
 mm/swap.c                 |   16 +++++++++
 mm/swap_state.c           |   80 +++++++++++++++++++++++++++++++++++++++++++++
 mm/swapfile.c             |   18 +++++++---
 6 files changed, 207 insertions(+), 5 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.