* Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Kirill A. Shutemov @ 2014-10-07 11:30 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Robert Love, Dave Hansen, Jan Kara, kvm, Neil Brown,
Stefan Hajnoczi, qemu-devel, linux-mm, KOSAKI Motohiro,
Michel Lespinasse, Andrea Arcangeli, Taras Glek, Juan Quintela,
Hugh Dickins, Isaku Yamahata, Mel Gorman, Sasha Levin,
Android Kernel Team, Andrew Jones, Huangpeng (Peter),
Andres Lagar-Cavilla, Christopher Covington, Anthony Liguori,
Paolo Bonzini, Keith Packard
In-Reply-To: <20141007110102.GJ2404@work-vm>
On Tue, Oct 07, 2014 at 12:01:02PM +0100, Dr. David Alan Gilbert wrote:
> * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > On Tue, Oct 07, 2014 at 11:46:04AM +0100, Dr. David Alan Gilbert wrote:
> > > * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > > > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > > > userland touches a still unmapped virtual address, a sigbus signal is
> > > > > sent instead of allocating a new page. The sigbus signal handler will
> > > > > then resolve the page fault in userland by calling the
> > > > > remap_anon_pages syscall.
> > > >
> > > > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > > > far as I understand it, it provides a way to give a *hint* to kernel which
> > > > may or may not trigger an action from kernel side. I don't think an
> > > > application will behaive reasonably if kernel ignore the *advise* and will
> > > > not send SIGBUS, but allocate memory.
> > >
> > > Aren't DONTNEED and DONTDUMP similar cases of madvise operations that are
> > > expected to do what they say ?
> >
> > No. If kernel would ignore MADV_DONTNEED or MADV_DONTDUMP it will not
> > affect correctness, just behaviour will be suboptimal: more than needed
> > memory used or wasted space in coredump.
>
> That's not how the manpage reads for DONTNEED; it calls it out as a special
> case near the top, and explicitly says what will happen if you read the
> area marked as DONTNEED.
Your are right. MADV_DONTNEED doesn't fit the interface too. That's bad
and we can't fix it. But it's not a reason to make this mistake again.
Read the next sentence: "The kernel is free to ignore the advice."
Note, POSIX_MADV_DONTNEED has totally different semantics.
> It looks like there are openssl patches that use DONTDUMP to explicitly
> make sure keys etc don't land in cores.
That's nice to have. But openssl works on systems without the interface,
meaning it's not essential for functionality.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Kirill A. Shutemov @ 2014-10-07 11:10 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api, Robert Love,
Dave Hansen, Jan Kara, Neil Brown, Stefan Hajnoczi, Andrew Jones,
KOSAKI Motohiro, Michel Lespinasse, Taras Glek, Juan Quintela,
Hugh Dickins, Isaku Yamahata, Mel Gorman, Sasha Levin,
Android Kernel Team, \"Dr. David Alan Gilbert\",
Huangpeng (Peter), Andres Lagar-Cavilla, Christopher Covington,
Anthony Liguori
In-Reply-To: <1412356087-16115-11-git-send-email-aarcange@redhat.com>
On Fri, Oct 03, 2014 at 07:08:00PM +0200, Andrea Arcangeli wrote:
> There's one constraint enforced to allow this simplification: the
> source pages passed to remap_anon_pages must be mapped only in one
> vma, but this is not a limitation when used to handle userland page
> faults with MADV_USERFAULT. The source addresses passed to
> remap_anon_pages should be set as VM_DONTCOPY with MADV_DONTFORK to
> avoid any risk of the mapcount of the pages increasing, if fork runs
> in parallel in another thread, before or while remap_anon_pages runs.
Have you considered triggering COW instead of adding limitation on
pages' mapcount? The limitation looks artificial from interface POV.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Dr. David Alan Gilbert @ 2014-10-07 11:01 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Robert Love, Dave Hansen, Jan Kara, kvm, Neil Brown,
Stefan Hajnoczi, qemu-devel, linux-mm, KOSAKI Motohiro,
Michel Lespinasse, Andrea Arcangeli, Taras Glek, Andrew Jones,
Juan Quintela, Hugh Dickins, Isaku Yamahata, Mel Gorman,
Sasha Levin, Android Kernel Team, Huangpeng (Peter),
Andres Lagar-Cavilla, Christopher Covington, Antho
In-Reply-To: <20141007105245.GC30762@node.dhcp.inet.fi>
* Kirill A. Shutemov (kirill@shutemov.name) wrote:
> On Tue, Oct 07, 2014 at 11:46:04AM +0100, Dr. David Alan Gilbert wrote:
> > * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > > userland touches a still unmapped virtual address, a sigbus signal is
> > > > sent instead of allocating a new page. The sigbus signal handler will
> > > > then resolve the page fault in userland by calling the
> > > > remap_anon_pages syscall.
> > >
> > > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > > far as I understand it, it provides a way to give a *hint* to kernel which
> > > may or may not trigger an action from kernel side. I don't think an
> > > application will behaive reasonably if kernel ignore the *advise* and will
> > > not send SIGBUS, but allocate memory.
> >
> > Aren't DONTNEED and DONTDUMP similar cases of madvise operations that are
> > expected to do what they say ?
>
> No. If kernel would ignore MADV_DONTNEED or MADV_DONTDUMP it will not
> affect correctness, just behaviour will be suboptimal: more than needed
> memory used or wasted space in coredump.
That's not how the manpage reads for DONTNEED; it calls it out as a special
case near the top, and explicitly says what will happen if you read the
area marked as DONTNEED.
It looks like there are openssl patches that use DONTDUMP to explicitly
make sure keys etc don't land in cores.
Dave
>
> --
> Kirill A. Shutemov
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Kirill A. Shutemov @ 2014-10-07 10:52 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Robert Love, Dave Hansen, Jan Kara, kvm, Neil Brown,
Stefan Hajnoczi, qemu-devel, linux-mm, KOSAKI Motohiro,
Michel Lespinasse, Andrea Arcangeli, Taras Glek, Andrew Jones,
Juan Quintela, Hugh Dickins, Isaku Yamahata, Mel Gorman,
Sasha Levin, Android Kernel Team, Huangpeng (Peter),
Andres Lagar-Cavilla, Christopher Covington, Anthony Liguori,
Mike Hommey, Keith Packard
In-Reply-To: <20141007104603.GI2404@work-vm>
On Tue, Oct 07, 2014 at 11:46:04AM +0100, Dr. David Alan Gilbert wrote:
> * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > userland touches a still unmapped virtual address, a sigbus signal is
> > > sent instead of allocating a new page. The sigbus signal handler will
> > > then resolve the page fault in userland by calling the
> > > remap_anon_pages syscall.
> >
> > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > far as I understand it, it provides a way to give a *hint* to kernel which
> > may or may not trigger an action from kernel side. I don't think an
> > application will behaive reasonably if kernel ignore the *advise* and will
> > not send SIGBUS, but allocate memory.
>
> Aren't DONTNEED and DONTDUMP similar cases of madvise operations that are
> expected to do what they say ?
No. If kernel would ignore MADV_DONTNEED or MADV_DONTDUMP it will not
affect correctness, just behaviour will be suboptimal: more than needed
memory used or wasted space in coredump.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Dr. David Alan Gilbert @ 2014-10-07 10:46 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrea Arcangeli, qemu-devel, kvm, linux-kernel, linux-mm,
linux-api, Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen,
Paolo Bonzini, Rik van Riel, Mel Gorman, Andy Lutomirski,
Andrew Morton, Sasha Levin, Hugh Dickins, Peter Feiner,
\"Dr. David Alan Gilbert\", Christopher Covington,
Johannes Weiner, Android Kernel Team, Robert Love
In-Reply-To: <20141007103645.GB30762@node.dhcp.inet.fi>
* Kirill A. Shutemov (kirill@shutemov.name) wrote:
> On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > userland touches a still unmapped virtual address, a sigbus signal is
> > sent instead of allocating a new page. The sigbus signal handler will
> > then resolve the page fault in userland by calling the
> > remap_anon_pages syscall.
>
> Hm. I wounder if this functionality really fits madvise(2) interface: as
> far as I understand it, it provides a way to give a *hint* to kernel which
> may or may not trigger an action from kernel side. I don't think an
> application will behaive reasonably if kernel ignore the *advise* and will
> not send SIGBUS, but allocate memory.
Aren't DONTNEED and DONTDUMP similar cases of madvise operations that are
expected to do what they say ?
> I would suggest to consider to use some other interface for the
> functionality: a new syscall or, perhaps, mprotect().
Dave
> --
> Kirill A. Shutemov
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Kirill A. Shutemov @ 2014-10-07 10:36 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api,
Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
Sasha Levin, Hugh Dickins, Peter Feiner,
\"Dr. David Alan Gilbert\", Christopher Covington,
Johannes Weiner, Android Kernel Team, Robert Love,
Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek
In-Reply-To: <1412356087-16115-9-git-send-email-aarcange@redhat.com>
On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> userland touches a still unmapped virtual address, a sigbus signal is
> sent instead of allocating a new page. The sigbus signal handler will
> then resolve the page fault in userland by calling the
> remap_anon_pages syscall.
Hm. I wounder if this functionality really fits madvise(2) interface: as
far as I understand it, it provides a way to give a *hint* to kernel which
may or may not trigger an action from kernel side. I don't think an
application will behaive reasonably if kernel ignore the *advise* and will
not send SIGBUS, but allocate memory.
I would suggest to consider to use some other interface for the
functionality: a new syscall or, perhaps, mprotect().
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 1/7] mfd: max77693: Add defines for MAX77693 charger driver
From: Lee Jones @ 2014-10-07 9:12 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Sebastian Reichel, Dmitry Eremin-Solenikov, David Woodhouse,
Samuel Ortiz, linux-kernel, linux-pm, Rob Herring, Pawel Moll,
Mark Rutland, Ian Campbell, Kumar Gala, devicetree, linux-api,
Ben Dooks, Kukjin Kim, Russell King, Javier Martinez Canillas,
linux-arm-kernel, linux-samsung-soc, Kyungmin Park,
Marek Szyprowski, Bartlomiej Zolnierkiewicz, Tomasz Figa
In-Reply-To: <1412252528-30148-2-git-send-email-k.kozlowski@samsung.com>
On Thu, 02 Oct 2014, Krzysztof Kozlowski wrote:
> Prepare for adding support for Maxim 77693 charger by adding necessary
> new defines and structure for device tree parsed data.
>
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> ---
> include/linux/mfd/max77693-private.h | 108 +++++++++++++++++++++++++++++++++++
> include/linux/mfd/max77693.h | 8 +++
> 2 files changed, 116 insertions(+)
Acked-by: Lee Jones <lee.jones@linaro.org>
> diff --git a/include/linux/mfd/max77693-private.h b/include/linux/mfd/max77693-private.h
> index fc17d56581b2..e1b2b61285b9 100644
> --- a/include/linux/mfd/max77693-private.h
> +++ b/include/linux/mfd/max77693-private.h
> @@ -144,10 +144,118 @@ enum max77693_pmic_reg {
> #define FLASH_INT_FLED1_SHORT BIT(3)
> #define FLASH_INT_OVER_CURRENT BIT(4)
>
> +/* Fast charge timer in in hours */
> +#define DEFAULT_FAST_CHARGE_TIMER 4
> +/* microamps */
> +#define DEFAULT_TOP_OFF_THRESHOLD_CURRENT 150000
> +/* minutes */
> +#define DEFAULT_TOP_OFF_TIMER 30
> +/* microvolts */
> +#define DEFAULT_CONSTANT_VOLT 4200000
> +/* microvolts */
> +#define DEFAULT_MIN_SYSTEM_VOLT 3600000
> +/* celsius */
> +#define DEFAULT_THERMAL_REGULATION_TEMP 100
> +/* microamps */
> +#define DEFAULT_BATTERY_OVERCURRENT 3500000
> +/* microvolts */
> +#define DEFAULT_CHARGER_INPUT_THRESHOLD_VOLT 4300000
> +
> +/* MAX77693_CHG_REG_CHG_INT_OK register */
> +#define CHG_INT_OK_BYP_SHIFT 0
> +#define CHG_INT_OK_BAT_SHIFT 3
> +#define CHG_INT_OK_CHG_SHIFT 4
> +#define CHG_INT_OK_CHGIN_SHIFT 6
> +#define CHG_INT_OK_DETBAT_SHIFT 7
> +#define CHG_INT_OK_BYP_MASK BIT(CHG_INT_OK_BYP_SHIFT)
> +#define CHG_INT_OK_BAT_MASK BIT(CHG_INT_OK_BAT_SHIFT)
> +#define CHG_INT_OK_CHG_MASK BIT(CHG_INT_OK_CHG_SHIFT)
> +#define CHG_INT_OK_CHGIN_MASK BIT(CHG_INT_OK_CHGIN_SHIFT)
> +#define CHG_INT_OK_DETBAT_MASK BIT(CHG_INT_OK_DETBAT_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_DETAILS_00 register */
> +#define CHG_DETAILS_00_CHGIN_SHIFT 5
> +#define CHG_DETAILS_00_CHGIN_MASK (0x3 << CHG_DETAILS_00_CHGIN_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_DETAILS_01 register */
> +#define CHG_DETAILS_01_CHG_SHIFT 0
> +#define CHG_DETAILS_01_BAT_SHIFT 4
> +#define CHG_DETAILS_01_TREG_SHIFT 7
> +#define CHG_DETAILS_01_CHG_MASK (0xf << CHG_DETAILS_01_CHG_SHIFT)
> +#define CHG_DETAILS_01_BAT_MASK (0x7 << CHG_DETAILS_01_BAT_SHIFT)
> +#define CHG_DETAILS_01_TREG_MASK BIT(7)
> +
> +/* MAX77693_CHG_REG_CHG_DETAILS_01/CHG field */
> +enum max77693_charger_charging_state {
> + MAX77693_CHARGING_PREQUALIFICATION = 0x0,
> + MAX77693_CHARGING_FAST_CONST_CURRENT,
> + MAX77693_CHARGING_FAST_CONST_VOLTAGE,
> + MAX77693_CHARGING_TOP_OFF,
> + MAX77693_CHARGING_DONE,
> + MAX77693_CHARGING_HIGH_TEMP,
> + MAX77693_CHARGING_TIMER_EXPIRED,
> + MAX77693_CHARGING_THERMISTOR_SUSPEND,
> + MAX77693_CHARGING_OFF,
> + MAX77693_CHARGING_RESERVED,
> + MAX77693_CHARGING_OVER_TEMP,
> + MAX77693_CHARGING_WATCHDOG_EXPIRED,
> +};
> +
> +/* MAX77693_CHG_REG_CHG_DETAILS_01/BAT field */
> +enum max77693_charger_battery_state {
> + MAX77693_BATTERY_NOBAT = 0x0,
> + /* Dead-battery or low-battery prequalification */
> + MAX77693_BATTERY_PREQUALIFICATION,
> + MAX77693_BATTERY_TIMER_EXPIRED,
> + MAX77693_BATTERY_GOOD,
> + MAX77693_BATTERY_LOWVOLTAGE,
> + MAX77693_BATTERY_OVERVOLTAGE,
> + MAX77693_BATTERY_OVERCURRENT,
> + MAX77693_BATTERY_RESERVED,
> +};
> +
> +/* MAX77693_CHG_REG_CHG_DETAILS_02 register */
> +#define CHG_DETAILS_02_BYP_SHIFT 0
> +#define CHG_DETAILS_02_BYP_MASK (0xf << CHG_DETAILS_02_BYP_SHIFT)
> +
> /* MAX77693 CHG_CNFG_00 register */
> #define CHG_CNFG_00_CHG_MASK 0x1
> #define CHG_CNFG_00_BUCK_MASK 0x4
>
> +/* MAX77693_CHG_REG_CHG_CNFG_01 register */
> +#define CHG_CNFG_01_FCHGTIME_SHIFT 0
> +#define CHG_CNFG_01_CHGRSTRT_SHIFT 4
> +#define CHG_CNFG_01_PQEN_SHIFT 7
> +#define CHG_CNFG_01_FCHGTIME_MASK (0x7 << CHG_CNFG_01_FCHGTIME_SHIFT)
> +#define CHG_CNFG_01_CHGRSTRT_MASK (0x3 << CHG_CNFG_01_CHGRSTRT_SHIFT)
> +#define CHG_CNFG_01_PQEN_MAKS BIT(CHG_CNFG_01_PQEN_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_CNFG_03 register */
> +#define CHG_CNFG_03_TOITH_SHIFT 0
> +#define CHG_CNFG_03_TOTIME_SHIFT 3
> +#define CHG_CNFG_03_TOITH_MASK (0x7 << CHG_CNFG_03_TOITH_SHIFT)
> +#define CHG_CNFG_03_TOTIME_MASK (0x7 << CHG_CNFG_03_TOTIME_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_CNFG_04 register */
> +#define CHG_CNFG_04_CHGCVPRM_SHIFT 0
> +#define CHG_CNFG_04_MINVSYS_SHIFT 5
> +#define CHG_CNFG_04_CHGCVPRM_MASK (0x1f << CHG_CNFG_04_CHGCVPRM_SHIFT)
> +#define CHG_CNFG_04_MINVSYS_MASK (0x7 << CHG_CNFG_04_MINVSYS_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_CNFG_06 register */
> +#define CHG_CNFG_06_CHGPROT_SHIFT 2
> +#define CHG_CNFG_06_CHGPROT_MASK (0x3 << CHG_CNFG_06_CHGPROT_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_CNFG_07 register */
> +#define CHG_CNFG_07_REGTEMP_SHIFT 5
> +#define CHG_CNFG_07_REGTEMP_MASK (0x3 << CHG_CNFG_07_REGTEMP_SHIFT)
> +
> +/* MAX77693_CHG_REG_CHG_CNFG_12 register */
> +#define CHG_CNFG_12_B2SOVRC_SHIFT 0
> +#define CHG_CNFG_12_VCHGINREG_SHIFT 3
> +#define CHG_CNFG_12_B2SOVRC_MASK (0x7 << CHG_CNFG_12_B2SOVRC_SHIFT)
> +#define CHG_CNFG_12_VCHGINREG_MASK (0x3 << CHG_CNFG_12_VCHGINREG_SHIFT)
> +
> /* MAX77693 CHG_CNFG_09 Register */
> #define CHG_CNFG_09_CHGIN_ILIM_MASK 0x7F
>
> diff --git a/include/linux/mfd/max77693.h b/include/linux/mfd/max77693.h
> index f0b6585cd874..88ef24b28294 100644
> --- a/include/linux/mfd/max77693.h
> +++ b/include/linux/mfd/max77693.h
> @@ -63,6 +63,14 @@ struct max77693_muic_platform_data {
> int path_uart;
> };
>
> +struct max77693_charger_platform_data {
> + u32 constant_volt;
> + u32 min_system_volt;
> + u32 thermal_regulation_temp;
> + u32 batttery_overcurrent;
> + u32 charge_input_threshold_volt;
> +};
> +
> /* MAX77693 led flash */
>
> /* triggers */
--
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
^ permalink raw reply
* Re: [PATCH 07/17] mm: madvise MADV_USERFAULT: prepare vm_flags to allow more than 32bits
From: Kirill A. Shutemov @ 2014-10-07 9:03 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api,
Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
Sasha Levin, Hugh Dickins, Peter Feiner,
\"Dr. David Alan Gilbert\", Christopher Covington,
Johannes Weiner, Android Kernel Team, Robert Love,
Dmitry Adamushko, Neil Brown, Mike Hommey, Taras Glek
In-Reply-To: <1412356087-16115-8-git-send-email-aarcange@redhat.com>
On Fri, Oct 03, 2014 at 07:07:57PM +0200, Andrea Arcangeli wrote:
> We run out of 32bits in vm_flags, noop change for 64bit archs.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> fs/proc/task_mmu.c | 4 ++--
> include/linux/huge_mm.h | 4 ++--
> include/linux/ksm.h | 4 ++--
> include/linux/mm_types.h | 2 +-
> mm/huge_memory.c | 2 +-
> mm/ksm.c | 2 +-
> mm/madvise.c | 2 +-
> mm/mremap.c | 2 +-
> 8 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index c341568..ee1c3a2 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -532,11 +532,11 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> /*
> * Don't forget to update Documentation/ on changes.
> */
> - static const char mnemonics[BITS_PER_LONG][2] = {
> + static const char mnemonics[BITS_PER_LONG+1][2] = {
I believe here and below should be BITS_PER_LONG_LONG instead: it will
catch unknown vmflags. And +1 is not needed un 64-bit systems.
> /*
> * In case if we meet a flag we don't know about.
> */
> - [0 ... (BITS_PER_LONG-1)] = "??",
> + [0 ... (BITS_PER_LONG)] = "??",
>
> [ilog2(VM_READ)] = "rd",
> [ilog2(VM_WRITE)] = "wr",
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH v9 1/3] drm: rockchip: Add basic drm driver
From: Andrzej Hajda @ 2014-10-07 6:13 UTC (permalink / raw)
To: Mark yao, heiko, Boris BREZILLON, David Airlie, Rob Clark,
Daniel Vetter, Rob Herring, Pawel Moll, Mark Rutland,
Ian Campbell, Kumar Gala, Randy Dunlap, Grant Likely,
Greg Kroah-Hartman, John Stultz, Rom Lemarchand
Cc: devicetree, linux-doc, linux-kernel, dri-devel, linux-api,
linux-rockchip, dianders, marcheu, dbehr, olof, djkurtz, xjq, kfx,
cym, cf, zyw, xxm, huangtao, kever.yang, yxj, wxt, xw
In-Reply-To: <54336474.1010503@rock-chips.com>
On 10/07/2014 05:56 AM, Mark yao wrote:
> On 2014年09月30日 21:31, Andrzej Hajda wrote:
>> Hi Mark,
> Hi Andrzej,
> Sorry for replying late, I have a vacation before.
> Thanks for your review.
>> On 09/30/2014 03:03 PM, Mark Yao wrote:
>>> From: Mark yao <mark.yao@rock-chips.com>
>>>
(...)
>>> +#ifdef CONFIG_PM_SLEEP
>>> +static int rockchip_drm_suspend(struct drm_device *dev, pm_message_t state)
>>> +{
>>> + struct drm_connector *connector;
>>> +
>>> + drm_modeset_lock_all(dev);
>>> + list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
>>> + int old_dpms = connector->dpms;
>>> +
>>> + if (connector->funcs->dpms)
>>> + connector->funcs->dpms(connector, DRM_MODE_DPMS_OFF);
>>> +
>>> + /* Set the old mode back to the connector for resume */
>>> + connector->dpms = old_dpms;
>>> + }
>>> + drm_modeset_unlock_all(dev);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int rockchip_drm_resume(struct drm_device *dev)
>>> +{
>>> + struct drm_connector *connector;
>>> +
>>> + drm_modeset_lock_all(dev);
>>> + list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
>>> + if (connector->funcs->dpms)
>>> + connector->funcs->dpms(connector, connector->dpms);
>>> + }
>>> + drm_modeset_unlock_all(dev);
>>> +
>>> + drm_helper_resume_force_mode(dev);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int rockchip_drm_sys_suspend(struct device *dev)
>>> +{
>>> + struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> + pm_message_t message;
>>> +
>>> + if (pm_runtime_suspended(dev))
>>> + return 0;
>>> +
>>> + message.event = PM_EVENT_SUSPEND;
>>> +
>>> + return rockchip_drm_suspend(drm_dev, message);
>> drm_dev can be NULL here, it can happen when system is suspended
>> before all components are bound. It can also contain invalid pointer
>> if after successfull drm initialization de-initialization happens for
>> some reason.
>>
>> Some workaround is to check for null here and set drvdata to null on
>> master unbind. But I guess it should be protected somehow to avoid races
>> in accessing drvdata.
> So, can I use the way that check for null here and set drvdata to null
> on master unbind?
> I don't know which way is better to protect somehow.
It seems to be a core problem, I have proposed some solution using drm
driver PM callbacks [1]
but it appears these callbacks are obsolete, so different solution
should be found. According to
Russel probably some extension of component framework.
As a temporary solution I guess null checks should work in most cases.
Regards
Andrzej
[1]: https://lkml.org/lkml/2014/10/3/60
>
> -Mark.
>>> +}
>>> +
>>> +static int rockchip_drm_sys_resume(struct device *dev)
>>> +{
>>> + struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> +
>>> + if (!pm_runtime_suspended(dev))
>>> + return 0;
>>> +
>>> + return rockchip_drm_resume(drm_dev);
>> Ditto.
>>
>> Regards
>> Andrzej
>>
>>
^ permalink raw reply
* Re: [PATCH v9 1/3] drm: rockchip: Add basic drm driver
From: Mark yao @ 2014-10-07 3:56 UTC (permalink / raw)
To: Andrzej Hajda, heiko, Boris BREZILLON, David Airlie, Rob Clark,
Daniel Vetter, Rob Herring, Pawel Moll, Mark Rutland,
Ian Campbell, Kumar Gala, Randy Dunlap, Grant Likely,
Greg Kroah-Hartman, John Stultz, Rom Lemarchand
Cc: linux-doc, kever.yang, dri-devel, dianders, xjq, zyw, cym,
linux-rockchip, kfx, wxt, huangtao, devicetree, yxj, marcheu, xxm,
xw, linux-api, linux-kernel, cf
In-Reply-To: <542AB0B2.6000207@samsung.com>
[-- Attachment #1.1: Type: text/plain, Size: 16220 bytes --]
On 2014?09?30? 21:31, Andrzej Hajda wrote:
> Hi Mark,
Hi Andrzej,
Sorry for replying late, I have a vacation before.
Thanks for your review.
> On 09/30/2014 03:03 PM, Mark Yao wrote:
>> From: Mark yao <mark.yao@rock-chips.com>
>>
>> This patch adds the basic structure of a DRM Driver for Rockchip Socs.
>>
>> Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
>> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
>> Acked-by: Daniel Vetter <daniel@ffwll.ch>
>> Reviewed-by: Rob Clark <robdclark@gmail.com>
>> ---
>> Changes in v2:
>> - use the component framework to defer main drm driver probe
>> until all VOP devices have been probed.
>> - use dma-mapping API with ARM_DMA_USE_IOMMU, create dma mapping by
>> master device and each vop device can shared the drm dma mapping.
>> - use drm_crtc_init_with_planes and drm_universal_plane_init.
>> - remove unnecessary middle layers.
>> - add cursor set, move funcs to rockchip drm crtc.
>> - use vop reset at first init
>> - reference framebuffer when used and unreference when swap out vop
>>
>> Changes in v3:
>> - change "crtc->fb" to "crtc->primary-fb"
>> Adviced by Daniel Vetter
>> - init cursor plane with universal api, remove unnecessary cursor set,move
>>
>> Changes in v4:
>> Adviced by David Herrmann
>> - remove drm_platform_*() usage, use register drm device directly.
>> Adviced by Rob Clark
>> - remove special mmap ioctl, do userspace mmap with normal mmap() or mmap offset
>>
>> Changes in v5:
>> Adviced by Arnd Bergmann
>> - doing DMA start with a 32-bit masks with dma_mask and dma_coherent_mark
>> - fix some incorrect dependencies.
>> Adviced by Boris BREZILLON
>> - fix some mistake and bugs.
>> Adviced by Daniel Vetter
>> - drop all special ioctl and use generic kms ioctl instead.
>> Adviced by Rob Clark
>> - use unlocked api for drm_fb_helper_restore_fbdev_mode.
>> - remove unused rockchip_gem_prime_import_sg_table.
>>
>> Changes in v6:
>> - set gem buffer pitch 64 bytes align, needed by mali gpu.
>> Adviced by Daniel Kurtz
>> - fix some mistake, bugs, remove unused define, more better code style etc.
>> - use clk_prepare()/unprepare() at probe()/remove() and clk_enable()/disable()
>> at runtime instead of clk_prepare_enable().
>> - provide a help function from vop for encoder to do mode config, instead of
>> using drm_diaplay_mode private method.
>> - change vop mode_set timing to make it more safely.
>>
>> Changes in v7:
>> - fix memory leakage problem
>>
>> Changes in v8:
>> - fix iommu crash when use dual crtc.
>> - use frame start interrupt for vsync instead of line flag interrupt,
>> because the win config take affect at frame start time, if we use ling flag
>> interrupt, the address check often failed.
>> Adviced by Daniel Kurtz
>> - fix some bugs, mistake, remove unused function
>> - keep clock and vop disabled when probe end
>> - use drm_plane_helper_check_update to check update_plane if vaild
>>
>> Changes in v9:
>> - fix suspend and resume bug, make iommu attach and detach safely.
>>
>> drivers/gpu/drm/Kconfig | 2 +
>> drivers/gpu/drm/Makefile | 1 +
>> drivers/gpu/drm/rockchip/Kconfig | 17 +
>> drivers/gpu/drm/rockchip/Makefile | 8 +
>> drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 509 +++++++++
>> drivers/gpu/drm/rockchip/rockchip_drm_drv.h | 65 ++
>> drivers/gpu/drm/rockchip/rockchip_drm_fb.c | 200 ++++
>> drivers/gpu/drm/rockchip/rockchip_drm_fb.h | 28 +
>> drivers/gpu/drm/rockchip/rockchip_drm_fbdev.c | 209 ++++
>> drivers/gpu/drm/rockchip/rockchip_drm_fbdev.h | 20 +
>> drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 294 +++++
>> drivers/gpu/drm/rockchip/rockchip_drm_gem.h | 54 +
>> drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 1423 +++++++++++++++++++++++++
>> drivers/gpu/drm/rockchip/rockchip_drm_vop.h | 196 ++++
>> 14 files changed, 3026 insertions(+)
>> create mode 100644 drivers/gpu/drm/rockchip/Kconfig
>> create mode 100644 drivers/gpu/drm/rockchip/Makefile
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_drv.c
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_drv.h
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_fb.c
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_fb.h
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_fbdev.c
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_fbdev.h
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_gem.c
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_gem.h
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_vop.c
>> create mode 100644 drivers/gpu/drm/rockchip/rockchip_drm_vop.h
>>
>> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index b066bb3..7c4c3c6 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -171,6 +171,8 @@ config DRM_SAVAGE
>>
>> source "drivers/gpu/drm/exynos/Kconfig"
>>
>> +source "drivers/gpu/drm/rockchip/Kconfig"
>> +
>> source "drivers/gpu/drm/vmwgfx/Kconfig"
>>
>> source "drivers/gpu/drm/gma500/Kconfig"
>> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
>> index 4a55d59..d03387a 100644
>> --- a/drivers/gpu/drm/Makefile
>> +++ b/drivers/gpu/drm/Makefile
>> @@ -52,6 +52,7 @@ obj-$(CONFIG_DRM_VMWGFX)+= vmwgfx/
>> obj-$(CONFIG_DRM_VIA) +=via/
>> obj-$(CONFIG_DRM_NOUVEAU) +=nouveau/
>> obj-$(CONFIG_DRM_EXYNOS) +=exynos/
>> +obj-$(CONFIG_DRM_ROCKCHIP) +=rockchip/
>> obj-$(CONFIG_DRM_GMA500) += gma500/
>> obj-$(CONFIG_DRM_UDL) += udl/
>> obj-$(CONFIG_DRM_AST) += ast/
>> diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
>> new file mode 100644
>> index 0000000..87255f7
>> --- /dev/null
>> +++ b/drivers/gpu/drm/rockchip/Kconfig
>> @@ -0,0 +1,17 @@
>> +config DRM_ROCKCHIP
>> + tristate "DRM Support for Rockchip"
>> + depends on DRM && ROCKCHIP_IOMMU && ARM_DMA_USE_IOMMU && IOMMU_API
>> + select DRM_KMS_HELPER
>> + select DRM_KMS_FB_HELPER
>> + select DRM_PANEL
>> + select FB_CFB_FILLRECT
>> + select FB_CFB_COPYAREA
>> + select FB_CFB_IMAGEBLIT
>> + select VT_HW_CONSOLE_BINDING if FRAMEBUFFER_CONSOLE
>> + select VIDEOMODE_HELPERS
>> + help
>> + Choose this option if you have a Rockchip soc chipset.
>> + This driver provides kernel mode setting and buffer
>> + management to userspace. This driver does not provides
>> + 2D or 3D acceleration; acceleration is performed by other
>> + IP found on the SoC.
>> diff --git a/drivers/gpu/drm/rockchip/Makefile b/drivers/gpu/drm/rockchip/Makefile
>> new file mode 100644
>> index 0000000..b3a5193
>> --- /dev/null
>> +++ b/drivers/gpu/drm/rockchip/Makefile
>> @@ -0,0 +1,8 @@
>> +#
>> +# Makefile for the drm device driver. This driver provides support for the
>> +# Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
>> +
>> +rockchipdrm-y := rockchip_drm_drv.o rockchip_drm_fb.o rockchip_drm_fbdev.o \
>> + rockchip_drm_gem.o rockchip_drm_vop.o
>> +
>> +obj-$(CONFIG_DRM_ROCKCHIP) += rockchipdrm.o
>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
>> new file mode 100644
>> index 0000000..879b2e0
>> --- /dev/null
>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
>> @@ -0,0 +1,509 @@
>> +/*
>> + * Copyright (C) Fuzhou Rockchip Electronics Co.Ltd
>> + * Author:Mark Yao <mark.yao@rock-chips.com>
>> + *
>> + * based on exynos_drm_drv.c
>> + *
>> + * This software is licensed under the terms of the GNU General Public
>> + * License version 2, as published by the Free Software Foundation, and
>> + * may be copied, distributed, and modified under those terms.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + */
>> +
>> +#include <asm/dma-iommu.h>
>> +
>> +#include <drm/drmP.h>
>> +#include <drm/drm_crtc_helper.h>
>> +#include <drm/drm_fb_helper.h>
>> +#include <linux/dma-mapping.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/of_graph.h>
>> +#include <linux/component.h>
>> +
>> +#include "rockchip_drm_drv.h"
>> +#include "rockchip_drm_fb.h"
>> +#include "rockchip_drm_fbdev.h"
>> +#include "rockchip_drm_gem.h"
>> +
>> +#define DRIVER_NAME "rockchip"
>> +#define DRIVER_DESC "RockChip Soc DRM"
>> +#define DRIVER_DATE "20140818"
>> +#define DRIVER_MAJOR 1
>> +#define DRIVER_MINOR 0
>> +
>> +/*
>> + * Attach a (component) device to the shared drm dma mapping from master drm
>> + * device. This is used by the VOPs to map GEM buffers to a common DMA
>> + * mapping.
>> + */
>> +int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
>> + struct device *dev)
>> +{
>> + struct dma_iommu_mapping *mapping = drm_dev->dev->archdata.mapping;
>> + int ret;
>> +
>> + ret = dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
>> + if (ret)
>> + return ret;
>> +
>> + dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
>> +
>> + return arm_iommu_attach_device(dev, mapping);
>> +}
>> +
>> +void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
>> + struct device *dev)
>> +{
>> + arm_iommu_detach_device(dev);
>> +}
>> +
>> +static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
>> +{
>> + struct rockchip_drm_private *private;
>> + struct dma_iommu_mapping *mapping;
>> + struct device *dev = drm_dev->dev;
>> + int ret;
>> +
>> + private = devm_kzalloc(drm_dev->dev, sizeof(*private), GFP_KERNEL);
>> + if (!private)
>> + return -ENOMEM;
>> +
>> + drm_dev->dev_private = private;
>> +
>> + drm_mode_config_init(drm_dev);
>> +
>> + rockchip_drm_mode_config_init(drm_dev);
>> +
>> + dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms),
>> + GFP_KERNEL);
>> + if (!dev->dma_parms) {
>> + ret = -ENOMEM;
>> + goto err_config_cleanup;
>> + }
>> +
>> + /* TODO(djkurtz): fetch the mapping start/size from somewhere */
>> + mapping = arm_iommu_create_mapping(&platform_bus_type, 0x00000000,
>> + SZ_2G);
>> + if (IS_ERR(mapping)) {
>> + ret = PTR_ERR(mapping);
>> + goto err_config_cleanup;
>> + }
>> +
>> + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>> + if (ret)
>> + goto err_release_mapping;
>> +
>> + dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
>> +
>> + ret = arm_iommu_attach_device(dev, mapping);
>> + if (ret)
>> + goto err_release_mapping;
>> +
>> + /* Try to bind all sub drivers. */
>> + ret = component_bind_all(dev, drm_dev);
>> + if (ret)
>> + goto err_detach_device;
>> +
>> + /* init kms poll for handling hpd */
>> + drm_kms_helper_poll_init(drm_dev);
>> +
>> + /*
>> + * enable drm irq mode.
>> + * - with irq_enabled = true, we can use the vblank feature.
>> + */
>> + drm_dev->irq_enabled = true;
>> +
>> + /*
>> + * with vblank_disable_allowed = true, vblank interrupt will be disabled
>> + * by drm timer once a current process gives up ownership of
>> + * vblank event.(after drm_vblank_put function is called)
>> + */
>> + drm_dev->vblank_disable_allowed = true;
>> +
>> + ret = drm_vblank_init(drm_dev, ROCKCHIP_MAX_CRTC);
>> + if (ret)
>> + goto err_kms_helper_poll_fini;
>> +
>> + rockchip_drm_fbdev_init(drm_dev);
>> +
>> + /* force connectors detection */
>> + drm_helper_hpd_irq_event(drm_dev);
>> +
>> + return 0;
>> +
>> +err_kms_helper_poll_fini:
>> + drm_kms_helper_poll_fini(drm_dev);
>> + component_unbind_all(dev, drm_dev);
>> +err_detach_device:
>> + arm_iommu_detach_device(dev);
>> +err_release_mapping:
>> + arm_iommu_release_mapping(dev->archdata.mapping);
>> +err_config_cleanup:
>> + drm_mode_config_cleanup(drm_dev);
>> + drm_dev->dev_private = NULL;
>> + return ret;
>> +}
>> +
>> +static int rockchip_drm_unload(struct drm_device *drm_dev)
>> +{
>> + struct device *dev = drm_dev->dev;
>> +
>> + drm_kms_helper_poll_fini(drm_dev);
>> + component_unbind_all(dev, drm_dev);
>> + arm_iommu_detach_device(dev);
>> + arm_iommu_release_mapping(dev->archdata.mapping);
>> + drm_mode_config_cleanup(drm_dev);
>> + drm_dev->dev_private = NULL;
>> +
>> + return 0;
>> +}
>> +
>> +void rockchip_drm_lastclose(struct drm_device *dev)
>> +{
>> + struct rockchip_drm_private *priv = dev->dev_private;
>> +
>> + drm_fb_helper_restore_fbdev_mode_unlocked(&priv->fbdev_helper);
>> +}
>> +
>> +static const struct file_operations rockchip_drm_driver_fops = {
>> + .owner = THIS_MODULE,
>> + .open = drm_open,
>> + .mmap = rockchip_gem_mmap,
>> + .poll = drm_poll,
>> + .read = drm_read,
>> + .unlocked_ioctl = drm_ioctl,
>> +#ifdef CONFIG_COMPAT
>> + .compat_ioctl = drm_compat_ioctl,
>> +#endif
>> + .release = drm_release,
>> +};
>> +
>> +const struct vm_operations_struct rockchip_drm_vm_ops = {
>> + .open = drm_gem_vm_open,
>> + .close = drm_gem_vm_close,
>> +};
>> +
>> +static struct drm_driver rockchip_drm_driver = {
>> + .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_PRIME,
>> + .load = rockchip_drm_load,
>> + .unload = rockchip_drm_unload,
>> + .lastclose = rockchip_drm_lastclose,
>> + .get_vblank_counter = drm_vblank_count,
>> + .enable_vblank = rockchip_drm_crtc_enable_vblank,
>> + .disable_vblank = rockchip_drm_crtc_disable_vblank,
>> + .gem_vm_ops = &rockchip_drm_vm_ops,
>> + .gem_free_object = rockchip_gem_free_object,
>> + .dumb_create = rockchip_gem_dumb_create,
>> + .dumb_map_offset = rockchip_gem_dumb_map_offset,
>> + .dumb_destroy = drm_gem_dumb_destroy,
>> + .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>> + .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>> + .gem_prime_import = drm_gem_prime_import,
>> + .gem_prime_export = drm_gem_prime_export,
>> + .gem_prime_get_sg_table = rockchip_gem_prime_get_sg_table,
>> + .gem_prime_vmap = rockchip_gem_prime_vmap,
>> + .gem_prime_vunmap = rockchip_gem_prime_vunmap,
>> + .fops = &rockchip_drm_driver_fops,
>> + .name = DRIVER_NAME,
>> + .desc = DRIVER_DESC,
>> + .date = DRIVER_DATE,
>> + .major = DRIVER_MAJOR,
>> + .minor = DRIVER_MINOR,
>> +};
>> +
>> +#ifdef CONFIG_PM_SLEEP
>> +static int rockchip_drm_suspend(struct drm_device *dev, pm_message_t state)
>> +{
>> + struct drm_connector *connector;
>> +
>> + drm_modeset_lock_all(dev);
>> + list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
>> + int old_dpms = connector->dpms;
>> +
>> + if (connector->funcs->dpms)
>> + connector->funcs->dpms(connector, DRM_MODE_DPMS_OFF);
>> +
>> + /* Set the old mode back to the connector for resume */
>> + connector->dpms = old_dpms;
>> + }
>> + drm_modeset_unlock_all(dev);
>> +
>> + return 0;
>> +}
>> +
>> +static int rockchip_drm_resume(struct drm_device *dev)
>> +{
>> + struct drm_connector *connector;
>> +
>> + drm_modeset_lock_all(dev);
>> + list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
>> + if (connector->funcs->dpms)
>> + connector->funcs->dpms(connector, connector->dpms);
>> + }
>> + drm_modeset_unlock_all(dev);
>> +
>> + drm_helper_resume_force_mode(dev);
>> +
>> + return 0;
>> +}
>> +
>> +static int rockchip_drm_sys_suspend(struct device *dev)
>> +{
>> + struct drm_device *drm_dev = dev_get_drvdata(dev);
>> + pm_message_t message;
>> +
>> + if (pm_runtime_suspended(dev))
>> + return 0;
>> +
>> + message.event = PM_EVENT_SUSPEND;
>> +
>> + return rockchip_drm_suspend(drm_dev, message);
> drm_dev can be NULL here, it can happen when system is suspended
> before all components are bound. It can also contain invalid pointer
> if after successfull drm initialization de-initialization happens for
> some reason.
>
> Some workaround is to check for null here and set drvdata to null on
> master unbind. But I guess it should be protected somehow to avoid races
> in accessing drvdata.
So, can I use the way that check for null here and set drvdata to null
on master unbind?
I don't know which way is better to protect somehow.
-Mark.
>
>> +}
>> +
>> +static int rockchip_drm_sys_resume(struct device *dev)
>> +{
>> + struct drm_device *drm_dev = dev_get_drvdata(dev);
>> +
>> + if (!pm_runtime_suspended(dev))
>> + return 0;
>> +
>> + return rockchip_drm_resume(drm_dev);
> Ditto.
>
> Regards
> Andrzej
>
>
>
>
>
[-- Attachment #1.2: Type: text/html, Size: 18015 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* Re: [PATCH RFC] introduce ioctl to completely invalidate page cache
From: Dave Chinner @ 2014-10-07 1:30 UTC (permalink / raw)
To: Jan Kara
Cc: Thanos Makatos, Jens Axboe, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
jlayton@poochiereds.net, bfields@fieldses.org
In-Reply-To: <20141006143019.GG7526@quack.suse.cz>
On Mon, Oct 06, 2014 at 04:30:19PM +0200, Jan Kara wrote:
> On Mon 06-10-14 11:33:23, Thanos Makatos wrote:
> > > > Trond also had a comment that if we extended the ioctl to work for all
> > > > inodes (not just blkdev) and allowed some additional flags of what
> > > > needs to be invalidated, the new ioctl would be also useful to NFS
> > > > userspace - see Trond's email at
> > > >
> > > > http://www.spinics.net/lists/linux-fsdevel/msg78917.html
> > > >
> > > > and the following thread. I would prefer to cover that usecase when we
> > > > are introducing new invalidation ioctl. Have you considered that Thanos?
> > >
> > > Sure, though I don't really know how to do it. I'll start by looking at the code
> > > flow when someone does " echo 3 > /proc/sys/vm/drop_caches", unless you
> > > already have a rough idea how to do that.
> >
> > I realise I haven't clearly understood what the semantics of this new ioctl
> > should be.
> >
> > My initial goal was to implement an ioctl that would _completely_ invalidate
> > the buffer cache of a block device when there is no file-system involved.
> > Unless I'm mistaken the patch I posted achieves this goal.
> Yes.
>
> > We now want to extend this patch to take care of cached metadata, which seems
> > to be of particular importance for NFS, and I suspect that this piece of
> > functionality will still be applicable to any kind of file-system, correct?
> So most notably they want the ioctl to work not only for block devices
> but also for any regular file. That's easily doable - you just call
> filemap_write_and_wait() and invalidate_inode_pages2() in the ioctl handler
> for regular files.
>
> Also they wanted to be able to specify a range of a mapping to invalidate -
> that's easily doable as well. Finally they wanted a 'flags' argument so you
> can additionally ask fs to invalidate also some metadata. How invalidation
> is done will be a fs specific thing and for now I guess we don't need to go
> into details. NFS guys can sort that out when they decide to implement it.
> So in the beginning we can just have u64 flags argument and in
> it a single 'INVAL_DATA' flag meaning that invalidation of data in a given
> range is requested. Later NFS guys can add further flags.
Why do we need a new ioctl to do this? fadvise64() seems like it's
the exact fit for "FADV_INVALIDATE_[META]DATA" flags...
And before anyone shouts "posix_fadvise sucks!" note that I'm
talking about adding flags to the syscall that the kernel defines,
not the glibc posix wrapper....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* Re: [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Andrea Arcangeli @ 2014-10-06 17:24 UTC (permalink / raw)
To: Mike Hommey
Cc: qemu-devel, kvm, linux-kernel, linux-mm, linux-api,
Linus Torvalds, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
Sasha Levin, Hugh Dickins, Peter Feiner,
\"Dr. David Alan Gilbert\", Christopher Covington,
Johannes Weiner, Android Kernel Team, Robert Love,
Dmitry Adamushko <dmitry.adamu>
In-Reply-To: <20141003231336.GA13528@glandium.org>
Hi,
On Sat, Oct 04, 2014 at 08:13:36AM +0900, Mike Hommey wrote:
> On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > userland touches a still unmapped virtual address, a sigbus signal is
> > sent instead of allocating a new page. The sigbus signal handler will
> > then resolve the page fault in userland by calling the
> > remap_anon_pages syscall.
>
> What does "unmapped virtual address" mean in this context?
To clarify this I added this in a second sentence in the commit
header:
"still unmapped virtual address" of the previous sentence in this
context means that the pte/trans_huge_pmd is null. It means it's an
hole inside the anonymous vma (the kind of hole that doesn't account
for RSS but only virtual size of the process). It is the same state
all anonymous virtual memory is, right after mmap. The same state that
if you read from it, will map a zeropage into the faulting virtual
address. If the page is swapped out, it will not trigger userfaults.
If something isn't clear let me know.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 12/17] mm: sys_remap_anon_pages
From: Andrea Arcangeli @ 2014-10-06 17:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, linux-mm, linux-api, Linus Torvalds
In-Reply-To: <87iok0q8p4.fsf@tassilo.jf.intel.com>
Hi,
On Sat, Oct 04, 2014 at 06:13:27AM -0700, Andi Kleen wrote:
> Andrea Arcangeli <aarcange@redhat.com> writes:
>
> > This new syscall will move anon pages across vmas, atomically and
> > without touching the vmas.
> >
> > It only works on non shared anonymous pages because those can be
> > relocated without generating non linear anon_vmas in the rmap code.
>
> ...
>
> > It is an alternative to mremap.
>
> Why a new syscall? Couldn't mremap do this transparently?
The difference between remap_anon_pages and mremap is that mremap
fundamentally moves vmas and not pages (just the pages are moved too
because they remain attached to their respective vmas), while
remap_anon_pages move anonymous pages zerocopy across vmas but it
would never touch any vma.
mremap for example would also nuke the source vma, remap_anon_pages
just moves the pages inside the vmas instead so it doesn't require to
allocate new vmas in the area that receives the data.
We could certainly change mremap to try to detect when page_mapping of
anonymous page is 1 and downgrade the mmap_sem to down_read and then
behave like remap_anon_pages internally by updating the page->index if
all pages in the range can be updated. However to provide the same
strict checks that remap_anon_pages does and to leave the source vma
intact, mremap would need new flags that would need to alter the
normal mremap semantics that silently wipes out the destination range
and get rid of the source range and it would require to run a
remap_anon_pages-detection-routine that isn't zero cost.
Unless we add even more flags to mremap, we wouldn't have the absolute
guarantee that the vma tree is not altered in case userland is not
doing all things right (like if userland forgot MADV_DONTFORK).
Separating the two looked better, mremap was never meant to be
efficient at moving 1 page at time (or 1 THP at time).
Embedding remap_anon_pages inside mremap didn't look worthwhile
considering that as result, mremap would run slower when it cannot
behave like remap_anon_pages and it would also run slower than
remap_anon_pages when it could.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [GIT PULL] kselftest 3.18-updates-1
From: Shuah Khan @ 2014-10-06 16:47 UTC (permalink / raw)
To: linus Torvalds
Cc: Andrew Morton, open list:KERNEL SELFTEST F...,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Shuah Khan
Hi Linus,
Here are the updates for kselftest for 3.18. Please see
details in the signed tag.
thanks,
-- Shuah
The following changes since commit 69e273c0b0a3c337a521d083374c918dc52c666f:
Linux 3.17-rc3 (2014-08-31 18:23:04 -0700)
are available in the git repository at:
git-OoYKEaZ2EDaWaY/ihj7yzEB+6BGkLq7r@public.gmane.org:/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/kselftest-3.18-updates-1
for you to fetch changes up to ce6a144a0d01c6628496e4c0d18fbf3a0362cc67:
selftests/memfd: Run test on all architectures (2014-09-17 08:00:16 -0600)
----------------------------------------------------------------
kselftest Updates for 3.18
- Fix for missing arguments to printf
- Fix to build failures on 32-bit systems.
- Enhancement to run memfd_test run on all architectures
as most architectures support __NR_memfd_create
----------------------------------------------------------------
Pranith Kumar (3):
memfd_test: Make it work on 32-bit systems
memfd_test: Add missing argument to printf()
selftests/memfd: Run test on all architectures
tools/testing/selftests/memfd/Makefile | 21 -----------------
tools/testing/selftests/memfd/memfd_test.c | 36
++++++++++++++----------------
2 files changed, 17 insertions(+), 40 deletions(-)
--
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org | (970) 217-8978
^ permalink raw reply
* Re: [PATCH] x86,seccomp,prctl: Remove PR_TSC_SIGSEGV and seccomp TSC filtering
From: Andy Lutomirski @ 2014-10-06 16:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org, Ingo Molnar, Kees Cook,
Andrea Arcangeli, Erik Bosman, H. Peter Anvin, Linux API,
Michael Kerrisk-manpages, Paul Mackerras,
Arnaldo Carvalho de Melo, X86 ML
In-Reply-To: <20141004081324.GR10583@worktop.programming.kicks-ass.net>
On Sat, Oct 4, 2014 at 1:13 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Oct 03, 2014 at 02:15:24PM -0700, Andy Lutomirski wrote:
>> On Fri, Oct 3, 2014 at 2:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Fri, Oct 03, 2014 at 02:04:53PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Oct 3, 2014 at 2:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >
>> >> > Something like so.. slightly less ugly and possibly with more
>> >> > complicated conditions setting the cr4 if you want to fix tsc vs seccomp
>> >> > as well.
>> >>
>> >> This will crash anything that tries rdpmc in an allow-everything
>> >> seccomp sandbox. It's also not very compatible with my grand scheme
>> >> of allowing rdtsc to be turned off without breaking clock_gettime. :)
>> >
>> > Well, we clear cap_user_rdpmc, so everybody who still tries it gets what
>> > he deserves, no problem there.
>>
>> Oh, interesting.
>>
>> To continue playing devil's advocate, what if you do perf_event_open,
>> then mmap it, then start the seccomp sandbox?
>
> We update that cap bit on every update to the self-monitor state, and in
> a perfect world people would also check the cap bit every time they try
> and read it, and fall back to the syscall. So we could just clear it..
> but I can imagine reality ruining things here.
If nothing else, the fact that rdpmc fails with SIGSEGV instead of
with some nonsense value means that this will always be racy.
>
>> My draft patches are currently tracking the number of perf_event mmaps
>> per mm. I'm not thrilled with it, but it's straightforward. And I
>> still need to benchmark cr4 writes, which is tedious, because I can't
>> do it from user code.
>
> Should be fairly straight fwd from kernel space, get a tsc stamp,
> read+write cr4 1000 times, get another tsc read, and maybe do that
> several times. No?
I tried it. Rough numbers on my 2.7 GHz Sandy Bridge laptop
Writing to cr4 in VMX non-root (changing PCE) takes ~48ns. RMW cr4
takes rougly 51ns. IMO neither of these is enough to be worth
worrying *that* much about when switching into or out of a perf-using
task. But you might disagree with me.
Changing TSD takes 700ns, because KVM has the VMCS programmed wrong.
I'll send a patch.
I suspect that the same experiment on bare metal would run faster.
--Andy
^ permalink raw reply
* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Andrea Arcangeli @ 2014-10-06 16:41 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Linus Torvalds, qemu-devel, KVM list, Linux Kernel Mailing List,
linux-mm, Linux API, Andres Lagar-Cavilla, Dave Hansen,
Paolo Bonzini, Rik van Riel, Mel Gorman, Andy Lutomirski,
Andrew Morton, Sasha Levin, Hugh Dickins, Peter Feiner,
Christopher Covington, Johannes Weiner, Android Kernel Team,
Robert Love, Dmitry Adamushko <dmitry>
In-Reply-To: <20141006085540.GD2336@work-vm>
Hello,
On Mon, Oct 06, 2014 at 09:55:41AM +0100, Dr. David Alan Gilbert wrote:
> * Linus Torvalds (torvalds@linux-foundation.org) wrote:
> > On Fri, Oct 3, 2014 at 10:08 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > >
> > > Overall this looks a fairly small change to the rmap code, notably
> > > less intrusive than the nonlinear vmas created by remap_file_pages.
> >
> > Considering that remap_file_pages() was an unmitigated disaster, and
> > -mm has a patch to remove it entirely, I'm not at all convinced this
> > is a good argument.
> >
> > We thought remap_file_pages() was a good idea, and it really really
> > really wasn't. Almost nobody used it, why would the anonymous page
> > case be any different?
>
> I've posted code that uses this interface to qemu-devel and it works nicely;
> so chalk up at least one user.
>
> For the postcopy case I'm using it for, we need to place a page, atomically
> some thread might try and access it, and must either
> 1) get caught by userfault etc or
> 2) must succeed in it's access
>
> and we'll have that happening somewhere between thousands and millions of times
> to pages in no particular order, so we need to avoid creating millions of mappings.
Yes, that's our current use case.
Of course if somebody has better ideas on how to resolve an anonymous
userfault they're welcome.
How to resolve an userfault is orthogonal on how to detect it and to
notify userland about it and to be notified when the userfault has
been resolved. The latter is what the userfault and userfaultfd
do. The former is what remap_anon_pages is used for but we could use
something else too if there are better ways. mremap would clearly work
too, but it would be less strict (it could lead to silent data
corruption if there are bugs in the userland code), it would be slower
and it would eventually a hit a -ENOMEM failure because there would be
too many vmas.
I could in theory drop remap_anon_pages from this patchset, but
without an optimal way to resolve an userfault, the rest isn't so
useful.
We're currently discussing on what would be the best way to resolve a
MAP_SHARED userfault on tmpfs in fact (that's not sorted yet), but so
far, it seems remap_anon_pages fits the bill for anonymous memory.
remap_anon_pages is not as problematic to maintain as remap_file_pages
for the reason explained in the commit header, but there are other
reasons: it doesn't require special pte_file and it changes nothing of
how anonymous page faults works. All it requires is a loop to catch a
changed page->index (previously page->index couldn't change, not it
can, that's the only thing it changes).
remap_file_pages complexity derives from not being allowed to change
page->index during a move because the page_mapping may be bigger than
1, while that is precisely what remap_anon_pages does.
As long as this "rmap preparation" is the only constraints that
remap_anon_pages introduces in terms of rmap, it looks a nice
not-too-intrusive solution to resolve anonymous userfaults
efficiently.
Introducing remap_anon_pages in fact doesn't reduce the
simplification derived from the removal of remap_file_pages.
As opposed removing remap_anon_pages later would only have the benefit
of removing this very patch 10/17 and no other benefit.
In short remap_anon_pages does this (heavily simplified):
pte = *src_pte;
*src_pte = 0;
pte_page(pte)->index = adjusted according to src_vma/dst_vma->vm_pgoff
*dst_pte = pte;
It guarantees not to modify the vmas and in turn it doesn't require to
take the mmap_sem for writing.
To use remap_anon_pages, each thread has to create its own temporary
vma with MADV_DONTFORK set on it (not formally required by the syscall
strict checks, but then the application must never fork if
MADV_DONTFORK isn't set or remap_anon_pages could return -EBUSY:
there's no risk of silent data corruption even if the thread forks
without setting MADV_DONTFORK) as source region where receive data
through the network. Then after the data is fully received
rmap_anon_pages moves the page from the temporary vma to the address
where the userfault triggered atomically (while other threads may be
attempting to access the userfault address too, thanks to
remap_anon_pages atomic behavior they won't risk to ever see partial
data coming from the network).
remap_anon_pages as side effect creates an hole in the temporary
(source) vma, so the next recv() syscall receiving data from the
network will fault-in a new anonymous page without requiring any
further malloc/free or other kind of vma mangling.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* RE: [PATCH RFC] introduce ioctl to completely invalidate page cache
From: Thanos Makatos @ 2014-10-06 15:21 UTC (permalink / raw)
To: 'Jan Kara'
Cc: Jens Axboe, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
jlayton@poochiereds.net, bfields@fieldses.org
In-Reply-To: <20141006143019.GG7526@quack.suse.cz>
Thanks, Jan, it's much clearer now what I need to do.
> So most notably they want the ioctl to work not only for block devices but also for any regular file. That's easily doable - you just call
> filemap_write_and_wait() and invalidate_inode_pages2() in the ioctl handler for regular files.
All right, so I need to find out how I can direct the new ioctl to
file-systems as well. I thought that I could get away with it by simply looking
at which file-systems have the same block device as the one to which the ioctl
is directed, but IIUC this doesn't make sense as NFS doesn't use a block device
at all.
> Also they wanted to be able to specify a range of a mapping to invalidate -
> that's easily doable as well. Finally they wanted a 'flags' argument so you can
> additionally ask fs to invalidate also some metadata. How invalidation is done
> will be a fs specific thing and for now I guess we don't need to go into
> details. NFS guys can sort that out when they decide to implement it.
So after I've figured out how to direct this new ioctl to a file-system, I need
to understand out how to invalidate a specific range of data. I will gracefully
fail metadata invalidation operations with EOPNOTSUPP or ENOSYS.
> So in the beginning we can just have u64 flags argument and in it a single
> 'INVAL_DATA' flag meaning that invalidation of data in a given range is
> requested. Later NFS guys can add further flags.
OK that I can do.
I suppose we'll always fail metadata invalidation operations when the target
of the ioctl is a block device.
Thanks
^ permalink raw reply
* Re: [PATCH RFC] introduce ioctl to completely invalidate page cache
From: Jan Kara @ 2014-10-06 14:30 UTC (permalink / raw)
To: Thanos Makatos
Cc: 'Jan Kara', Jens Axboe, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
jlayton@poochiereds.net, bfields@fieldses.org
In-Reply-To: <2368A3FCF9F7214298E53C823B0A48EC0424106C@AMSPEX01CL02.citrite.net>
On Mon 06-10-14 11:33:23, Thanos Makatos wrote:
> > > Trond also had a comment that if we extended the ioctl to work for all
> > > inodes (not just blkdev) and allowed some additional flags of what
> > > needs to be invalidated, the new ioctl would be also useful to NFS
> > > userspace - see Trond's email at
> > >
> > > http://www.spinics.net/lists/linux-fsdevel/msg78917.html
> > >
> > > and the following thread. I would prefer to cover that usecase when we
> > > are introducing new invalidation ioctl. Have you considered that Thanos?
> >
> > Sure, though I don't really know how to do it. I'll start by looking at the code
> > flow when someone does " echo 3 > /proc/sys/vm/drop_caches", unless you
> > already have a rough idea how to do that.
>
> I realise I haven't clearly understood what the semantics of this new ioctl
> should be.
>
> My initial goal was to implement an ioctl that would _completely_ invalidate
> the buffer cache of a block device when there is no file-system involved.
> Unless I'm mistaken the patch I posted achieves this goal.
Yes.
> We now want to extend this patch to take care of cached metadata, which seems
> to be of particular importance for NFS, and I suspect that this piece of
> functionality will still be applicable to any kind of file-system, correct?
So most notably they want the ioctl to work not only for block devices
but also for any regular file. That's easily doable - you just call
filemap_write_and_wait() and invalidate_inode_pages2() in the ioctl handler
for regular files.
Also they wanted to be able to specify a range of a mapping to invalidate -
that's easily doable as well. Finally they wanted a 'flags' argument so you
can additionally ask fs to invalidate also some metadata. How invalidation
is done will be a fs specific thing and for now I guess we don't need to go
into details. NFS guys can sort that out when they decide to implement it.
So in the beginning we can just have u64 flags argument and in
it a single 'INVAL_DATA' flag meaning that invalidation of data in a given
range is requested. Later NFS guys can add further flags.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH 04/17] mm: gup: make get_user_pages_fast and __get_user_pages_fast latency conscious
From: Andrea Arcangeli @ 2014-10-06 14:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: qemu-devel, KVM list, Linux Kernel Mailing List, linux-mm,
Linux API, Andres Lagar-Cavilla, Dave Hansen, Paolo Bonzini,
Rik van Riel, Mel Gorman, Andy Lutomirski, Andrew Morton,
Sasha Levin, Hugh Dickins, Peter Feiner, \Dr. David Alan Gilbert\,
Christopher Covington, Johannes Weiner, Android Kernel Team,
Robert Love, Dmitry Adamushko <dm>
In-Reply-To: <CA+55aFyuYRuY9fiJQKL=XJ0-BKhGsZbo1HGkGUOJ6DbbxdA-dQ@mail.gmail.com>
Hello,
On Fri, Oct 03, 2014 at 11:23:53AM -0700, Linus Torvalds wrote:
> On Fri, Oct 3, 2014 at 10:07 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > This teaches gup_fast and __gup_fast to re-enable irqs and
> > cond_resched() if possible every BATCH_PAGES.
>
> This is disgusting.
>
> Many (most?) __gup_fast() users just want a single page, and the
> stupid overhead of the multi-page version is already unnecessary.
> This just makes things much worse.
>
> Quite frankly, we should make a single-page version of __gup_fast(),
> and convert existign users to use that. After that, the few multi-page
> users could have this extra latency control stuff.
Ok. I didn't think at a better way to add the latency control other
than to reduce nr_pages in a outer loop instead of altering the inner
calls, but this is what I got after implementing it... If somebody has
a cleaner way to implement the latency control stuff that's welcome
and I'd be glad to replace it.
> And yes, the single-page version of get_user_pages_fast() is actually
> latency-critical. shared futexes hit it hard, and yes, I've seen this
> in profiles.
KVM would save a few cycles from a single-page version too. I just
thought further optimizations could be added later and this was better
than nothing.
Considering I've no better idea how to implement the latency control
stuff, for now I'll just drop this controversial patch, and I'll
convert those get_user_pages to gup_unlocked instead of converting
them to gup_fast, which is more than enough to obtain the mmap_sem
holding scalability improvement (that also solves the mmap_sem trouble
for the userfaultfd). gup_unlocked isn't as good as gup_fast but it's
at least better than the current get_user_pages().
I got into this gup_fast latency control stuff purely because there
were a few get_user_pages that could have been converted to
get_user_pages_fast as they were using "current" and "current->mm" as the
first two parameters, except for the risk of disabling irq for
long. So I tried to do the right thing and fix gup_fast but I'll leave
this further optimization queued for later.
About the missing commit header for the other patch Paolo already
replied to it, to clarify this a bit further in short I expect that
FOLL_TRIED flag to be merged through the KVM git tree which already
contains it. I'll add a comment to the commit header to specify
it. Sorry for the confusion about that patch.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* RE: [PATCH RFC] introduce ioctl to completely invalidate page cache
From: Thanos Makatos @ 2014-10-06 11:33 UTC (permalink / raw)
To: Thanos Makatos, 'Jan Kara', Jens Axboe
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, jlayton@poochiereds.net,
bfields@fieldses.org
In-Reply-To: <2368A3FCF9F7214298E53C823B0A48EC042405BC@AMSPEX01CL02.citrite.net>
> > Trond also had a comment that if we extended the ioctl to work for all
> > inodes (not just blkdev) and allowed some additional flags of what
> > needs to be invalidated, the new ioctl would be also useful to NFS
> > userspace - see Trond's email at
> >
> > http://www.spinics.net/lists/linux-fsdevel/msg78917.html
> >
> > and the following thread. I would prefer to cover that usecase when we
> > are introducing new invalidation ioctl. Have you considered that Thanos?
>
> Sure, though I don't really know how to do it. I'll start by looking at the code
> flow when someone does " echo 3 > /proc/sys/vm/drop_caches", unless you
> already have a rough idea how to do that.
I realise I haven't clearly understood what the semantics of this new ioctl
should be.
My initial goal was to implement an ioctl that would _completely_ invalidate
the buffer cache of a block device when there is no file-system involved.
Unless I'm mistaken the patch I posted achieves this goal.
We now want to extend this patch to take care of cached metadata, which seems
to be of particular importance for NFS, and I suspect that this piece of
functionality will still be applicable to any kind of file-system, correct?
Do we want this new ioctl to do what "echo 3 > /proc/sys/vm/drop_caches" does
but on a more selective basis, which IIUC drops whatever can be dropped, but
may not drop everything? If so, then we should more precisely define this ioctl
as "drop *all* cached data and as much metadata you can, which may not be all
of them". If this is the case, would adding a call to drop_pagecache_sb() on
all super blocks whose .s_bdev equals the block device we're interested in
suffice?
^ permalink raw reply
* RE: [PATCH RFC] introduce ioctl to completely invalidate page cache
From: Thanos Makatos @ 2014-10-06 9:21 UTC (permalink / raw)
To: 'Jan Kara', Jens Axboe
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, jlayton@poochiereds.net,
bfields@fieldses.org
In-Reply-To: <20141006080659.GA7526@quack.suse.cz>
> > We're currently ignoring the buffer cache sync and invalidation (which
> > is odd), but at least being consistent would be good.
> Well, invalidate_bdev() doesn't return anything. And
> invalidate_mapping_pages() inside invalidate_bdev() returns only number of
> invalidated pages. I don't think there's any value in returning that.
>
> OTOH invalidate_inode_pages2() returns 0 / -EBUSY / other error when
> invalidation of some page fails so returning that seems useful.
>
> > Might also need a filemap_write_and_wait() to sync before invalidation.
> That's what fsync_bdev() is doing under the hoods. Sometimes I'm not sure
> whether all these wrappers are useful...
Indeed, fsync_bdev() does call filemap_write_and_wait() so I don't need to explicitly do that.
>
> Trond also had a comment that if we extended the ioctl to work for all inodes
> (not just blkdev) and allowed some additional flags of what needs to be
> invalidated, the new ioctl would be also useful to NFS userspace - see Trond's
> email at
>
> http://www.spinics.net/lists/linux-fsdevel/msg78917.html
>
> and the following thread. I would prefer to cover that usecase when we are
> introducing new invalidation ioctl. Have you considered that Thanos?
Sure, though I don't really know how to do it. I'll start by looking at the code flow when someone does " echo 3 > /proc/sys/vm/drop_caches", unless you already have a rough idea how to do that.
Thanks
^ permalink raw reply
* Re: [PATCH 10/17] mm: rmap preparation for remap_anon_pages
From: Dr. David Alan Gilbert @ 2014-10-06 8:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrea Arcangeli, qemu-devel, KVM list, Linux Kernel Mailing List,
linux-mm, Linux API, Andres Lagar-Cavilla, Dave Hansen,
Paolo Bonzini, Rik van Riel, Mel Gorman, Andy Lutomirski,
Andrew Morton, Sasha Levin, Hugh Dickins, Peter Feiner,
\Dr. David Alan Gilbert\, Christopher Covington, Johannes Weiner,
Android Kernel Team, Robert
In-Reply-To: <CA+55aFx++R42L75ooE=Fmaem73=V=q7f6pYTcALxgrA1y98G-A@mail.gmail.com>
* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> On Fri, Oct 3, 2014 at 10:08 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > Overall this looks a fairly small change to the rmap code, notably
> > less intrusive than the nonlinear vmas created by remap_file_pages.
>
> Considering that remap_file_pages() was an unmitigated disaster, and
> -mm has a patch to remove it entirely, I'm not at all convinced this
> is a good argument.
>
> We thought remap_file_pages() was a good idea, and it really really
> really wasn't. Almost nobody used it, why would the anonymous page
> case be any different?
I've posted code that uses this interface to qemu-devel and it works nicely;
so chalk up at least one user.
For the postcopy case I'm using it for, we need to place a page, atomically
some thread might try and access it, and must either
1) get caught by userfault etc or
2) must succeed in it's access
and we'll have that happening somewhere between thousands and millions of times
to pages in no particular order, so we need to avoid creating millions of mappings.
Dave
>
> Linus
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH RFC] introduce ioctl to completely invalidate page cache
From: Jan Kara @ 2014-10-06 8:06 UTC (permalink / raw)
To: Jens Axboe
Cc: Thanos Makatos, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, jlayton-vpEMnDpepFuMZCB2o+C8xQ,
bfields-uC3wQj2KruNg9hUCZPvPmw, jack-AlSwsSmVLrQ
In-Reply-To: <542DAEAC.8010203-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
On Thu 02-10-14 13:59:40, Jens Axboe wrote:
> On 10/02/2014 10:09 AM, Thanos Makatos wrote:
> > This patch introduces a new ioctl called BLKFLUSHBUFS2, which is pretty
> > similar to BLKFLUSHBUFS except that is also invalidates the page cache.
> > This allows for a complete invalidation of the cached data of a
> > particular block device, which might be useful for cases like
> > synchronising the caches of an iSCSI block device used by multiple
> > hosts.
> >
> > Signed-off-by: Thanos Makatos <thanos.makatos-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
> > ---
> > block/compat_ioctl.c | 1 +
> > block/ioctl.c | 13 +++++++++++--
> > include/uapi/linux/fs.h | 1 +
> > 3 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
> > index 18b282c..672388ab 100644
> > --- a/block/compat_ioctl.c
> > +++ b/block/compat_ioctl.c
> > @@ -688,6 +688,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> > case BLKDISCARDZEROES:
> > return compat_put_uint(arg, bdev_discard_zeroes_data(bdev));
> > case BLKFLSBUF:
> > + case BLKFLSBUF2:
> > case BLKROSET:
> > case BLKDISCARD:
> > case BLKSECDISCARD:
> > diff --git a/block/ioctl.c b/block/ioctl.c
> > index d6cda81..0c427a7 100644
> > --- a/block/ioctl.c
> > +++ b/block/ioctl.c
> > @@ -268,6 +268,12 @@ static inline int is_unrecognized_ioctl(int ret)
> > ret == -ENOIOCTLCMD;
> > }
> >
> > +static void flush_buffer_cache(struct block_device *bdev)
> > +{
> > + fsync_bdev(bdev);
> > + invalidate_bdev(bdev);
> > +}
> > +
> > /*
> > * always keep this in sync with compat_blkdev_ioctl()
> > */
> > @@ -282,6 +288,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
> >
> > switch(cmd) {
> > case BLKFLSBUF:
> > + case BLKFLSBUF2:
> > if (!capable(CAP_SYS_ADMIN))
> > return -EACCES;
> >
> > @@ -289,8 +296,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
> > if (!is_unrecognized_ioctl(ret))
> > return ret;
> >
> > - fsync_bdev(bdev);
> > - invalidate_bdev(bdev);
> > + flush_buffer_cache(bdev);
> > + if (BLKFLSBUF2 == cmd)
> > + return invalidate_inode_pages2(
> > + bdev->bd_inode->i_mapping);
> > return 0;
>
> We're currently ignoring the buffer cache sync and invalidation (which
> is odd), but at least being consistent would be good.
Well, invalidate_bdev() doesn't return anything. And
invalidate_mapping_pages() inside invalidate_bdev() returns only number of
invalidated pages. I don't think there's any value in returning that.
OTOH invalidate_inode_pages2() returns 0 / -EBUSY / other error when
invalidation of some page fails so returning that seems useful.
> Might also need a filemap_write_and_wait() to sync before invalidation.
That's what fsync_bdev() is doing under the hoods. Sometimes I'm not
sure whether all these wrappers are useful...
Trond also had a comment that if we extended the ioctl to work for all
inodes (not just blkdev) and allowed some additional flags of what needs to
be invalidated, the new ioctl would be also useful to NFS userspace - see
Trond's email at
http://www.spinics.net/lists/linux-fsdevel/msg78917.html
and the following thread. I would prefer to cover that usecase when we are
introducing new invalidation ioctl. Have you considered that Thanos?
Honza
--
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH 12/17] mm: sys_remap_anon_pages
From: Andi Kleen @ 2014-10-04 13:13 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, linux-mm, linux-api, Linus Torvalds
In-Reply-To: <1412356087-16115-13-git-send-email-aarcange@redhat.com>
Andrea Arcangeli <aarcange@redhat.com> writes:
> This new syscall will move anon pages across vmas, atomically and
> without touching the vmas.
>
> It only works on non shared anonymous pages because those can be
> relocated without generating non linear anon_vmas in the rmap code.
...
> It is an alternative to mremap.
Why a new syscall? Couldn't mremap do this transparently?
-Andi
--
ak@linux.intel.com -- Speaking for myself only
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH] x86,seccomp,prctl: Remove PR_TSC_SIGSEGV and seccomp TSC filtering
From: Peter Zijlstra @ 2014-10-04 8:13 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel@vger.kernel.org, Ingo Molnar, Kees Cook,
Andrea Arcangeli, Erik Bosman, H. Peter Anvin, Linux API,
Michael Kerrisk-manpages, Paul Mackerras,
Arnaldo Carvalho de Melo, X86 ML
In-Reply-To: <CALCETrVtK6w4smnRCTED=csAyt3WNNOaZE_WRzvECuSx260X3w@mail.gmail.com>
On Fri, Oct 03, 2014 at 02:15:24PM -0700, Andy Lutomirski wrote:
> On Fri, Oct 3, 2014 at 2:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Oct 03, 2014 at 02:04:53PM -0700, Andy Lutomirski wrote:
> >> On Fri, Oct 3, 2014 at 2:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> > Something like so.. slightly less ugly and possibly with more
> >> > complicated conditions setting the cr4 if you want to fix tsc vs seccomp
> >> > as well.
> >>
> >> This will crash anything that tries rdpmc in an allow-everything
> >> seccomp sandbox. It's also not very compatible with my grand scheme
> >> of allowing rdtsc to be turned off without breaking clock_gettime. :)
> >
> > Well, we clear cap_user_rdpmc, so everybody who still tries it gets what
> > he deserves, no problem there.
>
> Oh, interesting.
>
> To continue playing devil's advocate, what if you do perf_event_open,
> then mmap it, then start the seccomp sandbox?
We update that cap bit on every update to the self-monitor state, and in
a perfect world people would also check the cap bit every time they try
and read it, and fall back to the syscall. So we could just clear it..
but I can imagine reality ruining things here.
> My draft patches are currently tracking the number of perf_event mmaps
> per mm. I'm not thrilled with it, but it's straightforward. And I
> still need to benchmark cr4 writes, which is tedious, because I can't
> do it from user code.
Should be fairly straight fwd from kernel space, get a tsc stamp,
read+write cr4 1000 times, get another tsc read, and maybe do that
several times. No?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox