* Executable mapping of on-chip registers through /dev/mem? @ 2015-11-18 18:21 Florian Fainelli 2015-11-19 15:17 ` Russell King - ARM Linux 0 siblings, 1 reply; 3+ messages in thread From: Florian Fainelli @ 2015-11-18 18:21 UTC (permalink / raw) To: linux-arm-kernel Hi, On the brcmstb platform, we have a special piece of hardware which tries to be smart and checks whether a virtual mapping to the on-chip register range (called GISB), PCIE inbound windows, or other memory-mapped chip-selects region has the executable bit set, and if it does, it typically issues an error condition. It turns out, that we can re-create that condition just that by opening /dev/mem and calling mmap() with PROT_EXEC, giving the physical base address of the register range (0xF000_0000 typically on these platforms), and a mapping size which spans the entire register range (32MB), although smaller mapping size also exhibit the problem, just a little slower. In these two conditions, we end-up with the CPU speculatively trying to fetch instruction streams from this range, and eventually itching this sensitive piece of hardware and causing the error condition to occur. Tracing through the calls from drivers/char/mem.c, we have this: drivers/char/mem.c: mmap_mem() ARM does define __HAVE_PHYS_MEM_ACCESS_PROT and we have CONFIG_MEM_DMA_BUFFERABLE=y for our V7 builds here arch/arm/mm/mmu.c: -> phys_mem_access_prot() -> !pfn_valid(pfn) is true -> pgprot_uncached() If I do change the pgprot value to also include the XN bit, this problem never occurs, because we satisfy the piece of hardware checking for the executable bit (or lack, thereof) in the mapping. What is is not really clear to me, is whether we are creating a new mapping of this 32MB register range on this SoC, with an uncached mapping + executable bit set, or we are modifying the existing mapping in that case? I cooked up a local patch which allows a machine to define a phys_mem_access_prot-like callback, which can then look at the calling parameters and change the pgprot_t values accordingly if the range falls in this problematic space. I do not like that, because this currently forces my machine to have knowledge about where this register range is, so I am wondering if there are better solutions like: - making phys_mem_access_prot set L_PTE_XN unconditionally for the !pfn_valid case, instead of pgprot_uncached(), but who am I going to break by doing so, what if people want to execute code from a memory mapped, like flash, FPGA, anything? - having a better way to determine if the pfn falls within existing register mappings? But without a map_io() or putting that information in Device Tree, how am I sure this is an exhaustive range? Thanks! -- Florian ^ permalink raw reply [flat|nested] 3+ messages in thread
* Executable mapping of on-chip registers through /dev/mem? 2015-11-18 18:21 Executable mapping of on-chip registers through /dev/mem? Florian Fainelli @ 2015-11-19 15:17 ` Russell King - ARM Linux 2015-11-19 19:36 ` Florian Fainelli 0 siblings, 1 reply; 3+ messages in thread From: Russell King - ARM Linux @ 2015-11-19 15:17 UTC (permalink / raw) To: linux-arm-kernel On Wed, Nov 18, 2015 at 10:21:06AM -0800, Florian Fainelli wrote: > It turns out, that we can re-create that condition just that by opening > /dev/mem and calling mmap() with PROT_EXEC, giving the physical base > address of the register range (0xF000_0000 typically on these > platforms), and a mapping size which spans the entire register range > (32MB), although smaller mapping size also exhibit the problem, just a > little slower. There's two ways of looking at this: 1. the kernel should protect against it and prevent you creating an executable /dev/mem mapping 2. only root can create these mappings (provided the permissions are set sanely on /dev/mem) and this is just another way for root to hang themselves. Someone may have valid reasons to be able to open /dev/mem, map it executable, and execute code there - for example, an expansion ROM, some platform specific code, etc. So I'd be very nervous about changing the behaviour and causing userspace regressions. In my mind, it's a case of "if it hurts when I do X then don't do X". > Tracing through the calls from drivers/char/mem.c, we have this: > > drivers/char/mem.c: > mmap_mem() > ARM does define __HAVE_PHYS_MEM_ACCESS_PROT and we have > CONFIG_MEM_DMA_BUFFERABLE=y for our V7 builds here > > arch/arm/mm/mmu.c: > -> phys_mem_access_prot() > -> !pfn_valid(pfn) is true > -> pgprot_uncached() > > If I do change the pgprot value to also include the XN bit, this problem > never occurs, because we satisfy the piece of hardware checking for the > executable bit (or lack, thereof) in the mapping. Yes, but you're changing the permissions that _any_ pgprot_uncached() mapping gets to be different from what the user requested. At best, if we're saying we want to deny executable mappings of /dev/mem, we should return an error if userspace tries to request such a mapping. However, there's an issue here which is not obvious: when you don't have an XN bit, then the kernel has assumptions that when you request read but without execute permission, you'll end up with read _and_ execute permission. In other words, on older CPUs, even if you request in userspace a PROT_READ mapping, the kernel will silently translate that to PROT_READ | PROT_EXEC internally. So denying a PROT_EXEC mapping will result in /dev/mem being useless on older CPUs as you'd never be able to create any mapping of it. > What is is not really clear to me, is whether we are creating a new > mapping of this 32MB register range on this SoC, with an uncached > mapping + executable bit set, or we are modifying the existing mapping > in that case? You're creating a new mapping in the userspace address range, it's not a new mapping in kernel space. > - having a better way to determine if the pfn falls within existing > register mappings? But without a map_io() or putting that information in > Device Tree, how am I sure this is an exhaustive range? Is there really some problem with having userspace avoid using PROT_EXEC when mapping /dev/mem, which all round seems to be the correct solution here ? -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Executable mapping of on-chip registers through /dev/mem? 2015-11-19 15:17 ` Russell King - ARM Linux @ 2015-11-19 19:36 ` Florian Fainelli 0 siblings, 0 replies; 3+ messages in thread From: Florian Fainelli @ 2015-11-19 19:36 UTC (permalink / raw) To: linux-arm-kernel On 19/11/15 07:17, Russell King - ARM Linux wrote: > On Wed, Nov 18, 2015 at 10:21:06AM -0800, Florian Fainelli wrote: >> It turns out, that we can re-create that condition just that by opening >> /dev/mem and calling mmap() with PROT_EXEC, giving the physical base >> address of the register range (0xF000_0000 typically on these >> platforms), and a mapping size which spans the entire register range >> (32MB), although smaller mapping size also exhibit the problem, just a >> little slower. > > There's two ways of looking at this: > 1. the kernel should protect against it and prevent you creating an > executable /dev/mem mapping > 2. only root can create these mappings (provided the permissions are > set sanely on /dev/mem) and this is just another way for root to > hang themselves. > > Someone may have valid reasons to be able to open /dev/mem, map it > executable, and execute code there - for example, an expansion ROM, > some platform specific code, etc. So I'd be very nervous about > changing the behaviour and causing userspace regressions. > > In my mind, it's a case of "if it hurts when I do X then don't do X". > >> Tracing through the calls from drivers/char/mem.c, we have this: >> >> drivers/char/mem.c: >> mmap_mem() >> ARM does define __HAVE_PHYS_MEM_ACCESS_PROT and we have >> CONFIG_MEM_DMA_BUFFERABLE=y for our V7 builds here >> >> arch/arm/mm/mmu.c: >> -> phys_mem_access_prot() >> -> !pfn_valid(pfn) is true >> -> pgprot_uncached() >> >> If I do change the pgprot value to also include the XN bit, this problem >> never occurs, because we satisfy the piece of hardware checking for the >> executable bit (or lack, thereof) in the mapping. > > Yes, but you're changing the permissions that _any_ pgprot_uncached() > mapping gets to be different from what the user requested. At best, > if we're saying we want to deny executable mappings of /dev/mem, we > should return an error if userspace tries to request such a mapping. Well, what I ended-up doing was directly using __pgprot_modify() locally in phys_mem_access_prot() when !pfn_valid() is true, but that is still too broad, not changing the definition of pgprot_uncached(), precisely as this would break other areas. We do not want to deny executable mappings through /dev/mem for the entire range the mapping is established, just if it falls within a platform-specific problematic region. The more I think about it, and the more I think we might be able to solve this by simply documenting this as a caveat/feature of the platform rather than looking for something overly complicated involving adding machine-specific register space knowledge and acting upon this. > > However, there's an issue here which is not obvious: when you don't > have an XN bit, then the kernel has assumptions that when you request > read but without execute permission, you'll end up with read _and_ > execute permission. In other words, on older CPUs, even if you > request in userspace a PROT_READ mapping, the kernel will silently > translate that to PROT_READ | PROT_EXEC internally. So denying a > PROT_EXEC mapping will result in /dev/mem being useless on older CPUs > as you'd never be able to create any mapping of it. I see, we clearly do not want that, I agree. > >> What is is not really clear to me, is whether we are creating a new >> mapping of this 32MB register range on this SoC, with an uncached >> mapping + executable bit set, or we are modifying the existing mapping >> in that case? > > You're creating a new mapping in the userspace address range, it's not > a new mapping in kernel space. > >> - having a better way to determine if the pfn falls within existing >> register mappings? But without a map_io() or putting that information in >> Device Tree, how am I sure this is an exhaustive range? > > Is there really some problem with having userspace avoid using > PROT_EXEC when mapping /dev/mem, which all round seems to be the > correct solution here ? Well, no and yes. No because, this clearly is a simple fix, that can be documented as such. Yes in the sense that this particular platform is sensitive to such a thing, and can lead to particularly difficult debugging exercises with the platform users. As I just learned, some other people decided that open-coding a /dev/mem like was a brilliant idea, which sort of motivated me for looking into a kernel-level solution to stop that from happening. Appreciate your answer, thanks! -- Florian ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-11-19 19:36 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-18 18:21 Executable mapping of on-chip registers through /dev/mem? Florian Fainelli 2015-11-19 15:17 ` Russell King - ARM Linux 2015-11-19 19:36 ` Florian Fainelli
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.