[RFC] prevent "dd if=/dev/mem" crash

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] prevent "dd if=/dev/mem" crash
@ 2003-10-17 22:10 Bjorn Helgaas
  2003-10-17 22:23 ` Matt Mackall
  2003-10-17 22:50 ` Andrew Morton
  0 siblings, 2 replies; 23+ messages in thread
From: Bjorn Helgaas @ 2003-10-17 22:10 UTC (permalink / raw)
  To: linux-ia64, linux-kernel

This is the generic part of a change to prevent "dd if=/dev/mem"
from causing a machine check on ia64.

read_mem() and write_mem() already check the requested address
against "high_memory", but that is only a complete check if
everything from 0 to high_memory is valid, readable/writable
memory.  Obviously that's not the case for architectures with
discontiguous memory, like ia64.

Old behavior:

    # dd if=/dev/mem of=/dev/null
    <unrecoverable machine check>

New behavior (this system has a hole from 0-16MB, then memory
from 16MB-1GB):

    # dd if=/dev/mem of=/dev/null
    0+0 records in
    0+0 records out
    0 bytes transferred in 0.000282 seconds (0 bytes/sec)

    # dd if=/dev/mem of=/dev/null bs=1M skip=16 
    1004+10 records in
    1004+10 records out
    1056964608 bytes transferred in 1.629262 seconds (648738280 bytes/sec)

I expect there are probably different opinions about the idea
that "dd if=/dev/mem" exits without doing anything.  Sparc and
68K have nearby code that bit-buckets writes and returns zeroes
for reads of page zero.  We could do that, too, but it seems like
kind of a hack, and holes on ia64 can be BIG (on the order of
256GB for one box).

So flame away :-)

The patch below is mangled so it won't apply easily.  If this
seems a reasonable approach, I'll submit the ia64 piece first,
then repost this.

Bjorn

===== drivers/char/mem.c 1.44 vs edited =====
--- 1.44/ drivers/char/mem.c	Sun Sep 21 15:50:34 2003
+++ edited/ drivers/char/mem.c	Fri Oct 17 15:37:47 2003
@@ -79,6 +79,24 @@
 #endif
 }
 
+static inline int valid_mem_range(unsigned long addr, size_t *count)
+{
+#if defined(CONFIG_IA64)
+	return efi_valid_mem_range(addr, count);
+#else
+	unsigned long end_mem;
+
+	end_mem = __pa(high_memory);
+	if (addr >= end_mem)
+		return 0;
+
+	if (*count > end_mem - addr)
+		*count = end_mem - addr;
+
+	return 1;
+#endif
+}
+
 static ssize_t do_write_mem(struct file * file, void *p, unsigned long realp,
 			    const char * buf, size_t count, loff_t *ppos)
 {
@@ -113,14 +131,10 @@
 			size_t count, loff_t *ppos)
 {
 	unsigned long p = *ppos;
-	unsigned long end_mem;
 	ssize_t read;
 
-	end_mem = __pa(high_memory);
-	if (p >= end_mem)
+	if (!valid_mem_range(p, &count))
 		return 0;
-	if (count > end_mem - p)
-		count = end_mem - p;
 	read = 0;
 #if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
 	/* we don't have page 0 mapped on sparc and m68k.. */
@@ -149,13 +163,9 @@
 			 size_t count, loff_t *ppos)
 {
 	unsigned long p = *ppos;
-	unsigned long end_mem;
 
-	end_mem = __pa(high_memory);
-	if (p >= end_mem)
+	if (!valid_mem_range(p, &count))
 		return 0;
-	if (count > end_mem - p)
-		count = end_mem - p;
 	return do_write_mem(file, __va(p), p, buf, count, ppos);
 }
 
===== include/linux/efi.h 1.3 vs edited =====
--- 1.3/ include/linux/efi.h	Thu Aug  7 14:01:48 2003
+++ edited/ include/linux/efi.h	Thu Oct 16 16:54:52 2003
@@ -266,6 +266,7 @@
 extern u64 efi_get_iobase (void);
 extern u32 efi_mem_type (unsigned long phys_addr);
 extern u64 efi_mem_attributes (unsigned long phys_addr);
+extern int efi_valid_mem_range (unsigned long phys_addr, unsigned long *count);
 
 /*
  * Variable Attributes


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 22:10 [RFC] prevent "dd if=/dev/mem" crash Bjorn Helgaas
@ 2003-10-17 22:23 ` Matt Mackall
  2003-10-17 22:50 ` Andrew Morton
  1 sibling, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2003-10-17 22:23 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-ia64, linux-kernel

On Fri, Oct 17, 2003 at 04:10:36PM -0600, Bjorn Helgaas wrote:
> I expect there are probably different opinions about the idea
> that "dd if=/dev/mem" exits without doing anything.  Sparc and
> 68K have nearby code that bit-buckets writes and returns zeroes
> for reads of page zero.  We could do that, too, but it seems like
> kind of a hack, and holes on ia64 can be BIG (on the order of
> 256GB for one box).

I don't see any reason not to returns zeros. A hole in a file does the
same thing, after all. The fact that the hole is big makes no difference.

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 22:10 [RFC] prevent "dd if=/dev/mem" crash Bjorn Helgaas
  2003-10-17 22:23 ` Matt Mackall
@ 2003-10-17 22:50 ` Andrew Morton
  2003-10-17 23:25   ` Bjorn Helgaas
  2003-10-19 18:17   ` Pavel Machek
  1 sibling, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2003-10-17 22:50 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-ia64, linux-kernel

Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>
> Old behavior:
> 
>     # dd if=/dev/mem of=/dev/null
>     <unrecoverable machine check>

I recently fixed this for ia32 by changing copy_to_user() to not oops if
the source address generated a fault.  Similarly copy_from_user() returns
an error if the destination generates a fault.

In other words: drivers/char/mem.c requires that the architecture's
copy_*_user() functions correctly handle faults on either the source or
dest of the copy.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 22:50 ` Andrew Morton
@ 2003-10-17 23:25   ` Bjorn Helgaas
  2003-10-17 23:55     ` Andrew Morton
  2003-10-19 18:17   ` Pavel Machek
  1 sibling, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2003-10-17 23:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-ia64, linux-kernel

On Friday 17 October 2003 4:50 pm, Andrew Morton wrote:
> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> >
> > Old behavior:
> > 
> >     # dd if=/dev/mem of=/dev/null
> >     <unrecoverable machine check>
> 
> I recently fixed this for ia32 by changing copy_to_user() to not oops if
> the source address generated a fault.  Similarly copy_from_user() returns
> an error if the destination generates a fault.
> 
> In other words: drivers/char/mem.c requires that the architecture's
> copy_*_user() functions correctly handle faults on either the source or
> dest of the copy.

If we really believe copy_*_user() must correctly handle *all* faults,
isn't the "p >= __pa(high_memory)" test superfluous?

I don't know how ia32 handles a read to non-existent physical memory.
Are you saying that copy_*_user() can deal with that just like it does
a garden-variety TLB fault?

On ia64, a read to non-existent physical memory causes the processor
to time out and take a machine check.  I'm not sure it's even possible
to recover from that.

Bjorn

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 23:25   ` Bjorn Helgaas
@ 2003-10-17 23:55     ` Andrew Morton
  2003-10-18  0:15       ` William Lee Irwin III
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Andrew Morton @ 2003-10-17 23:55 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-ia64, linux-kernel

Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>
> On Friday 17 October 2003 4:50 pm, Andrew Morton wrote:
> > Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> > >
> > > Old behavior:
> > > 
> > >     # dd if=/dev/mem of=/dev/null
> > >     <unrecoverable machine check>
> > 
> > I recently fixed this for ia32 by changing copy_to_user() to not oops if
> > the source address generated a fault.  Similarly copy_from_user() returns
> > an error if the destination generates a fault.
> > 
> > In other words: drivers/char/mem.c requires that the architecture's
> > copy_*_user() functions correctly handle faults on either the source or
> > dest of the copy.
> 
> If we really believe copy_*_user() must correctly handle *all* faults,
> isn't the "p >= __pa(high_memory)" test superfluous?

This code was conceived before my time and I don't recall seeing much
discussion, so this is all guesswork..

I'd say that the high_memory test _is_ superfluous and that if anyone
cared, we would remove it and establish a temporary pte against the address if
it was outside the direct-mapped area.  But nobody cares enough to have
done anything about it.

> I don't know how ia32 handles a read to non-existent physical memory.
> Are you saying that copy_*_user() can deal with that just like it does
> a garden-variety TLB fault?

I don't know, and I suspect it depends on the off-CPU hardware
implementation anyway.  But the access will either generate a fault or it
won't and in either case we're OK, yes?

> On ia64, a read to non-existent physical memory causes the processor
> to time out and take a machine check.  I'm not sure it's even possible
> to recover from that.

ick.  That would be very poor form.  What about things like probing for
memory, device hot-unplug, memory hot unplug etc?

Still, the code you have is quite reasonable.  But please structure it
thusly:

#include <asm/io.h>	/* valid_phys_addr_range */

#ifndef ARCH_HAS_VALID_PHYS_ADDR_RANGE
static inline int valid_phys_addr_range(unsigned long addr, size_t *count)
{
	unsigned long end_mem;

	end_mem = __pa(high_memory);
	if (addr >= end_mem)
		return 0;

	if (*count > end_mem - addr)
		*count = end_mem - addr;

	return 1;
}
#endif

or whatever.  It's a bit more conventional this way and allows other
architectures to do appropriate things.

As for return values: if the requested read or write starts at a
not-present address it should probably return -EFAULT.  This is what ia32
will do.  Arguably this is indistinguishable from a bad address on the
userspace side and we should return -EINVAL but whatever.

If the request starts at a valid phys address but covers a not-present
address it should return a short read or write (returns something less than
`count').

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 23:55     ` Andrew Morton
@ 2003-10-18  0:15       ` William Lee Irwin III
  2003-10-18  0:21       ` David Mosberger
  2003-10-20 17:42       ` [RFC] " Bjorn Helgaas
  2 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-10-18  0:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bjorn Helgaas, linux-ia64, linux-kernel

On Fri, Oct 17, 2003 at 04:55:43PM -0700, Andrew Morton wrote:
> This code was conceived before my time and I don't recall seeing much
> discussion, so this is all guesswork..
> I'd say that the high_memory test _is_ superfluous and that if anyone
> cared, we would remove it and establish a temporary pte against the address if
> it was outside the direct-mapped area.  But nobody cares enough to have
> done anything about it.

I've heard crash dump people mention it and am trying to convince them
to post their fix(es) here. =)


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 23:55     ` Andrew Morton
  2003-10-18  0:15       ` William Lee Irwin III
@ 2003-10-18  0:21       ` David Mosberger
  2003-10-18  0:49         ` Andrew Morton
                           ` (2 more replies)
  2003-10-20 17:42       ` [RFC] " Bjorn Helgaas
  2 siblings, 3 replies; 23+ messages in thread
From: David Mosberger @ 2003-10-18  0:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bjorn Helgaas, linux-ia64, linux-kernel

>>>>> On Fri, 17 Oct 2003 16:55:43 -0700, Andrew Morton <akpm@osdl.org> said:

  >> If we really believe copy_*_user() must correctly handle *all* faults,
  >> isn't the "p >= __pa(high_memory)" test superfluous?

  Andrew> This code was conceived before my time and I don't recall seeing much
  Andrew> discussion, so this is all guesswork..

  Andrew> I'd say that the high_memory test _is_ superfluous and that
  Andrew> if anyone cared, we would remove it and establish a
  Andrew> temporary pte against the address if it was outside the
  Andrew> direct-mapped area.  But nobody cares enough to have done
  Andrew> anything about it.

What about memory-mapped device registers?  Isn't all memory
physically contiguous on x86 and that's why the "p >=
__pa(high_memory)" test saves you from that?

  >> On ia64, a read to non-existent physical memory causes the processor
  >> to time out and take a machine check.  I'm not sure it's even possible
  >> to recover from that.

  Andrew> ick.  That would be very poor form.

Reasonable people can disagree on that.  One philosophy states that if
your kernel touches random addresses, it's better to signal a visible
error (machine-check) than to risk silent data corruption.

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  0:21       ` David Mosberger
@ 2003-10-18  0:49         ` Andrew Morton
  2003-10-18  1:31           ` Matt Chapman
                             ` (2 more replies)
  2003-10-20 15:17         ` Bjorn Helgaas
  2003-10-20 15:58         ` fielding PCI bus timeouts - was: " Rich Altmaier
  2 siblings, 3 replies; 23+ messages in thread
From: Andrew Morton @ 2003-10-18  0:49 UTC (permalink / raw)
  To: davidm; +Cc: bjorn.helgaas, linux-ia64, linux-kernel

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> >>>>> On Fri, 17 Oct 2003 16:55:43 -0700, Andrew Morton <akpm@osdl.org> said:
> 
>   >> If we really believe copy_*_user() must correctly handle *all* faults,
>   >> isn't the "p >= __pa(high_memory)" test superfluous?
> 
>   Andrew> This code was conceived before my time and I don't recall seeing much
>   Andrew> discussion, so this is all guesswork..
> 
>   Andrew> I'd say that the high_memory test _is_ superfluous and that
>   Andrew> if anyone cared, we would remove it and establish a
>   Andrew> temporary pte against the address if it was outside the
>   Andrew> direct-mapped area.  But nobody cares enough to have done
>   Andrew> anything about it.
> 
> What about memory-mapped device registers?  Isn't all memory
> physically contiguous on x86 and that's why the "p >=
> __pa(high_memory)" test saves you from that?

We _want_ to be able to read mmio ranges via /dev/mem, don't we?  I guess
it has never come up because everyone uses kmem.

>   >> On ia64, a read to non-existent physical memory causes the processor
>   >> to time out and take a machine check.  I'm not sure it's even possible
>   >> to recover from that.
> 
>   Andrew> ick.  That would be very poor form.
> 
> Reasonable people can disagree on that.

nah ;)

> One philosophy states that if
> your kernel touches random addresses, it's better to signal a visible
> error (machine-check) than to risk silent data corruption.

An access to an illegal address should generate a fault, period.  This puts
the processing into the hands of software.  If software chooses to silently
ignore the fault (ie: "silent data corruption") then it is poorly designed.

If the hardware doesn't give the system programmer a choice then the
hardware is poorly designed, surely?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  0:49         ` Andrew Morton
@ 2003-10-18  1:31           ` Matt Chapman
  2003-10-18  1:41             ` Andrew Morton
  2003-10-18  1:48           ` David Mosberger
  2003-10-19 11:25           ` Eric W. Biederman
  2 siblings, 1 reply; 23+ messages in thread
From: Matt Chapman @ 2003-10-18  1:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, bjorn.helgaas, linux-ia64, linux-kernel

On Fri, Oct 17, 2003 at 05:49:55PM -0700, Andrew Morton wrote:
> David Mosberger <davidm@napali.hpl.hp.com> wrote:
> >
> > One philosophy states that if
> > your kernel touches random addresses, it's better to signal a visible
> > error (machine-check) than to risk silent data corruption.
> 
> An access to an illegal address should generate a fault, period.  This puts
> the processing into the hands of software.  If software chooses to silently
> ignore the fault (ie: "silent data corruption") then it is poorly designed.

It *does* signal a fault, in the form of a machine check.  On other
architectures I'm familiar with this is usually implemented as an
interrupt, but the idea is similar - when the system bus controller
detects a bad address on the bus, it returns all 1s (for a read) and
signals an interrupt.  Usually you can turn this interrupt off (and
most likely you can on Itanium chipsets too) but that is not a good
idea.

The problem is that this interrupt is not synchronous with respect to
the instruction stream, and this makes it difficult for software to
recover from, particularly in a monolithic system like Linux where you
can't just terminate the faulting driver.  The best you can usually do
is to print the details and hope that it's a once-off.  It is not
something that you can sensibly use to abort copy_*_user.

In any case touching random addresses is just plain bad.  What if
there's a device mapped there which happens to have read side effects
like clearing the interrupt cause, so e.g. every time you read /dev/mem
you cause a timeout on your SCSI bus :)

Matt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  1:31           ` Matt Chapman
@ 2003-10-18  1:41             ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2003-10-18  1:41 UTC (permalink / raw)
  To: Matt Chapman; +Cc: davidm, bjorn.helgaas, linux-ia64, linux-kernel

Matt Chapman <matthewc@cse.unsw.edu.au> wrote:
>
> It *does* signal a fault, in the form of a machine check.  On other
>  architectures I'm familiar with this is usually implemented as an
>  interrupt, but the idea is similar - when the system bus controller
>  detects a bad address on the bus, it returns all 1s (for a read) and
>  signals an interrupt.

I see.  That seems OK.  At least it means that software can implement a
probing function and that the time interval between hot unplug and handling
the unplug interrupt can be managed appropriately.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  0:49         ` Andrew Morton
  2003-10-18  1:31           ` Matt Chapman
@ 2003-10-18  1:48           ` David Mosberger
  2003-10-18  2:01             ` Andrew Morton
  2003-10-18  2:01             ` Matt Chapman
  2003-10-19 11:25           ` Eric W. Biederman
  2 siblings, 2 replies; 23+ messages in thread
From: David Mosberger @ 2003-10-18  1:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, bjorn.helgaas, linux-ia64, linux-kernel

>>>>> On Fri, 17 Oct 2003 17:49:55 -0700, Andrew Morton <akpm@osdl.org> said:

  Andrew> We _want_ to be able to read mmio ranges via /dev/mem, don't
  Andrew> we?  I guess it has never come up because everyone uses
  Andrew> kmem.

I just don't see how making a "dd if=/dev/mem" safe and allowing
access to arbitrary physical memory can go to together.  Given that
/dev/mem _is_ being used for accessing mmio space, is it really worth
bothering trying to make such a "dd" safe?

  Andrew> If the hardware doesn't give the system programmer a choice
  Andrew> then the hardware is poorly designed, surely?

Emh, we're talking about _physical_ memory accesses here.  AFAIK,
failures on physical memory accesses are never signaled with
synchronous faults (not on any reasonably modern high performance
architecture, at least).  Loads probably _could_ be signalled
synchronously, but consider stores: would you really want to wait with
retiring a store until it has made it all the way to some slow ISA
device?  I think not (IN/OUT do that).  No, modern CPUs check the
TLB/page-table and if that check passes, they'll _assume_ the memory
access will complete without errors.  If it doesn't, they signal an
asynchronous failure (e.g., via an MCA).

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  1:48           ` David Mosberger
@ 2003-10-18  2:01             ` Andrew Morton
  2003-10-18  2:01             ` Matt Chapman
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2003-10-18  2:01 UTC (permalink / raw)
  To: davidm; +Cc: bjorn.helgaas, linux-ia64, linux-kernel

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> >>>>> On Fri, 17 Oct 2003 17:49:55 -0700, Andrew Morton <akpm@osdl.org> said:
> 
>   Andrew> We _want_ to be able to read mmio ranges via /dev/mem, don't
>   Andrew> we?  I guess it has never come up because everyone uses
>   Andrew> kmem.
> 
> I just don't see how making a "dd if=/dev/mem" safe and allowing
> access to arbitrary physical memory can go to together.  Given that
> /dev/mem _is_ being used for accessing mmio space, is it really worth
> bothering trying to make such a "dd" safe?

Possibly not.  I thought that simply oopsing the kernel was a bit rude, and
fixing ia32 to not do that was relatively simple.

We should, within reason, handle it as gracefully as possible, yes?

>   Andrew> If the hardware doesn't give the system programmer a choice
>   Andrew> then the hardware is poorly designed, surely?
> 
> Emh, we're talking about _physical_ memory accesses here.  AFAIK,
> failures on physical memory accesses are never signaled with
> synchronous faults (not on any reasonably modern high performance
> architecture, at least).  Loads probably _could_ be signalled
> synchronously, but consider stores: would you really want to wait with
> retiring a store until it has made it all the way to some slow ISA
> device?  I think not (IN/OUT do that).  No, modern CPUs check the
> TLB/page-table and if that check passes, they'll _assume_ the memory
> access will complete without errors.  If it doesn't, they signal an
> asynchronous failure (e.g., via an MCA).

If the not-present memory is marked cacheable and/or writeback then yes,
but that would be an odd thing to do, wouldn't it?

It the memory is mapped noncacheable then a synchronous error on a read
sounds reasonable.  A synchronous error on a write would assume that the
noncacheability affects the write buffers and IIRC that usually doesn't
happen(?).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  1:48           ` David Mosberger
  2003-10-18  2:01             ` Andrew Morton
@ 2003-10-18  2:01             ` Matt Chapman
  1 sibling, 0 replies; 23+ messages in thread
From: Matt Chapman @ 2003-10-18  2:01 UTC (permalink / raw)
  To: davidm; +Cc: Andrew Morton, bjorn.helgaas, linux-ia64, linux-kernel

On Fri, Oct 17, 2003 at 06:48:33PM -0700, David Mosberger wrote:
> >>>>> On Fri, 17 Oct 2003 17:49:55 -0700, Andrew Morton <akpm@osdl.org> said:
> 
>   Andrew> We _want_ to be able to read mmio ranges via /dev/mem, don't
>   Andrew> we?  I guess it has never come up because everyone uses
>   Andrew> kmem.
> 
> I just don't see how making a "dd if=/dev/mem" safe and allowing
> access to arbitrary physical memory can go to together.  Given that
> /dev/mem _is_ being used for accessing mmio space, is it really worth
> bothering trying to make such a "dd" safe?

Usually people who want to access MMIO devices would use mmap rather
than read/write.

But I do agree that the current semantics are not unreasonable and the
user should think about what they're doing before accessing random parts
of /dev/mem.

Matt


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  0:49         ` Andrew Morton
  2003-10-18  1:31           ` Matt Chapman
  2003-10-18  1:48           ` David Mosberger
@ 2003-10-19 11:25           ` Eric W. Biederman
  2003-10-19 19:01             ` William Lee Irwin III
  2 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2003-10-19 11:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, bjorn.helgaas, linux-ia64, linux-kernel

Andrew Morton <akpm@osdl.org> writes:

> David Mosberger <davidm@napali.hpl.hp.com> wrote:
> >
> > >>>>> On Fri, 17 Oct 2003 16:55:43 -0700, Andrew Morton <akpm@osdl.org> said:
> > 
> >   >> If we really believe copy_*_user() must correctly handle *all* faults,
> >   >> isn't the "p >= __pa(high_memory)" test superfluous?
> > 
> > Andrew> This code was conceived before my time and I don't recall seeing much
> 
> >   Andrew> discussion, so this is all guesswork..
> > 
> >   Andrew> I'd say that the high_memory test _is_ superfluous and that
> >   Andrew> if anyone cared, we would remove it and establish a
> >   Andrew> temporary pte against the address if it was outside the
> >   Andrew> direct-mapped area.  But nobody cares enough to have done
> >   Andrew> anything about it.

This can be useful for the case of > 4GB on a 32bit box.  But for mmio
it is useless.

> > What about memory-mapped device registers?  Isn't all memory
> > physically contiguous on x86 and that's why the "p >=
> > __pa(high_memory)" test saves you from that?
> 
> We _want_ to be able to read mmio ranges via /dev/mem, don't we?  I guess
> it has never come up because everyone uses kmem.

No.

sys_read/sys_write does not give you enough control to write a device driver
from user space if you want to access something other than RAM you need to
mmap /dev/mem and issue the appropriate read/write commands.  

On ia64 you can send the variant that won't generate a machine check if it
fails.  You can get the alignment right etc, etc.  

And even if you can get the interface right from user space to make mmio
work through sys_read/sys_write it will still be slow and silly to use.
And since it is useless it will go unused and then regress in
strange peculiar unexpected ways.

We do have all of the information we need in struct page to see if a
page address is valid, so checking that is reasonable.  I suspect it
will require some interesting variant of pfn_to_page to handle of the
weird sparse memory locations properly.

Of course it might just be reasonable to require knowing and
responsible use of /dev/mem.  Whatever you are doing.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 22:50 ` Andrew Morton
  2003-10-17 23:25   ` Bjorn Helgaas
@ 2003-10-19 18:17   ` Pavel Machek
  2003-10-23  8:33     ` Martin Pool
  1 sibling, 1 reply; 23+ messages in thread
From: Pavel Machek @ 2003-10-19 18:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bjorn Helgaas, linux-ia64, linux-kernel

Hi!

> > Old behavior:
> > 
> >     # dd if=/dev/mem of=/dev/null
> >     <unrecoverable machine check>
> 
> I recently fixed this for ia32 by changing copy_to_user() to not oops if
> the source address generated a fault.  Similarly copy_from_user() returns
> an error if the destination generates a fault.

Are you sure this is not hiding real errors? If you pass wrong
kernel ptr to copy_*_user, it should oops, not mask error with
-EFAULT.
Maybe another copy_user_unsafe should be created?
				Pavel
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-19 11:25           ` Eric W. Biederman
@ 2003-10-19 19:01             ` William Lee Irwin III
  0 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-10-19 19:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, davidm, bjorn.helgaas, linux-ia64, linux-kernel

On Sun, Oct 19, 2003 at 05:25:37AM -0600, Eric W. Biederman wrote:
> We do have all of the information we need in struct page to see if a
> page address is valid, so checking that is reasonable.  I suspect it
> will require some interesting variant of pfn_to_page to handle of the
> weird sparse memory locations properly.

It would be best to check the pfn before attempting to convert it to a
struct page. The struct page * returned by arch code will be garbage in
most instances, as none of the routines actually check validity
internally. pfn_valid() is even bogus on most of them, so you'll have
to walk pgdats by hand for this. The pfn_valid() checks work most of the
time on PC's, but the first time someone runs X on a box with discontig
and a bogus pfn_valid() they'll get fireworks (and in fact, it's already
happened, but wasn't posted to lkml).

-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-18  0:21       ` David Mosberger
  2003-10-18  0:49         ` Andrew Morton
@ 2003-10-20 15:17         ` Bjorn Helgaas
  2003-10-20 18:48           ` David Mosberger
  2003-10-20 15:58         ` fielding PCI bus timeouts - was: " Rich Altmaier
  2 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2003-10-20 15:17 UTC (permalink / raw)
  To: davidm, David Mosberger, Andrew Morton; +Cc: linux-ia64, linux-kernel

On Friday 17 October 2003 6:21 pm, David Mosberger wrote:
> What about memory-mapped device registers?  Isn't all memory
> physically contiguous on x86 and that's why the "p >=
> __pa(high_memory)" test saves you from that?

As others have mentioned, using read/write on /dev/mem to get at
memory-mapped registers is unlikely to work on ia64 anyway, because
read/write use cacheable mappings.  Using mmap does work (using
uncacheable mappings), and my patch doesn't change that path.

>   >> On ia64, a read to non-existent physical memory causes the processor
>   >> to time out and take a machine check.  I'm not sure it's even possible
>   >> to recover from that.
> 
>   Andrew> ick.  That would be very poor form.
> 
> Reasonable people can disagree on that.  One philosophy states that if
> your kernel touches random addresses, it's better to signal a visible
> error (machine-check) than to risk silent data corruption.

It occurred to me over the weekend that part of this confusion is
related to the fact that ia64 doesn't have page tables for the
kernel identity-mapped segments.  (We're talking about reading
physical memory, but read/write_mem() actually convert the address
using __va() before doing the copy.)

I bet that ia32 does have page tables for this case, and that
an attempt to read non-existent physical memory will cause a TLB
miss from which copy_*_user() can easily recover.

On ia64, the same TLB miss would occur, but since there are no page
tables, the miss handler assumes the kernel knows what it is doing
and happily synthesizes a mapping that points nowhere.

Bjorn

^ permalink raw reply	[flat|nested] 23+ messages in thread

* fielding PCI bus timeouts - was: prevent "dd if=/dev/mem" crash
  2003-10-18  0:21       ` David Mosberger
  2003-10-18  0:49         ` Andrew Morton
  2003-10-20 15:17         ` Bjorn Helgaas
@ 2003-10-20 15:58         ` Rich Altmaier
  2 siblings, 0 replies; 23+ messages in thread
From: Rich Altmaier @ 2003-10-20 15:58 UTC (permalink / raw)
  To: linux-ia64, linux-kernel

Just a note to mention experience with handling hardware failures.

For the case of user mappings to IO buses there are important
classes of aps, usually realtime or data acquisition, that
benefit from nice error handling of bus timeouts at user level.

These aps tend to be using old or prototype hardware, which can
fail (cause a bus timeout) during "normal" operation.
Or at least the user view is that the machine should not crash
due to "one flaky board".   Hence there is merit in being
able to translate a PIO-read bus timeout to say a SIGBUS.

More interesting is the case of PIO-write failure, as the writes
can be asynchronous.  Meaning by the time the hardware recognizes
a failure, the CPU's store instruction has graduated and the
CPU has moved on.  Perhaps gone through a context switch or
even exitted the user process.  In this case something more
than a SIGBUS is needed (IRIX has several options to deal with
this).  

On the thread about dd if=/dev/mem, I don't know of any legitmate
reason that user code needs to successfully recover from reading
non-existant phys memory.  I would suggest the princple that
bad user code should not cause a crash, and bad user code that
does a lot of reads would be thought harmless by many dangerous
users.  So some kind of error to the user process is probably
reasonable, perhaps SIGBUS.
Silently returning 0 doesn't sound right, as if there were any 
legitimate reason for this code in the first place, it probably
relates to some search of the physical address space.  Perhaps
a diagnostic.

FYI, Rich

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-17 23:55     ` Andrew Morton
  2003-10-18  0:15       ` William Lee Irwin III
  2003-10-18  0:21       ` David Mosberger
@ 2003-10-20 17:42       ` Bjorn Helgaas
  2003-10-23 21:05         ` Bjorn Helgaas
  2 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2003-10-20 17:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-ia64, linux-kernel

On Friday 17 October 2003 5:55 pm, Andrew Morton wrote:
> Still, the code you have is quite reasonable.  But please structure it
> thusly:
> ...

Here's a patch structured that way.

> As for return values: if the requested read or write starts at a
> not-present address it should probably return -EFAULT.  This is what ia32
> will do.  Arguably this is indistinguishable from a bad address on the
> userspace side and we should return -EINVAL but whatever.

I made it return -EFAULT.  I worry a little bit because ia32
returned 0 (short read) when (addr >= high_memory) before,
but I don't have a strong opinion one way or the other.

===== drivers/char/mem.c 1.44 vs edited =====
--- 1.44/drivers/char/mem.c	Sun Sep 21 15:50:34 2003
+++ edited/drivers/char/mem.c	Mon Oct 20 10:43:08 2003
@@ -79,6 +79,22 @@
 #endif
 }
 
+#ifndef ARCH_HAS_VALID_PHYS_ADDR_RANGE
+static inline int valid_phys_addr_range(unsigned long addr, size_t *count)
+{
+	unsigned long end_mem;
+
+	end_mem = __pa(high_memory);
+	if (addr >= end_mem)
+		return 0;
+
+	if (*count > end_mem - addr)
+		*count = end_mem - addr;
+
+	return 1;
+}
+#endif
+
 static ssize_t do_write_mem(struct file * file, void *p, unsigned long realp,
 			    const char * buf, size_t count, loff_t *ppos)
 {
@@ -113,14 +129,10 @@
 			size_t count, loff_t *ppos)
 {
 	unsigned long p = *ppos;
-	unsigned long end_mem;
 	ssize_t read;
 
-	end_mem = __pa(high_memory);
-	if (p >= end_mem)
-		return 0;
-	if (count > end_mem - p)
-		count = end_mem - p;
+	if (!valid_phys_addr_range(p, &count))
+		return -EFAULT;
 	read = 0;
 #if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
 	/* we don't have page 0 mapped on sparc and m68k.. */
@@ -149,13 +161,9 @@
 			 size_t count, loff_t *ppos)
 {
 	unsigned long p = *ppos;
-	unsigned long end_mem;
 
-	end_mem = __pa(high_memory);
-	if (p >= end_mem)
-		return 0;
-	if (count > end_mem - p)
-		count = end_mem - p;
+	if (!valid_phys_addr_range(p, &count))
+		return -EFAULT;
 	return do_write_mem(file, __va(p), p, buf, count, ppos);
 }
 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-20 15:17         ` Bjorn Helgaas
@ 2003-10-20 18:48           ` David Mosberger
  0 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2003-10-20 18:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: davidm, David Mosberger, Andrew Morton, linux-ia64, linux-kernel

>>>>> On Mon, 20 Oct 2003 09:17:10 -0600, Bjorn Helgaas <bjorn.helgaas@hp.com> said:

  Bjorn> On Friday 17 October 2003 6:21 pm, David Mosberger wrote:
  >> What about memory-mapped device registers?  Isn't all memory
  >> physically contiguous on x86 and that's why the "p >=
  >> __pa(high_memory)" test saves you from that?

  Bjorn> As others have mentioned, using read/write on /dev/mem to get
  Bjorn> at memory-mapped registers is unlikely to work on ia64
  Bjorn> anyway, because read/write use cacheable mappings.  Using
  Bjorn> mmap does work (using uncacheable mappings), and my patch
  Bjorn> doesn't change that path.

True, I just find this whole thing rather disgusting: different
behavior for read/write vs. mmap; but you're right, it's nothing
new and probably the most pragmatic "solution".

  Bjorn> I bet that ia32 does have page tables for this case, and that
  Bjorn> an attempt to read non-existent physical memory will cause a
  Bjorn> TLB miss from which copy_*_user() can easily recover.

True, but even this doesn't help for MMIO space, where a device may
decode less than a full page.  So you'd still end up accessing address
holes.  Just that x86 returns garbage in that case (I think).

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-19 18:17   ` Pavel Machek
@ 2003-10-23  8:33     ` Martin Pool
  2003-10-23  9:31       ` Zoltan Menyhart
  0 siblings, 1 reply; 23+ messages in thread
From: Martin Pool @ 2003-10-23  8:33 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrew Morton, Bjorn Helgaas, linux-ia64, linux-kernel

On 19 Oct 2003, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
> 
> > > Old behavior:
> > > 
> > >     # dd if=/dev/mem of=/dev/null
> > >     <unrecoverable machine check>
> > 
> > I recently fixed this for ia32 by changing copy_to_user() to not oops if
> > the source address generated a fault.  Similarly copy_from_user() returns
> > an error if the destination generates a fault.
> 
> Are you sure this is not hiding real errors? If you pass wrong
> kernel ptr to copy_*_user, it should oops, not mask error with
> -EFAULT.
> Maybe another copy_user_unsafe should be created?

I think the problem is that reading memory that is mapped but doesn't
physically exist causes a Machine Check Assertion (like an NMI) rather
than a regular fault.

--
Martin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-23  8:33     ` Martin Pool
@ 2003-10-23  9:31       ` Zoltan Menyhart
  0 siblings, 0 replies; 23+ messages in thread
From: Zoltan Menyhart @ 2003-10-23  9:31 UTC (permalink / raw)
  To: linux-ia64, linux-kernel

Some machines may require special memory zones, e.g. for ia64
architectures you need to keep the "minimal state save" area
for the Processor Abstraction Layer in un-cached memory.
If you read the memory "in the usual way" then you access the
memory through the HW caches.
The ia64 architecture forbids to have both cached and un-cached
access to the same memory location (by any of the CPUs, DMAs),
otherwise you create a cache paradox => machine check.
Think twice before even trying a "dd if=/dev/mem"...

Zoltan Menyhart

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] prevent "dd if=/dev/mem" crash
  2003-10-20 17:42       ` [RFC] " Bjorn Helgaas
@ 2003-10-23 21:05         ` Bjorn Helgaas
  0 siblings, 0 replies; 23+ messages in thread
From: Bjorn Helgaas @ 2003-10-23 21:05 UTC (permalink / raw)
  To: David Mosberger; +Cc: linux-ia64, linux-kernel

The drivers/char/mem.c change was accepted, so here's the
ia64-specific part.

===== arch/ia64/kernel/efi.c 1.26 vs edited =====
--- 1.26/arch/ia64/kernel/efi.c	Tue Oct 21 18:52:48 2003
+++ edited/arch/ia64/kernel/efi.c	Thu Oct 23 14:35:08 2003
@@ -711,6 +711,32 @@
 	return 0;
 }
 
+int
+valid_phys_addr_range (unsigned long phys_addr, unsigned long *size)
+{
+	void *efi_map_start, *efi_map_end, *p;
+	efi_memory_desc_t *md;
+	u64 efi_desc_size;
+
+	efi_map_start = __va(ia64_boot_param->efi_memmap);
+	efi_map_end   = efi_map_start + ia64_boot_param->efi_memmap_size;
+	efi_desc_size = ia64_boot_param->efi_memdesc_size;
+
+	for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
+		md = p;
+
+		if (phys_addr - md->phys_addr < (md->num_pages << EFI_PAGE_SHIFT)) {
+			if (!(md->attribute & EFI_MEMORY_WB))
+				return 0;
+
+			if (*size > md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - phys_addr)
+				*size = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - phys_addr;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 static void __exit
 efivars_exit (void)
 {
===== include/asm-ia64/io.h 1.17 vs edited =====
--- 1.17/include/asm-ia64/io.h	Wed Aug 20 00:13:39 2003
+++ edited/include/asm-ia64/io.h	Thu Oct 23 14:32:42 2003
@@ -72,6 +72,9 @@
 	return (void *) (address + PAGE_OFFSET);
 }
 
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range (unsigned long addr, size_t *count); /* efi.c */
+
 /*
  * The following two macros are deprecated and scheduled for removal.
  * Please use the PCI-DMA interface defined in <asm/pci.h> instead.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-10-23 21:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-17 22:10 [RFC] prevent "dd if=/dev/mem" crash Bjorn Helgaas
2003-10-17 22:23 ` Matt Mackall
2003-10-17 22:50 ` Andrew Morton
2003-10-17 23:25   ` Bjorn Helgaas
2003-10-17 23:55     ` Andrew Morton
2003-10-18  0:15       ` William Lee Irwin III
2003-10-18  0:21       ` David Mosberger
2003-10-18  0:49         ` Andrew Morton
2003-10-18  1:31           ` Matt Chapman
2003-10-18  1:41             ` Andrew Morton
2003-10-18  1:48           ` David Mosberger
2003-10-18  2:01             ` Andrew Morton
2003-10-18  2:01             ` Matt Chapman
2003-10-19 11:25           ` Eric W. Biederman
2003-10-19 19:01             ` William Lee Irwin III
2003-10-20 15:17         ` Bjorn Helgaas
2003-10-20 18:48           ` David Mosberger
2003-10-20 15:58         ` fielding PCI bus timeouts - was: " Rich Altmaier
2003-10-20 17:42       ` [RFC] " Bjorn Helgaas
2003-10-23 21:05         ` Bjorn Helgaas
2003-10-19 18:17   ` Pavel Machek
2003-10-23  8:33     ` Martin Pool
2003-10-23  9:31       ` Zoltan Menyhart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).