public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: __get_free_pages(): is the MEM really mine?
@ 2001-09-27 10:06 Bernd Harries
  2001-09-27 13:00 ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Harries @ 2001-09-27 10:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Thanks, Ingo.

> perfectly legal - but there is no guarantee you will succeed getting two
> nearby 2 MB pages. You will get it if your driver initializes during

Yes, I try until I have enough (2 in 2.4.x, 32 in 2.2.x) contig chunks and
free the rest again. But statistically I do get 2 contig chunks immediately
quite often if the X11 is not already running. 256 MB is in the box

> > When I run the user appl. again after short time I mostly get the same
> > chunk of physical memory (virt_to_bus is identical!)
> 
> have you perpahs freed that page?

Yes, as expected.

> printk every occasion of
> allocating/freeing a 2 MB buffer and i'm sure you'll see the problem.
> (Perhaps it's the close() implicitly done by exit() that frees the
> buffer?)

Yes, that is definitely the case and I expect it. 

> >  Now close '/dev/aprsc027' fd = 3 ...

The test program does a close and on the console 

But I tend to conclude from getting the same phys address again after some
time that noone else uses much memory inbetween. Plus, the first page of the
area stays Zero all the time while the higher pages seem to be used by
someone. I know that this is no prove that the 1st page was really not used
otherwise but... And I know it is also legal that other procs use the very same RAM.
But the prob is that the system gets unstable. And it doesn't get unstable if
order count is 0, which i use in minor 0..23 to allocate a small kernel
buffer. Only minor 26 and 27 allocate a 4 MB contig buffer in open() and mmap
that buffer to user space, while minor 28 and 29 only allocate a small buffer to
write the FIFOs and mmap the 32 MB PCI area of the card.

The impression I have is that only large allocations behave strangely. But
the instability is not visible immediately. Too bad. Only after some time do I
see strange behaviour of the system. But I think I don't see them if I only
use the functionality of the minors with smaller buffers.

could my nopage() method be inmplemented wrongly?
I read Alessandro Rubini's book I learned how to implement it:

  chn_ptr = (struct RSC_SOFTC *)vma_ptr->vm_private_data;
  card_ptr = chn_ptr->card_ptr;
  minor = chn_ptr->minor;
  card_chn = minor & APRSC_CARD_CHNS_MASK;

  page_ptr = NOPAGE_SIGBUS;
  
  Iprintf(" address=$%08lX ad - vm_start=$%08lX VMA_OFFSET=$%08lX \n",
    address,
    address - vma_ptr->vm_start,
    vma_ptr->vm_pgoff << PAGE_SHIFT);
  
  offset = address - vma_ptr->vm_start + (vma_ptr->vm_pgoff << PAGE_SHIFT);

  if(card_chn == APRSC_DEV_PER_CARD - 6)  /* Bild 1 Ch 26 dieser Karte */
  {
    if(offset > card_ptr->contig_len0)
    {
      return(page_ptr);
    }
    /*endif()*/
    start = (ULONG)card_ptr->dma_mem0;
  }
  else if(card_chn == APRSC_DEV_PER_CARD - 5)  /* Bild 2 Ch 27 dieser Karte
*/
  {
    if(offset > card_ptr->contig_len1)
    {
      return(page_ptr);
    }
    /*endif()*/
    start = (ULONG)card_ptr->dma_mem1;
  }
  else
  {
    return(page_ptr);
  }
  /*endif(card_chn == APRSC_DEV_PER_CARD - [(>=7), (<=4)] usw.)*/
  page_ptr = virt_to_page(start + offset);
  Iprintf(" start+off=$%08lX page_ptr=$%8p \n",
    start + offset,
    page_ptr);
  get_page(page_ptr);
  
  return(page_ptr);



Here is the console output of my driver during the test program:

Sep 27 11:43:28 pcma73 kernel: rsc_open() minor=$1B 
Sep 27 11:43:28 pcma73 kernel:  DMA blk 0 at KV:$CE800000 BUS:$0E800000 
Sep 27 11:43:28 pcma73 kernel:  DMA blk 1 at KV:$CE600000 BUS:$0E600000
contig < 
Sep 27 11:43:28 pcma73 kernel:  Max Buffer Frag at BUS:$0E600000 len
$00400000 bytes 
Sep 27 11:43:28 pcma73 kernel:  Collected DMA Buffer1 at KS:$0000CE600000
BUS:$0E600000 len $00400000 bytes 
Sep 27 11:43:28 pcma73 kernel: rsc_ioctl()
Sep 27 11:43:28 pcma73 kernel:  RSC_IOC_GET_FIX: copy_to_user() returned $0 
Sep 27 11:43:28 pcma73 kernel: rsc_ioctl()
Sep 27 11:43:28 pcma73 kernel: rsc_mmap()  minor=$1B  offset=$00000000 
Sep 27 11:43:28 pcma73 kernel: rsc_vma_open()
Sep 27 11:43:28 pcma73 kernel: rsc_nopage()
Sep 27 11:43:28 pcma73 kernel:  address=$40132000 ad - vm_start=$00000000
VMA_OFFSET=$00000000 
Sep 27 11:43:28 pcma73 kernel:  start+off=$CE600000 page_ptr=$c1398000 
Sep 27 11:43:28 pcma73 kernel: rsc_nopage()
Sep 27 11:43:28 pcma73 kernel:  address=$40134000 ad - vm_start=$00002000
VMA_OFFSET=$00000000 
Sep 27 11:43:28 pcma73 kernel:  start+off=$CE602000 page_ptr=$c1398080 
Sep 27 11:43:28 pcma73 kernel: rsc_ioctl()
Sep 27 11:43:28 pcma73 kernel:  RSC_IOC_DMA_OUT
Sep 27 11:43:28 pcma73 kernel: rsc_vma_close()
Sep 27 11:43:28 pcma73 kernel: rsc_close()
Sep 27 11:43:28 pcma73 kernel:  PCIRSC: DMA0CSR=$10 ok.
Sep 27 11:43:28 pcma73 kernel:  PCIRSC: DMA1CSR=$10 ok.
Sep 27 11:43:28 pcma73 kernel:  PCIRSC: PCISR=$0290 ok.
Sep 27 11:43:28 pcma73 kernel:  Free DMA blk 0 at KS:$CE800000 
Sep 27 11:43:28 pcma73 kernel:  Free DMA blk 1 at KS:$CE600000 


Thank you very much for your help!

-- 
Bernd Harries

bha@gmx.de            http://bharries.freeyellow.com
bharries@web.de       Tel. +49 421 809 7343 priv.  | MSB First!
harries@stn-atlas.de       +49 421 457 3966 offi.  | Linux-m68k
bernd@linux-m68k.org       +49 172 139 6054 handy  | Medusa T40

GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: __get_free_pages(): is the MEM really mine?
@ 2001-10-01 11:33 Bernd Harries
  2001-10-05 12:55 ` Hugh Dickins
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Harries @ 2001-10-01 11:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Ingo Molnar wrote:

> > Is there a guarantee that the n - 1 pages above the 1st one are not
> > donated to other programs while my driver uses them?
> 
> yes. The 2MB block of 512 x 4k pages (we should perhaps call it a 'order 9
> page') is yours.

I think I have to demonstrate to you how my driver behaves in reality.

Too bad the driver would in the moment not allow any open() without at least
a PLX RDK Lite evaluation board... It would be possible to modify it to
allow opens even there is no card. Or to malloc a 4 MB buffer also for the minor
31 device, which is my dummy test minor that needs no HW.

Of course you couldn't use the PLX DMA engine then. But you could still mmap
the RAM to user space.

An alternative to sending you a driver (which could make your box instable
temporaryly) is to let you use my Linux box at home. Damn, why didn't I let
you log in from Oldenburg... I forgot about that possibility. I took a PLX eval
board home with me already friday, because here I have the real RSC cards
already.

What do you think?


> > I'll move the code to init_module later once it is stable.
> 
> even init_module() can be executed much later: eg. kmod removes the module
> because it's unused, and it's reinserted later. So generally it's really
> unrobust to expect a 9th order allocation to succeed at module_init()
> time.

For our application (dedicated System) I could guarantee even that.

> the fundamental issue is not the lazyness of Linux VM developers. 99.9% of
> all allocations are order 0. 99.9% of the remaining allocations are order
> 1 or 2.

I wonder why only I see problems so far. Maybe it's because I also mmap()
that RAM to user space?



> (later on we could even add support to grow and shrink the size of the
> physical memory pool (within certain boundaries), so it could be sized
> boot-time.)
> 
> would anything like this be useful? Since it's a completely separate pool
> (in fact it wont even show up in the normal memory statistics), it does
> not disturb the existing VM in any way.

It would'nt even be needed in the moment. The 9-order get_free_pages() does
not explicitly fail. Not even during later open()s. If it would I would
simply add more RAM. (well, let the company pay it) 256 MB are in and that is
enough so far.

Later I will load the module explicitly right after boot and then it's
almost sure I will get the RAM.

Well, as I said, get_free_pages doesn't even fail! It just seems to allow
others to use the RAM before I free it again... Or it corrupts some kernel
structs during munmap(), which certainly decrements the usage counter of the
upper pages to 0 again.

For now I'll try to reproduce instability without using a DMA Hardware.

Thanks,

-- 
Bernd Harries

bha@gmx.de            http://bharries.freeyellow.com
bharries@web.de       Tel. +49 421 809 7343 priv.  | MSB First!
harries@stn-atlas.de       +49 421 457 3966 offi.  | Linux-m68k
bernd@linux-m68k.org       +49 172 139 6054 handy  | Medusa T40

GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: __get_free_pages(): is the MEM really mine?
@ 2001-09-27 14:19 Bernd Harries
  0 siblings, 0 replies; 17+ messages in thread
From: Bernd Harries @ 2001-09-27 14:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Ingo Molnar wrote:

> well - what did you expect to happen? A freed page is going to be reused
> for other purposes. A big 2MB allocation can be reused in part, once
> memory usage grows.

With my knowledge, I expected exactly that.

> So you should not expect the device to be able to DMA
> into a page that got freed, unpunished.

I am not. The DMA ioctl() finishes before the close() -> free happens after
the hexdump and the DMA. The buffer is allocated in open. The fact that I get
the same buffer again next time shows that the free is sucessful and
effective, right?

Sep 27 11:43:28 pcma73 kernel: rsc_open() minor=$1B 
Sep 27 11:43:28 pcma73 kernel:  DMA blk 0 at KV:$CE800000 BUS:$0E800000 
Sep 27 11:43:28 pcma73 kernel:  DMA blk 1 at KV:$CE600000 BUS:$0E600000
contig < 

Sep 27 11:43:28 pcma73 kernel:  Collected DMA Buffer1 at KS:$0000CE600000

Sep 27 11:43:28 pcma73 kernel: rsc_ioctl()
Sep 27 11:43:28 pcma73 kernel:  RSC_IOC_DMA_OUT

Sep 27 11:43:28 pcma73 kernel: rsc_close()

Sep 27 11:43:28 pcma73 kernel:  Free DMA blk 0 at KS:$CE800000 
Sep 27 11:43:28 pcma73 kernel:  Free DMA blk 1 at KS:$CE600000 

> Perhaps i'm misunderstanding the problem.

My problem is, I'm out of ideas. All I can think of is describe as much as
possible the relevant things that I do and the things that occur. Maybe
someone more experienced recognizes a principal flaw in the concept.

> Plus, if you allocate a 2MB
> physically continuous chunk then the likelyhood is high that there were
> fragmented pages skipped during the initial search for a 2MB block - so
> you still have a fair likelyhood to reallocate it after some time, if
> memory usage is light. But this likelyhood nears zero once RAM usage gets
> near 100%.

And I can rely on the fact that all the 2 MB are contig memory without
holes, right? It's completely mine, isn't it?
Or is it perhaps illegal to let the mem usage pump?
Should I better allocate the mem in init_module() instead of rsc_open()?
Probably page tables are more likely to get corrupted than they would be if
I allocate only once. Or do I have to use a spin_lock somewhere in the nopage
method?


>From my tests I'm ready the believe the 1st page really _is_ mine but now
I'm not so sure all the 
(1 << 9) pages really are.

If I don't access the pages, just allocate them and free them after some
time, I never saw any instabilities. But it seems that as soon as I access pages
above the 1st in the buffer, something gets corrupted. So maybe today it's
only legal to allocate 1 page at a time and I have to do that 
(1<<10) times...

Or maybe some of the VM trouble I read about recntly would also cover my
problems?

Thanks,

-- 
Bernd Harries

bha@gmx.de            http://bharries.freeyellow.com
bharries@web.de       Tel. +49 421 809 7343 priv.  | MSB First!
harries@stn-atlas.de       +49 421 457 3966 offi.  | Linux-m68k
bernd@linux-m68k.org       +49 172 139 6054 handy  | Medusa T40

GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


^ permalink raw reply	[flat|nested] 17+ messages in thread
* __get_free_pages(): is the MEM really mine?
@ 2001-09-27  8:56 Bernd Harries
  2001-09-27  9:15 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Bernd Harries @ 2001-09-27  8:56 UTC (permalink / raw)
  To: linux-kernel

Hi all,

this is my 4th try to post to the list. I didn't see any echo, so 
I try again. Sorry if you did see the msg earlier (yesterday)..

Is __get_free_pages() not enough to allocate memory in the kernel?
Seems like something else is using the same memory. Do I have to lock the
pages I allocated? 

I began with 2.4.6 on a dual CPU x86 box with 256 MB RAM and when I saw
probs I upgraded to 2.4.10. Still unstable.

In a driver I'm writing, in the open() method, I use multiple 
__get_free_pages() to allocate a 4 MB kernel (image)buffer for DMA purposes.
The buffer I get is contiguous (I try until it is) and is freed in
close(). Order count is 9.

When I run the user appl. again after short time I mostly get the 
same chunk of physical memory (virt_to_bus is identical!)

For access from userspace I implemented mmap() which uses the nopage()
method of the VMA. The user program hexdumps 256 bytes of the beginning
of the 4 MB buffer and 256 bytes of 0x2000 above the beginning.

After the hexdump fromm userspace I trigger a DMA engine to copy 
0x8000 bytes (4 * the offset of the 2nd hexdump) from my kernelbuffer to a
'Local RAM' on a PCI card. (For now I only copy out to be sure the
buffer is not modified)

I see mostly zeroes in both of the 2 hexdumps.

If I repeat the user program within seconds, suddenly the 2nd 
256 byte dump starts to change. Sometimes I see filenames of my harddisk
within the hexdump, looking like some directory listing. (e.g.
"/etc/ppp/options" ) Sometimes I see the contents of the printk buffer of
the kernel, sometimes stuff I cannot identify.

The dump form the first page seems to stay zero all the time. 
The bus address of the Buffer is the same each time.

I wouldn't bother about the changes if the system wouldn't seem 
to become compromised by the tests. Sometimes a dump occurs on the console
when I try to buid a new version of my driver module.
Sometimes the shell in which I started the test program gets logged out.

I have a feeling that the effect only occurs if the 2nd dump is beyond the
1st page of my kernel buffer.



Here is the output of my test program:

pcma73:/home/bharries/c/apr/>aprdma_shmw 0x8000 0 1
 open('/dev/aprsc027', ) seems ok! fd = 3 
 Get fix par 
 mmio: start=$DC800000 off=$00000000 len=$00001000 
 mem1: start=$E0000000 off=$00000000 len=$02000000 
 mem2: start=$DA000000 off=$00000000 len=$02000000 

 colcon_offs=$00000000 
 fifo1_offs =$01000000 
 fifo2_offs =$01100000 
 shm_offs   =$01400000 shm_ram_size=$00400000 
 hwcsr_offs =$01A00000 

 Get var par 
 rx_pmd_adr  =$00000000 rx_msg_typ =$00000000 
 tx_pmd_adr  =$00000000 tx_msg_typ =$00000000 
 dma_bus_adr0=$00000000 contig_len0=$00000000 
 dma_bus_adr1=$03800000 contig_len1=$00400000    <-- BUS Addr

 dma0=$00000000 len=$00000000 
 dma1=$40132000 len=$00400000           <-- mmapped User Addr

Diagnose Dump Adr=$40132000

:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                  
  
Diagnose Dump Adr=$40134000

:3D 24 30 30 30 30 30 30 30 30 20 0A 53 65 70 20  =$00000000 *Sep 
:32 36 20 31 32 3A 31 35 3A 30 34 20 70 63 6D 61  26 12:15:04 pcma
:37 33 20 6B 65 72 6E 65 6C 3A 20 20 73 74 61 72  73 kernel:  star
:74 2B 6F 66 66 3D 24 43 33 38 30 30 30 30 30 20  t+off=$C3800000 
:70 61 67 65 5F 70 74 72 3D 24 63 31 30 65 30 30  page_ptr=$c10e00
:30 30 20 0A 53 65 70 20 32 36 20 31 32 3A 31 35  00 *Sep 26 12:15
:3A 30 34 20 70 63 6D 61 37 33 20 6B 65 72 6E 65  :04 pcma73 kerne
:6C 3A 20 20 61 64 64 72 65 73 73 3D 24 34 30 31  l:  address=$401
:33 34 30 30 30 20 61 64 20 2D 20 76 6D 5F 73 74  34000 ad - vm_st
:61 72 74 3D 24 30 30 30 30 32 30 30 30 20 56 4D  art=$00002000 VM
:41 5F 4F 46 46 53 45 54 3D 24 30 30 30 30 30 30  A_OFFSET=$000000
:30 30 20 0A 53 65 70 20 32 36 20 31 32 3A 31 35  00 *Sep 26 12:15
:3A 30 34 20 70 63 6D 61 37 33 20 6B 65 72 6E 65  :04 pcma73 kerne
:6C 3A 20 20 73 74 61 72 74 2B 6F 66 66 3D 24 43  l:  start+off=$C
:33 38 30 32 30 30 30 20 70 61 67 65 5F 70 74 72  3802000 page_ptr
:3D 24 63 31 30 65 30 30 38 30 20 0A 00 00 00 00  =$c10e0080 *    
   Fill DMA ioctl struct 
 Local RAM write triggered. 
 Local RAM write end. 

 Now close '/dev/aprsc027' fd = 3 ...




-- 
-- 
Bernd Harries

bha@gmx.de            http://bharries.freeyellow.com
bharries@web.de       Tel. +49 421 809 7343 priv.  | MSB First!
harries@stn-atlas.de       +49 421 457 3966 offi.  | Linux-m68k
bernd@linux-m68k.org       +49 172 139 6054 handy  | Medusa T40

GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2001-10-05 15:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-27 10:06 __get_free_pages(): is the MEM really mine? Bernd Harries
2001-09-27 13:00 ` Ingo Molnar
2001-09-29 17:15   ` Bernd Harries
2001-09-30  7:27     ` Ingo Molnar
2001-09-30 12:59       ` Bernd Harries
2001-10-01  5:55         ` Ingo Molnar
2001-10-05  8:49           ` Bernd Harries
  -- strict thread matches above, loose matches on Subject: below --
2001-10-01 11:33 Bernd Harries
2001-10-05 12:55 ` Hugh Dickins
2001-10-05 13:32   ` Bernd Harries
2001-10-05 15:27     ` Hugh Dickins
2001-09-27 14:19 Bernd Harries
2001-09-27  8:56 Bernd Harries
2001-09-27  9:15 ` Ingo Molnar
2001-09-27  9:20 ` Ingo Molnar
2001-09-27 14:38 ` Eric W. Biederman
2001-09-29  7:32   ` Bernd Harries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox