public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Documenation/vm/numa
@ 2002-04-19  3:56 Mel
  2002-04-19  5:18 ` Martin J. Bligh
  0 siblings, 1 reply; 6+ messages in thread
From: Mel @ 2002-04-19  3:56 UTC (permalink / raw)
  To: linux-kernel

Below is a small extension of the numa file in the vm Documenation branch
which tries to give a brief explanation about pg_data_t and zone_t
structs. Patch is against 2.4.19pre7 but I think it'll apply to any 2.4.x
or 2.5.x kernel. No change of code etc etc etc. Comments, corrections and
opinions very welcome

			Mel



--- linux-2.4.19pre7.orig/Documentation/vm/numa	Fri Aug  4 19:23:37 2000
+++ linux-2.4.19pre7.mel/Documentation/vm/numa	Fri Apr 19 04:46:23 2002
@@ -1,4 +1,5 @@
 Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
+Appended Apr 2002 by Mel Gorman   <melcsn.ul.ie>

 The intent of this file is to have an uptodate, running commentary
 from different people about NUMA specific code in the Linux vm.
@@ -39,3 +40,123 @@
 NUMA port achieves more maturity. The call alloc_pages_node has been
 added, so that drivers can make the call and not worry about whether
 it is running on a NUMA or UMA platform.
+
+
+Nodes
+=====
+
+A node is described by the pg\_data\_t struct. Each can have one or more
+of the three zone types ZONE\_HIGHMEM, ZONE\_NORMAL and ZONE\_DMA. It can
+only have one zone of each type. It is the responsibility of the buddy
+allocator to make sure pages are allocated from the proper nodes.
+
+It is declared as
+
+typedef struct pglist_data {
+        zone_t node_zones[MAX_NR_ZONES];
+        zonelist_t node_zonelists[GFP_ZONEMASK+1];
+        int nr_zones;
+        struct page *node_mem_map;
+        unsigned long *valid_addr_bitmap;
+        struct bootmem_data *bdata;
+        unsigned long node_start_paddr;
+        unsigned long node_start_mapnr;
+        unsigned long node_size;
+        int node_id;
+        struct pglist_data *node_next;
+} pg_data_t;
+
+ node_zones        The zones for this node. Currently ZONE_HIGHMEM,
+                   ZONE_NORMAL, ZONE_DMA.
+
+ node_zonelists    This is the order of zones that allocations are
+		   preferred  from. build_zonelists() in page_alloc.c does
+		   the work when called by free_area_init_core(). So a failed
+		   allocation ZONE_HIGHMEM may fall back to ZONE_NORMAL
+		   or back to ZONE_DMA . See the buddy algorithm for details.
+
+ nr_zones          Number of zones in this node,  between 1 and 3
+
+ node_mem_map      The first page of the physical block this node represents
+
+ valid_addr_bitmap Not positive, a bitmap that shows where holes are in memory?
+
+ bdata             Used only when starting up the node. Mainly confined
+                   to bootmem.c
+
+ node_start_paddr  The starting physical addres of the node?
+
+ node_start_mapnr  This appears to be a "nice" place to put the zone inside
+                   the larger mem_map. It's set during
+                   free_page_init_core. Presumably there is some architecture
+                   dependant way of defining nice.
+
+ node_size         The total number of pages in this zone
+
+ node_id           The ID of the node, starts at 0
+
+ node_next         Pointer to next node in a linear list. NULL terminated
+
+1.2   Zones
+===========
+  Each pg_data_t node will be aware of one or more zones that it can
+allocate pages from. The possible zones are ZONE_HIGHMEM, ZONE_NORMAL
+and ZONE_DMA. There can only be one zone of each type per pg_data_t.
+Each zone is suitable for a particular use but there is not necessarily
+a penalty for usage of the wrong zone like there is with the wrong
+pg_data_t
+
+  typedef struct zone_struct {
+          /*
+           * Commonly accessed fields:
+           */
+          spinlock_t              lock;
+          unsigned long           free_pages;
+          unsigned long           pages_min, pages_low, pages_high;
+          int                     need_balance;
+
+          /*
+           * free areas of different sizes
+           */
+          free_area_t             free_area[MAX_ORDER];
+
+          /*
+           * Discontig memory support fields.
+           */
+          struct pglist_data      *zone_pgdat;
+          struct page             *zone_mem_map;
+          unsigned long           zone_start_paddr;
+          unsigned long           zone_start_mapnr;
+
+          /*
+           * rarely used fields:
+           */
+          char                    *name;
+          unsigned long           size;
+  } zone_t;
+
+ lock             A lock to protect the zone
+ free_pages       Total number of free pages in the zone
+ pages_min        When pages_min is reached, kswapd is woken up
+ pages_low        When reached, the allocator will do the kswapd work in
+                  a synchronuous fashion
+ pages_high       Once kswapd is woken, it won't sleep until pages_high pages
+                  are free
+ need_balance     A flag kswapd uses to determine if it needs to balance
+ free_area        Used by the buddy algorithm
+ zone_pgdat       Points to the parent pg_data_t
+ zone_mem_map     The first page in mem_map this zone refers to
+ zone_start_paddr Physical address of zone
+ zone_start_mapnr Address inside mem_map ?
+ name             The string name of the zone
+ size             Self explanatory
+
+1.3   Relationship
+==================
+
+        pg_data_t ------->  pg_data_t ------->      pgdata_t ------->NULL
+           / | \               / | \                 / | \
+      -----  |  -----     -----  |  -----       -----  |  -----
+      |      |      |     |      |      |       |      |      |
+  zone_t  zone_t  zone_t zone_t zone_t zone_t zone_t zone_t zone_t
+



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-04-20 19:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-19  3:56 [PATCH] Documenation/vm/numa Mel
2002-04-19  5:18 ` Martin J. Bligh
2002-04-19 20:25   ` Eric W. Biederman
2002-04-19 21:37     ` Martin J. Bligh
2002-04-20 16:05       ` Mel
2002-04-20 19:47         ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox