From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hongyang Yang Subject: Re: [PATCH v10 3/5] remus: introduce remus device Date: Fri, 6 Jun 2014 09:54:03 +0800 Message-ID: <53911F3B.9060003@cn.fujitsu.com> References: <1401932069-16460-1-git-send-email-yanghy@cn.fujitsu.com> <1401932069-16460-4-git-send-email-yanghy@cn.fujitsu.com> <21392.41897.449550.60294@mariner.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <21392.41897.449550.60294@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson Cc: ian.campbell@citrix.com, wency@cn.fujitsu.com, stefano.stabellini@eu.citrix.com, andrew.cooper3@citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, rshriram@cs.ubc.ca, roger.pau@citrix.com, laijs@cn.fujitsu.com List-Id: xen-devel@lists.xenproject.org On 06/06/2014 01:06 AM, Ian Jackson wrote: > Yang Hongyang writes ("[PATCH v10 3/5] remus: introduce remus device"): >> introduce remus device, an abstract layer of remus devices(nic, disk, >> etc).It provide the following APIs for libxl: > > Thanks. > >> >libxl__remus_device_setup >> setup remus devices, like attach qdisc, enable disk buffering, etc >> >libxl__remus_device_teardown >> teardown devices >> >libxl__remus_device_postsuspend >> >libxl__remus_device_preresume >> >libxl__remus_device_commit >> above three are for checkpoint. > > I started reviewing this patch by reading the commit message and the > changes to libxl_internal.h. > > As far as I can tell what's going on, it mostly looks plausible. But > the new parts of libxl_internal.h are missing important information > about the new interfaces. > > I'd like the documentation in libxl_internal.h to be sufficient to > read, understand, and check the code on one side of those interfaces, > without having to go and read the code on the other side. Thanks, I will improve the doc in the next version. > > I'll go into more detail about this below. > > Because of this, I haven't read the actual implementation code yet. > >> through remus device layer, the remus execution flow will be like >> this: >> xl remus -> remus device setup >> |-> remus checkpoint(postsuspend, commit, preresume) >> ... >> |-> remus device teardown,failover or abort > > This diagram could usefully be transferred into a comment in the code, > probably in libxl_internal.h. > >> the remus device layer provide an interface >> libxl__remus_device_ops >> which a remus device must implement.the whole remus structure: >> |remus| >> | >> |remus device| >> | >> |nic| |drbd disks| |qemu disks| ... >> a device(nic, drbd disks, qemu disks, etc) must implement >> libxl__remus_device_ops to support remus. > > Again, this diagram too. > >> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h >> index 2b46121..20601b2 100644 >> --- a/tools/libxl/libxl_internal.h >> +++ b/tools/libxl/libxl_internal.h > ... >> +/*----- remus device related state structure -----*/ >> >> +typedef enum libxl__remus_device_kind { >> + LIBXL__REMUS_DEVICE_NIC, >> + LIBXL__REMUS_DEVICE_DISK, >> +} libxl__remus_device_kind; >> + >> +typedef struct libxl__remus_state libxl__remus_state; >> +typedef struct libxl__remus_device libxl__remus_device; >> +typedef struct libxl__remus_device_state libxl__remus_device_state; >> +typedef struct libxl__remus_device_ops libxl__remus_device_ops; > > All fine. > >> +struct libxl__remus_device_ops { > > Who produces and consumes this ? I _think_ from your diagram above > that this is produced by a device type and consumed by the main remus > code. Is that right ? I think this should be documented. > >> + /* >> + * init device ops private data, etc. must implenment >> + */ > > What does "must implement" mean ? Do you mean that some of these > function pointers can be 0 ? If so, you should say this explicitly > somewhere and say what 0 means. (No-op?) > >> + int (*init)(libxl__remus_device_ops *self, >> + libxl__remus_state *rs); >> + /* >> + * free device ops private data, etc. must implenment >> + */ >> + void (*destroy)(libxl__remus_device_ops *self); >> + /* device ops's private data */ >> + void *data; > > In libxl we often embed structs inside other structs, rather than > chaining them with void*s. Is tehre some reason why that's not a good > idea here ? > > Also, I assume this should be "const void*", since I think this can > only refer to static data ? > >> + /* >> + * checkpoint callbacks, async ops. may not implemented >> + */ > > If these are async ops, what happens when they complete ? > > I see a libxl__remus_device_callback typedef below, and a callback > field in libxl__remus_device. Is that it ? > > If so this should be written down. > >> + /* >> + * check whether device ops match the device, async op. must implement >> + */ >> + void (*match)(libxl__remus_device_ops *self, >> + libxl__remus_device *dev); > > I don't understand the purpose of this, but perhaps this is because I > don't understand the lifecycle of a libxl__remus_device and what > fields in it are for which bits of code. > >> + /* >> + * setup the remus device, async op. must implement >> + */ >> + void (*setup)(libxl__remus_device *dev); >> + >> + /* >> + * teardown the remus device, async op. must implement >> + */ >> + void (*teardown)(libxl__remus_device *dev); > > When we say "setup" and "teardown" we refer to the actual device, not > merely the libxl data structures, which are setup and torn down with > "init" and "destroy" ? This could be clearer. > >> +struct libxl__remus_device_state { >> + libxl__ao *ao; >> + libxl__egc *egc; >> + >> + /* devices that have been setuped */ >> + libxl__remus_device **dev; >> + >> + int num_nics; >> + int num_disks; >> + >> + /* for counting devices that have been handled */ >> + int num_devices; >> + /* for counting devices that matched and setuped */ >> + int num_setuped; >> +}; > > We need to know who owns which fields in this structure. Or who is > supposed to set them. Also, I'm not sure what this structure is for. > It appears only to be in libxl__remus_state. What is the reason for > it being separated out ? > >> +struct libxl__remus_device { > > Again, we need to know who owns/sets which fields in this structure. > (If the struct is shared between different layers of code, it is > normally easiest to do this by reordering the fields into groups > according to their ownership.) > > If the whole struct is owned by the same set of code, then we need to > know which set of code that is. Perhaps by specifying a pattern on > the function name (libxl__remus_device_*?) > >> + int devid; >> + /* libxl__device_* which this remus device related to */ >> + const void *backend_dev; >> + libxl__remus_device_kind kind; >> + int ops_index; > > What is ops_index ? > >> + libxl__remus_device_ops *ops; > > I think these ops structs are vtables so should be const. > >> + /* for calling scripts */ >> + libxl__async_exec_state aes; > > Conversely, I'm not sure that particular comments add anything. > libxl__async_exec_state is always for executing scripts. > >> + /* for async func calls */ >> + libxl__ev_child child; > > Is this an ownership comment ? If so are you sure that the ownership > isn't "this is owned by device-specific ops methods" ? (It is obvious > that only an /asynchronous/ device-specific ops method would be able > to make use of it.) > >> +typedef void libxl__remus_callback(libxl__egc *, >> + libxl__remus_state *, int rc); >> + >> +struct libxl__remus_state { >> + libxl__ao *ao; >> + uint32_t domid; >> + libxl__remus_callback *callback; > > You should say that these must be set by the caller. (Looking at the > API I assume that's the case.) > >> + /* private */ >> + int saved_rc; >> + /* context containing device related stuff */ >> + libxl__remus_device_state dev_state; >> + >> + libxl__ev_time timeout; /* used for checkpoint */ >> +}; >> + >> +_hidden void libxl__remus_device_setup(libxl__egc *egc, >> + libxl__remus_state *rs); >> +_hidden void libxl__remus_device_teardown(libxl__egc *egc, >> + libxl__remus_state *rs); >> +_hidden void libxl__remus_device_postsuspend(libxl__egc *egc, >> + libxl__remus_state *rs); >> +_hidden void libxl__remus_device_preresume(libxl__egc *egc, >> + libxl__remus_state *rs); >> +_hidden void libxl__remus_device_commit(libxl__egc *egc, >> + libxl__remus_state *rs); >> _hidden int libxl__netbuffer_enabled(libxl__gc *gc); > > These functions all call rs->callback() when done ? If so there > should be a comment to say so. > >> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl >> index 52f1aa9..4278a6b 100644 >> --- a/tools/libxl/libxl_types.idl >> +++ b/tools/libxl/libxl_types.idl >> @@ -43,6 +43,7 @@ libxl_error = Enumeration("error", [ >> (-12, "OSEVENT_REG_FAIL"), >> (-13, "BUFFERFULL"), >> (-14, "UNKNOWN_CHILD"), >> + (-15, "NOT_MATCH"), >> ], value_namespace = "") > > It is good that you introduce a new error code for your new error > case. But I think it needs to have a better name. What does it > mean ? > > In fact, I grepped the whole of this patch for NOT_MATCH and this > new error status is checked for somewhere but never generated! > > If it forms a part of the new internal API then that should be > documented. > >> diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl >> index 745e2ac..36bae04 100755 >> --- a/tools/libxl/libxl_save_msgs_gen.pl >> +++ b/tools/libxl/libxl_save_msgs_gen.pl >> @@ -24,7 +24,7 @@ our @msgs = ( >> 'unsigned long', 'done', >> 'unsigned long', 'total'] ], >> [ 3, 'scxA', "suspend", [] ], >> - [ 4, 'scxW', "postcopy", [] ], >> + [ 4, 'scxA', "postcopy", [] ], >> [ 5, 'scxA', "checkpoint", [] ], >> [ 6, 'scxA', "switch_qemu_logdirty", [qw(int domid >> unsigned enable)] ], > > I think this change (and its consequential changes to the handwritten > parts) should be split out into a "no functional change" pre-patch. > > Thanks, > Ian. > . > -- Thanks, Yang.