From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hongyang Yang Subject: Re: [PATCH v15 2/7] remus: introduce remus device Date: Fri, 11 Jul 2014 16:18:12 +0800 Message-ID: <53BF9DC4.7000406@cn.fujitsu.com> References: <1404721584-17862-1-git-send-email-yanghy@cn.fujitsu.com> <1404721584-17862-3-git-send-email-yanghy@cn.fujitsu.com> <21437.31542.967350.687109@mariner.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <21437.31542.967350.687109@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson Cc: laijs@cn.fujitsu.com, wency@cn.fujitsu.com, andrew.cooper3@citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, rshriram@cs.ubc.ca, ian.campbell@citrix.com List-Id: xen-devel@lists.xenproject.org On 07/10/2014 01:26 AM, Ian Jackson wrote: > Yang Hongyang writes ("[PATCH v15 2/7] remus: introduce remus device"): >> introduce remus device, an abstract layer of remus devices(nic, disk, >> etc).It provides the following APIs for libxl: >> >libxl__remus_device_setup >> setup remus devices, like attach qdisc, enable disk buffering, etc >> >libxl__remus_device_teardown >> teardown devices >> >libxl__remus_device_postsuspend >> >libxl__remus_device_preresume >> >libxl__remus_device_commit >> above three are for checkpoint. > > Sorry to mention this now, but I think it would be clearer if all > these libxl__remus_device_* functions which manipulate _all_ the > devices for a domain used the plural of "device", ie > libxl__remus_devices_setup, etc. > >> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c >> index 39f1c28..bcbd02b 100644 >> --- a/tools/libxl/libxl.c >> +++ b/tools/libxl/libxl.c >> @@ -733,6 +733,31 @@ out: >> static void remus_failover_cb(libxl__egc *egc, >> libxl__domain_suspend_state *dss, int rc); >> >> +static void libxl__remus_setup_failed(libxl__egc *egc, >> + libxl__remus_state *rs, int rc) >> +{ >> + STATE_AO_GC(rs->ao); >> + libxl__ao_complete(egc, ao, rc); >> +} >> + >> +static void libxl__remus_setup_done(libxl__egc *egc, >> + libxl__remus_state *rs, int rc) > > This callback appears after the code which sets it up, which naturally > executes first. Things should be in chronological order, as I said. > >> +{ >> + libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs); >> + STATE_AO_GC(rs->ao); >> + >> + if (!rc) { >> + libxl__domain_suspend(egc, dss); >> + return; >> + } >> + >> + LOG(ERROR, "Remus: failed to setup device for guest with domid %u", >> + dss->domid); > > This log message should report the rc. Ok, thanks. > >> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c >> index 83eb29a..8cc90dc 100644 >> --- a/tools/libxl/libxl_dom.c >> +++ b/tools/libxl/libxl_dom.c >> @@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data) >> domain_suspend_callback_common(egc, dss); >> } >> >> +static void remus_device_postsuspend_cb(libxl__egc *egc, >> + libxl__remus_state *rs, int rc) >> +{ > > I can't quite tell, but isn't this code not in chronological order ? > >> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h >> index f8441f6..8ef20a0 100644 >> --- a/tools/libxl/libxl_internal.h >> +++ b/tools/libxl/libxl_internal.h > ... >> +/* >> + * This structure is for remus device layer, it records remus devices >> + * that have been setuped. >> + */ > > Still lots of uses of "setuped". Correct would be "devices that have > been set up". > >> +/*----- checkpointing APIs -----*/ >> + >> +/* callbacks */ >> + >> +static void device_common_cb(libxl__egc *egc, >> + libxl__remus_device *dev, >> + int rc) >> +{ > > I asked for things to be in chronological order in the file. I think > this function therefore ought to be below these checkpoint functions. > > Also its name isn't very suggestive. Perhaps it should have > "checkpoint" in the name ? Yes, I shall put "checkpoint" in the name. > >> +/* API implementations */ >> + >> +void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs) > ... >> +void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs) > ... >> +void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs) > > These three functions are nearly identical. They should be combined > somehow. Perhaps there should be a single function which takes the > offset of the per-device operation vtable member as a parameter. Or > perhaps the three functions should be generated by a macro. Currently > I'm leaning towards the former. > >> + if(rds->num_set_up == 0) > ^ > Coding style. > >> + for (i = 0; i < rds->num_set_up; i++) { >> + dev = rds->dev[i]; >> + dev->callback = device_common_cb; >> + if (dev->ops->commit) { >> + dev->ops->commit(dev); >> + } else { >> + rds->num_devices++; >> + if (rds->num_devices == rds->num_set_up) >> + rs->callback(egc, rs, rs->saved_rc); > > Why not have this call device_common_cb rather than open-coding it ? > >> +/*----- main flow of control -----*/ > > This is for setup, right ? Or teardown ? Or both ? setup and teardown, will change this comment. > >> +/* callbacks */ >> + >> +static void device_setup_cb(libxl__egc *egc, >> + libxl__remus_device *dev, >> + int rc); >> +static void device_match_cb(libxl__egc *egc, >> + libxl__remus_device *dev, >> + int rc) >> +{ > > This is still not in chronological order. You must put callbacks > after the things that set them up, since obviously the callback > happens after it is set up. > > Seeing things in the wrong order makes it hard to see what's going on. > In the interests of trying to reduce the number of iterations of this > series I'm going to do a kind of peephole review of what I see here; > you should bear in mind that I'm not always trying to understand what > the purpose of these functions is so I may have structural comments on > your next version. > >> + libxl__remus_device_state *const rds = dev->rds; >> + libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state); >> + >> + STATE_AO_GC(rs->ao); >> + >> + if (rc) { >> + if (++dev->ops_index >= ARRAY_SIZE(dev_ops) || > > This use of ++ looks confusing to me. > >> + rc != ERROR_REMUS_DEVOPS_NOT_MATCH) { >> + /* the device can not be matched */ >> + rds->num_devices++; >> + rs->saved_rc = ERROR_FAIL; > > Please can we not return more ERROR_FAIL. Instead, why not return > ERROR_REMUS_DEVICE_NOT_SUPPORTED as I suggested before ? > >> + rds->num_devices++; >> + /* >> + * we add devices that have been setuped to the array no matter >> + * the setup process succeed or failed because we need to ensure >> + * the device been teardown while setup failed. If any of the >> + * device setup failed, we will quit remus, but before we exit, >> + * we will teardown the devices that have been added to **dev >> + */ >> + rds->dev[rds->num_set_up++] = dev; >> + if (rc) { >> + /* setup failed */ >> + rs->saved_rc = ERROR_FAIL; >> + } > > This doesn't look right. If the setup fails, presumably we shouldn't > put the device in the array ? If we do it will presumably be torn > down later. > >> +static void destroy_device_ops(libxl__remus_state *rs); >> +static void device_teardown_cb(libxl__egc *egc, >> + libxl__remus_device *dev, >> + int rc) >> +{ >> + libxl__remus_device_state *const rds = dev->rds; >> + libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state); >> + >> + STATE_AO_GC(rs->ao); >> + >> + /* ignore teardown errors to teardown as many devs as possible*/ >> + rds->num_set_up--; > > We should at least preserve the rc so that we don't pretend that > everything worked if it didn't. > > You may need to invent a new teardown rc variable, and/or add logging, > so that you get the right error code out, and the right log messages, > if a setup fails and causes a teardown which also fails. > >> +static int init_device_ops(libxl__remus_state *rs) >> +{ >> + int i, rc; >> + const libxl__remus_device_ops *ops; >> + >> + for (i = 0; i < ARRAY_SIZE(dev_ops); i++) { >> + ops = dev_ops[i]; >> + if (ops->init(ops, rs)) { >> + rc = ERROR_FAIL; > > In general, you should preserve the rc, not turn everything into > ERROR_FAIL. > > I still don't understand why libxl__remus_device_state is not part of > libxl__remus_state. > > I wonder if you need a different name for "the abstract thing which is > represented by a libxl__remus_device_ops". Since each ops only deals > with one kind (but perhaps a kind may have several ops), I suggest > calling them "subkind"s or "devsubkinds" or something. At the moment > you seem to be calling them "ops". > > As a result I think this function is poorly named. It doesn't > initialise any ops. It initialises each subkind. Perhaps it should > be called init_device_subkind. > > You might consider whether libxl__remus_device_ops should be > libxl__remus_subkind_ops perhaps or libxl__remus_devsubkind_ops or > libxl__remus_device_subkind_ops. > >> +void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs) >> +{ >> + STATE_AO_GC(rs->ao); >> + >> + /* Convenience aliases */ >> + libxl__remus_device_state *const rds = &rs->dev_state; >> + >> + if (ARRAY_SIZE(dev_ops) == 0) >> + goto out; > > Why is this special case necessary ? This means there're no ops present, the following process of setup are unnecessary, so we just goto out and return. > >> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl >> index de25f42..e56567f 100644 >> --- a/tools/libxl/libxl_types.idl >> +++ b/tools/libxl/libxl_types.idl >> @@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [ >> (-12, "OSEVENT_REG_FAIL"), >> (-13, "BUFFERFULL"), >> (-14, "UNKNOWN_CHILD"), >> + (-15, "REMUS_DEVOPS_NOT_MATCH"), > > I don't object to this name for this exact case and it's fine to have > a specific error code. > > But if you are going to make the whole machinery return > ERROR_REMUS_DEVICE_NOT_SUPPORTED if no subkind matches, then you can > reuse that error code for the subkind methods. > > Thanks, > Ian. > . > -- Thanks, Yang.