From: Hongyang Yang <yanghy@cn.fujitsu.com>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: laijs@cn.fujitsu.com, wency@cn.fujitsu.com,
andrew.cooper3@citrix.com, yunhong.jiang@intel.com,
eddie.dong@intel.com, xen-devel@lists.xen.org,
rshriram@cs.ubc.ca, ian.campbell@citrix.com
Subject: Re: [PATCH v15 2/7] remus: introduce remus device
Date: Thu, 10 Jul 2014 10:18:55 +0800 [thread overview]
Message-ID: <53BDF80F.7030307@cn.fujitsu.com> (raw)
In-Reply-To: <21437.31542.967350.687109@mariner.uk.xensource.com>
On 07/10/2014 01:26 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v15 2/7] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provides the following APIs for libxl:
>> >libxl__remus_device_setup
>> setup remus devices, like attach qdisc, enable disk buffering, etc
>> >libxl__remus_device_teardown
>> teardown devices
>> >libxl__remus_device_postsuspend
>> >libxl__remus_device_preresume
>> >libxl__remus_device_commit
>> above three are for checkpoint.
>
> Sorry to mention this now, but I think it would be clearer if all
> these libxl__remus_device_* functions which manipulate _all_ the
> devices for a domain used the plural of "device", ie
> libxl__remus_devices_setup, etc.
Ok
>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index 39f1c28..bcbd02b 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -733,6 +733,31 @@ out:
>> static void remus_failover_cb(libxl__egc *egc,
>> libxl__domain_suspend_state *dss, int rc);
>>
>> +static void libxl__remus_setup_failed(libxl__egc *egc,
>> + libxl__remus_state *rs, int rc)
>> +{
>> + STATE_AO_GC(rs->ao);
>> + libxl__ao_complete(egc, ao, rc);
>> +}
>> +
>> +static void libxl__remus_setup_done(libxl__egc *egc,
>> + libxl__remus_state *rs, int rc)
>
> This callback appears after the code which sets it up, which naturally
> executes first. Things should be in chronological order, as I said.
>
>> +{
>> + libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
>> + STATE_AO_GC(rs->ao);
>> +
>> + if (!rc) {
>> + libxl__domain_suspend(egc, dss);
>> + return;
>> + }
>> +
>> + LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
>> + dss->domid);
>
> This log message should report the rc.
>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 83eb29a..8cc90dc 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data)
>> domain_suspend_callback_common(egc, dss);
>> }
>>
>> +static void remus_device_postsuspend_cb(libxl__egc *egc,
>> + libxl__remus_state *rs, int rc)
>> +{
>
> I can't quite tell, but isn't this code not in chronological order ?
>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index f8441f6..8ef20a0 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
> ...
>> +/*
>> + * This structure is for remus device layer, it records remus devices
>> + * that have been setuped.
>> + */
>
> Still lots of uses of "setuped". Correct would be "devices that have
> been set up".
>
>> +/*----- checkpointing APIs -----*/
>> +
>> +/* callbacks */
>> +
>> +static void device_common_cb(libxl__egc *egc,
>> + libxl__remus_device *dev,
>> + int rc)
>> +{
>
> I asked for things to be in chronological order in the file. I think
> this function therefore ought to be below these checkpoint functions.
I used to thought that the definitions of callbacks should be before
the functions that set them, sorry for the misunderstanding, I will
correct them.
>
> Also its name isn't very suggestive. Perhaps it should have
> "checkpoint" in the name ?
>
>> +/* API implementations */
>> +
>> +void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
> ...
>> +void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
> ...
>> +void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)
>
> These three functions are nearly identical. They should be combined
> somehow. Perhaps there should be a single function which takes the
> offset of the per-device operation vtable member as a parameter. Or
> perhaps the three functions should be generated by a macro. Currently
> I'm leaning towards the former.
Good suggestions
>
>> + if(rds->num_set_up == 0)
> ^
> Coding style.
>
>> + for (i = 0; i < rds->num_set_up; i++) {
>> + dev = rds->dev[i];
>> + dev->callback = device_common_cb;
>> + if (dev->ops->commit) {
>> + dev->ops->commit(dev);
>> + } else {
>> + rds->num_devices++;
>> + if (rds->num_devices == rds->num_set_up)
>> + rs->callback(egc, rs, rs->saved_rc);
>
> Why not have this call device_common_cb rather than open-coding it ?
Sorry for the poor named function, device_common_cb should be used at
checkpoint. will add checkpoint to the function name.
>
>> +/*----- main flow of control -----*/
>
> This is for setup, right ? Or teardown ? Or both ?
>
>> +/* callbacks */
>> +
>> +static void device_setup_cb(libxl__egc *egc,
>> + libxl__remus_device *dev,
>> + int rc);
>> +static void device_match_cb(libxl__egc *egc,
>> + libxl__remus_device *dev,
>> + int rc)
>> +{
>
> This is still not in chronological order. You must put callbacks
> after the things that set them up, since obviously the callback
> happens after it is set up.
>
> Seeing things in the wrong order makes it hard to see what's going on.
> In the interests of trying to reduce the number of iterations of this
> series I'm going to do a kind of peephole review of what I see here;
> you should bear in mind that I'm not always trying to understand what
> the purpose of these functions is so I may have structural comments on
> your next version.
Ok, just don't quite follow the meaning of chronological order, but it's
more clearer for me now.
>
>> + libxl__remus_device_state *const rds = dev->rds;
>> + libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
>> +
>> + STATE_AO_GC(rs->ao);
>> +
>> + if (rc) {
>> + if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||
>
> This use of ++ looks confusing to me.
>
>> + rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
>> + /* the device can not be matched */
>> + rds->num_devices++;
>> + rs->saved_rc = ERROR_FAIL;
>
> Please can we not return more ERROR_FAIL. Instead, why not return
> ERROR_REMUS_DEVICE_NOT_SUPPORTED as I suggested before ?
I will add this error code.
>
>> + rds->num_devices++;
>> + /*
>> + * we add devices that have been setuped to the array no matter
>> + * the setup process succeed or failed because we need to ensure
>> + * the device been teardown while setup failed. If any of the
>> + * device setup failed, we will quit remus, but before we exit,
>> + * we will teardown the devices that have been added to **dev
>> + */
>> + rds->dev[rds->num_set_up++] = dev;
>> + if (rc) {
>> + /* setup failed */
>> + rs->saved_rc = ERROR_FAIL;
>> + }
>
> This doesn't look right. If the setup fails, presumably we shouldn't
> put the device in the array ? If we do it will presumably be torn
> down later.
the netbuf script was designed as below:
1. when setup failed, the script won't teardown the device itself.
2. the teardown op is ok to be excute many times.
In the remus device layer, if one device setup failed(whether script
exit or the script get killed or something), we will quit
remus, but before that, we will teardown all device that has been set
up, whether it's correctly set up or not. So if we don't put the
device in the array, we will get a leaked device, that has not been
teardown.
>
>> +static void destroy_device_ops(libxl__remus_state *rs);
>> +static void device_teardown_cb(libxl__egc *egc,
>> + libxl__remus_device *dev,
>> + int rc)
>> +{
>> + libxl__remus_device_state *const rds = dev->rds;
>> + libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
>> +
>> + STATE_AO_GC(rs->ao);
>> +
>> + /* ignore teardown errors to teardown as many devs as possible*/
>> + rds->num_set_up--;
>
> We should at least preserve the rc so that we don't pretend that
> everything worked if it didn't.
>
> You may need to invent a new teardown rc variable, and/or add logging,
> so that you get the right error code out, and the right log messages,
> if a setup fails and causes a teardown which also fails.
Ok, thanks.
>
>> +static int init_device_ops(libxl__remus_state *rs)
>> +{
>> + int i, rc;
>> + const libxl__remus_device_ops *ops;
>> +
>> + for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>> + ops = dev_ops[i];
>> + if (ops->init(ops, rs)) {
>> + rc = ERROR_FAIL;
>
> In general, you should preserve the rc, not turn everything into
> ERROR_FAIL.
Ok, thanks.
>
> I still don't understand why libxl__remus_device_state is not part of
> libxl__remus_state.
libxl__remus_device_state is part of libxl__remus_state:
+struct libxl__remus_state {
+ /* must set by caller of libxl__remus_device_(setup|teardown) */
+ libxl__ao *ao;
+ uint32_t domid;
+ libxl__remus_callback *callback;
+
+ /* private */
+ int saved_rc;
+ /* context containing device related stuff */
+ libxl__remus_device_state dev_state;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ libxl__ev_time timeout; /* used for checkpoint */
+};
>
> I wonder if you need a different name for "the abstract thing which is
> represented by a libxl__remus_device_ops". Since each ops only deals
> with one kind (but perhaps a kind may have several ops), I suggest
> calling them "subkind"s or "devsubkinds" or something. At the moment
> you seem to be calling them "ops".
>
> As a result I think this function is poorly named. It doesn't
> initialise any ops. It initialises each subkind. Perhaps it should
> be called init_device_subkind.
>
> You might consider whether libxl__remus_device_ops should be
> libxl__remus_subkind_ops perhaps or libxl__remus_devsubkind_ops or
> libxl__remus_device_subkind_ops.
Ok, thanks.
>
>> +void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
>> +{
>> + STATE_AO_GC(rs->ao);
>> +
>> + /* Convenience aliases */
>> + libxl__remus_device_state *const rds = &rs->dev_state;
>> +
>> + if (ARRAY_SIZE(dev_ops) == 0)
>> + goto out;
>
> Why is this special case necessary ?
>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index de25f42..e56567f 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [
>> (-12, "OSEVENT_REG_FAIL"),
>> (-13, "BUFFERFULL"),
>> (-14, "UNKNOWN_CHILD"),
>> + (-15, "REMUS_DEVOPS_NOT_MATCH"),
>
> I don't object to this name for this exact case and it's fine to have
> a specific error code.
>
> But if you are going to make the whole machinery return
> ERROR_REMUS_DEVICE_NOT_SUPPORTED if no subkind matches, then you can
> reuse that error code for the subkind methods.
Will add ERROR_REMUS_DEVICE_NOT_SUPPORTED error code.
>
> Thanks,
> Ian.
> .
>
--
Thanks,
Yang.
next prev parent reply other threads:[~2014-07-10 2:18 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-07 8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-07-07 8:26 ` [PATCH v15 1/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
2014-07-07 8:26 ` [PATCH v15 2/7] remus: introduce remus device Yang Hongyang
2014-07-09 17:26 ` Ian Jackson
2014-07-10 2:18 ` Hongyang Yang [this message]
2014-07-10 17:34 ` Ian Jackson
2014-07-11 2:12 ` Hongyang Yang
2014-07-11 6:16 ` Hongyang Yang
2014-07-11 8:18 ` Hongyang Yang
2014-07-07 8:26 ` [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
2014-07-09 23:15 ` Ian Jackson
2014-07-10 3:03 ` Hongyang Yang
2014-07-10 17:43 ` Ian Jackson
2014-07-11 2:18 ` Hongyang Yang
2014-07-10 12:53 ` Ian Campbell
2014-07-07 8:26 ` [PATCH v15 4/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
2014-07-07 8:26 ` [PATCH v15 5/7] libxl: network buffering cmdline switch Yang Hongyang
2014-07-07 8:26 ` [PATCH v15 6/7] libxl: disk " Yang Hongyang
2014-07-07 8:26 ` [PATCH v15 7/7] MAINTAINERS: Update maintained files of REMUS Yang Hongyang
-- strict thread matches above, loose matches on Subject: below --
2014-07-10 17:35 [PATCH v15 2/7] remus: introduce remus device Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53BDF80F.7030307@cn.fujitsu.com \
--to=yanghy@cn.fujitsu.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=eddie.dong@intel.com \
--cc=ian.campbell@citrix.com \
--cc=laijs@cn.fujitsu.com \
--cc=rshriram@cs.ubc.ca \
--cc=wency@cn.fujitsu.com \
--cc=xen-devel@lists.xen.org \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.