From: Chuck Lever <chuck.lever@oracle.com>
To: Benny Halevy <bhalevy@panasas.com>, Boaz Harrosh <bharrosh@panasas.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH 15/15] NFS: Fix some minor problems with nfs4_deviceids
Date: Fri, 2 Mar 2012 13:10:59 -0500 [thread overview]
Message-ID: <EA56F79D-3608-443B-99DA-0D5E46C678B9@oracle.com> (raw)
In-Reply-To: <4D0F430A-73F2-4AAE-A087-302EE08D29C0@oracle.com>
On Mar 1, 2012, at 5:55 PM, Chuck Lever wrote:
>
> On Mar 1, 2012, at 5:39 PM, Myklebust, Trond wrote:
>
>> On Thu, 2012-03-01 at 17:02 -0500, Chuck Lever wrote:
>>> Clean up due to code review.
>>>
>>> sizeof(struct nfs4_deviceid) is the size of the in-core device ID data
>>> structure, but NFS4_DEVICEID4_SIZE is the number of octets in an XDR'd
>>> device ID. The two are not interchangeable, even if they happen to
>>> have the same value. If struct nfs4_deviceid is padded by the
>>> compiler, sizeof(struct nfs4_deviceid) may not be the same as the
>>> XDR'd size. Not likely, but still.
>>>
>>> Ensure that the data field is aligned to the largest pointer type that
>>> is used to access it: in this case, that's u64. Type-punning among
>>> types with different alignment has been discouraged in user space and
>>> the kernel, to avoid unneeded pointer aliasing. The use of a union
>>> is preferred instead.
>>>
>>> Ensure that XDR'ing and comparing is done via one of the union's data
>>> fields, and not via the whole struct, again in case the compiler pads
>>> the struct.
>>>
>>> As a micro-optimization, this change also ensures that the compiler
>>> may perform memcpy() and memcmp() operations on these fields in
>>> larger, more efficient, chunks.
>>>
>>> This patch needs review and testing by pnfs layout specialists.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>> fs/nfs/blocklayout/blocklayout.c | 2 +-
>>> fs/nfs/blocklayout/blocklayoutdev.c | 8 ++++----
>>> fs/nfs/blocklayout/extents.c | 5 +++--
>>> fs/nfs/callback_xdr.c | 2 +-
>>> fs/nfs/nfs4filelayout.c | 3 +--
>>> fs/nfs/nfs4xdr.c | 4 ++--
>>> fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 8 ++++----
>>> fs/nfs/pnfs_dev.c | 7 ++++---
>>> include/linux/nfs4.h | 7 ++++++-
>>> include/linux/pnfs_osd_xdr.h | 4 ++--
>>> 10 files changed, 28 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>>> index 783ebd5..3b730bf 100644
>>> --- a/fs/nfs/blocklayout/blocklayout.c
>>> +++ b/fs/nfs/blocklayout/blocklayout.c
>>> @@ -901,7 +901,7 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
>>> dev->pglen = PAGE_SIZE * max_pages;
>>> dev->mincount = 0;
>>>
>>> - dprintk("%s: dev_id: %s\n", __func__, dev->dev_id.data);
>>> + dprintk("%s: dev_id: %s\n", __func__, dev->dev_id.u.data);
>>> rc = nfs4_proc_getdeviceinfo(server, dev);
>>> dprintk("%s getdevice info returns %d\n", __func__, rc);
>>> if (rc) {
>>> diff --git a/fs/nfs/blocklayout/blocklayoutdev.c b/fs/nfs/blocklayout/blocklayoutdev.c
>>> index b48f782..2dafca5 100644
>>> --- a/fs/nfs/blocklayout/blocklayoutdev.c
>>> +++ b/fs/nfs/blocklayout/blocklayoutdev.c
>>> @@ -124,8 +124,8 @@ nfs4_blk_decode_device(struct nfs_server *server,
>>> struct nfs_net *nn = net_generic(net, nfs_net_id);
>>>
>>> dprintk("%s CREATING PIPEFS MESSAGE\n", __func__);
>>> - dprintk("%s: deviceid: %s, mincount: %d\n", __func__, dev->dev_id.data,
>>> - dev->mincount);
>>> + dprintk("%s: deviceid: %s, mincount: %d\n", __func__,
>>> + dev->dev_id.u.data, dev->mincount);
>>>
>>> memset(&msg, 0, sizeof(msg));
>>> msg.data = kzalloc(sizeof(bl_msg) + dev->mincount, GFP_NOFS);
>>> @@ -206,7 +206,7 @@ static struct block_device *translate_devid(struct pnfs_layout_hdr *lo,
>>> mid = BLK_ID(lo);
>>> spin_lock(&mid->bm_lock);
>>> list_for_each_entry(dev, &mid->bm_devlist, bm_node) {
>>> - if (memcmp(id->data, dev->bm_mdevid.data,
>>> + if (memcmp(id->u.data, dev->bm_mdevid.u.data,
>>> NFS4_DEVICEID4_SIZE) == 0) {
>>> rv = dev->bm_mdev;
>>> goto out;
>>> @@ -323,7 +323,7 @@ nfs4_blk_process_layoutget(struct pnfs_layout_hdr *lo,
>>> status = -ENOMEM;
>>> goto out_err;
>>> }
>>> - memcpy(&be->be_devid, p, NFS4_DEVICEID4_SIZE);
>>> + memcpy(&be->be_devid.u.data, p, NFS4_DEVICEID4_SIZE);
>>> p += XDR_QUADLEN(NFS4_DEVICEID4_SIZE);
>>> be->be_mdev = translate_devid(lo, &be->be_devid);
>>> if (!be->be_mdev)
>>> diff --git a/fs/nfs/blocklayout/extents.c b/fs/nfs/blocklayout/extents.c
>>> index 1f9a603..52b747c 100644
>>> --- a/fs/nfs/blocklayout/extents.c
>>> +++ b/fs/nfs/blocklayout/extents.c
>>> @@ -675,10 +675,11 @@ encode_pnfs_block_layoutupdate(struct pnfs_block_layout *bl,
>>> if (!xdr_start)
>>> goto out;
>>> list_for_each_entry_safe(lce, save, &bl->bl_commit, bse_node) {
>>> - p = xdr_reserve_space(xdr, 7 * 4 + sizeof(lce->bse_devid.data));
>>> + p = xdr_reserve_space(xdr, 7 * 4 + NFS4_DEVICEID4_SIZE);
>>> if (!p)
>>> break;
>>> - p = xdr_encode_opaque_fixed(p, lce->bse_devid.data, NFS4_DEVICEID4_SIZE);
>>> + p = xdr_encode_opaque_fixed(p, lce->bse_devid.u.data,
>>> + NFS4_DEVICEID4_SIZE);
>>> p = xdr_encode_hyper(p, lce->bse_f_offset << SECTOR_SHIFT);
>>> p = xdr_encode_hyper(p, lce->bse_length << SECTOR_SHIFT);
>>> p = xdr_encode_hyper(p, 0LL);
>>> diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
>>> index 5466829..6061322 100644
>>> --- a/fs/nfs/callback_xdr.c
>>> +++ b/fs/nfs/callback_xdr.c
>>> @@ -347,7 +347,7 @@ __be32 decode_devicenotify_args(struct svc_rqst *rqstp,
>>> goto err;
>>> }
>>> dev->cbd_layout_type = ntohl(*p++);
>>> - memcpy(dev->cbd_dev_id.data, p, NFS4_DEVICEID4_SIZE);
>>> + memcpy(dev->cbd_dev_id.u.data, p, NFS4_DEVICEID4_SIZE);
>>> p += XDR_QUADLEN(NFS4_DEVICEID4_SIZE);
>>>
>>> if (dev->cbd_layout_type == NOTIFY_DEVICEID4_CHANGE) {
>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>>> index 47e8f34..f0c6ba0 100644
>>> --- a/fs/nfs/nfs4filelayout.c
>>> +++ b/fs/nfs/nfs4filelayout.c
>>> @@ -550,8 +550,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
>>> if (unlikely(!p))
>>> goto out_err;
>>>
>>> - memcpy(id, p, sizeof(*id));
>>> - p += XDR_QUADLEN(NFS4_DEVICEID4_SIZE);
>>> + p = xdr_decode_opaque_fixed(p, id->u.data, NFS4_DEVICEID4_SIZE);
>>> nfs4_print_deviceid(id);
>>>
>>> nfl_util = be32_to_cpup(p++);
>>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
>>> index 80ba010..91020ea 100644
>>> --- a/fs/nfs/nfs4xdr.c
>>> +++ b/fs/nfs/nfs4xdr.c
>>> @@ -1946,7 +1946,7 @@ encode_getdeviceinfo(struct xdr_stream *xdr,
>>>
>>> p = reserve_space(xdr, 16 + NFS4_DEVICEID4_SIZE);
>>> *p++ = cpu_to_be32(OP_GETDEVICEINFO);
>>> - p = xdr_encode_opaque_fixed(p, args->pdev->dev_id.data,
>>> + p = xdr_encode_opaque_fixed(p, args->pdev->dev_id.u.data,
>>> NFS4_DEVICEID4_SIZE);
>>> *p++ = cpu_to_be32(args->pdev->layout_type);
>>> *p++ = cpu_to_be32(args->pdev->pglen); /* gdia_maxcount */
>>> @@ -5492,7 +5492,7 @@ static int decode_getdevicelist(struct xdr_stream *xdr,
>>> if (unlikely(!p))
>>> goto out_overflow;
>>> for (i = 0; i < res->num_devs; i++)
>>> - p = xdr_decode_opaque_fixed(p, res->dev_id[i].data,
>>> + p = xdr_decode_opaque_fixed(p, res->dev_id[i].u.data,
>>> NFS4_DEVICEID4_SIZE);
>>> res->eof = be32_to_cpup(p);
>>> return 0;
>>> diff --git a/fs/nfs/objlayout/pnfs_osd_xdr_cli.c b/fs/nfs/objlayout/pnfs_osd_xdr_cli.c
>>> index b3918f7..99a1e89 100644
>>> --- a/fs/nfs/objlayout/pnfs_osd_xdr_cli.c
>>> +++ b/fs/nfs/objlayout/pnfs_osd_xdr_cli.c
>>> @@ -55,8 +55,8 @@
>>> static __be32 *
>>> _osd_xdr_decode_objid(__be32 *p, struct pnfs_osd_objid *objid)
>>> {
>>> - p = xdr_decode_opaque_fixed(p, objid->oid_device_id.data,
>>> - sizeof(objid->oid_device_id.data));
>>> + p = xdr_decode_opaque_fixed(p, objid->oid_device_id.u.data,
>>> + NFS4_DEVICEID4_SIZE);
>>>
>>> p = xdr_decode_hyper(p, &objid->oid_partition_id);
>>> p = xdr_decode_hyper(p, &objid->oid_object_id);
>>> @@ -377,8 +377,8 @@ pnfs_osd_xdr_encode_layoutupdate(struct xdr_stream *xdr,
>>> static inline __be32 *
>>> pnfs_osd_xdr_encode_objid(__be32 *p, struct pnfs_osd_objid *object_id)
>>> {
>>> - p = xdr_encode_opaque_fixed(p, &object_id->oid_device_id.data,
>>> - sizeof(object_id->oid_device_id.data));
>>> + p = xdr_encode_opaque_fixed(p, &object_id->oid_device_id.u.data,
>>> + NFS4_DEVICEID4_SIZE);
>>> p = xdr_encode_hyper(p, object_id->oid_partition_id);
>>> p = xdr_encode_hyper(p, object_id->oid_object_id);
>>>
>>> diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
>>> index 4f359d2..6c6ab81 100644
>>> --- a/fs/nfs/pnfs_dev.c
>>> +++ b/fs/nfs/pnfs_dev.c
>>> @@ -46,7 +46,7 @@ static DEFINE_SPINLOCK(nfs4_deviceid_lock);
>>> void
>>> nfs4_print_deviceid(const struct nfs4_deviceid *id)
>>> {
>>> - u32 *p = (u32 *)id;
>>> + const u32 *p = id->u.data32;
>>>
>>> dprintk("%s: device id= [%x%x%x%x]\n", __func__,
>>> p[0], p[1], p[2], p[3]);
>>> @@ -56,7 +56,7 @@ EXPORT_SYMBOL_GPL(nfs4_print_deviceid);
>>> static inline u32
>>> nfs4_deviceid_hash(const struct nfs4_deviceid *id)
>>> {
>>> - unsigned char *cptr = (unsigned char *)id->data;
>>> + const unsigned char *cptr = id->u.data;
>>> unsigned int nbytes = NFS4_DEVICEID4_SIZE;
>>> u32 x = 0;
>>>
>>> @@ -77,7 +77,8 @@ _lookup_deviceid(const struct pnfs_layoutdriver_type *ld,
>>>
>>> hlist_for_each_entry_rcu(d, n, &nfs4_deviceid_cache[hash], node)
>>> if (d->ld == ld && d->nfs_client == clp &&
>>> - !memcmp(&d->deviceid, id, sizeof(*id))) {
>>> + !memcmp(&d->deviceid.u.data, id->u.data,
>>> + NFS4_DEVICEID4_SIZE)) {
>>> if (atomic_read(&d->ref))
>>> return d;
>>> else
>>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>>> index b1e6b64..979f607 100644
>>> --- a/include/linux/nfs4.h
>>> +++ b/include/linux/nfs4.h
>>> @@ -649,9 +649,14 @@ enum filelayout_hint_care4 {
>>> #define NFS4_DEVICEID4_SIZE 16
>>>
>>> struct nfs4_deviceid {
>>> - char data[NFS4_DEVICEID4_SIZE];
>>> + union {
>>> + unsigned char data[NFS4_DEVICEID4_SIZE];
>>> + u32 data32[NFS4_DEVICEID4_SIZE / sizeof(u32)];
>>> + u64 data64[NFS4_DEVICEID4_SIZE / sizeof(u64)];
>>> + } u;
>>> };
>>>
>>
>> Ugh... This just looks worse and worse.... Let's rather just keep these
>> as unsigned char. I don't care if OSDs have special secret meanings and
>> all that crap. They can cast the damned thing if they need to. The rest
>> of us should be considering this to be opaque data.
>
> One possibility is that OSD and block layout drivers can construct their device ID in a u64 array on the stack, and then memcpy() the array to the nfs4_deviceid.
>
> But we can't cast pointers to an "unsigned char" field, since it's not guaranteed to be aligned. That's a real bug, one that's hit us before.
Accessing a "char x[16]" field by casting x to a (u64 *) is considered harmful. It will work when x happens to be aligned to 64-bits, but otherwise accessing a field that way will cause an oops.
It's not clear why the OSD layout driver needs to print device IDs as a pair of unsigned long longs. Why can't nfs4_print_deviceid() be used ?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
next prev parent reply other threads:[~2012-03-02 18:11 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-01 22:00 [PATCH 00/15] For 3.4 (2nd take) Chuck Lever
2012-03-01 22:00 ` [PATCH 01/15] NFS: Make nfs_cache_array.size a signed integer Chuck Lever
2012-03-01 22:00 ` [PATCH 02/15] NFS: Clean up debugging in decode_pathname() Chuck Lever
2012-03-01 22:00 ` [PATCH 03/15] NFS: Add debugging messages to NFSv4's CLOSE procedure Chuck Lever
2012-03-01 22:00 ` [PATCH 04/15] NFS: Reduce debugging noise from encode_compound_hdr Chuck Lever
2012-03-01 22:15 ` Adamson, Dros
2012-03-01 22:19 ` Myklebust, Trond
2012-03-01 22:00 ` [PATCH 05/15] SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field Chuck Lever
2012-03-01 22:01 ` [PATCH 06/15] SUNRPC: Move clnt->cl_server into struct rpc_xprt Chuck Lever
2012-03-01 22:01 ` [PATCH 07/15] SUNRPC: Add API to acquire source address Chuck Lever
2012-03-01 22:09 ` Jim Rees
2012-03-01 22:27 ` Chuck Lever
2012-03-01 22:01 ` [PATCH 08/15] commit 6f38b4ba433ac6494f83cb73dd07dcbde797e1e0 Chuck Lever
2012-03-01 22:01 ` [PATCH 09/15] NFS: Add a client-side function to display NFS file handles Chuck Lever
2012-03-01 22:28 ` Myklebust, Trond
2012-03-01 22:32 ` Chuck Lever
2012-03-02 17:17 ` Steve Dickson
2012-03-02 17:19 ` Chuck Lever
2012-03-02 18:50 ` Steve Dickson
2012-03-06 16:55 ` Adamson, Dros
2012-03-01 22:01 ` [PATCH 10/15] NFS: Save root file handle in nfs_server Chuck Lever
2012-03-01 22:30 ` Myklebust, Trond
2012-03-01 22:01 ` [PATCH 11/15] NFS: Simplify arguments of encode_renew() Chuck Lever
2012-03-01 22:01 ` [PATCH 12/15] NFS: Introduce NFS_ATTR_FATTR_V4_LOCATIONS Chuck Lever
2012-03-01 22:02 ` [PATCH 13/15] NFS: Request fh_expire_type attribute in "server caps" operation Chuck Lever
2012-03-01 22:02 ` [PATCH 14/15] NFS: Fix some minor problems with nfs4_verifiers Chuck Lever
2012-03-01 22:35 ` Myklebust, Trond
2012-03-01 22:43 ` Myklebust, Trond
2012-03-01 22:02 ` [PATCH 15/15] NFS: Fix some minor problems with nfs4_deviceids Chuck Lever
2012-03-01 22:39 ` Myklebust, Trond
2012-03-01 22:55 ` Chuck Lever
2012-03-02 18:10 ` Chuck Lever [this message]
2012-03-02 21:58 ` Boaz Harrosh
2012-03-02 22:00 ` Boaz Harrosh
2012-03-02 22:03 ` Chuck Lever
2012-03-02 22:06 ` Boaz Harrosh
2012-03-02 22:11 ` Chuck Lever
2012-03-02 22:52 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=EA56F79D-3608-443B-99DA-0D5E46C678B9@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bhalevy@panasas.com \
--cc=bharrosh@panasas.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).