From mboxrd@z Thu Jan  1 00:00:00 1970
From: david m. richter <richterd@gmail.com>
Date: Fri, 5 Dec 2008 12:35:06 -0500
Subject: [Cluster-devel] gfs uevent and sysfs changes
In-Reply-To: <1228470705.3579.12.camel@localhost.localdomain>
References: <20081201173137.GA25171@redhat.com>
	<1d07ca700812041032o6f82fecew3fb93545fe64ed2d@mail.gmail.com>
	<20081204210754.GA19571@redhat.com>
	<1d07ca700812041359i2fe5443by7ac229485ec36f71@mail.gmail.com>
	<20081204223820.GB19571@redhat.com>
	<1228470705.3579.12.camel@localhost.localdomain>
Message-ID: <1d07ca700812050935q2d37c53lda8ad5ce4af6459a@mail.gmail.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Fri, Dec 5, 2008 at 4:51 AM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> Hi,
>
> On Thu, 2008-12-04 at 16:38 -0600, David Teigland wrote:
>> On Thu, Dec 04, 2008 at 04:59:23PM -0500, david m. richter wrote:
>> > ah, so just to make sure i'm with you here: (1) gfs_controld is
>> > generating this "id"-which-is-the-mountgroup-id, and (2) gfs_kernel
>> > will no longer receive this in the hostdata string, so (3) i can just
>> > rip out my in-kernel hostdata-parsing gunk and instead send in the
>> > mountgroup id on my own (i have my own up/downcall channel)?  if i've
>> > got it right, then everything's a cinch and i'll shut up :)
>>
>> Yep.  Generally, the best way to uniquely identify and refer to a gfs
>> filesystem is using the fsname string (specified during mkfs with -t and
>> saved in the superblock).  But, sometimes it's just a lot easier have a
>> numerical identifier instead.  I expect this is why you're using the id,
>> and it's why we were using it for communicating about plocks.
>>
>> In cluster1 and cluster2 the cluster infrastructure dynamically selected a
>> unique id when needed, and it never worked great.  In cluster3 the id is
>> just a crc of the fsname string.
>>
>> Now that I think about this a bit more, there may be a reason to keep the
>> id in the string.  There was some interest on linux-kernel about better
>> using the statfs fsid field, and this id is what gfs should be putting
>> there.
>>
> In that case gfs2 should be able to generate the id itself from the
> fsname and it still doesn't need it passed in, even if it continues to
> expose the id in sysfs.
>
> Perhaps better still, it should be possible for David to generate the id
> directly if he really needs it from the fsname.
>
> Since we also have a UUID now, for recently created filesystems, it
> might be worth exposing that via sysfs and/or uevents too.
>
>> > say, one tangential question (i won't be offended if you skip it -
>> > heh): is there a particular reason that you folks went with the uevent
>> > mechanism for doing upcalls?  i'm just curious, given the
>> > seeming-complexity and possible overhead of using the whole layered
>> > netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
>> > (don't let the "rpc" fool you; it's a barebones, dead-simple pipe).
>> > -- and no, i'm not selling anything :)  my boss was asking for a list
>> > of differences between rpc_pipefs and uevents and the best i could
>> > come up with is the former's bidirectional.  Trond mentioned the
>> > netlink overhead and i wondered if that was actually a significant
>> > factor or just lost in the noise in most cases.
>>
>> The uevents looked pretty simple when I was initially designing how the
>> kernel/user interactions would work, and they fit well with sysfs files
>> which I was using too.  I don't think the overhead of using uevents is too
>> bad.  Sysfs files and uevents definately don't work great if you need any
>> kind of sophisticated bi-directional interface.
>>
>> Dave
>>
> I think uevents are a reasonable choice as they are easy enough to parse
> that it could be done by scripts, etc and easy to extend as well. We do
> intend to use netlink in the future (bz #337691) for quota messages, but
> in that case we would be using an existing standard for sending those
> messages.
>
> Netlink can be extended fairly easily, but you do have to be careful
> when designing the message format. I've not come across rpc_pipefs
> before, so I can't comment on that yet. I don't think we need to worry
> about overhead on sending the messages (if you have so much recovery
> message traffic that its a problem, you probably have bigger things to
> worry about!), and I don't see that netlink should have any more
> overhead than any other method of sending messages.

thanks!  again, i appreciate learning from other peoples' experiences.
 fyi, the rpc_pipefs stuff is only currently used in two places, i
believe -- by rpc.idmapd and rpc.gssd; just another of a surprisingly
wide variety of ways to do kernel<->userland communication.

thanks again, d.
>
> Steve.
>
>
>