The above description of loop avoidance assumed two things:
This parent discovery and selection is an important component of YTMP, though somewhat independent in the sense that different parent discovery and selection techniques can work with the YTMP tree management algorithms. This section outlines some of the issues and approaches to parent discovery and selection.
A key attributes of YTMP is that it scales to very large groups (thousands of members). Parent discovery and selection must likewise be scalable, which means that, except for small trees, members can't be expected to know of all other members, much less the distance to those members. Second, there are a number of factors, some of them orthogonal or even contradictory, that a member must consider in picking a parent.
I don't have any silver bullet approach to parent discovery and selection. Rather I have in mind a number of heuristics. This is an area where we need a lot of experimentation, both to find out how important good parent selection is, and to find good ways to do it.
When Parent Discovery and Selection is not Necessary
Just to be clear, I want to make sure the reader understands that all of this applies only to the situation where a member cannot join a (local IP multicast) cluster. The first thing a newly joining member (newcomer) always does is to check to see if it can join a cluster. It can even do this without contacting the rendezvous, if there are no other reasons, such as security or accounting, to contact the rendezvous.
All clusters for any given tree have a deterministically selectable IP multicast address created by hashing the Group ID. If a cluster has been established, which would be the case where other members already exist within the tight scoping, the newcomer simply joins the cluster (under most but not all circumstances, see below). If a cluster has not been established, the newcomer must then find a parent in the tree at large. In any event, however, the newcomer continues to listen on the cluster IP multicast address for other newcomers. (There is an algorithm, not described in this document, for how members discover other members on the cluster and elect a cluster head.)
Parent Discovery
Each member must maintain a list of potential parents. A potential parent at a minimum must not have the member in its root path, and must have capacity for accepting new children. The member does not in general maintain a connection with each potential parent, so information in the list is always to some extent out of date. The member does, however, try to maintain a connection with the most promising two or three stand-by potential parents and keep up to date with their status so that it can obtain a new parent quickly should it need to.
There are several means by which members can learn about potential parents. Certainly a newcomer learns of some when it contacts the rendezvous. In fact, for small groups, the rendezvous can keep a full list of the group membership and pass this onto newcomers. A member can also learn of potential parents from information contained in various control messages, such as other members' root paths or intent to join paths.
Otherwise, there are two primary means of discovering potential parents--one relaxed and one rushed. The rushed method is used when a member does not have enough stand-by potential parents (or worse, has no stand-by and no parent). This would be the case where the former stand-by potential parents either left the tree, or obtained a full complement of children.
What the member does is query the former stand-by, or any other potential parent in its list, to learn of its children. It then queries one or more of the children to see if they are appropriate stand-bys. If not (they too have full complements of children), it in turn queries their children, working its way leafward depth first. In this way a valid stand-by is found relatively quickly.
The relaxed method is used when a member has a parent and enough stand-bys, and is simply looking to keep its list of potential parents more-or-less up to date. To do this scalably, each member transmits a member discovery message to its parent using YDP tree-anycast. Frames sent via tree-anycast are pseudo-randomly sent neighbor to neighbor along the tree, and pseudo-randomly stop at some member. This member answers the message, thus getting discovered. It is then added to the list (or refreshed if it is already there), and if the list if full, the oldest entry may be taken out.
The pseudo-random neighbor selection we use favors going leafwards over going rootwards, making discovery of nodes closer in the tree more frequent, though by no means exclusive. By initially sending the discovery message to its parent, the member insures with high probability that the discovered member won't be an offspring (and therefore invalid as a parent).
Partition Discovery
As stated earlier, when a member cannot find a valid parent at all, it assumes it is the root of the tree. Of course it is possible that multiple members may fail to find a parent and all consider themselves the root of the tree. This results in a partitioned tree--one partition for each root.
When a member becomes a root, it both
If multiple members are doing this, then unless the mesh is also partitioned, and the rendezvous are down, the members will discover each other. At that point, they add each other to their potential parent lists and continue to follow the parent discovery algorithm, with the result that a single root will emerge.
Parent Selection
Having built up a list of potential parents, each member must determine which would be the best parent and stand-bys. The basic approach is for each member, over time, to query entries in their list and determine the appropriateness of each entry. How aggressively this is done depends on the volume of content and importance of a good-quality tree. The order in which it is done may be based on domain name commonality.
I assume too that any information learned about another member that is likely to remain stable over time (such as the distance to the member or the member's transmission capacity) can be stored locally in a database and used with multiple groups. This would be particularly useful in an enterprise environment where a member may encounter the same other members over and over.
All other things being equal, members should pick parents that are nearby, by which I mainly mean a small latency. If each member choses, among the other members that may validly be a parent, a nearby one, then the overall cost of the tree is kept low. This improves performance for the members while reducing load on the internet.
(Having said this, I acknowledge that there are types of applications, particularly those that have a hard upper-bound on end-to-end latency, that may require some kind of tree-wide optimization. This sort of tree is an issue for further study.)
There may also be other factors to consider in choosing a parent than just distance. Foremost among these is whether the parent has or is actively receiving content needed by the member. By way of explanation, content can be divided into three types, what I call buckets, spigots, and kegs. A bucket is a finite object, such as a file, whereby it doesn't matter to the application the order in which the contents of the object are received. The application does nothing until the entire object is received.
A spigot is a content ``stream'' with no specified beginning or end. A member joining the tree receives the contents of the spigot at whatever point it is being transmitted. The contents are received in the order they are sent (with the caveat that some applications only require best effort delivery).
A keg, like a bucket, is a finite object. Like a spigot, however, the object should be received in order from beginning to end. An example of a keg is a realNetworks file, whereby some portion of the file from the beginning is buffered, after which the file ``plays'' from the beginning while more is received.
Streams are not much of an issue when considering which parent to join. Every member will be receiving and transmitting from roughly the same place in the stream. With some care, we can make this happen with buckets as well. Because the application doesn't care in what order the content arrives, each member can arrange to transmit from roughly the same point in the bucket (file) it is actively receiving. This allows members to move anywhere in the tree and be able to pick up roughly where they left off (see Figure /refbucketEx1).
[IMAGE ]
Figure 9: Example of Bucket Distribution Topology Flexibility
Kegs are more difficult. Because each member must receive from the beginning, members that have received more of the keg cannot select as parent members that have received less. It can easily be the case that there are no such members with spare capacity for new children in a tree.
To see this, consider a tree where each member may have no more than two children. Consider a parent member P and its children C1 and C2 that high up in the tree (and therefore far along in the keg relative to lower members). Assume further that all other members in the tree that are as far or further along as C1 and C2 have no spare capacity for new children.
Now assume that P quits the tree. This leaves one ``vacancy'' in the tree because P's former parent now has capacity for one child. Assume that C1 attaches to P's former parent. C2 has no free member to join.
There is no real easy solution to this. It seems that C2 must find a parent/child pair of members such that the parent has received as much or more, and the child has received as much or less. C2 must displace the child of this pair by joining the parent and having the parent reject the child. The child, then, must do the same thing as C2, leading to a kind of domino effect as a series of members, each holding less of the keg, displace other members. (None of the details of how this would really go have been worked out.)
(Note that buckets can be distributed one bucket per group or multiple buckets per group. A good example of the latter would be netnews, with one yoid group for one netnews group, and with each message being a single bucket. Here even the buckets could be transmitted out of order. This could be called a bucket brigade. A group with multiple kegs is called, or at least it was back in my undergraduate days, a kegger.)