Using libstree is hopefully really quite simple. As an example, let's walk through one of the test programs found in the test directory of the source distribution: lcshex.c, which calculates the longest common byte sequence of the programs you pass it as command line arguments [1].
Since libstree operates on arbitrary strings, it needs a string model that leaves the implementation of the common string operations up to the user. String families that use a given set of such operations is called a string class (called LST_StringClass). Strings are instances of the LST_String type, and suffix trees are instances of LST_STree type. Each string belongs to a string class. Since it will often be the case that an application uses only a single class of strings (e.g., byte strings), there is a default string class, whose properties you can specify once and then forget about it.
libstree currently uses three string-class-specific operations:
A function to compare string items (LST_StringItemCmpFunc)
A function to copy string items (LST_StringItemCopyFunc)
A function to convert a single string into an ASCII string (LST_StringPrintFunc)
Our little test program works on binary files, so we can use the compare and copy function, but for output, we prefer hexadecimal strings. libstree already provides a function that does this, and we use it to override the default.
Let's look at some code. We will need a suffix tree and a few strings; also, we override the output method as describes above.
LST_STree *tree; LST_StringSet *set, *result; LST_String *string; /* Create a string class with a special print method, otherwise * we can use the defaults: */ lst_stringclass_set_defaults(NULL, NULL, lst_string_print_hex); |
Since most applications will have to handle multiple strings at a time, a container for a number of strings is convenient. libstree provides the type LST_StringSet that can hold a set of strings. Let's allocate such a set:
/* Create a string set to conveniently hold all our strings: */ set = lst_stringset_new(); |
Now, we check for each of the program names that the user passed in, whether the file exists and if we can read it. We then allocate a chunk of memory and read the entire program code into it:
FILE *f; u_char *data; struct stat st; /* stat() the file, open file handle etc, now read content: */ if (fread(data, 1, st.st_size, f) != (size_t) st.st_size) { printf("Error reading %s -- skipping.\n", argv[i]); free(data); continue; } |
We're ready to create a string object from this bytesequence and add it to our string set.
string = lst_string_new(data, 1, st.st_size); lst_stringset_add(set, string); |
Now, let's build a suffix tree for all the strings we just put into our string set:
/* Create a suffix tree for all strings in the set: */ tree = lst_stree_new(set); |
Now find the longest common substring of all those strings.
/* Find longest common substring(s) */ result = lst_alg_longest_common_substring(tree, 0, 0); |
Note that the result is also a string set -- there may be multiple longest strings, of the same length. You can limit the results by requesting only substrings of a given minimum and maximum length, see lst_alg_longest_common_substring(). Finally, let's print each of the strings out to the console. The string set contents can be iterated using lst_stringset_foreach():
/* Print them out: */ lst_stringset_foreach(result, string_cb, NULL); |
That's it. Please browse the other test programs to get a better feeling for the API.
[1] | What do you mean, "What is this good for!?" :)" |