BU CAS CS 113
Introduction to Computer Science II with Intensive C
Fall 1997


Assignment 8: 2-3 Trees


Last Modified: Fri Dec 5 14:41:37 1997

Deadline

December 12, 1997 NO LATE WORK WILL BE ACCEPTED. Be sure to turn in everything you have by the deadline.

What to Submit

Be sure the files you submit have exactly the following names: hw8-ttt.c, hw8-symtab.c hw8-inv.c hw8-Makefile .

These programs are to be electronically submitted by using the submit program on csa. The code you submit should conform with the program assignment guidelines.

Implementing a 2-3 Tree

Implement hw8-ttt.h, which contains the following interface for a 2-3 tree.

/*
 * File: ttt.h
 * -----------
 * This file provides an interface for a general 2-3 search
 * tree facility that allows the client to maintain control of
 * the structure of the node.
 */

#ifndef _ttt_h
#define _ttt_h

#include "genlib.h"

/* Types: clientDataT, keyT, clientBlockT
 * --------------------------------------
 * These are all void*, but to emphasize the reasons for the void*,
 * keyT is used for keys. 
 * clientBlockT for the blocks of memory
 * (allocated by the client) stored in the tree.  These blocks must
 * include both a key and the data assoiciated with the key.
 * clientDataT is used for any additional piece of data which client
 * might pass along to callback functions.
 */
typedef void *clientDataT, *keyT, *clientBlockT;


/*
 * Type: tttADT
 * ------------
 * This is the abstract type for a binary search tree.
 */

typedef struct tttCDT *tttADT;


/*
 * Type: compFnT
 * ------------
 * This type defines the type space of comparison functions.
 * Both arguments to compare functions are void*, but the following
 * convention will be used: the first argument is always an "isolated" key,
 * the second takes the address a client allocated block of memory
 * which contains a key.  The two keys are compared and an integer
 * is returned which is negative, 0, or positive depending on
 * whether the first key is less than, equal to, or
 * greater than the second.
 *
 */

typedef int (*compFnT)(const keyT p1, const clientBlockT p2);


/*
 * Type: nodeFnT
 * -------------
 * This type defines the class of callback functions for nodes.
 */

typedef void (*nodeFnT)(clientBlockT blockPtr, clientDataT clientData);


/*
 * Function: NewTTT
 * Usage: ttt = NewTTT(compFn);
 * -------------------------------------------------------
 * This function allocates and returns a new empty binary search
 * tree.  The argument is a comparison function. 
 */

tttADT NewTTT(compFnT compFn);

/*
 * Function: FindTTTNode
 * Usage: np = FindTTTNode(ttt, key);
 * -----------------------------------
 * This function applies the 2-3 search algorithm to find a
 * particular key in the tree represented by ttt.  The second
 * argument represents the address of the key in the client
 * space rather than the key itself, which makes it possible to
 * use this package for keys that are not pointer types.  If a
 * node matching the key appears in the tree, FindTTTNode
 * returns a pointer to it; if not, FindTTTNode returns NULL.
 */

clientBlockT FindTTTNode(tttADT ttt, keyT kp);

/*
 * Function: InsertTTTNode
 * Usage: np = InsertTTTNode(ttt, key, clientBlock);
 * -------------------------------------------------
 * This function is used to insert a new client block into a binary search
 * tree.  The ttt and key arguments are interpreted as they are
 * in FindTTTNode.  If the key already exists, the associated client
 * block is overwritten.  If the key is not found, then the clientBlock 
 * is added to the tree.  
 */

void InsertTTTNode(tttADT ttt, keyT kp, clientBlockT clientBlock);


/*
 * Function: MapTTT
 * Usage: MapTTT(fn, ttt, order, clientData);
 * ------------------------------------------
 * This function calls fn on every node in the binary search tree,
 * passing it a pointer to a node and the clientData pointer.  The
 * type of traversal is given by the order argument, which must
 * be one of the constants InOrder, PreOrder, or PostOrder.
 */

typedef enum { InOrder, PreOrder, PostOrder } traversalOrderT;

void MapTTT(nodeFnT fn, tttADT ttt, traversalOrderT order,
            clientDataT clientData);


/*
 * Note: it is possible to implement functions for deleting and freeing,
 * but you are not required to do so.  Also, note that this implementation
 * may leak memory when an item is added which has a key which is already
 * in the tree.
 */

#endif

Searching in Two-three Trees

Recall from lecture that a two-three tree is similar to a binary search tree except that the tree has two kinds of nodes: It is possible to insert into a 2-3 tree in such a way that all leaves are on the same level, thus avoiding the inefficiencies which can occur in a binary search tree which becomes unbalanced.

Inserting into 2-3 Trees

The algorithm for insertig into a 2-3 tree is not particularly hard, once you understand the process and see how to view it recursively. First, consider the process holistically, as if you could see the entire tree. Suppose it looked like


		(h|n)
             /    |    \
	 (c) 	 (j)	(q|v) 
Inserting f would be easy, it can join c and form a node with 2 data values. The result is:
	 (h|n) 
      /    |    \ 
  (c|f)   (j)	(q|v) 
But if we try to insert z, things are more interesting. First, we try to insert into a leaf, but (q|v) is "full". So we split the node into two with one data value each, promoting the "middle" data value. In this case, (q) and (z) are the new leaves, and (v) gets promoted to (h|n) which is also "full", so again we split and promote the middle (this time n). Now the result is:
	 (n) 
        /   \ 
     (h)      (v) 
    /   \     /  \ 
(c|f)   (j)  (q)  (z) 
The key to writing code for this is understand a more local perspective, namely the view from a single node of the tree. Suppose in the preceeding example you take the view of the node (h|n) while we insert z. The process goes someting like this:
  1. Since z > n we insert z in the right subtree. (Recursive step: the right subtree is a smaller tree.)
  2. After the insertion of z into the right subtree, we find out that there has been a promotion. The return value of the recursive function will hve to tell us what we need to know about the promotion. In this case we need to know that data value is v and the trees to the left and right of (v). That is, the return value is
    
    	(v)
            / \
         (q)   (z)
    
  3. Since there is a promotion, we have to put it in our node.

So the process really boils down to two phases: down and up. On the way down, all we do is continue the search in the appropriate subtree; on the way up, we deal with promotion.

Well, almost. Unfortunately, we have ignored two things: two extreme cases. Fortunately, they are not too bad to deal with. The first case should be obvious--we have no base case to the recursion! Clearly the base case has something to do with the leaves. It turns out to be easiest to consider the base case to be an empty tree (represented in C by a NULL). That is, we will not check to see if a node is a leaf, rather we will deal with the situation when a NULL is being handlesd by the code.

What do we do with an empty tree? If the entire tree were empty, we would build a new root node. Even if the tree is not entirely empty, the idea is the same,

The parent node, which will receive the return value, will have to deal with the promotion as described above.

The other extreme case is at the root. What if the root node returns something to promote? This is the case in the example above. The root node (h|n) inserts into its right subtree and receives a promotion in return. The root is already full, so this produces another promotion which results in a new root for the tree. This will cause the CDT to need a new value for its root, namely the promoted tree.

Putting this all together, we see that the recursive insertion function will look something like:

treeT RecInsert( ..., treeT tree, ...)

    if (tree == NULL) { return NewPromotedTree (...); }

    /* figure out which tree to search using key comparisons */

    promotedNodePtr = RecInsert(...)

    if (promotedNodePtr == NULL) {return NULL;}

    /* deal with promotion (set links/data in this node and promoted node) */

    return promotedNodePtr;  /* will be NULL if previously had 1 data */
This will be called from a wrapper which will have something like the following in it:
    promotedNodePtr = RecInsert(..., ttt->root, ...)
    if (promotedNodePtr != NULL) { ttt-> root = promotedNodePtr; }
The psuedo-code above is, of course, very sketchy and leaves out a lot of details, but it does demonstrate the algorithm and give a good outline from which to start. Be sure to look at the information on the lab web pages as well. Rob has put together some code for 2-3 trees which should be very helpful in completing this assignment.

Using a Symbol Table

Note that the 2-3 tree is not quite the same as a symbol table, but it can be used to implement a symbol table. Implement hw8-symtab.h (a symbol table) using the hw8-ttt.h. Note that you are the implementor of the symbol table and the client of the 2-3 tree simulataneously.

Your program hw7-inv.c should still work with this new implementation of a symbol table, except that there is no delete now, since I am not requiring you to delete from 2-3 trees. In fact, it will have the nice property that the lists will now be in alphabetical order instead of "random". Make a copy of hw7-inv.c and call it hw8-inv.c Modify it to work with hw8-symtab.h. (If it didn't work quite right last time, fix it.)

Makefiles

I have been providing you with Makefiles for each assignment. For this assignment, you must submit hw8-Makefile which can compile your assignment. Your Makefile should produce the following files:

Academic Honesty and Collaboration

It is reasonable to discuss with others possible general approaches to problems. It is unreasonable to work together on a detailed solution, to copy a solution, or to give away a solution. If your common discussion can be detected by looking at the solutions, then there is too much collaboration. Such instances of academic dishonesty may result in a course grade of F or expulsion from Boston University.

Do not allow your work to be used by others:

Warning: If someone cheats by using your work, you will also be penalized.