XML schema

Computer/Terms 2008. 4. 21. 18:44

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. An XML schema provides a view of the document type at a relatively high level of abstraction.

There are languages developed specifically to express XML schemas. The Document Type Definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two other very popular, more expressive XML schema languages are XML Schema (W3C) and RELAX NG.

The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.

Reference:
http://en.wikipedia.org/wiki/XML_schema

Posted by 알 수 없는 사용자
,

Binary search tree

Computer/Terms 2008. 4. 18. 15:46

In computer science, a binary search tree (BST) is a binary tree data structure which has the following properties:

- each node (item in the tree) has a value;
- a total order (linear order) is defined on these values;
- the left subtree of a node contains only values less than the node's value;
- the right subtree of a node contains only values greater than or equal to the node's value.

The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient.

Binary search trees can choose to allow or disallow duplicate values, depending on the implementation.

Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.

Operations
Operations on a binary tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine that computes the total order (linear order) on any two values. This comparator can be explicitly or implicitly defined, depending on the language in which the BST is implemented.

Searching
Searching a binary tree for a specific value can be a recursive or iterative process. This explanation covers a recursive method.

We begin by examining the root node. If the value we are searching for equals the root, the value exists in the tree. If the value we are searching for is less than the root, it must be in the left subtree. Similarly, if it is greater than the root, then it must be in the right subtree.

This process is repeated on each subsequent node until the value is found or we reach a leaf node. If a leaf node is reached and the searched value is not found, then the item must not be present in the tree.

Here is the search algorithm in the Python programming language:

def search_binary_tree(node, key):
     if node is None:
         return None  # key not found
     if key < node.key:
         return search_binary_tree(node.left, key)
     elif key > node.key:
         return search_binary_tree(node.right, key)
     else:  # key is equal to node key
         return node.value  # found key

This operation requires O(log n) time in the average case, but needs O(n) time in the worst-case, when the unbalanced tree resembles a linked list (degenerate tree).

Insertion
Insertion begins as a search would begin; if the root is not equal to the value, we search the left or right subtrees as before. Eventually, we will reach an external node and add the value as its right or left child, depending on the node's value. In other words, we examine the root and recursively insert the new node to the left subtree if the new value is less than the root, or the right subtree if the new value is greater than or equal to the root.

Here's how a typical binary search tree insertion might be performed in C++:

/* Inserts the node pointed to by "newNode" into the subtree rooted at "treeNode" */
 void InsertNode(struct node *&treeNode, struct node *newNode)
 {
     if (treeNode == NULL)
       treeNode = newNode;
     else if (newNode->value < treeNode->value)
       InsertNode(treeNode->left, newNode);
     else
       InsertNode(treeNode->right, newNode);
 }

The above "destructive" procedural variant modifies the tree in place. It uses only constant space, but the previous version of the tree is lost. Alternatively, as in the following Python example, we can reconstruct all ancestors of the inserted node; any reference to the original tree root remains valid, making the tree a persistent data structure:

def binary_tree_insert(node, key, value):
     if node is None:
         return TreeNode(None, key, value, None)
 
     if key == node.key:
         return TreeNode(node.left, key, value, node.right)
     if key < node.key:
         return TreeNode(binary_tree_insert(node.left, key, value), node.key, node.value, node.right)
     else:
         return TreeNode(node.left, node.key, node.value, binary_tree_insert(node.right, key, value))

The part that is rebuilt uses Θ(log n) space in the average case and Ω(n) in the worst case (see big-O notation).

In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but Ω(n) time in the worst case.

Another way to explain insertion is that in order to insert a new node in the tree, its value is first compared with the value of the root. If its value is less than the root's, it is then compared with the value of the root's left child. If its value is greater, it is compared with the root's right child. This process continues, until the new node is compared with a leaf node, and then it is added as this node's right or left child, depending on its value.

There are other ways of inserting nodes into a binary tree, but this is the only way of inserting nodes at the leaves and at the same time preserving the BST structure.

Deletion
There are several cases to be considered:

- Deleting a leaf: Deleting a node with no children is easy, as we can simply remove it from the tree.
- Deleting a node with one child: Delete it and replace it with its child.
- Deleting a node with two children: Suppose the node to be deleted is called N. We replace the value of N with either its in-order successor (the left-most child of the right subtree) or the in-order predecessor (the right-most child of the left subtree).

Once we find either the in-order successor or predecessor, swap it with N, and then delete it. Since both the successor and the predecessor must have fewer than two children, either one can be deleted using the previous two cases. A good implementation avoids consistently using one of these nodes, however, because this can unbalance the tree.

Here is C++ sample code for a destructive version of deletion. (We assume the node to be deleted has already been located using search.)

void DeleteNode(struct node * & node) {
     if (node->left == NULL) {
         struct node *temp = node;
         node = node->right;
         delete temp;
     } else if (node->right == NULL) {
         struct node *temp = node;
         node = node->left;
         delete temp;
     } else {
         // In-order predecessor (rightmost child of left subtree)
         // Node has two children - get max of left subtree
         struct node **temp = &node->left; // get left node of the original node
 
         // find the rightmost child of the subtree of the left node
         while ((*temp)->right != NULL) {
             temp = &(*temp)->right;
         }
 
         // copy the value from the in-order predecessor to the original node
         node->value = (*temp)->value;
 
         // then delete the predecessor
         DeleteNode(*temp);
     }
}

Although this operation does not always traverse the tree down to a leaf, this is always a possibility; thus in the worst case it requires time proportional to the height of the tree. It does not require more even when the node has two children, since it still follows a single path and does not visit any node twice.

Here is the code in Python:

def findSuccessor(self):
    succ = None
    if self.rightChild:
        succ = self.rightChild.findMin()
    else:
        if self.parent.leftChild == self:
            succ = self.parent
        else:
            self.parent.rightChild = None
            succ = self.parent.findSuccessor()
            self.parent.rightChild = self
        return succ
 
def findMin(self):
    n = self
    while n.leftChild:
        n = n.leftChild
    print 'found min, key = ', n.key
    return n
 
def spliceOut(self):
    if (not self.leftChild and not self.rightChild):
        if self == self.parent.leftChild:
            self.parent.leftChild = None
        else:
            self.parent.rightChild = None
    elif (self.leftChild or self.rightChild):
        if self.leftChild:
            if self == self.parent.leftChild:
                self.parent.leftChild = self.leftChild
            else:
                self.parent.rightChild = self.leftChild
        else:
            if self == self.parent.leftChild:
                self.parent.leftChild = self.rightChild
            else:
                self.parent.rightChild = self.rightChild
 
def binary_tree_delete(self, key):
    if self.key == key:
        if not (self.leftChild or self.rightChild):
            if self == self.parent.leftChild:
                self.parent.leftChild = None
            else:
                self.parent.rightChild = None
        elif (self.leftChild or self.rightChild) and (not (self.leftChild and self.rightChild)):
            if self.leftChild:
                if self == self.parent.leftChild:
                    self.parent.leftChild = self.leftChild
                else:
                    self.parent.rightChild = self.leftChild
            else:
                if self == self.parent.leftChild:
                    self.parent.leftChild = self.rightChild
                else:
                    self.parent.rightChild = self.rightchild
        else:
            succ = self.findSuccessor()
            succ.spliceOut()
            if self == self.parent.leftChild:
                self.parent.leftChild = succ
            else:
                self.parent.rightChild = succ
            succ.leftChild = self.leftChild
            succ.rightChild = self.rightChild
    else:
        if key < self.key:
            if self.leftChild:
                self.leftChild.delete_key(key)
            else:
                print "trying to remove a non-existant node"
        else:  
            if self.rightChild:
                self.rightChild.delete_key(key)
            else:
                print "trying to remove a non-existant node"

Traversal
Once the binary search tree has been created, its elements can be retrieved in order by recursively traversing the left subtree of the root node, accessing the node itself, then recursively traversing the right subtree of the node, continuing this pattern with each node in the tree as it's recursively accessed. The tree may also be traversed in pre-order or post-order traversals.

def traverse_binary_tree(treenode):
     if treenode is None: return
     left, nodevalue, right = treenode
     traverse_binary_tree(left)
     visit(nodevalue)
     traverse_binary_tree(right)

Traversal requires Ω(n) time, since it must visit every node. This algorithm is also O(n), and so it is asymptotically optimal.

Sort
A binary search tree can be used to implement a simple but inefficient sorting algorithm. Similar to heapsort, we insert all the values we wish to sort into a new ordered data structure — in this case a binary search tree — and then traverse it in order, building our result:

def build_binary_tree(values):
     tree = None
     for v in values:
         tree = binary_tree_insert(tree, v)
     return tree
 
 def traverse_binary_tree(treenode):
     if treenode is None: return []
     else:
         left, value, right = treenode
         return (traverse_binary_tree(left), [value], traverse_binary_tree(right))

The worst-case time of build_binary_tree is Θ(n2) — if you feed it a sorted list of values, it chains them into a linked list with no left subtrees. For example, build_binary_tree([1, 2, 3, 4, 5]) yields the tree (None, 1, (None, 2, (None, 3, (None, 4, (None, 5, None))))).

There are several schemes for overcoming this flaw with simple binary trees; the most common is the self-balancing binary search tree. If this same procedure is done using such a tree, the overall worst-case time is O(nlog n), which is asymptotically optimal for a comparison sort. In practice, the poor cache performance and added overhead in time and space for a tree-based sort (particularly for node allocation) make it inferior to other asymptotically optimal sorts such as quicksort and heapsort for static list sorting. On the other hand, it is one of the most efficient methods of incremental sorting, adding items to a list over time while keeping the list sorted at all times.

Reference:
http://en.wikipedia.org/wiki/Binary_search_tree

Posted by 알 수 없는 사용자
,

The Smart Common Input Method platform (SCIM) is an input method (IM) platform containing support for more than thirty languages (CJK and many European languages) for POSIX-style operating systems including Linux and BSD.

SCIM is a development platform to reduce development times of IM in software. It uses a clear architecture and provides a simple and powerful programming interface.

SCIM is a common IM platform written in the C++ language. It abstracts the input method interface into several classes and attempts to make the classes more simple and independent from each other. With the simpler and more independent interfaces, developers can write their own input methods in fewer lines of code.

SCIM is a modularized IM platform, and as such, components can be implemented as dynamically loadable modules, thus can be loaded during runtime at will. For example, input methods written for SCIM could be IMEngine modules, and users can use such IMEngine modules combined with different interface modules (FrontEnd) in different environments without rewrite or recompile of the IMEngine modules, reducing the compile time or development time of the project.

SCIM is a high level library, similar to XIM or IIIMF, however, SCIM claims to be simpler than either of those IM platforms. SCIM also claims that it can be used alongside XIM or IIIMF. SCIM can also be used to extend the input method interface of existing application toolkits, such as GTK+2 and Qt via IMmodules.

Reference:
http://en.wikipedia.org/wiki/Scim

Posted by 알 수 없는 사용자
,