Video segmentation is the most fundamental process for appropriate index- ing and retrieval of video intervals. In general, video streams are composed 1 of shots delimited by physical shot boundaries. Substantial work has been done on how to detect such shot boundaries automatically (Arman et aI., 1993) (Zhang et aI., 1993) (Zhang et aI., 1995) (Kobla et aI., 1997). Through the inte- gration of technologies such as image processing, speech/character recognition and natural language understanding, keywords can be extracted and associated with these shots for indexing (Wactlar et aI., 1996). A single shot, however, rarely carries enough amount of information to be meaningful by itself. Usu- ally, it is a semantically meaningful interval that most users are interested in re- trieving. Generally, such meaningful intervals span several consecutive shots. There hardly exists any efficient and reliable technique, either automatic or manual, to identify all semantically meaningful intervals within a video stream. Works by (Smith and Davenport, 1992) (Oomoto and Tanaka, 1993) (Weiss et aI., 1995) (Hjelsvold et aI., 1996) suggest manually defining all such inter- vals in the database in advance. However, even an hour long video may have an indefinite number of meaningful intervals. Moreover, video data is multi- interpretative. Therefore, given a query, what is a meaningful interval to an annotator may not be meaningful to the user who issues the query. In practice, manual indexing of meaningful intervals is labour intensive and inadequate.