**Camilo RostokerNovember 4, 2005**

Financial markets are a fruitful area for data exploration, and complementary to data exploration is the need for an efficient and effective visualization tool. This project aims at developing a visualization tool to aid in the exploration of financial market data. We propose an extension to H3, a navigation and visualization tool for hypberbolic spaces. The financial market data is transformed into a graph which is then partioned into cliques, quasi-cliques and independent sets. This type of partioning leads to sparse, dense clusters of vertices, an ideal candidate for a hypberbolic layout. We also introduce a novel characterization of glyphs within H3 that are used to represent the underlying graph structure without revealing the actual vertices and edges. We test our approach on a dataset consisting of 6556 stocks measurements taken every day over a 500-day period.

Recently, several researchers have proposed novel graph-theoretic approaches to clustering. In [5], researchers have shown an interesting connection between clustering and finding all maximal cliques in a given undirected graph. Other similar approaches generalize the notion of defining clusters as maximal cliques to a definition that allows clusters to be identified as highly-dense, but not necessarily complete, subgraphs[1]. Such dense subgraphs are often referred to as quasi-cliques -- more formally, as γ-quasi-cliques, which specifies that the subgraph must have an edge density of γ (0 < γ < 1). The system we have developed is in need of a visualization tool, and works as follows. First, we perform a robust correlation computation, using the Marona[4] method, to construct a similarity matrix for the data items being explored. We then threshold the resulting correlation matrix to produce a graph, where nodes represent stocks and an edge indicates that the similarity (correlation in this cass) between two stocks exceeds the threshold. We perform Phased Local Search[7], a state-of-the-art Stochastic Local Search algorithm, on the graph to find cliques and independent sets (and soon to include quasi-cliques). Our motivation for this project is recent results showing that, in the context of financial market data, cliques represent groups of stocks that display similar trading patterns, while independent sets represent groups of stocks that are completely diversified from all others in the given dataset[3]. Once the clusters of vertices have been found in the graph (i.e. cliques, quasi-cliques and independents), an appropriate visualization tool must be employed. Depending on the context in which this technique is being used, the accompanying visualization tool will inevitably need to support application-specific constraints and requirements. As mentioned, in our proposed project, we will be looking at financial market data, which corresponds to having stocks as vertices in the graph, and an edge (or lack thereof) representing some relationship between the two stocks. To display the results, we have chosen H3, a tool for layout and interactive navigation of node-link graphs in 3D hyperbolic space[6].

There is a plethora of graph drawing and visualization tools available for both academic research and commercial uses. Most of the publicly available tools are general in the types of graphs they were designed for, but all offer some advantages and disadvantages over others. For example, Tulip[2] is a graph drawing tool with excellent scalability and powerful drawing routines, but the complexity/learning curve lends itself to more advanced users. Graphviz is a toolkit designed for static layouts of small data, is easy to use and has been around for several years, but lacks the interaction and scalability of other graph drawing toolkits. Other systems offer specialised graph drawing features for the target application domain. For example, Cytoscape is a tool that allows complex networks of genes and proteins to be visualized, and includes support for various other domain-specific tasks. H3 is a library that provides layout and interactive navigation of node-link graphs in 3D hyperbolic space. The key features of this system is that the hyperbolic view allows the user to see a great deal of the context around the current focus node.

In our proposed project, the graphs to be visualized will be dense clusters of vertices with edges connecting the clusters. The goal of the visualization will be to identify stocks within clusters (intra-cluster relationships), but also see how stocks within clusters are related to stocks from other clusters (inter-cluster relationships). We therefore believe that the H3 Viewer is an ideal visualization approach for the problem at hand as it will allow the user to focus in on vertices within clusters while keeping the global structure within context (focus+context), and will provide an effective navigational tool for efficiently exploration the market graph. Moreover, clusters outside of the current focus area do not need to be explicitly drawn. A user may want to know about clusters connected to the one in context, and may want to know some basic properties about it, but does not necessarily need to see every single vertex explicitly drawn (especially if they are far from the current focus and would be drawn too small to see anyways). We propose an extension to H3 that will display a summary of a cluster using a glyph which represents the useful properties of the underlying graph structure (i.e. the max-clique, quasi-clique or indendepent set). The 2 summary, which will be an extra vertex in the graph, will then be connected to the actual vertices forming the cluster which it represents. Furthermore, we can extend the keyboard commands to allow a user to explicitly control the appearance of the graph, such as which clusters are drawn, which edges within the clusters to draw, etc.

November 16 | Finished specifying concepts to be implemented Starting implementation |

November 30 | Prototype of concepts implemented |

December 10 | Working implementation of all concepts Start writeup |

December 16 | Project write complete Start presentation |

November 18 | Presentation complete |

We have proposed H3 as a visualization solution to the specific problem of exploring clusters within the market graph. We chose H3 because it allows a focus+context view of the overall graph structure, with an intuitive and simple interface for interactively navigating through the graph. We also have identified possible extensions to H3 particularly well-suited for viewing large graphs with highly dense but sparse clusters in the graph.

- Javed A. Aslam, Ekaterina Pelekhov, and Daniela Rus. The star clustering algorithm for
static and dynamic information organization. Journal of Graph Algorithms and Applications,
8(1):95-121, 2004.

- D. Auber. Tulip. In P. Mutzel, M. Junger, and S. Leipert, editors, 9th Symp. Graph Drawing,
volume 2265 of Lecture Notes in Computer Science, pages 335-337. Springer-Verlag, 2001.

- Vladimir Boginski, Sergiy Butenko, and Panos M. Pardalos. Mining market data: A network
approach.

- James Chilson, Raymond Ng, Alan Wagner, and Ruben Zamar. Parallel computation of
high dimensional robust correlation and covariance matrices. In KDD 04: Proceedings of
the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining,
pages 533-538, New York, NY, USA, 2004. ACM Press.

- F. Glover, B. Alidaee, and H. Wang. Clustering of microarray data via clique partitioning.
Journal of Combinatorial Optimization, 2005.

- Tamara Munzner. H3: Laying out large directed graphs in 3d hyperbolic space. In Proceedings
of the 1997 IEEE Symposium on Information Visualization, pages 2-10, 1997.

- Wayne Pullan. Phased local search. TBA.