We define the document set exploration task as the production of an application-specific categorization. Computers can help by producing visualizations of the semantic relationships in the corpus, but the approach of directly visualizing the vector space representation of the document set via multidimensional scaling (MDS) algorithms fails to reveal most of the structure because such datasets are mostly disconnected, that is, the preponderance of inter-item distances are large and roughly equal. Interpreting these large distances as disconnection between items yields a decomposition of the dataset into distinct components with small inter-item distances, that is, clusters. We propose the Disconnected Component Tree (DiscoTree) as a data structure that encapsulates the hierarchical relationship between components as the disconnection threshold changes, and present a sampling-based algorithm for efficiently computing the DiscoTree in O(N) time where N is the number of items. We present the MoDiscoTag application which combines the DiscoTree with an MDS view and a tagging system, and show that it succeeds in resolving structure that is not visible using previous dimensionality reduction methods. We validate our approach with real-world datasets of WikiLeaks Cables and War Logs from the journalism domain.
Abstract: In sampled data systems the controller receives periodically sampled state feedback about the evolution of a continuous time plant, and must choose a constant control signal to apply between these updates; however, unlike purely discrete time models the evolution of the plant between updates is important. In contrast, for systems with nonlinear dynamics existing reachability algorithms---based on Hamilton-Jacobi equations or viability theory---assume continuous time state feedback and the ability to instantaneously adjust the input signal. In this paper we describe an algorithm for determining an implicit surface representation of minimal backwards reach tubes for nonlinear sampled data systems, and then construct switched, set-valued feedback controllers which are permissive but ensure safety for such systems. The reachability algorithm is adapted from the Hamilton-Jacobi formulation proposed in Ding & Tomlin . We show that this formulation is conservative for sampled data systems. We implement the algorithm using approximation schemes from level set methods, and demonstrate it on a modified double integrator.
Despite an abundance of technical literature on dimension reduction (DR), our understanding of how real data analysts are using DR techniques and what problems they face remains largely incomplete. In this paper, we contribute the first systematic and broad analysis of DR usage by a sample of real data analysts, along with their needs and problems. We present the results of a two-year qualitative research endeavor, in which we iteratively collected and analyzed a rich corpus of data in the spirit of grounded theory. We interviewed 24 data analysts from different domains and surveyed papers depicting applications of DR. The result is a descriptive taxonomy of DR usage, and concrete real-world usage examples summarized in terms of this taxonomy. We also identify seven gaps where user DR needs are unfulfilled by currently available techniques, and three mismatches where the users do not need offered techniques. At the heart of our taxonomy is a task classification that differentiates between abstract tasks related to point clusters and those related to dimensions. The taxonomy and usage examples are intended to provide a better descriptive understanding of real data analystsí practices and needs with regards to DR. The gaps are intended as prescriptive pointers to future research directions, with the most important gaps being a lack of support for users without expertise in the mathematics of DR, and an absence of DR techniques for comparing explicit groups of dimensions or for relating non-linear embeddings to original dimensions.
Extracting ontological relationships (e.g., isa and hasa) from free-text repositories (e.g., engineering documents and in- struction manuals) can improve usersí queries, as well as benefit applications built for these domains. Current methods to extract ontologies from text usually miss many meaningful relationships because they either con- centrate on single-word terms and short phrases or neglect syntactic relationships between concepts in sentences. We propose a novel pattern-based algorithm to find onto- logical relationships between complex concepts by exploit- ing parsing information to extract multi-word concepts and nested concepts. Our procedure is iterative: we tailor the constrained sequential pattern mining framework to discover new patterns. Our experiments on three real data sets show that our algorithm consistently and significantly outperforms previous representative ontology extraction algorithms.
We introduce a method for learning low-dimensional linear feedback strategies for the control of physics-based animated characters. Once learned, these allow simulated characters to respond to changes in the environment and changes in goals. The approach is based on policy search in the space of reduced-order linear output feedback matrices. We show that these can be used to replace or further reduce manually-designed state and action abstractions. The approach is sufficiently general to allow for the development of unconventional feedback loops, such as feedback based on ground reaction forces to achieve robust in-place balancing and robust walking. Results are demonstrated for a mix of 2D and 3D systems, including tilting-platform balancing, walking, running, rolling, targeted kicks, and several types of ball-hitting tasks.
If you have any questions or comments regarding this page please send mail to email@example.com.