Manufacturing and design processes often generate large scale data sets with lots of numeric and nominal attributes. Discovering and predicting the hidden pattern or relationships among all of these data attributes is pivotal in figuring out the crucial factors that affect the manufacturing process and assisting the improvement of the production quality accordingly. This ongoing project specifically targets at this issue by building up a robust data mining system which can efficiently deal with high dimensional large data sets. The techniques we have investigated include decision trees, decision rules, neural networks and some other traditional machine learning methods. Our current interests focus on applying some adaptive evolutionary computational methods to solve complex problems or evolve complex systems. One of the promising approaches we stick to is using Gene Expression Programming (GEP) algorithm. Belonging to the family of Genetic Algorithms (Gas), GEP is a recently-developed evolutionary algorithm that is capable of evolving computer programs and predicts mathematical functions from experimental data. Because of its linear chromosome representation and its separation of the solution and search space, GEP dramatically improves upon traditional genetic programming with respect to complexity and time efficiency, and can solve various types of modeling and optimization problems. Our experiments conducted on multi-category pattern classification problems have demonstrated the capability of GEP to mine accurate but more compact classification rules, compared to traditional machine learning algorithms. Further efforts include adding incremental learning features to the algorithm for its better performance and applying our data mining tools to more practical manufacturing problems.
Our interests in natural language processing mainly reside on how to use machine learning methods to accomplish certain tasks which mainly depend on human's work nowadays. Summarization is one of those such that we can not imagine our everyday life without it. Every morning and evening the traffic reports are summaries, news headlines are summaries, a trailer can also be regarded as the summary of a movie, and an abstract of a scientific article is of course the best representative. Email can be treated as one of the greatest invention in the 20th century. However, some of us may get hundreds of emails per day, and we may not be able to read all of them, since we have other things to take care. But we do not want to miss the important information. Same dilemmas occur in reviewing scientific research articles, browsing web pages, and reading newspaper and magazines, etc. Today's advanced information technology often gives us not the life of ease, but more tension. Automated text summarization is what we are currently working on to alleviate the side-effects of the information overload. It condenses the content of a document and presents the most salient to the users, and the users will know whether they should go on to read the whole document.
Design of various routing schemes has dominated the discussions in the computer networking community for a long time.This underlines the importance of the topic. Though almost all routing protocols have been formulated based on detailed functional parameters of the system there is no clear definition of optimality(evaluation is done by simulating a network and testing the protocol). In this work, we present a decision theoretic approach to modeling a network routing protocol which would function optimally in the framework. This formal definition of optimality would help us compute E-optimal routing algorithms. The various routing protocols fall out as different policies in such a framework.Exchanging information is an essential part of any routing protocol in computer networks. The objective of any such method is to let the system achieve what is termed as "sufficient global knowledge" to work in an optimal fashion. Network routing is not an easy problem as such a system is not only highly dynamic but also the decision making system should be highly responsive. The study tries to describe how we can model both static and dynamic routing protocols in networks using the Markov decision processes framework. We deal with the problem in stages , modeling the perfect information scenario first, proceeding to an uncertain model and then try to find optimal information sharing strategies that improve the performance of the underlying data routing mechanism.
Partially Observable Markov Decision Processes (POMDP) have been proven to be very useful in decision making of single agents when the environment of an agent can be modelled as a stochastic process. An additional complication arises, when there are many intelligent self-interested agents working in an environment. An interesting question is how will an agent model another intelligent agent? This work aims to address this issue. The research proposes a new general framework, in which an agent can model another "intelligent" agent working in the same environment.
We have developed an intelligent automated DNA restriction mapping tool useful in problems related to the worldwide Human Genome Project. The purpose of this system is to map these restriction enzyme cutting sites back onto the DNA molecule by determining the original order of the digest segments. This is difficult because segment lengths are not known exactly. The tool that is available for public download and use, uses Pratt's separation theory, Dempster and Shafer's theory of evidencial reasoning, and heuristic search for finding the proper arrangement of digest segments.
Generating small models for which there may exist very little training data presents a crucial problem in computational biology, namely the trade-off between model specificity and under-fitting the data. There are a bevy of superior modeling techniques; however, certain domain specific problems, such as modeling the regulatory regions in intergenic DNA impose constraints on the modeling process due to the lack of sufficient data. We have developed a general motif modeling system (named "hendrix") whose purpose is to model short gapless motifs. Once trained, these motifs can be used to search new data.
This project involves developing an intelligent system that chooses a set of near-optimal setup parameters for an electronic assembly machine to achieve maximum production throughput. This ongoing project explores the simulation and use of various search techniques for finding these parameters. The techniques we have investigated to date include the use of genetic algorithms, local search, tabu search and expert systems.
The aim of our research is to understand and automate the mechanisms by which language can emerge among artificial, knowledge-based and rational agents. Our ultimate goal is to design and implement agents that, upon encountering other agent(s) with which they do not share an agent communication language, are able to initiate creation of, and further able to evolve and enrich, a mutually understandable agent communication language (ACL).
We have investigated the application of machine-learning tools and techniques for inducing large-scale probabilistic models from a raw real-world medical database. Such probabilistic models are typically employed in medical decision support systems. The real-world datasets used to build knowledge-bases in decision systems tend to be dirty, containing a substantial amount of errant values. We have developed two novel context-based error detection and correction techniques for cleansing dirty datasets. A probabilistic error model using a context-driven structure and a pattern-based approach to clean temporal data and group records that exhibit similarity based on their contextual ordering is also presented. We have also studied the effectiveness of Bayesian network construction techniques by constructing and testing three different types of Bayesian networks
ITS (Intelligent Transportation Systems) involves improving our existing roadway transportation system through the use of information technology. The AI Lab's interests are in how best to gather, process and disseminate this information to the public's greatest benefit. So far, our efforts have concentrated on three main areas: ADVANCE, GCM, and Data Fusion.
ADVANCE: (Sponsor - Illinois Department of Transportation and US Department of Transportation(FHWA)
Traffic congestion wastes 2 billion gallons of fuel per year in the US alone. 135 million US drivers spend 2 billion hours trapped in traffic per year. The total cost to Americans due to trafficexceeds $100 billion per year. Given these staggering numbers, we would like to somehow ease traffic burdens through the use of route planning based on optimization of various criteria. Through the use of both static and dynamic data gathered from current traffic conditions, we would like to be able to answer questions such as: how do I get to X, what is the fastest way to get to X, and where is the nearest X? We have investigated or are investigating the use of path planning, data fusion from multiple sources, automatic accident detection, short-term travel prediction, and better man-machine interfaces for answering these questions:
GCM: (Sponsor - Illinois DOT, Indiana DOT, and Wisconsin DOT)
The GCM Corridor is one of four "Priority Corridors" throughout the country. The corridor includes the greater metropolitan areas of Gary, Chicago, and Milwaukee as well as portions of southeast Wisconsin, northeast Illinois, and northwestern Indiana. The corridor was defined to allow for a wide range of solutions for movements throughout the corridor. The intent of ITS is to improve mobility by better managing the existing transportation system, rather than simply building new roads. The goals of the priority corridors are to provide an operational testbed for ITS projects where they have the greatest opportunity improve regional traffic, provide "showcases" that will help maintain public awareness and support for ITS development, create strong institutional relationships that will lead to greater regional cooperation, and provide an opportunity for effectively testing new technologies.
Data Fusion: (Sponsor - National Academy of Sciences)
The purpose of the data fusion subproject is to assist the user in routing travel planning, and other travel related activities through the dissemination of various traffic related information to a traveler on-line. Once received, this data is combined in a meaningful way to answer travel related questions the user may have. We are currently investigating the capacity of neural networks in this realm.