|
1. Introduction
Information Systems researchers and technologists have built
and investigated Decision Support Systems (DSS) for more than 35 years.
Here list the developments in DSS beginning with building model-oriented DSS in the late 1960s,
theory developments in the 1970s, and the implementation of financial planning systems and
Group DSS in the early and mid 80s. During the mid-1980s we have proposed and implemented
Intelligent DSS
through combining knowledge system with DSS. Then it documents the origins of Executive Information
Systems, OLAP and Business Intelligence. The implementation
of Web-based DSS in the mid-1990s became active topics and made influence widely.
Top of the page
2. Data-Driven DSS
Data-driven DSS is a type of DSS that emphasizes access to and manipulation of a time-series
of internal company data and sometimes external data. Simple file systems accessed by query
and retrieval tools provide the most elementary level of functionality. Data warehouse systems
that allow the manipulation of data by computerized tools tailored to a specific task and setting
or by more general tools and operators provide additional functionality. Data-driven DSS with
On-line Analytical Processing (OLAP) provides the highest level of functionality and decision
support that is linked to analysis of large collections of historical data. Executive Information
Systems (EIS) and Geographic Information Systems (GIS) are special purpose Data-Driven DSS.
Metadata is data about data
which describes the content, quality, condition, and other
characteristics of data.. It plays an important role not only in the
design, implementation and maintenance of the data warehouse, but also
in data organizing, information querying and result understanding [3].
It usually records the location and description of warehouse system
components. Here, we expand the scope of the metadata, use it to
describe and manage the data and environment of the whole system that
includes not only the data in data warehouse platform but also the
task model and algorithms (or functions) in ETL and data mining.
Metadata is in a core position of the whole system since it integrates
ETL, data warehouse, and data mining tools. It controls the whole flow
from ETL, data warehouse to data mining, so we can define and execute
ETL and data mining tasks more conveniently and effectively.
In MSMiner, the contents of the metadata are as follows:
(1) Description of the external data source. The external data source
can be relational database or other kind of data, such as Excel data,
plain text, XML text, etc. In metadata, it contains allocated position
and environment information of the external data source, data
structure and description of the contents.
(2) Descriptions of the subject, including the name and remark of the
subject, when the subject is created and updated etc.
(3) Description of databases under a subject, including the name, type
and remark of database, the login information and other information.
(4) Description of tables in a database, including fact tables,
dimensional tables and temporary tables. It contains tables’
information and fields’ information.
(5) Description of the ETL task, containing organization and steps of
the task, data source, selection of the transformation functions,
assumption of the parameters, creation and execution history of the
task, and so on.
(6) Description of the data mining task, containing organization and
steps of the task, data source, selection of the mining algorithms,
assumption of the parameters, evaluation and output of the results,
creation and execution history of the task, and so on.
(7) Description of the data cube, containing dimension and measure of
extracted information, building information of the star-structure, and
so on.
(8) Management of the algorithm base for data mining, containing the
registration and management of the mining algorithms,
(9) Management of the functions for ETL, containing the registration
and management of the functions.
(10) User's information, containing user's basic information,
authority, operational history, and so on.
We build the correspondent metadata classes with object-oriented
method. We take the three-tier architecture as the system architecture
and put the metadata management subsystem at the middle tier position.
It can be regarded as a metadata management server. The upper tier
accesses and manages metadata by the middle tier.
Metadata is automatically generated while every component of the
system is created. Metadata will be changed during the daily
maintenance of the system. MSMiner provides special metadata manager
subsystem that can maintain the metadata directly and the whole system
is managed validly.
ETL
subsystem is an important subsystem of MSMiner. The main motivation of
ETL function module is to transform the operational date from source
database to analytical data in data warehouse. As we all know, the
data in data warehouse is integrated and extracted from disperse
database (for example Oracle, SQL Server, Access, Foxpro, Excel, DB2
etc), and there are many differences between the operational data in
source database and the analytical data in data warehouse, so it
isn’t a good way to load the data from various data sources into
data warehouse directly. Namely, to get the clean data for data
warehouse, the data from previous database must be cleaned, collected
and transformed before being integrated into data warehouse. It is a
key and complex step during building data warehouse. Generally
speaking, ETL subsystem needs to finish the following works:
(1) Because of data repetition and conflict in the source data from
disperse database, the subsystem should unify the conflict data.
(2) To get the comprehensive data in data warehouse, the subsystem should transform the original data structure from application-oriented one to subject-oriented one and do some generating and computing.

The basic architecture of ETL subsystem is shown in Figure 2. From
this figure, it is clear that there are 4 modules in ETL subsystem:
(1) The Friendly user interface
The users can do any ETL operations expediently by this interface,
such as designing the ETL tasks, registering new ETL DLL functions,
scheduling executing ETL tasks and visiting the result of ETL tasks.
(2) The integrated ETL Function management and ETL task Management.
This module including registering new ETL DLL functions, building new
ETL tasks, scheduling and processing of ETL tasks etc.
(3) The uniform metadata management
The whole subsystem is developed in metadata-oriented way. Namely all
information of this subsystem, including data source, algorithm and
result, are managed by metadata.
(4) The database server
ETL subsystem supports disperse and various database (for example
Oracle, SQL Server, Access, Foxpro, Excel, DB2 etc).
The subsystem supports the expandable ETL function base. The main
algorithms for ETL function are realized in the form of dynamic link
lib (DLL) with uniform interfaces. Users can design the ETL task
according to their need by choosing the relevant ETL DLLs. At present
the subsystem provides about 30 kinds of ETL DLLs. In addition, users
can develop some new ETL DLLs in accordance with uniform interfaces,
and add them into ETL function base. In order to improve the
efficiency, the ETL tasks can be scheduled at designated time and
processed concurrently.
Data warehouse is “a subject-oriented, integrated, time-variant,
nonvolatile collection of data in support of management decisions” .
The function of the data warehouse is to provide a general data
warehouse environment, by which users can create and maintain their
data warehouse in accordance with different needs to finish data
analysis and processing and provide preparation for data mining task.
Data warehouse in MSMiner consists of lots of subjects. When data
warehouse is created, users establish several subject fields according
to the application needs, the system help users extract the data for
each subject and model them by star-schema. Based on the above
operation, data warehouse realizes the multi-dimension data cube and
OLAP, provides validate data source for data mining and
decision-making. The final results may be shown by visualization
tools.
Data warehouse in MSMiner is modeled by star-schema. The system
extracts the data from source tables or views and builds multiple fact
tables through the data extraction, transformation and loading by the
subject's request. A star-model 's structure is made of one fact table
and several dimension tables related to the fact table, where the fact
table includes multiple dimensions and measures. The dimension stands
for the special visual angle for viewing data, such as time dimension,
distribution dimension, product dimension and so on. The measure is
data’s real meaning and describes what is the data. Each dimension
table describes a certain dimension and its values, and each dimension
consists of several levels. For example, a time dimension may be
divided into three levels: year, season, and month, as each describes
different query layer. One or several star-schema structures form a
subject, which is the basic unit of data warehouse.
The OLAP is realized by two ways: creating special multi-dimension
database system (MOLAP) and simulating the multi-dimension data by
using the relational database (ROLAP). MSMiner supports ROLAP, which
is based on the star schema. The star structure related with multiple
dimension tables simulates the multi-dimension data cube, where the
dimensions and measures in the data cube come from dimensions and
measures in the star schema. When OLAP operations are executed in the
data cube, multi-dimension analysis translates the request into SQL
statements, queries in fact tables, then shows the results in the form
of multi-dimension.
At present the system supports the standard OLAP operations, such as
slice, dice, roll up, drill down and pivot. And the results may be
displayed in many forms such as cross-tabulation tables, bar charts,
pie charts or other forms of graphical output.
The results of OLAP operations and data in fact tables may be the data
source for data mining subsystem. They may be helpful to some
preparation work for data mining.
Top of the page
3. Model-Driven DSS
Model-Driven DSS emphasize access to and manipulation of a model, for example, statistical,
financial, optimization and/or simulation models. Simple statistical and analytical tools
provide the most elementary level of functionality. Some OLAP systems that allow complex analysis
of data may be classified as hybrid DSS systems providing both modeling and data retrieval and
data summarization functionality. In general, model-driven DSS use complex financial, simulation,
optimization or multi-criteria models to provide decision support. Model-driven DSS use data and
parameters provided by decision makers to aid decision makers in analyzing a situation, but they
are not usually data intensive, that is very large data bases are usually not need for model-driven DSS.
Early versions of Model-Driven DSS were called Computationally Oriented DSS by Bonczek, Holsapple
and Whinston (1981). Such systems have also been called model-oriented or model-based decision
support systems.
Top of the page
4. Knowledge-Driven DSS
Knowledge-Driven DSS can suggest or recommend actions to managers. These DSS are person-computer
systems with specialized problem-solving expertise. The "expertise" consists of knowledge about
a particular domain, understanding of problems within that domain, and "skill" at solving some
of these problems. A related concept is Data Mining. It refers to a class of analytical applications
that search for hidden patterns in a database. Data mining is the process of sifting through large
amounts of data to produce data content relationships. Tools used for building Knowledge-Driven DSS
are sometimes called Intelligent Decision Support methods (cf., Zhongzhi Shi, 1988, Dhar and Stein, 1997).
Top of the page
5. Web-Based DSS
Web-Based DSS deliver decision support information or decision support tools to a manager
or business analyst using a "thin-client" Web browser like Netscape Navigator or Internet
Explorer that is accessing the Global Internet or a corporate intranet. The computer server
that is hosting the DSS application is linked to the user's computer by a network with the
TCP/IP protocol. Web-Based DSS can be communications-driven, data-driven, document-driven,
knowledge-driven, model-driven or a hybrid. Web technologies can be used to implement any
category or type of DSS. Web-based means the entire application is implemented using Web
technologies; Web-enabled means key parts of an application like a database remain on a
legacy system, but the application can be accessed from a Web-based component and displayed
in a browser.
Top of the page
6. Simulation-Based DSS
Simulation-Based DSS deliver decision support information or decision support tools to
help managers analyze semi-structured problems through simulation. These diverse systems were all called
Decision Support Systems. DSS could support operations, financial management and strategic decision-making.
A variety of models were used in DSS including optimization and simulation. Also,
Top of the page
7. GIS-Based DSS
GIS-Based DSS deliver decision support information or decision support tools to a manager
or business analyst using GIS. General-purpose GIS tools are programs such as ARC/INFO, MAPInfo
and ArcView that have extensive functionality and can be difficult for users unfamiliar with
GIS and cartographic principles to learn. Specific-purpose GIS tools are programs that are
written by a GIS programmer to provide a user group with specific functions in an easy-to-use package.
In the past, specific-purpose GIS tools were written primarily using a macro language.
This method of delivering specific-purpose GIS tools requires that each user have a copy of
the host program (ARC/INFO or ArcView) to run the macro language application.
The GIS programmers now have a far richer set of tools for application development.
Programming libraries with classes for interactive mapping and spatial analysis
functions have made it possible to develop specific-purpose GIS tools using industry-standard
programming languages that can be compiled and run without a host program (stand-alone).
Internet development tools have matured as well, making it possible to develop fairly complex
GIS-based programs that users can use through the World Wide Web.
Top of the page
8. Communication-Driven DSS
Communications-Driven DSS is a type of DSS that emphasizes communications, collaboration
and shared decision-making support. A simple bulletin board or threaded email is the most
elementary level of functionality. The comp.groupware FAQ defines groupware as "software
and hardware for shared interactive environments" intended to support and augment group
activity. Groupware is a subset of a broader concept called Collaborative Computing.
Communications-Driven DSS enable two or more people to communicate with each other,
share information and co-ordinate their activities. Group Decision Support Systems
or GDSS is a hybrid type of DSS that allows multiple users to work collaboratively in
groupwork using various software tools. Examples of group support tools are: audio
conferencing, bulletin boards and web-conferencing, document sharing, electronic mail,
computer supported face-to-face meeting software, and interactive video.
Top of the page
9. Application
Example
We
have applied IDSS to many application areas such as tax deviation,
analysis of fishery information, analysis of VIP (very important
person) for telecom corporation, and so on. For example, the fishing
ground prediction system is a good application example of CBR. This
system has been applied to the East China Sea fishing center
prediction. In 2002, it is awarded the second
grade of National Science and Technology Progress Award.
Top of the page
|