Data Integration Tips

The most important component based on which all business houses function is “data”. Data typically refers to the set of information (real time and/or past) that is available to the company. The company, based on the available data, performs requisite functions, be it data modeling, sampling or simply analyzing it.

However, data by nature is huge and generally unmanageable in the raw format. For effective analysis and examination, data needs to be pooled, both logically and in terms of volume. This process of pooling raw data, so as to make it compatible for analysis, is broadly called data integration (DI).
Data integration systems are formally defined as a triple where G is the global (or mediated) schema, S is the heterogeneous set of source schemas, and M is the mapping that maps queries between the source and the global schemas.
The data that companies get access to, generally comes from different sources, and need to be unified in order to provide end-users with a comprehensive picture. The main function of data integration is to achieve this purpose. Data integration appears with increasing frequency as the volume and the need to share existing data goes. It has been the focus of extensive theoretical work which involves numerous open problems that need to be solved.
Data integration has been an extremely important technique that has been successfully put into use for commercial, scientific and management purposes (where it is known as Enterprise Information Integration). When research results from different bioinformatics repositories need to be combined or two similar companies need to merge their databases, Data Integration is the process experts turn to.
The data integration procedure has to integrate data models and establish common terms of reference. For this, integration techniques need to go beyond simple consolidation of application databases.
Data Integration comprises of three separate layers – the data transport interface, the data exchange services, and the user/ application interface.
These components need to be successfully managed for effective integration to happen. Data integration, by nature, can also be of two types – internal integration (for pooling data available within the databases of a single company) and external integration (Integration with external constituencies in a highly-controlled environment – generally, among parties that have strong, well-established partnerships, when one or more of the parties impose standards for data exchange, or when the volume of data exchanged is very high and frequent).
There exist certain basic requirements that need to be fulfilled in order to make the data integration process a success. The major factors that influence internal data integration can be tabulated as under:
  1. Relative centralization or distribution of internal business processes
  2. Location of data stores within the organization,
  3. Current application deployment on centralized or distributed systems, and
  4. Changes in business conditions such as acquisition of a company or business unit.
The requirements for external data integration, however, are slightly different. Hence, the factors that influence it also differ from those that affect internal data integration. These factors may be denoted as follows:
  1. Nature of the business relationship with the external constituency,
  2. Types of shared business processes,
  3. Data exchange standards imposed by the external organization(s),
  4. Technology available at the external site.
Apart from these taking care of these factors that are crucial to the performance of internal and external data integration respectively, all integration procedures must comply with the organizational security policies.
Data integration techniques come with a certain, pre-estimated cost. The expenses for initialization and subsequent long-run deployment of integration also need to be considered by company heads.
Integration has, over the recent years, become an immensely useful technique for managing data in the corporate framework. Physical integration of data (involving tools and technology) has often taken center stage, and a good volume theoretical study has been done on this aspect of data integration. However, for effectively integrating available data, a comprehensive study of the strategies, designs, standards and policies of governance need to be undergone. The nature of the data that flows in the information system of a company also has to be understood thoroughly.
Indeed, for the purpose of clarification and understanding, the data integration procedure can be explained in terms of a simple framework. This framework comprises of:
  • Integration structure (standards, guidelines, processes, policies, DI “rules”, integration patterns),
  • Decisions of integration arrived at by means of resolution of the DI issues and data stewardship (quality standards, compliance, security),
  • The plan for DI Design (research of the source data, business design, design of target integration and rules of transformation), and
  • Governance and strategy that directs the current maintenance (data quality program and change management).
Hence, we find that, as businesses grow, the sheer volume of data that needs to be handled and analyzed grows enormously. Data integration provides a solution to this requirement, by explaining the procedures for effectively pooling the data. Integration of data needs to be comprehensive too, accommodating for both internal and external needs.
Posted in Basics | Leave a comment

Data maintenance and administration

Data is very important for any business – big or small. That is because it is by storing, managing, retrieving and analyzing this data that the business is able to arrive at various crucial decisions such as product design and placement, product pricing, customer behaviors and preferences, market opportunities, competitor surveys and such other. Thus the importance and need for data maintenance and administration.

Data is crucial and thus the business has to decide who in the organization has access to such data – not all people may have access to all data, except of course the owner of the company. In really big companies there are groups of people who might have access to the data while such information is passed on to the subordinates by the group head. This is why the data needs to be secured and administered properly so that it can be retrieved quickly by the person who needs it and kept away from those who should not have access to it. Yes, the issue of data security goes hand in hand with data maintenance and administration.

In computing science, data maintenance and administration refers to the management and running of the organization’s data that is generally stored in the database under a chosen database management system or alternative systems like electronic spreadsheets. In smaller outfits, if one is running a single-user system, this is done by the person who owns the organization while in bigger organizations; administrators (who are often specialists and are responsible for various functions of the business including data analysis) are deputed to do the job.

Alternatively, data administration may be split between the end-users who are made responsible for maintaining their client accounts with a person designated as data administrator who is answerable to security specific information as this is shared by all the users. The ideal way for determining how to split responsibilities between the end-users however lies in finding out the data maintenance task that need to be performed through downloading certain tools. These tools become extremely handy for those who are responsible for doing the job.

Security price quote and financial information quotes are performed using the net’s scheduled automatic download feature where the data administrator becomes the person responsible for configuring the download parameters. This includes the time of the download, securities to be downloaded, etc. The data administrator is also responsible for reviewing the download logs for key data changes.

In order to gain account portfolio information like account record updates and transaction and position reconciliation data will be available (with the help of the tool) in the form of data files received from the custodian or the clearing firm. These files, however, will be introduced via the tool/net’s institutional import interface. Nevertheless, whether these introductions are best performed by the data administrator or by each end-user depends on several factors. When all or most of the advisor/ rep accounts are kept within a single file, then the introduction is best performed by the data administrator. New accounts usually do not require manual intervention by the data administrator for assigning the proper Rep ID.

When the custodian or clearing firm provides break up data files for each adviser or rep, then it becomes comparatively easier for each of them introducing their own data files. Nevertheless, an organization may still insist on assigning the task to a data administrator jus to relieve the advisor or the rep. No matter who performs this task, or whether it has been configured to crop up on its own, someone should review the data for accuracy. Under normal circumstances, this should be the duty of each adviser or rep as each of them are best placed to know what trades or transactions were placed for their clients. Often a Data Reconciliation tool comes handy in this reviewing process.

For users who do not have access to custodian or clearing firm data for some or all of their accounts and must manually enter the data from usually available sources, the decision of whether this job comes under the administrator’s task or to be performed by each adviser or rep is best left to the user.

So far as analyzing and administration of data is concerned, there are quite a good many net based solutions that not only evaluate but also indicates the various possibilities for data administration. Some of their data administration services include…

  • Generation of addressing
  • Separation and correction of fields
  • First name review
  • Existence check
  • Dead list/ negative comparison
  • Credit assessment
  • Risk index relationship
  • Splitting of consumer and business address
  • Relocation address comparison
  • Tracking of returns
Posted in Basics | Leave a comment

Data mining: is it new business?

Data mining – is it new business? The answer is ‘far from it’. Actually, data mining is no business at all – it is just a business process that helps businesses enhance their performance.

But you can initiate a new business if you can successfully mine the right kind of data from your well organized data warehouse; though the job resembles searching for the needle from the backyard haystack. However, it is the latest business buzzword that is drawing business persons ranging from greenhorns to great gamblers like never before. However, the needle at this instance is that small but essential intelligence or aptitude, whatever you may call, that is necessary to develop your business while the haystack represents your database or data warehouse that you have created over much time.

Using automated methods of statistical analysis which may be termed as a form of data mining, business people are now in the process of finding out new trends or drifts in business related behaviors that was earlier ignored. As soon as this is achieved, it can be utilized in a prognostic manner to newer business ventures and also sometimes for the present business itself to find tune it and enhance its productivity.

But the first step towards reaching the goal, as you may appreciate, requires the right kind of data or information that is relevant to the business to be gathered. Although it may sound difficult at first, it is not that hard to get into it. After all, it is your business and you’re the best judge to see what data is pertinent to your business. If you are selling shoes, for instance, you would know who makes them best, which community prefers them more or which is the best selling season. And once these information or data is gathered, use your spare time to hone them or edit them till you reach the gist. If you are presently tracking the date of the customer in a contemporary DBMS, you have probably finished with it. That means you have already created your own data warehouse.

Now is the time for experimentation. Select one or many algorithms to match your problem with the data in hand. Algorithm, as you may understand, involves repeated step-by-step application of a process till one reaches the solution. Since you are experimenting with several methods, chances are, one of them will click. Or you may go for two of the most common forms of algorithms, namely,Regression and Classification. The first one is the commonest statistical technique adopted by those into data mining the world over. It involves selecting a numerical dataset & developing it into a mathematics formula that agrees with the data. As you feel the results are ready to be used for forecasting the behavior, take your current data, plug this to a developed formula & you have reached your goal! However, one of the major drawbacks related to this method is that the system works great with continuous data in quantities only. When you are working data that is categorical and the order is not crucial, you would do well with other techniques.

When you are using categorical data or may be a combination of continuous categorical and numeric data, Classification will suit you fine. It is quite competent in processing a much wide kind of data as compared to Regression and so is becoming more popular with the new breed of business community. Instead of pursuing a complex mathematical formula to reach a decision, it provides you with decision tree, requiring many binary decisions.

Data mining, as said earlier, is an oven fresh topic now. Apart from Classification and Regression, many other algorithms for data mining have hit the market like bomb shells now.

Products of data mining products are the big thing today. Most database distributors have taken adequate steps to see to it the platforms can use the techniques for data mining.

Oracle’s Data Mining Suite (Darwin) already includes Neural Networks, Classification and Regression trees, Regression Analysis, k-nearest neighbors and detailed Clustering Algorithms. The SQL Server of Microsoft allows data mining by using clustering algorithms and classification trees.Data mining algorithms are on offer by many statistical packages such as S-Plus, SAS and SPSS.

So, you can see that data mining is no business but much more than that. In today’s competitive world, data mining has become something very important as it enables a business to perform better and give it an edge over competition.

Posted in Basics | Leave a comment