Process Optimization using Big Data in practice (NEW !)
|You have a certain amount of process data available, perhaps even "big data", or at least you may easily get it. However you are not sure how to deploy the information contained in this data.|
This course provides a hands-on instruction how to get from the raw process data to relevant information, which you then can use for a root-cause analysis or a process optimisation. The employed machine learning (ML) methods, emphasis on CART (classification and regression trees) and Random Forests, are explained in an easily understandable way. Prior knowledge of statistics is not required.
What are you going to learn ?
|This course provides a hands-on instruction how deal with large to big data sets. A strong emphasis is put on the applicability of the proposed methods in your own context.|
First, we cover methods which can be used to get an overview of the process data (visualisation, correlation analyis, outlier treatment). This is the basis for
choosing suitable parameters for your analysis and then preparing the data.
Subsequently, we introduce big data methods (machine learning) for a root cause analysis and process optimization. Here the main focus is on the intuitively understandable classification and regression trees (CART) and random forests.
All methods are illustrated by demonstrations using modern analysis tools and are practiced using applied exercises on the PC. After the training you
will be able to perform similar analyses of your own data indepenently. In the course, we will use R scripts in addition to commercial tools. R is a free and very powerful open-source statistics programming language, so that you may perform your analyses also indepenently from commercial tools.
Please note that we assume in this training that your data is already available in a "flat" file, e.g. in Excel. Due to the limited time, we will not touch techniques for auotmated data extraction, for instance from data bases.
Who should attend ?
- Anybody wanting to draw more information from their available data (starting from approx. 80 batch records to real "big data")
- Elementary knowledge of statistics is not required, but recommended (e. g. as provided in the training "Visualisation of Lab Data")
Which topics are covered ?
|Graphical methods for gaining an overview|
| ||Box plots|
Scatter plot / Correlogramm
| ||Outlier treatment|
Dealing with missing values
|Machine learning (ML) methods|
| ||Linear models|
Outlook on neural networks (NN)
Classification and regressions trees (CART)
| ||SPM (Salford Prediction Miner / minitab)|
Any questions ?
- Course duration: 2 days
- Participants: max. 12 (one PC per participant)
- Costs: including complete course documentation, coffee and lunch: see registration form
- Dates: see registration form
- Further information: see contact form