The Machine Learning (Python-based) functions are FOCUS functions implemented as Python scripts. These Python scripts take advantage of Python packages such as scipy, numpy, scikit-learn, and pandas, which extends the Python capabilities to machine learning. The Machine Learning (Python-based) functions perform regression, classification, extreme gradient boosting, and outlier detection using a variety of machine learning methods. The Python scripts perform a sequence of conventional machine learning tasks including scaling of the data where appropriate. They are built around a grid search with cross-validation. That is, some hyper-parameters (parameters that influence the learning process, but that are not model parameters) are identified, and models are built using a number of values for each hyper-parameter, in order to determine the optimal values. To determine optimality, cross-validation is used, which ensures that the performance is measured on a validation-subset of the data that is distinct from the training-subset. Rows with missing values in the target-column are not used for training and validation, but a predicted value is computed by the trained model. You can generate the .csv file and accompanying Master File used in the examples by running the WebFOCUS - Machine Learning Demo tutorial in the server Web Console. |