Tabular data ------------ We tested FEDOT on the results of `AMLB `_ benchmark. We used the setup of the framework obtained from 'frameworks.yaml' on the date of starts of experiments. So, the following stable versions were used: AutoGluon 0.7.0, TPOT 0.11.7, LightAutoML 0.3.7.3, v3.40.0.2, FEDOT 0.7.2. Some runs for AutoGluon are failed due to the errors (described also in Appendix D of AMLB paper [1]). The visualization obtained using built-in visualizations of critical difference plot (CD) from AutoMLBenchmark [1]. In a CD (Critical Difference) diagram, we display each framework's average rank and highlight which ranks are statistically significantly different from one another. To determine the average rank per task, we first replace any missing values with a constant predictor, calculate ranks for represented AutoML solutions and constant predictor for each dataset and than took an average value of ranks across all datasets for each represented solution. We assess statistical significance of the rank differences using a non-parametric Friedman test with a threshold of p < 0.05 (resulting in p ≈ 0 for all diagrams) and apply a Nemenyi post-hoc test to identify which framework pairs differ significantly. Time budget for all experiments is 1 hour, 10 folds are used (1h8c setup for ALMB). The results are obtained using sever based on Xeon Cascadelake (2900MHz) with 12 cores and 16GB memory. CD for all datasets (ROC AUC and negative log loss): .. image:: ./img_benchmarks/cd-all-1h8c-constantpredictor.png The CD diagram for all datasets (ROC AUC and negative log loss) shows that all AutoML frameworks (LightAutoML, H2OAutoML, TPOT, AutoGluon, FEDOT) perform statistically better than constant predictor: CD for binary classification (ROC AUC): .. image:: ./img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png The CD diagram for binary classification (ROC AUC) shows that all AutoML frameworks (LightAutoML, H2OAutoML, TPOT, AutoGluon, FEDOT) perform similarly, falling within the same CD interval, and significantly outperform the constant predictor: CD for multiclass classification (negative logloss): .. image:: ./img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png The CD diagram for multiclass classification (negative log loss) shows that TPOT and Fedot demonstrate intermediate performance being on the border of the CD interval with constant predictor and the CD interval with H2OAutoML: We can conclude that FEDOT achieves performance comparable with competitors for tabular tasks. The ranks for frameworks are provided below: .. csv-table:: Ranks on Binary Tasks :header-rows: 1 :widths: 20,6,6,6,6,6,6 Task, FEDOT, AutoGluon(B), H2OAutoML, LightAutoML, TPOT, ConstantPredictor adult, 4, 5, 2, 1, 3, 6 airlines, , 2, 1, , 3, 4 amazon_employee_access, 4, 5, 2, 1, 3, 6 apsfailure, 3, 4, 1, 2, 5, 6 australian, 3, 2, 4, 1, 5, 6 bank-marketing, 3, 5, 2, 1, 4, 6 blood-transfusion, 1, 5, 2, 3, 4, 6 christine, 3, 5, 2, 1, 4, 6 credit-g, 5, 3, 1, 2, 4, 6 guillermo, 3, 2, , 1, 4, 5 jasmine, 3, 4, 2, 5, 1, 6 kc1, 2, 4, , 3, 1, 5 kddcup09_appetency, 4, 5, 2, 1, 3, 6 kr-vs-kp, 3, 5, 2, 4, 1, 6 miniboone, 4, 3, , 1, 2, 5 nomao, 5, 4, 2, 1, 3, 6 numerai28_6, 3, 4, 2, 1, , 5 phoneme, 5, 3, 2, 4, 1, 6 riccardo, , 1, , 2, , 3 sylvine, 4, 5, 2, 3, 1, 6 .. csv-table:: Ranks on MultiClass Tasks :header-rows: 1 :widths: 20,6,6,6,6,6,6 Task, FEDOT, AutoGluon(B), H2OAutoML, LightAutoML, TPOT, ConstantPredictor car, 3, 4, 2, 1, 5, 6 cnae-9, 3, 5, 4, 2, 1, 6 connect-4, 4, 5, 2, 1, 3, 6 covertype, 2, 1, 3, , , 4 dilbert, 4, 3, 2, 1, 5, 6 fabert, 5, 2, 3, 1, 4, 6 fashion-mnist, 3, 2, 4, 1, 5, 6 helena, , 2, 3, 1, 4, 5 jannis, 5, 4, 2, 1, 3, 6 jungle_chess_2pcs_raw_end, 2, 5, 4, 1, 3, 6 mfeat-factors, 3, 5, 2, 1, 4, 6 robert, , 2, , 1, , 3 segment, 3, 5, 1, 2, 4, 6 shuttle, 4, 3, 1, 2, , 5 vehicle, 2, 5, 1, 4, 3, 6 volkert, 4, 2, 3, 1, , 5 The raw metrics (ROC AUC for binary and logloss for multiclass) for frameworks are provided below: .. csv-table:: Binary (ROC AUC) :header-rows: 1 :widths: 20,6,6,6,6,6,6 Task, FEDOT, AutoGluon(B), H2OAutoML, LightAutoML, TPOT, ConstantPredictor adult, 0.929, 0.910, 0.931, **0.932**, 0.927, 0.500 airlines, , 0.725, **0.730**, , 0.694, 0.500 amazon_employee_access, 0.863, 0.857, 0.873, **0.879**, 0.866, 0.500 apsfailure, 0.992, 0.991, **0.993**, 0.992, 0.990, 0.500 australian, 0.939, 0.940, 0.938, **0.945**, 0.936, 0.500 bank-marketing, 0.936, 0.931, 0.939, **0.940**, 0.935, 0.500 blood-transfusion, **0.759**, 0.690, 0.754, 0.750, 0.740, 0.500 christine, 0.817, 0.804, 0.815, **0.830**, 0.807, 0.500 credit-g, 0.778, 0.795, **0.798**, 0.796, 0.794, 0.500 guillermo, 0.891, 0.900, , **0.926**, 0.783, 0.500 jasmine, 0.888, 0.883, 0.888, 0.880, **0.890**, 0.500 kc1, 0.843, 0.822, , 0.831, **0.845**, 0.500 kddcup09_appetency, 0.819, 0.804, 0.829, **0.850**, 0.826, 0.500 kr-vs-kp, **1.000**, 0.999, **1.000**, **1.000**, **1.000**, 0.500 miniboone, 0.981, 0.982, , **0.988**, 0.983, 0.500 nomao, 0.994, 0.995, 0.996, **0.997**, 0.995, 0.500 numerai28_6, **0.531**, 0.517, **0.531**, **0.531**, , 0.500 phoneme, 0.965, 0.965, 0.968, 0.965, **0.971**, 0.500 riccardo, , **1.000**, , **1.000**, , 0.500 sylvine, 0.988, 0.985, 0.989, 0.988, **0.993**, 0.500 .. csv-table:: MultiClass (LogLoss) :header-rows: 1 :widths: 20,6,6,6,6,6,6 Task, FEDOT, AutoGluon(B), H2OAutoML, LightAutoML, TPOT, ConstantPredictor car, 0.011, 0.117, 0.003, **0.002**, 0.643, 0.840 cnae-9, 0.211, 0.332, 0.262, 0.156, **0.154**, 2.200 connect-4, 0.404, 0.502, **0.338**, **0.337**, 0.373, 0.840 covertype, 0.164, **0.071**, 0.264, , , 1.210 dilbert, 0.040, 0.148, 0.103, **0.033**, 0.168, 1.610 fabert, 0.859, 0.788, 0.792, **0.766**, 0.892, 1.870 fashion-mnist, 0.388, 0.333, 0.383, **0.252**, 0.535, 2.300 helena, , 2.785, 2.980, **2.537**, 2.982, 4.140 jannis, 0.753, 0.728, 0.691, **0.664**, 0.703, 1.110 jungle_chess_2pcs_raw_end, 0.349, 0.431, 0.240, **0.149**, 0.219, 0.940 mfeat-factors, 0.089, 0.161, 0.093, **0.082**, 0.107, 2.300 robert, , 1.684, , **1.318**, , 2.300 segment, 0.062, 0.094, **0.060**, 0.061, 0.077, 1.950 shuttle, 0.001, 0.001, **0.000**, 0.001, , 0.670 vehicle, 0.354, 0.515, **0.331**, 0.404, 0.392, 1.390 volkert, 1.040, 0.920, 0.978, **0.812**, , 2.050 The comparison with [1] shows that AutoGluon is underperforming in our hardware setup, while TPOT and H2O are quite close in both setups. To avoid any confusion, we provide below an additional comparison of the FEDOT metrics with the metrics from [1]. However, it should be noted that the conditions are different, as are the exact versions of the frameworks. .. csv-table:: AutoML Benchmark Results (AUC) :header-rows: 1 :widths: 20,6,6,6,6,6,6,6,6 Task, FEDOT, H2O, TPOT, AutoGluon(B), LightAutoML, GAMA(B), MLJAR(P), FLAML adult, 0.929, 0.931, 0.927, 0.932, 0.932, 0.929, 0.931, 0.932 airlines, 0.716, 0.731, 0.722, 0.732, 0.727, 0.717, 0.730, 0.731 albert, 0.749, 0.761, 0.718, 0.782, 0.780, 0.726, 0.765, 0.770 amazon_employee_access, 0.863, 0.877, 0.864, 0.902, 0.879, 0.867, 0.903, 0.876 apsfailure, 0.992, 0.993, 0.989, 0.993, 0.993, 0.990, 0.992, 0.992 australian, 0.939, 0.935, 0.939, 0.941, 0.946, 0.941, 0.944, 0.938 bank-marketing, 0.936, 0.938, 0.935, 0.941, 0.940, 0.936, 0.940, 0.937 blood-transfusion, 0.759, 0.764, 0.724, 0.758, 0.753, 0.753, 0.753, 0.730 christine, 0.817, 0.825, 0.811, 0.826, 0.831, 0.828, 0.823, 0.824 credit-g, 0.778, 0.779, 0.791, 0.796, 0.796, 0.794, 0.785, 0.788 guillermo, 0.891, 0.897, 0.826, 0.914, 0.932, 0.865, 0.912, 0.919 jasmine, 0.888, 0.887, 0.886, 0.886, 0.880, 0.891, 0.886, 0.887 kc1, 0.843, 0.829, 0.844, 0.840, 0.831, 0.852, 0.824, 0.841 kddcup09_appetency, 0.753, 0.837, 0.831, 0.849, 0.851, 0.818, 0.837, 0.825 kr-vs-kp, 1.000, 1.000, 0.999, 1.000, 1.000, 1.000, 1.000, 0.961 minibooNE, 0.981, 0.987, 0.982, 0.989, 0.988, 0.982, 0.987, 0.987 nomao, 0.994, 0.996, 0.995, 0.997, 0.997, 0.995, 0.997, 0.997 numerai28_6, 0.531, 0.531, 0.528, 0.531, 0.531, 0.530, 0.531, 0.528 phoneme, 0.965, 0.968, 0.969, 0.969, 0.966, 0.971, 0.967, 0.972 riccardo, 0.998, 1.000, 0.998, 1.000, 1.000, 0.999, 1.000, 1.000 sylvine, 0.988, 0.990, 0.992, 0.990, 0.988, 0.993, 0.992, 0.991 .. csv-table:: AutoML Benchmark Results (Logloss) :header-rows: 1 :widths: 20,6,6,6,6,6,6,6,6 Task, FEDOT, H2O, TPOT, AutoGluon(B), LightAutoML, GAMA(B), MLJAR(P), FLAML car, 0.011, 0.001, 0.788, 0.002, 0.001, 0.022, 0.010, 0.002 cnae-9, 0.211, 0.200, 0.146, 0.126, 0.152, 0.126, 0.323, 0.164 connect-4, 0.404, 0.311, 0.392, 0.295, 0.335, 0.417, 0.342, 0.340 covertype, 0.164, 0.253, 0.696, 0.057, 0.082, 0.526, 0.105, 0.068 dilbert, 0.040, 0.065, 0.150, 0.014, 0.033, 0.176, 0.030, 0.024 fabert, 0.859, 0.746, 0.886, 0.683, 0.768, 0.763, 0.771, 0.766 fashion-mnist, 0.388, 0.283, 0.431, 0.221, 0.248, 0.439, 0.259, 0.253 helena, 2.963, 2.791, 2.951, 2.467, 2.555, 2.802, 2.653, 2.617 jannis, 0.753, 0.669, 0.734, 0.650, 0.666, 0.732, 0.672, 0.674 jungle_chess_2pcs_raw, 0.349, 0.136, 1.766, 0.012, 0.145, 0.243, 0.198, 0.210 mfeat-factors, 0.089, 0.096, 0.135, 0.071, 0.080, 0.077, 0.096, 0.092 robert, 1.745, 1.423, 1.956, 1.304, 1.283, 1.710, 1.417, 1.382 segment, 0.062, 0.061, 0.075, 0.052, 0.061, 0.067, 0.059, 0.067 shuttle, 0.001, 0.000, 0.001, 0.000, 0.001, 0.001, 0.000, 0.000 vehicle, 0.354, 0.351, 0.417, 0.312, 0.389, 0.378, 0.349, 0.439 volkert, 1.040, 0.844, 1.013, 0.672, 0.815, 1.102, 0.808, 0.795 [1] Gijsbers P. et al. AMLB: an AutoML benchmark //Journal of Machine Learning Research. – 2024. – Т. 25. – №. 101. – С. 1-65.