Abstract
Fermentation biomasses can be defined as a complex mixture of different natural components and microbes, having biodegradable and organic waste as the primary source. Its correct characterization is crucial to have proper processing in fermentative units. Firstly, proximate analysis is done to retrieve the content of specific compounds in the mixture, such as fat, proteins, and carbohydrates. However, this is often not enough to achieve the sufficient precision, since some low-concentration species are not easily found through this methodology (i.e., sulfate compounds, ethanol, caproic acid). Consequently, ultimate analysis is performed to evaluate the exact amount of every element in the mixture. For biomass-based compounds, atoms content can be synthesized in carbon, hydrogen, oxygen, nitrogen, and sulfur. The total content of these elements is also known as CHONS. From this, it is possible to derive the exact amount of the relative species in the biomass. However, the experimental procedure for its determination is rather time and budget-consuming. On the other hand, the amount of data collected in the literature, from both experimental and industrial analysis, can be exploited to build a numerical model, based on the multivariate statistical analysis and machine learning principles that predict the CHONS content for every type of biomass. In this work, a data-driven model has been developed to achieve this aim, having as input a set of relevant variables. Consequently, a dataset has been built to gather all these data. The multivariate statistical technique of Canonical Correlation Analysis (CCA) is used to find 'hidden' correlations and predict CHON content for 27 different biomass types. In future research, machine learning techniques will be applied to compare the results obtained.