Polymetrics
Introduction
Polymetrics is a Python-based computational framework for polymers tailored for industrial research.
The idea behind Polymetrics is to create a system that organizes extensive experimental data available in the industrial setup to develop internally consistent datasets. The linkages within a dataset are established using machine learning (ML) techniques.
The potential applications of Polymerics are screening property spaces, conducting competitive analysis, and facilitating data exchange across projects and business verticals.
Presently, Polymetrics provides a framework for similarity searches, feature building, and ML methods. The unified approach makes the library scalable and applicable for a variety of problems.
A Polymetrics dataset consists of polymer objects such as resins, their blends, and articles produced from the blends.
Polymetrics uses pandas library which provides native support for importing data from spreadsheets and various database formats.
The diverse information associated with polymers is stored in multiple data types for accurate representation and is tagged with labels for improved traceability.
Similarities in polymers can be discerned by delving into properties beyond specifications. The definition of similarity can differ based on the project scope, which could be identifying products with similar characteristics for application development or avoiding over-representation of specific classes in the data.
In Polymetrics, similarity is determined through user-specified features and is reported in graphical and tabulated forms for analysis.
In polymetrics, features are measurable properties or attributes which represent structures of polymers. An accurate representation of polymers needs high-dimensional data due to diverse configurations and a broad spectrum of properties.
Polymetrics strives to tackle the problem by the combination of A) feature building: developing new features by extracting relevant characteristics from analytical data and B) dimensionality reduction: the framework gives full control to scientists to retain key features relevant to the problem and drop the ones showing spurious correlations.
The framework can be applied to various learning problems and is demonstrated on an example classification problem.
Upcoming additions
unsupervised methods (outliers, clustering)
feature building for blends,
support for 'nested blends' (recyclates, masterbatches).
About me
application and product development in polyolefins | electronic structure calculation and chemical kinetics | programming enthusiast and a fan of Archillect
Queries on the code can be sent to me @ polymetricsforpolymers@gmail.com
Kaustubh Gupte: linkedin.com/in/ksgupte