Key insights from past projects

Fintech

Matogen Applied Insights is partnering with ConfirmU, a global, cutting-edge player in the financial inclusion space.

Read more

Matogen Applied Insights employed machine learning to build a predictive model to identify individuals who were likely to churn, as well as anticipate customer spending.

Read more

Matogen Applied Insights analysed the results of a telecoms client’s gamification marketing exercise.

Read more

Matogen Applied Insights assisted a large corporate client to improve its debt collection strategy using advanced analytics.

Read more

Our co-founder and CEO Jacobus Eksteen was one of 20 esteemed speakers at BankTech 2021, sharing their insights on technology in banking, at this year’s hybrid event – presenters and attendees could participate virtually or in person at the Indaba Hotel, Johannesburg.

Read more

Engineering credit data has become imperative considering the increasingly risky and competitive lending environment.

Read more

The COVID-19 pandemic constitutes a Black Swan event, of which the economic consequences have only started to unfold.

Read more

Right party contact is an integral component of strategic business decision-making, necessitated by increasingly difficult business trading conditions and efforts.

Read more

Matogen Applied Insights used alternative data for credit scoring to create new scorecards for a financial services client.

Read more

Crypto currency trading may be all the rage as of the time of writing, but Matogen Applied Insights had already delivered a bespoke automated crypto currency trading system a while back for an international client, a veritable “early adopter” in this dynamic sector.

Read more

The telecommunications industry is a dynamic, competitive and very strategic environment, but is prone to a particular phenomenon.

Read more

Agritech

Matogen Applied Insights has successfully completed a wide range of innovative data science projects in the agriculture sector.

Read more

This article showcases Matogen Applied Insights analytics capabilities within the agricultural sector, examining how employing deep learning for plant disease image recognition can be used to identify crop disease.

Read more

Creating an irrigation model was one of Matogen Applied Insights’ earliest projects within the agricultural sector.

Read more

Modelling agricultural data for harvest date prediction is one of many projects showcasing Matogen Applied Insights’ analytics capabilities within the agricultural sector.

Read more

In the spirit of Agri 3.0, Matogen Applied Insights created a sophisticated weather and plant disease risk alert system, using data and technology to mitigate the potentially dire repercussions of crop failure.

Read more

In one of Matogen Applied Insights’ flagship agricultural analytics projects, the data science team examined how modelling leaf analysis data can enable crop nutrient deficiency prediction.

Read more

Healthcare

What is Big Data in healthcare? Big Data is commonly and academically defined to have three significant characteristics: volume, variety and velocity.

Read more

Matogen Applied Insights assisted with automating parameters for an extensive COVID modelling endeavour.

Read more

Matogen Applied Insights assisted a US client in the health sector to perform advanced analytics on data pertaining.

Read more

Matogen Applied Insights created an online medical edutainment platform for a client in a neighbouring country.

Read more

We find neuropsychology fascinating at Matogen Applied Insights, so have made a short study of it here. With Black Friday ad campaigns circulating, it seemed appropriate to relate it to consumer behaviour.

Read more

In one of Matogen Applied Insights’ earliest health sector data science projects, a US-based medical company approached the team to create a tuberculosis monitoring dashboard using clinical trial data, including an automated ETL (Extract, Transform, Load) and wrangling process.

Read more

What is “resilience”? The dictionary defines “resilience” as “an ability to recover from, or adjust easily to, misfortune or change.

Read more

Matogen Applied Insights was approached by the director of a European medical company to provide a business case by modelling projections on the predicted return on investment for a new online platform aimed at connecting people and healthcare.

Read more

In one of many health sector Big Data projects, Matogen Applied Insights performed oncology data modelling for an international client.

Read more

Gamification of Credit Scoring

Matogen Applied Insights is partnering with ConfirmU, a global, cutting-edge player in the financial inclusion space, in the gamification of credit scoring by using psychometric traits.

Repayment behaviour

The word “credit” originates from French, via Italian from the Latin credere, meaning “believe” or “trust”. In other words, credit is extended to people who can be believed or trusted to pay it back. In traditional financial services, credit scores represent the likelihood of a person repaying a loan and are compiled from information about an individual’s repayment behaviour, i.e. how they have historically managed previous loans or other financial products. It is their past behaviour that determines whether they should be trusted or not.

Thin files

According to the World Bank, there are 2 billion people around the world, mostly in developing countries, who lack access to even the most basic banking services. Even in advanced economies, up to 20% of consumers’ credit history is either completely lacking or too thin to generate a traditional credit score. Given that the word credit relates to a person’s trustworthiness, it stands to reason that psychology may be a useful tool to assess a person’s inclination towards repaying debt according to their personality. This could be very useful, especially where information about repayment behaviour is absent.

Theoretical underpinnings

Psychometric instruments are most commonly used within the human resources realm for candidate hiring and conducting assessments at different points of the employee cycle. In order to measure the psychometric traits that are linked to financial behaviour and credit risk, four models (MBTI, Ocean/Big 5, PCL-R, Hogan Personality Inventory) were used to cover the traits which characterise a good borrower. Intent to repay and financial conscientiousness were the main characteristics to be measured. The selected traits are supported by extensive primary research and pilot studies, for example, inhabitants of four different towns in Uttar Pradesh, rural India, provided their credit scores, played the game and underwent the MBTI test. The findings were very positive.

Case study

The idea originated from a chatbot, using NLP mapped to psychological elements (based on the Big Five model in psychology) to screen prospective rental tenants in the property sector. This prototype was tweaked to image-based selection in order to be scalable for multiple languages (for example, 207 dialects in India!) and actual gamification, which is much more engaging. ConfirmU had subsequently developed a short game, constituting psychometric tests that deliver an immediate credit score.

Matogen Applied Insights modelling case study

The case study was performed on data pertaining to loans to small-scale dairy farmers in Kenya. The farmers played the game and the Matogen Applied Insights data science team was able to achieve a high Gini coefficient (a measure of predictive strength) overall, as well as very high predictive values on out-of-time testing, indicating that the model generalises well. The strength of this psychometric model was comparable to models built with credit bureau data within the unsecured space.

Given the transparent modelling technique and the low level of correlation with other sources, the scorecard the Matogen Applied Insights team built off psychometric data is a powerful tool for both thin and thick file customers. One of the weaknesses of a traditional “thin file” or “new to credit” scorecard is the tendency to lump many individuals of the same age or income together, whereas the additional psychometric lens considers them differently. The Matogen Applied Insights scorecard displayed even deciles in the scored population and the bad rates of the riskiest decile were three times that of the bad rates of the low-risk decile with a steady change. In practice, this could translate into giving someone with a lower score a smaller amount, instead of just declining them, maximising not only profit but also impact and sustainability.

Localisation

Traditional credit scorecards are adjusted for different locations because even financial data needs to be interpreted within a local context. It could be argued that one’s personality profile is strongly influenced by one’s culture. Therefore, a major unique selling point of the game is that it is customisable – it can be adapted to local culture to produce valid results, whether it is played in India, Kenya or Vietnam. A pilot study is conducted and a bespoke model is built for each new location and potential lenders. This is based on an understanding of the practicalities and the characteristics of the audience. A link is sent and a consumer plays the game, after which a model is constructed based on data from the game combined with loan repayment information. It would then be very easy to scale the model to other clients in the same market.

Credit and non-credit applications

The first pilot occurred among rural, micro-entrepreneurial women, who are representative of the microfinance industry. ConfirmU is currently penetrating the unique segment of agri-lending in Kenya amidst a booming digital landscape. In addition, localisation is underway to target a metropolitan, young, tech-savvy population without a credit history. Non-credit applications potentially include buy-now-pay-later platforms, a flourishing segment worldwide that is gaining more and more traction in emerging markets, as well as insurance and wealth tech. The largest credit bureau in the world aims to apply this solution in the property tech arena. There is also room for credit scoring in the blockchain DeFi (decentralised finance) space where collateral is required on lending platforms.

Using gamification for credit scoring, ConfirmU’s overall vision is to become a global alternative credit bureau. Traditional models are good, but they don’t actually evaluate the person’s intent to repay based on personality.

However, even in highly industrialised countries, advanced analytics of adverse events surrounding orthopaedic procedures are in their infancy. Despite the increasing availability of national healthcare databases, their complexity cannot be handled effectively by routine analytical methods.

Advanced analytics for orthopaedic procedures

Given their advanced analytics capabilities within the health sector, Matogen Applied Insights was able to thoroughly investigate the outcomes and risk factors of patients with dismineral diseases undergoing orthopaedic procedures such as hip and knee arthroplasty, as well as spinal fusion.

The Matogen Applied Insights data scientist examined millions of data records ranging from 2008 to 2014 from a national database. ICD9 codes were used to filter the data and extract records related to the relevant diseases and orthopaedic procedures.

SAS software was used to wrangle the colossal dataset into a chart tabulating unique combinations of variable values and their frequency in the dataset. Continuous variables such as length-of-stay and total cost were aggregated. Many cross-tabulations were generated from the frequency table to glean insights and trends from the data.

The outcome variable of most interest in this project was the incidence of mortality. Among other results, Matogen Applied Insights was able to pinpoint the precise combinations of demographics, mineral deficiency, and operative procedures that most highly correlated with post-procedural mortality.

Machine Learning to Predict Churn and Spend

Matogen Applied Insights employed machine learning to build a predictive model to identify individuals who were likely to churn, as well as anticipate customer spending.

Churn

Churn is defined as the percentage of customers who moved to a different service provider within a given time period. It is a particular feature of the telecommunications industry that is of concern to service providers, as it has profound impact on income and profitability. 

An Matogen Applied Insights client in the telecoms sector saw considerable potential in leveraging their warehoused data to manage voluntary churn. Initially, the project aimed to discover and examine the patterns of behaviour associated with churn. The Matogen Applied Insights data scientists used weights of evidence, logistic regression and random forests to predict customers that were highly likely to churn, and in conjunction with an expert-based approach, alerts were triggered to highlight particular accounts. These accounts were forwarded into a retention queue for proactive attempts to retain them.

Spend prediction

The machine learning techniques employed by the Matogen Applied Insights team not only helped to manage voluntary churn, but the improved profiling of customers enables more targeted and relevant product and service offerings in the prepaid customer base especially. This also results in an even better understanding of the customer due to a longer and deeper relationship.

After profiling customers in new and insightful ways, the project expects to start seeing tangible value from the three selected initial use cases. The next steps will be to increase the sophistication of the modelling and automate the process to a larger extent.

Gamification Marketing “Spin & Win” Analysis

Matogen Applied Insights analysed the results of a telecoms client’s gamification marketing exercise.

Marketing using play

Play is an innate component of the human experience and from an early age children acquire cognitive and social skills through playing games. The same innate psychological triggers that make games enjoyable have been shown to trigger consumer engagement and retention in the field of marketing.

“Gamification marketing” is defined as harnessing gaming elements in a non-gaming context as part of marketing campaigns to increase the level of participation and influence consumer behaviour.

Rewards programmes are a traditional, tried-and-tested type of gamified marketing that is a widely used tool that rewards brand loyalty. Customers earn points by spending more on a particular brand and perks and rewards are allocated based on a tiered system.

In online language learning, gamification is used to set incremental goals to motivate people in the arduous journey of acquiring a new language. Controversially, online gamified simulations have been used to gauge interest in, and encourage, enlistment in the US military.

Marketers within the sport and exercise realms have included gamification to unleash consumers’ innate competitiveness while promoting their products and FMCG companies often used contests with entry contingent upon goods purchases.

Spin & Win Analysis

A client in the telecoms sector is very active in a consumer sector that is noteworthy for its price-sensitivity, i.e the extent to which the product price affects consumers’ purchasing behaviour. The company had previously embarked on a gamified marketing campaign to foster brand loyalty in this volatile space.

Matogen Applied Insights was contracted to analyse data on three gamified marketing campaigns to establish whether they were successful.

Data was analysed for almost 2 million subscribers that played the game during the three campaigns; scrutinising trends and comparisons to subscribers that did not participate in the game. Significant trends and insights were uncovered relating to recharge frequency, additional revenue generated and active days data.

The Matogen Applied Insights team was able to pinpoint and quantify the differences between game-playing and non game-playing cohorts, assisting the client in planning future strategic marketing efforts.

Debt Collection Analytics

Matogen Applied Insights helped a large corporate client to improve its debt collection strategy using advanced analytics.

When creditors struggle to collect overdue payments, it usually signifies the beginning-of-the-end of their relationship with the customer. Apart from improving collection rates on outstanding amounts, Matogen Applied Insights’ client in the telecoms sector desired stronger customer retention, and subsequently, growth. They joined forces to embark on a programme of intelligent customer insights to manage the customer journey.

Debt collection advanced analytics

The Matogen Applied Insights data science team undertook a comprehensive collections diagnostic, in addition to supporting the client’s collections team with managing the ‘run down’ book. Matogen Applied Insights harnessed the foundational components, Right Party Contact (RPC), gleaned from advanced data wrangling and modelling, as well as the client insights enriched by alternative data sources and application of  new variables created from Credit Bureau data. Right Party Contact enables the client to get hold of delinquent customers, overcoming the challenges of contactability and network dormancy, whilst the advanced customer analytics made it possible to adjust the service component of the customer’s package downwards to match their current affordability profile.

This innovative and flexible approach to debt collection analytics simultaneously increases collection, as well as customer retention rates.

BankTech Conference 2021

Our co-founder and CEO Jacobus Eksteen was one of 20 esteemed speakers at BankTech 2021, sharing their insights on technology in banking, at this year’s hybrid event – presenters and attendees could participate virtually or in person at the Indaba Hotel, Johannesburg.

The conference focused on challenges faced by financial and banking institutions, and included sessions from experts in artificial intelligence, block chain, open banking, data analytics, cyber security and open banking. The programme also addressed the industry’s radical shift driven by disruptive technologies, changing business models, compliance and mounting regulation in the banking and financial industry.

Matogen Applied Insights is well-versed in using the latest techniques to optimise financial institutions’ data.

Here’s Jacobus’ full BankTech presentation:

Part 1
Part 2

Engineering Credit Data

Engineering credit data has become imperative considering the increasingly risky and competitive lending environment. In order to improve decisioning, credit providers often leverage bureau summary variables which are generally calculated from credit bureau data sources. This includes credit account data, deeds office information, collections and microlending data.

Bureau information gaps

However, there are gaps in the current set of most bureau summary variables. Certain types of behaviour are not captured and this results in improper risk management. A “risk paradox” can result where, for example, a reverse ranking where the observed risk is not in line with the expected risk. Consequently, credit risk models are of limited usage due to this deficiency in the variables.

This project aimed to understand and fill the gaps in bureau summary variables. The Matogen Applied Insights team reprocessed and imputed the raw payment strings received from the credit bureaus to create new variables that examine consumer behaviour at different overlapping and non-overlapping periods relative to each other. More than a hundred new variables were scoped with a focus on delinquency. The data scientists examined and visualised variable lists focusing on utilisation, payments, balances and instalments to illustrate a customer’s trajectory over time. Traditionally, customers displaying the same delinquency at a specific point would have been assigned the same risk. With additional insights, it is clear that while one might be improving, the other could be deteriorating. This process of engineering credit data makes it possible to assign different risk categories to customers.

Insights from engineered variables

As such, Matogen Applied Insights was able to compensate for the shortcomings in traditional credit bureau data and identify payment behaviour trends over different periods. Bureau summary variables is an innovative way of using credit bureau data, which is generally not used to scrutinise historic data from more than one point in time.

Commercial Lending Risk Modelling

The COVID-19 pandemic constitutes a Black Swan event, of which the economic consequences have only started to unfold. According to a McKinsey report, the sudden and simultaneous depression in supply and demand resulted in typical credit-risk data becoming “obsolete overnight”. Traditional 6 and 12 month old data were no longer suitable for identifying the resilience of individual borrowers. During the latter half of 2020, Matogen Applied Insights joined forces with a commercial lending company to conduct internal risk modelling for approving loans for new and existing clients.

Data engineering

Due to a shortage of behavioural data, demographic metrics were a significant input into filling information gaps. In collaboration with the client, the Matogen Applied Insights data scientist created variables which proved to be innovative business metrics in that they were unrelated to other variables and provided significant lift. Logistic regression was applied to the augmented dataset and a model was constructed using credit bureau modelling techniques, facilitating the credit approval process.

Commercial lending risk modelling

During the second phase of the project, the Matogen Applied Insights data scientist added macro-economic data using the one-factor Merton model to adapt the credit scorecard for the pandemic-and-beyond trading environment. Linear regression was performed with macroeconomic indicators, including interest rates, unemployment figures, inflation indices, GDP and exchange rates. Model performance was checked with cross-validation and used to establish the solvency of clients. The portfolio’s time variation or systematic risk could then be explained and forecasted, allowing better business decisions.

Monitoring and machine learning segmentation

Matogen Applied Insights delivered a model that predicts the likelihood of default, given current macro economic conditions. Model performance continues to be monitored and has been found to be stable. Significantly, the model has resulted in an increase in loan approvals, despite exceedingly challenging market circumstances.

Currently, segmentation of the client base is being explored using a series of decision trees, seeking an improvement of the credit approval process.

Right Party Contact in SA Telecom

Right party contact is an integral component of strategic business decision-making, necessitated by increasingly difficult business trading conditions and is critical for optimised marketing, customer service and collections efforts.

Customer identity resolution

A major South African telecoms company found that its marketing and sales endeavours were hampered by the high velocity in customer contact information changes, particularly within their specific target market. To improve the situation, Matogen Applied Insights offered guidance on how to combine data assets from across the client company’s subsidiaries to provide a comprehensive corpus of accurate and verifiable customer information. Applying data analytics and artificial intelligence, this project’s core deliverable was a singular, accurate “View of Client”, of which “Right Party Contact” information is the critical element.

Data engineering and modelling

The client provided Matogen Applied Insights with data from various internal sources across the group; the information was analysed and used to build an expert model. After examining a variety of attributes, the model was ranked based on recency and frequency logic. Validation on right party contact (RPC) was done by comparing the data to trusted external data sources. Matogen Applied Insights was able to achieve an average performance of 80% reliability that contact information, indeed, was current and accurate. This is much higher than the industry average.

The optimised customer contact information will serve as the foundation for a subsequent series of projects aimed at retaining customers, unlocking opportunities and mitigating risk.

Alternative Data Credit Scoring

Matogen Applied Insights used alternative data for credit scoring to create new scorecards for a financial services client.

Alternative data

Traditionally, when assessing a potential customer’s credit risk, financial services providers have had to rely on information pertaining to the individual’s past credit activities, such as type of credit obtained, and utilisation, generally obtained from credit bureaus. In contrast, “alternative data” is defined as any data not related to a client’s credit activity, yet provides crucial information about their habits, preferences, behavior, and character. It is important that alternative data originates from a source where it cannot be manipulated by the data subject. According to Experian, 65% of lenders already use alternative data to make lending decisions.

Substitute scorecards

Credit bureau data proved limited for the customers of one of Matogen Applied Insights’ financial services client, especially in several key portfolios. However, Matogen Applied Insights was able to build substitute scorecards using “alternative customer data” from other purchasing data accessible within the client’s data warehouse.

The Matogen Applied Insights data scientist performed creative indirect matching to compensate for the large number of missing SA ID numbers of clients. In addition, purchasing data was analysed to reveal a plethora of attributes with good predictive strength. Matogen Applied Insights created a framework using this novel data to generate new descriptive features and predictive models, which delivered additional insights into customers and their behaviour. These substitute scorecards were shown to perform very well compared to those produced by traditional credit bureaus.

Marketing and compliance

This alternative data credit scoring framework will be used internally on an ongoing basis to guide decision-making on marketing ethics and strategies to unlock new opportunities. For example, it would mitigate risk to the end-consumer due to over-indebtedness and incorrect product offerings. In general, the aim is to increase knowledge of South African consumers, and improve the predictive power of models, as well as boost compliance.

Automated Crypto Trading System

Crypto currency trading may be all the rage as of the time of writing, but Matogen Applied Insights had already delivered a bespoke automated crypto currency trading system a while back for an international client, a veritable “early adopter” in this dynamic sector.

Modelling big crypto data

In the first phase of the project, large datasets on concluded crypto transactions were examined and analysed. Applying advanced financial modelling, generated an algorithm that identifies optimal opportunities for trades on different crypto exchanges, on both long and short positions. At the same time risk is minimised by taking counterpositions where there is an opportunity for arbitrage.

Using Python, the Matogen Applied Insights data engineer built an automated crypto trading system that executes trades based on these algorithms.

Crypto exchange expansion

Subsequent to the successful implementation and operation of the automated trading system over a significant period of time, phase 2 of the project entails expanding the scope of the exchanges on which to execute trades, in addition to harnessing opportunities for leverage, whilst managing the risk of overexposure.

Cannibalisation in the Telecoms Sector

The telecommunications industry is a dynamic, competitive and very strategic environment, but is prone to a particular phenomenon. Not only do telecoms companies need to be concerned about competitors’ encroaching on their market share, their very own products may be eating into their profits! This so-called “market cannibalisation” happens when there is a loss in income when the launch of a new product displaces demand for older products. 

A major listed South African telecoms company, faced exactly this conundrum. They wanted to know how to leverage their extensive data repository to glean insights and make better decisions based on their customers’ profile and financial behaviour, in order to mitigate this situation.

Data wrangling, analytics, engineering

The highly skilled Matogen Applied Insights team wrangled, analysed and engineered the client’s customer data. Large volumes of sales data were scrutinised, after some significant wrangling. The Matogen Applied Insights data scientists closely examined trends to gain insights into customers’ spend, reactivation and churn behaviour. The team constructed sophisticated predictive models for campaigning, for client offer targeting specifically. This enables the company to strategically structure product launches in a way that optimises profitability and lessen the impact of market cannibalisation in their telecoms operations. 

In this ongoing project, the team is actively involved on an operational basis in the execution of campaigns, after which performance tracking and reporting are performed. The Matogen Applied Insights data scientists have been able to implement many improvements to current systems and have automated a significant proportion of data pipelines.

Client feedback

The CEO of the listed telecoms company had this to say:
“The Matogen Applied Insights team is really easy to work with and it feels like the consultants are alongside my staff, rather than separate from them. This makes it easy to bring together a business need with super-smart data scientists, which is typically where things fall apart.”

Agri 3.0 – Data Science in Agriculture

Matogen Applied Insights has successfully completed a wide range of innovative data science projects in the agriculture sector.

What is Agri 3.0?

An interesting outcome of the COVID-19 pandemic and the resultant lockdown in South Africa is that it brought into relief the notion of “essential services”. Suddenly, it was emphasised in staggered form which activities and enterprises were absolutely critical for human survival. Health services, transport and communication were highlighted, while societal extras like entertainment and leisure were quickly shut down to curb the spread of infections. Panic buying at supermarkets was an expression of people’s fear that the supply of food, the most essential of essentials, would be disrupted. However, to everyone’s great relief, the agricultural sector (as well as the other links in the food supply chain) kept chugging along more or less as usual. Nevertheless, this scenario served as a stark reminder of the importance of the agricultural sector.

Agriculture 3.0 is the term given to describe the “Third Agricultural Revolution”, i.e. data science in agriculture, also known as “digital farming”. It is characterised by the large-scale use of data to drive food production to levels that far outperform the two previous agricultural revolutions: mechanisation and genetic modification, respectively. Data produced by customised farming software products, weather stations, satellite imagery, leaf and soil samples, as well as irrigation systems, is measured and analysed to optimise yield, efficiency and profitability.

Data Science in Agriculture

To date, Matogen Applied Insights has collaborated on a variety of data science projects within the agriculture sector. These range from building customised systems, data wrangling and analysis, to the construction of predictive models and computer vision. Matogen Applied Insights comprises a team of highly skilled individuals from diverse backgrounds. It leverages international best practice from the fields of statistics and machine learning in synergy with clients’ domain expertise to examine trends and impart data-driven insights. This empowers clients to make better decisions towards more efficient farming practices, saving time and money, as well as limiting wastage.

Undertakings within the farming industry include systems such as dashboards, bots and electronic alerts for weather risk and crop protection spraying conditions, as well as plant disease image recognition. In addition, the analysis of agricultural data allowed for the identification of trends with regard to growth phases, nutrient deficiencies and irrigation. However, it is within the realm of predictive analytics that Matogen Applied Insights’ unique approach and cooperative capabilities position it favourably to contribute towards the Third Agricultural Revolution. Completed forecasting projects range from the prediction of crop disease risk and plant nutrient deficiency, as well as optimal harvest date and irrigation practices.

Remarkably, Matogen Applied Insights has already produced cutting-edge revelations that could potentially transform data-driven decision-making within farming as the Third Agricultural Revolution progresses.

Agri 3.0 – Plant Disease Image Recognition

This article showcases Matogen Applied Insights analytics capabilities within the agricultural sector, examining how employing deep learning for plant disease image recognition can be used to identify crop disease.

Plant pathology

Even in the modern era, plant disease still poses significant risks to crops, despite the widespread use of crop protection products. In fact, misdiagnosing plant disease can result in the misuse of chemicals, leading to the emergence of resistant pathogen strains. This, in turn, results in increased input costs and more outbreaks, impacting profitability, as well as the environment. Currently, disease diagnosis is conducted by humans, which is both labour- and cost-intensive. Thanks to advances in computer vision technology, there is an opportunity to employ deep-learning based modelling to enable automated plant disease image recognition in order to increase efficiency in this process.

University of Stellenbosch Plant Disease Clinic

The Department of Plant Pathology at the University of Stellenbosch utilizes the latest technology as part of its holistic and interdisciplinary approach towards limiting the impact of plant diseases. In its research and training, conventional and molecular techniques are combined to control plant pathogens and increase plant resistance in a sustainable and economic manner for the benefit of both local and export markets. The department features research programmes in grape, deciduous fruit, citrus, vegetable and cereal crop disease.

The department also contains the Plant Disease Clinic, a service laboratory where specialists in the field of plant pathology diagnose problems on received samples. Diagnosis is conducted on bacterial, fungal and viral disease. The clinic also offers an insect identification service in cooperation with the university’s etymology department. Digital images, either via a conventional camera or smartphone, or through a microscope lens, have become a useful diagnostic aid, enabling verification by experts worldwide.

Plant disease image recognition

Building on this methodology, Matogen Applied Insights has been providing ongoing technical support to the Plant Disease Clinic by engineering a comprehensive platform and workflow system. The platform includes a database incorporating migrated historic institutional data, reports and images. Preventing loss of historic information is crucial, especially in the case of rare plant disease.

The plant disease data warehouse contained within this platform enables detailed trend analysis and cutting-edge modelling. The machine learning technique, Convoluted Neural Network, was applied to the vast repository of plant disease images, in order to construct a model that would identify a specific plant disease on a previously unseen image (in theory, sent from the field on a smartphone, for example) with high levels of accuracy.

The model is able to classify a given image into “diseased” or “healthy” categories with great accuracy. Subsequently, distinctions are made between many different types of diseases. Rare cases and novel symptoms are also accommodated and variations in depth perception (angle, light, shade, leaf age) are addressed. In addition, the platform allows for the incorporation of expert knowledge for identifying, annotating, quantifying and guiding the computer vision search for relevant features.

Computer vision use case

Efficiency increases gained by the type of modelling outlined above, pose unparalleled opportunities to save time and money in combating plant disease, ultimately improving food security.

Agri 3.0 – Irrigation Model

Creating an irrigation model was one of Matogen Applied Insights’ earliest projects within the agricultural sector. It examined how data-driven farm management can optimise irrigation practices.

Crop irrigation optimisation

Due to population growth, industrialisation and contamination, there is increasing pressure on the amount of water available for irrigating agricultural crops. It has become critical to optimise crop irrigation to provide the most beneficial amount of water at the correct time, in order to maximise crop yields while limiting wastage. Too little water puts crops in excessive drought stress, whereas too much water results in waste and could potentially flood the nutrients away from the roots, hampering effective absorption, and therefore, yield.

Crop irrigation scheduling has traditionally been based on theoretical crop coefficient values, which are primarily based on annual seasonal changes. These values are used as a reference point when determining total water, crop and season allocations. However, these coefficients still have a coarse, monthly resolution at best and true water needs of crops deviate strongly, especially given increasingly erratic weather conditions.

Evaporation, Transpiration and Evapotranspiration

Evaporation is the process whereby liquid water is converted to water vapour, “vaporisation”, and removed from the evaporating surface, “vapour removal”. Water evaporates from a variety of surfaces, such as soil and wet vegetation, as well as lakes, rivers and pavements. The evaporation process is influenced by climatological parameters such as solar radiation, air temperature, air humidity and wind speed.

Transpiration denotes the vaporisation of liquid water contained in plant tissues and the vapour removal to the atmosphere. Crops predominately lose their water through stomata, small openings on the plant leaf through which gases and water vapour pass.Transpiration, like direct evaporation, is affected by weather conditions.

Evaporation and transpiration occur simultaneously and the processes are difficult to distinguish. Evaporation from cropped soil is mainly determined by the fraction of solar radiation reaching the soil surface, which decreases as crops develop and more canopy increases shade. Initially, water loss is largely due to evaporation, but at later crop growth stages, transpiration becomes the main process.

Evapotranspiration is the combined name for the processes of evaporation and transpiration. It is abbreviated as ETc and is used to express crop water usage. It varies from region to region depending on crop type, stage of growth, soil, and climate conditions. Different evapotranspiration rates may be observed even in different parts of the same region. Considering global warming and climate change in recent years, it is clear that the predicted ET values of a year cannot be used safely for upcoming years.

Crop irrigation model

An international client in the agriculture sector contracted Matogen Applied Insights to analyse data derived from telemetry readings from on-site probes dedicated to measuring the water moisture at various depths. This data was combined with climate data, specifically temperature, to enhance crop water requirement predictions.

By exploring the data it was revealed that there was a significant deviation of (30% – 50%) between expected and actual crop water consumption. An algorithm was created to derive a more refined crop coefficient curve from probe readings. In addition, these new crop coefficients were linked to heat readings from annual temperature data to produce more nuanced predictions that accommodate temperature fluctuations within seasons, instead of only using the broader time definition, “season”. A feedback loop was created to update and improve the water crop coefficient in an iterative fashion, continually improving model performance.

Agricultural software

Matogen Applied Insights’ predictive irrigation model was incorporated into a multifunctional agricultural software product developed by its sister company, Matogen Corporate Software Development. It delivers an irrigation scheduling tool that can be used with a wide range of telemetry devices and along with the other product components, offers actionable information on a variety of agricultural aspects, including weather and spray condition forecasts and soil classification — all of which are critical for efficient and sustainable farm management.

Agri 3.0 – Harvest Date Prediction

Modelling agricultural data for harvest date prediction is one of many projects showcasing Matogen Applied Insights’ analytics capabilities within the agricultural sector.

Smart farming

“Precision farming”, also known as “smart farming”, is an integral part of the Third Agricultural Revolution and pertains to the use of digital technologies as a crop management tool to monitor and optimise agricultural production. Vast amounts of farm data are collected via a plethora of devices, for example sensors, satellites and probes, as well as data from leaf, soil and fruit sampling. The data is combined, analysed and subjected to machine learning methodologies in order to derive insights that aim to guide decision-making — the ultimate objective being to optimise food production efficiency through increased yields at lower costs.

Vegetation indices

Matogen Applied Insights concluded a project for a major client in the South African agriculture sector which entailed using several sources of data to predict harvest dates where fruit would be at its “peak vegetative stage”. Significant inputs into this process were readings for Normalised Difference Vegetation Index (NDVI), as well as Enhanced Vegetation Index (EVI) data.

The Normalised Difference Vegetation Index (NDVI) is calculated based on data obtained from satellite imagery. It measures the relationship between the amount of near-infrared light reflected by plants and the amount of reflected visible light. Healthy plants reflect more near-infrared light and absorb more red light in comparison to other wavelengths.

The Enhanced Vegetation Index (EVI) is also derived from satellite image calculations based on how plants reflect light, as well as aerosol resistance. As such, EVI conveys more information about canopy structure, whereas NDVI indicates the presence of chlorophyll. These NDVI and EVI indices are complementary in assessing vegetation based on image data from space.

Vegetation Indices are an important indicator of “peak vegetative stage” and therefore very useful in the prediction of optimal dates for harvesting to maximise crop yield.

Data wrangling, feature engineering and modelling

The client provided Matogen Applied Insights with data detailing both the NDVI and EVI for more than a dozen grape cultivars. In addition, almost a decade’s worth of growth stage data was supplied for a selection of farm blocks dissimilar in age and in different locations. Specifically, the time series data for the growth stages for the “bud”, “bloom”, “veraison” (when grapes turn from yellow to green or red) and “harvest”, as well as the “initial planting date”, was included.

The different types of data were combined and new variables created in order to produce time series plots to identify trends pertaining to the crop growth stages and examine differences in respect to farm block age and location.

After exploratory data analysis, correlations between the variables were examined to determine which relationships were most significant with regards to the outcome variable “days_until_harvest”. Subsequently, a variety of techniques were applied, including Multiple Linear Regression, XGBoost Regressor and Random Forest Regressor. Multiple Linear Regression is an extension of Simple Linear Regression and is a technique used to predict the outcome variable, using two or more other variables. XGBoost Regression and Random Forest Regression have the same objective but use a Decision Tree methodology.

This process produced a model that achieved an R2 score of 98.1% accuracy on test set data when attempting to predict optimal harvesting dates for maximum crop yield. It means that given new, unseen (but known) harvest date data, the model was able to predict “days_until_harvest” impressively accurately.

Harvest date prediction use case

The insights gained from applying machine learning techniques to a combination of disparate types of agricultural data enable improved planning and management of the planting and harvesting of crops. Better harvest scheduling would result in lower crop losses, as well as optimise the application of crop protection products, i.e at the most impactful phase, whilst limiting residues at the point of export or retail. As more data is obtained, algorithms can be fine-tuned and improved to such an extent that future labour costs and time can be decreased, ultimately increasing efficiency in farming.

Agri 3.0 – Weather and Plant Disease Risk Alert System

In the spirit of Agri 3.0, Matogen Applied Insights created a sophisticated weather and plant disease risk alert system, using data and technology to mitigate the potentially dire repercussions of crop failure

Weather-related crop disease

Throughout the centuries, crop diseases often obliterated entire harvests, the Irish Potato Famine of the 1840s being a famous example of Potato Late Blight which led to mass starvation and emigration. The disease in question, Potato Blight (Phytophthora infestans), is a fungal infection caused by spores transported by wind and develops on susceptible plants when weather conditions are favourable, i.e. warm and humid.

To this day, crop losses due to Potato Blight amount to billions of US dollars worldwide, despite the scientific advances in chemical crop protection. This could be attributed to ever-increasing temperature and humidity levels effectively expanding the areas at risk, as well the existence of a wider variety of strains of fungal infections  most likely due to mutation.

Smith and Hutton Periods

Agricultural scientists have defined a formal description of the weather conditions conducive to Potato Blight. Initially, a “Smith Period” referred to a 48 hour period during which the minimum temperature measures 10°C or greater, in addition to the relative humidity exceeding 90% for at least 11 hours during the first 24 hour period, and then again for at least 11 hours during the final 24 hour period. A more recent, and simpler, definition is the “Hutton Period”, which denotes two consecutive days with minimum air temperatures of 10°C and relative humidity of more than 90% for at least 6 hours.

Plant disease risk prediction

Increasingly, using agricultural data to predict plant disease has become an essential tool in crop management. This type of forecasting is necessary to justify each instance of fungicide application and optimise the timing, targeting, and dosage. Several factors influence the risk for plant disease, including the crop flowering stage, petal or airborne spore inoculum levels and weather conditions. A Risk Score is calculated based on these variables.

Weather risk alert system

Matogen Applied Insights was contracted to automate and improve the client’s system. Our automated system applies cloud-based algorithms to weather data to calculate indices for disease risk for several diseases. When specified risk thresholds are reached, email and text messages are sent out automatically to users to alert them of heightened disease risk. The system also provides a high-resolution risk map and various relevant weather metrics. Crucially, the system also includes recommendations relating to risk mitigation, for example, which protection products would be required and when spraying conditions would be ideal.

In addition, Matogen Applied Insights provides service support to ensure that the risk alert system functions optimally. The risk alert data generated by the system is used to analyse trends in crop risks so that the models can be improved. User behaviour is also tracked for product improvement and business intelligence purposes. The tailor-made product can easily be adapted to include a greater variety of crops and diseases and dosage. Several factors influence the risk for plant disease, including the crop flowering stage, petal or airborne spore inoculum levels and weather conditions. A Risk Score is calculated based on these variables.

Agri 3.0 – Plant Nutrient Deficiency Prediction

In one of Matogen Applied Insights’ flagship agricultural analytics projects, the data science team examined how modelling leaf analysis data can enable crop nutrient deficiency prediction.

Plant leaf analysis

Plant leaf analysis is the only truly accurate means to determine whether plants are suffering from nutrient deficiencies and what the consequent nutritional requirements are. It is a critical component of the farm management process to inform decision-making surrounding the dosage and frequency of fertiliser application, in order to optimise efficiency and environmental sustainability.

The leaf analysis process is labour and time intensive, mainly due to the physical leaf sampling process as well as transportation to laboratories. Once the leaf sample arrives at the laboratory, a variety of tests are conducted to obtain as much information as possible from the sample. A wide range of nutrients can be measured including macronutrients such as nitrogen, potassium and phosphate, major nutrients such as calcium, sulphur and magnesium, as well as heavy metals such as lead, nickel, arsenic, cadmium, chromium, mercury, copper and zinc.

Leaf sample analysis and modelling

A client, a major player in the South African agricultural sector, approached Matogen Applied Insights to conduct a pilot study to analyse a vast amount of leaf analysis data to identify trends and examine the feasibility of “virtual sampling”. The primary objective of analysing the data was to determine in which areas sufficient numbers of samples were required to conduct the next phase of the exercise, “virtual sampling”.

For the initial stage of the project the focus was on soybeans and maize, as well as winter wheat crops in KwaZulu-Natal.

Data analysis and modelling

Analysing the data revealed that the number of samples taken varied greatly across province and growth stage. Matogen Applied Insights developed a simple model based on the frequency of the nutrient levels of interest and the number of samples for specific areas to produce a matrix highlighting sampling priority per crop, location and growth phase.

In areas where sufficient numbers of samples were taken, leaf sample data was combined with NDVI and weather station data to examine correlations. New variables were created to measure specific relationships. Statistical modelling techniques were applied to predict likelihood of nutrient deficiencies at specific locations. The most effective method produced a model for plant nutrient deficiency prediction with 95% precision on test data.

Virtual sampling

Virtual sampling involves making inferences about a sample using historical leaf samples and other data. The model produced by Matogen Applied Insights sets the stage to explore the extent to which virtual sampling can be used to decrease the number of physical plant leaf samples required. As this aspect of leaf analysis is the most costly, as mentioned above, reducing the number of samples required without decreasing information derived would lead to significant cost and efficiency gains in this crucial aspect of agriculture management. Prioritising physical leaf sample-taking at the locations and during the growth stages highlighted in the data analysis component of the project would enable the implementation of “virtual sampling”.

Big Data in Healthcare

What is Big Data in healthcare?

Big data is commonly and academically defined to have three significant characteristics: volume, variety and velocity. One of the key industries where generated data adheres to these definitions to a large extent is the healthcare industry. Tellingly, at the onset of the COVID-19 pandemic, big data originating from the health sector was instrumental in developing models used as critical inputs for governments worldwide to devise their response strategy.

Even under non-pandemic conditions, the medical industry produces a myriad of health-related data, including electronic health records, scanned imagery, sensor data and financial information. Analysing these types of data allows for a plethora of enhanced services such as disease prediction and prevention, optimised treatment plans, the development of new drugs, telemedicine (remote diagnosis of disease), real-time alerting, insurance fraud detection, as well as human resource and supply chain management,

Health data science

However, despite the increasing availability of massive health data sets, even on national and international levels, the complexity of Big Data analysis arises from combining these different information types. Traditional health sector analytics tools have proven incapable to collectively analyse data from electronic healthcare records, pharmaceutical data, clinical trials, patient summaries, genomic data, telemedicine, mobile applications and sensor data, as well as from social media and  information on well-being, behaviour and socio-economic indicators. 

Health data science is an emerging discipline which combines mathematics, statistics, epidemiology and informatics in order to mitigate these challenges to effectively harness opportunities presented by “Big Data” in healthcare. The Matogen Applied Insights’ data science team has successfully completed a range of projects within the healthcare sector, ranging from oncology to orthopaedics. 

Non-communicable and communicable disease

Advances in Big Data analytics are providing researchers powerful new ways to extract value from diverse sources of data in the fight against cancer in particular, one of the most impactful non-communicable diseases worldwide, especially as the world population grows and ages. Matogen Applied Insights completed a project for a US client to model, monitor and visualise cancer data in order to recommend optimal treatment regimens. Similarly, the team also modelled orthopaedic data, to predict adverse events related to surgical procedures for patients suffering from dismineral disease. Mental health disorders constitute another related major non-communicable category for which Matogen Applied Insights is collaborating with a team of neuroscientists as part of an ongoing project. On the infectious disease front, Matogen Applied Insights has participated in advanced modelling of the COVID-19 pandemic for a major international mining company, as well as building a pipeline to wrangle and visualise tuberculosis patient monitoring data for a US client using data from a large national database consisting of millions of records.

COVID Modelling Parameter Automation

Matogen Applied Insights assisted with automating parameters for an extensive COVID modelling endeavour.

The pandemic’s impact on mining

The commodities market is no stranger to external shocks and already early on in the pandemic, analysts were scrambling to quantify how COVID-19 would affect mining operations globally. A data science company specialising in health care modelling was contracted by an international mining conglomerate to construct a compartment model to track the COVID-19 epidemic’s impact on their operations worldwide.

COVID modelling parameter automation

The compartment model’s parameters were fitted by choosing reasonable values and running an extensive grid search around these values on a cluster. However, given that the model could accommodate up to 60 parameters, the computationally costly process proved problematic to repeat as new data became available. Matogen Applied Insights was approached to assist to automate and simplify the fitting of the model. Parameter fitting was made less complex by implementing a gradient descent with multiple live points, after a few rounds of which, a Markov Chain Monte Carlo (MCMC) was run on each live point. The best results were used as summary statistics for the parameter space. The algorithms were implemented in an Approximate Bayesian Computation framework and parallelised in an R package to run on an Azure server. This removed the need to manually fit the model and run an expensive grid search, saving many person-hours and giving better results.

Client’s feedback

“Matogen Applied Insights’ interdisciplinary team has a lot of experience applying advanced statistics and machine learning for predictive modelling, but they also recognise the power of human intelligence so they are about the synergy of human and machine intelligence. I have been working very closely with the team over the past six months and they are truly an amazing group of people. Not only do they produce top quality results in a very time efficient way, they also operate from the core values of integrity, respect, sustainability and social impact.”

Advanced Analytics for Orthopaedic Procedures

Matogen Applied Insights assisted a US client in the health sector to perform advanced analytics on data pertaining to dismineral disease, orthopaedic procedures and adverse events.

The impact of musculoskeletal disorders

In industrialised countries, musculoskeletal disorders pose a significant health and economic burden and account for more than half of all chronic conditions in people age 50 and older. In the United States, almost 6% of GDP is spent annually on healthcare for patients with muscoskeletal diagnoses.

In addition, “adverse events” are considered common after orthopaedic procedures. “Adverse events” are defined as any suboptimal outcome experienced by a patient following medical treatment, such as infection, unfavourable reaction to medication, injury and even, death. 

However, even in highly industrialised countries, advanced analytics of adverse events surrounding orthopaedic procedures are in their infancy. Despite the increasing availability of national healthcare databases, their complexity cannot be handled effectively by routine analytical methods.

Advanced analytics for orthopaedic procedures

Given their advanced analytics capabilities within the health sector, Matogen Applied Insights was able to thoroughly investigate the outcomes and risk factors of patients with dismineral diseases undergoing orthopaedic procedures such as hip and knee arthroplasty, as well as spinal fusion.

The Matogen Applied Insights data scientist examined millions of data records ranging from 2008 to 2014 from a national database. ICD9 codes were used to filter the data and extract records related to the relevant diseases and orthopaedic procedures.

SAS software was used to wrangle the colossal dataset into a chart tabulating unique combinations of variable values and their frequency in the dataset. Continuous variables such as length-of-stay and total cost were aggregated. Many cross-tabulations were generated from the frequency table to glean insights and trends from the data.

The outcome variable of most interest in this project was the incidence of mortality. Among other results, Matogen Applied Insights was able to pinpoint the precise combinations of demographics, mineral deficiency, and operative procedures that most highly correlated with post-procedural mortality.

Medical Edutainment Platform

Matogen Applied Insights created an online medical edutainment platform for a client in a neighbouring country.

Healthcare in Botswana

Botswana is an upper-middle-income country that has taken great strides towards improving access to healthcare. It offers universal healthcare to its citizens through its public healthcare system, free for children under five years of age. In keeping with objectives to improve child health in Africa, Matogen Applied Insights collaborated in a programme aimed at primary and pre-primary school children.

Children’s experience of medical treatment

Medical procedures can often be daunting, if not outright traumatic, for children. The extracurricular medical edutainment programme is designed to make children more comfortable with doctors, nurses and dentists. Through classes and holiday programmes, children are exposed to various medical topics, including personal safety, hygiene, nutrition, anatomy and doctors’ tools. In addition to alleviating any anxiety associated with seeking medical treatment, the programme also ultimately aims to potentially entice children to seek careers within the world of medicine.

Junior Medics online platform

Owing to the social distancing effects of COVID-19 and the barriers it brought to the Junior Medics team, Matogen Applied Insights helped create an online learning platform that allows children to have the whole Junior Medics experience and more, despite the pandemic. Matogen Applied Insights’ software developers were able to provide a customised environment with programme-specific content and appropriate accessibility. The platform enables Junior Medics teachers, usually junior doctors, to upload any content in the form of PDF’s, word documents, videos, images, etc. In addition, facilitators are also able to set up worksheets in the form of an online quiz, which can be monitored by time slot, which enables them to observe how students engage with the different content.

Thanks to the medical edutainment platform created by Matogen Applied Insights, the Junior Medics team can be one step ahead in improving child health and training future doctors!

Behavioural Data Science

We find neuropsychology fascinating at Matogen Applied Insights, so have made a short study of it here. With Black Friday ad campaigns circulating, it seemed appropriate to relate it to consumer behaviour. Behavioural Data Science, in a nutshell, is using computers to understand humans – and influence their behaviour.

For decades, “behavioural analytics” has been the term used to describe the process of using theory from non-financial fields such as psychology, neuroscience, economics, sociology and anthropology,  to understand consumer purchasing behaviour. Within the South African market research landscape, the “Conversion model”, a seminal psychological model to measure the strength of the relationship between consumers and brands, was developed from theory relating to religious conversion. For many years, it was used with great success to predict churn and quantify customer loyalty. 

Social science theory: ‘The why behind the panic buy‘

Recently, many retailers (and consumers!) were caught unawares by the waves of “panic buying” at the onset of the COVID-19 pandemic. Traditional timeseries purchasing data was useless in predicting this highly disruptive phenomenon, whereas theories from the realms of economics and psychology could have proven useful in predicting and mitigating these occurrences.

For example, Nobel prize winning economist Daniel Kahnemann’s “loss aversion” states that humans experience financial loss more profoundly than financial gain. In the context of panic buying, people were alarmed at the anticipated pain of not having access to products they normally would. Such risk aversion caused people to buy anything they set their eyes on for fear of not being able to do so later. Another Economics Nobel prize winner, Richard Thaler, wrote extensively about behaviour-nudging heuristics, a.k.a. “copycat behaviour”. During the early stages of the pandemic, consumers observing others buying large amounts of a particular item, were inclined to jump on the same bandwagon. No doubt this phenomenon was exacerbated and entrenched by the constant stream of social media and news reports showcasing this herd like behaviour.

In addition, psychologists have contributed theories regarding the human need for control during periods of stress. As such, stockpiling could be considered an amplified form of retail therapy in an attempt to assert control. Social scientists have even highlighted the soothing effects of buying utilitarian products specifically, i.e. items that “give you something to do”, which coincidentally tied in very well with actual types of products (food and cleaning products) that were being stockpiled!

“Game Theory” is another Nobel prize winning economics contribution, formulated by John Nash which famously entered the public lexicon in the Oscar-winning film, “A Beautiful Mind”. It assumes rational (and self-preserving!) reasons that could explain panic-buying: Out of two competing strategies, act normally or panic buy, normal purchasing behaviour would result in a state of equilibrium. However, once the panic buying has already started, the optimal strategy would be to do the same.

Enter algorithms and big data

With the emergence of AI technologies, the insights provided by behavioural theory can now be leveraged further using a range of methodological tools from statistics, computer science and engineering. As described by the Alan Turing Institute, “Human behaviour is a major source of data in the current digital economy. At the same time, it is also one of the main ‘objects’ of data science in the sense that many data science and artificial intelligence models are aimed at influencing human behaviour (e.g. through ‘nudging’, personalisation, and behavioural segmentation).”

Applying behavioural science to increasingly large datasets, human behaviour can now, better than ever, be understood and predicted across a variety of applications ranging from health, telecommunications, finance and development. Behavioural data science can help cities, businesses, governments and individuals understand why and how human decisions are made and inform how to optimise behaviour to achieve better outcomes.

Tuberculosis Monitoring Dashboard

In one of Matogen Applied Insights’ earliest health sector data science projects, a US-based medical company approached the team to create a tuberculosis monitoring dashboard using clinical trial data, including an automated ETL (Extract, Transform, Load) and wrangling process.

Monitoring tuberculosis patients

Keeping track of tuberculosis patients is critical as there are various exogenous and endogenous risk factors that influence the likelihood of developing active disease following exposure to tuberculosis bacilli. In addition, traditional monitoring approaches to track HIV or TNF (Tuberculosis Necrosis Factor), for example, rely primarily on costly and time intensive on-site visits and source data verification. In order to mitigate these difficulties, regulatory agencies encourage adopting a risk-based monitoring (RBM) approach that identifies and tracks critical data and procedures regarding the overall impact on trial integrity and subject safety.

Data wrangling and visualisation

As the first step in the ETL process, the Matogen Applied Insights data scientist coded a SAS script to extract the source information from the clinical database and then transformed the aggregate patient attributes into a usable format. The output data was then connected and loaded into Power BI, creating a dashboard that visualises critical data elements for all the participant sites involved. It used the R-script functionality within Power BI to deliver detailed visualisations, including risk factor diagrams, individual site performance analysis featuring interactive plots as well as drill-down capabilities.

This tuberculosis monitoring dashboard ensures that critical information reaches the right people in a timeous manner in a format that is easy to interpret even for the non-statistical end-user. By utilising Power BI, the automated process “refreshes” functionality, thereby ensuring that the provision of tuberculosis risk data is consistent, actionable and readily available to all clinical project management team members.

Neuroscience Expert System

What is “resilience”?

The dictionary defines “resilience” as “an ability to recover from, or adjust easily to, misfortune or change.” Based on many years of clinical research, a team of neuroscientists developed an innovative code to deliver insights into human resilience and how it can drive high performance teams within the corporate realm. The client approached Matogen Applied Insights to engineer an integrated data system to process detailed information about subjects’ behaviours, attitudes, emotional and cognitive states. This data was previously captured and analysed in a disconnected and manual fashion.  

Analysis and recommendations

Matogen Applied Insights analysed historical data, and devised a data-driven expert-validated model utilising the client domain expertise, i.e. “the neuroscience expert system”. This model not only measures resilience, but also identifies the most impactful actions towards improvement. The model was used to build a software application that provides users with commendations on areas in which they are already performing well and recommendations for most impactful improvement.

Model stability and segmentation

In order to ensure that the population on which the model was built matches the observed population, the monitoring phase of the project includes calculating the Population Stability Index (PSI) which measures the discrepancy between the expected and actual populations.

In addition, the Matogen Applied Insights data scientist applied collaborative filtering to examine the relationships between the variables, including the Resilience Index. Collaborative filtering is an advanced statistical technique that simultaneously assesses similarities between respondents and constructs to detect relationships. It is a method often used in recommender systems such as Netflix. Matogen Applied Insights has also successfully incorporated this method for a major listed South African telecoms company for its campaigns. In this project, collaborative filtering clearly identified four clear segments within the data. Finally, linear regression was conducted which delivered the outstanding insight that five constructs could explain almost 70% of the outcome under scrutiny!

Model monitoring

In this ongoing project, the information captured in the neuroscience expert system and analytics engine is updated according to the ever-widening academic knowledge base, which, in addition to on-going monitoring, ensures the accuracy of the predictive resilience model.

Health Industry Business Case Modelling

Matogen Applied Insights was approached by the director of a European medical company to provide a business case by modelling projections on the predicted return on investment for a new online platform aimed at connecting people and healthcare.

The prospective healthcare services site includes listings of medical professionals across a broad spectrum of specialisations, in addition to detailed information on a variety of illnesses, medication and symptoms. The site also features playlists highlighting specific procedures and conditions, medical stories featuring patient experiences, as well as community groups for particular ailments.

Business case modelling

The platform generates revenue through booking appointments with health care professionals, as well as ad clicks and sponsored content. Given the price sensitivity surrounding health care seeking behaviour, the client wanted to explore different revenue scenarios that would impact on its EV (Enterprise Value) and EBITDA (Earnings before Interest, Taxes, Depreciation and Amortisation).

The experienced Matogen Applied Insights data science team  conducted extensive research and considered a plethora of variables to construct a model representing high and low revenue scenarios.  Matogen Applied Insights was able to illustrate various projections for the next five years. Both the top-down and bottom-up approaches showed the same result (cashflows, IRR) and empowered the entrepreneur to have informed discussions with potential investors.

Oncology Data Modelling

In one of many health sector Big Data projects, Matogen Applied Insights performed oncology data modelling for an international client.

Big Data in healthcare

The benefits and prevalence of Big Data in the healthcare sector are well-known, but it  has been said that Big Data analytics can be considered a potential game-changer in the fight against cancer in particular. Previously, medical researchers had to rely on small sample sizes, case studies and relatively sparse tumour DNA or genetic analyses. 

In contrast, the current availability of enormous publicly available databases containing a mind-boggling range of data on different types of cancers, across a spectrum of demographics, regions and genetic profiles, have rendered traditional analytics tools defunct. The advanced computer modelling and wrangling provided by data scientists make it possible to construct better predictive models for improved diagnosis and tailor-made treatment.

Oncological data modelling

In this project, oncological data was extracted from a large, publicly available, all-payer, inpatient healthcare database designed to produce regional and national estimates of inpatient utilisation, access, charges, quality, and outcomes for the United States.

The oncological data modelling determined which outcome was most probable in patients (based on their attributes) with breast cancer and what route of intervention to follow based on the result. For example, this aided oncologists in deciding whether to administer neoadjuvant chemotherapy or perform invasive surgery on the patient before administering chemo.

This study also interrogated the possibility of predicting the downstaging of a tumour and the level of lymphovascular invasion after neoadjuvant chemotherapy.

Go to top ↑