Data analytics

Note: This article has been adapted from an Advanced Performance Management article. Whilst the article does not use examples related to sustainability, it does provide key information in relation to types and methods of data analytics and the ethical considerations around data. Data analytics is being widely used in sustainability matters and so understanding the approaches is useful and these can then be applied to scenario with a sustainability focus.

Types of data analytics

The amount of data available to organisations has increased significantly in recent years. However, for this data to be valuable to organisations, they need to be able to analyse it, to identify patterns and trends in it. Data analytics allows organisations to do this.

This article discusses the types of data analytics organisations could use, and the benefits from using them.

As the notion of ‘big data’ highlights, the amount of data available to businesses (volume; variety) is greater than ever before, and the speed with which it is available (velocity) is faster than ever before. Consequently, business decisions are increasingly data-driven, as organisations can use data from a greater variety of sources, and dive deeper into analysis than they have historically been able to.

Data analytics is the process of collecting and examining data, in order to extract meaningful business insights, which can be used to inform decision-making to improve performance. This can be done through a variety of methods, such as statistical analysis (e.g. regression analysis) and machine learning1 and organisations are likely to use data analytics software (such as Tableau; Microsoft Power BI; or Qlik Sense) to facilitate their analysis. Although the detailed analysis and modelling of data takes place in the software, accountants need to be able to interpret the information provided by the software; for example, assessing whether patterns or trends identified by the software seem realistic or plausible.

Descriptive, diagnostic, and predictive analytics

Each type of analytics serves a different purpose but can be used in conjunction with the others to help an organisation gain a full understanding of the story its data tells. Descriptive analytics is typically carried out first.

Descriptive analytics

Descriptive analytics uses current and historical data to identify trends and relationships; in effect, to identify what has happened. Descriptive analytics is relatively accessible, and basic statistical software – such as Microsoft Excel – can be used to highlight trends in data and to produce visualisations, such as line graphs, bar charts or pie charts.

Nonetheless, descriptive analytics can be very useful in communicating changes over time, and identifying patterns and trends, which can then be analysed further.

ILLUSTRATIVE EXAMPLE 1 – Demand trends
The streaming provider, Netflix, gathers data on users’ behaviour, for example, what programmes they are watching, or what types of programmes they are watching. This allows Netflix’s team to analyse the data to determine which TV programmes and movies are trending at any given time, and Netflix then shares these trends with users.

Not only does this allow users to see what is popular – and to get suggestions about what they might enjoy watching – it also allows the Netflix team to know what types of programmes, themes, or actors are particularly popular at a given time. This knowledge can then help to shape decision-making about future programmes to commission and can also be used in re-targeting campaigns (recommendations of programmes for viewers to watch).

Once an organisation understands what has happened (descriptive analytics), it will want to understand why. That is where diagnostic analytics comes in.

Diagnostic analytics

Diagnostic analytics is the process of using data to determine the causes of trends. Understanding why a trend is developing, or why a problem occurred, is very important when making decisions. However, there is often more than one contributing factor to any given trend. Diagnostic analytics helps organisations understand the range of factors – internal and external – which affect outcomes, and which have the greatest impact, so that managers can focus on these when developing initiatives to improve performance.

It often involves the use of statistical software tools and can involve a variety of techniques including data drilling and data mining, regression analysis and time series analysis2. To investigate the root cause of trends, organisations may need to examine a wider range of data sources than they have historically examined, including non-financial as well as financial, and external as well as internal.

Data drilling: The most common type of data drilling is drilling down. Drilling down into summary information can reveal more detail about the data which is driving trends at the summary level. For example, a consolidated sales report may show that overall sales have increased but drilling down into this could reveal that sales of some products are rising rapidly while others are falling. Similarly, the sales team could drill down to get a more detailed view of sales in individual regions, or across different sales channels. This analysis could then help the sales team to decide where to focus its resources to maximise growth going forwards.

ILLUSTRATIVE EXAMPLE 1 – Analysing staff turnover
A company’s human resources information showed that one department was hiring significantly more people than any other department, but there was no net increase in the department’s head count, because its staff turnover rate was also much higher than the other departments. Drilling down into the data revealed that many of the positions were for a specific team, which paid its staff less than the industry average. Having identified this, the company reviewed its pay scales for that team, and took other measures to improve retention in the team.

ILLUSTRATIVE EXAMPLE 2 – Understanding customer demand
HelloFresh’s business model centres around selling and delivering pre-packaged meal boxes, each containing the exact ingredients required for a particular meal, including fresh produce, meat, dairy, and seasonings, on a subscription basis.

The perishable nature of the fresh food in the meal kits, presents HelloFresh with a daily supply chain challenge: ordering the right amounts of products so that customers receive the ingredients they want, when they want them, whilst avoiding food waste because of over-ordering.

As with any subscription-based business, customer retention is vital, because customer churn erodes revenue and profits, along with the cost of attracting and acquiring new customers. Keeping customers satisfied requires an understanding of their preferences, including what recipes, ingredients, and meals each household favours.

By collecting data from customers (for example, when they favour certain types of food over others, when they want to receive their orders) HelloFresh has developed algorithms which it can use to help forecast demand more effectively. At an individual level, if particular customers habitually eat fish on a Friday, this enables HelloFresh to target tailored seafood offers in the latter part of the week. At a wider level, analysing food trends across different regions can inform the marketing of special offers to customers in those regions.

Having an improved understanding of the customer allows the company to optimise product selections to them, limiting the amounts of products needed to meet customer requirements without generating waste.

If customers want to cancel their subscription, during the cancellation process they are asked to provide their reason for cancelling. By also gathering this data, HelloFresh can analyse the most frequent reasons for losing customers – in different regions, or among different demographic groups, as well as at an overall level. In turn, understanding why people are cancelling their subscriptions can help HelloFresh to improve its product and user experience, to help it retain more customers.

Overall, the insights gained from analysing customer demand and customer feedback have helped HelloFresh increase profit margins, improve order volume forecasts, and increase customer retention.

Diagnostic analytics is not only about statistics, though. It also involves thinking laterally, considering external factors that might be impacting the patterns in data, and finding additional sources of data to help build a broader picture. For example, a clothing brand may see an unexpected surge in sales if one of its products is worn by a high-profile celebrity or promoted by a celebrity influencer.

However, it is also important to use professional scepticism when looking at data; for example, asking whether the results and analysis fit with your understanding of a situation. One of the limitations of any kind of data analytics is the quality of the underlying data. If that data is not accurate or current, then the resulting analysis cannot be relied on either.

More widely, we also need a note of caution here, and to be aware of the potential limitations of diagnostic analytics, in particular:

  • It relies on past data, which limits its ability to draw conclusions about possible future events; past performance is not indicative of future results.
  • Regression analysis and correlation analysis examine how strongly different variables are linked to each other. However, correlation does not necessarily imply causality.

ILLUSTRATIVE EXAMPLE – Correlation not causality
Monthly data for ice cream sales and the number of shark attacks around the United States shows the two variables are highly correlated, increasing in the summer months and falling in the winter months.

However, this does not mean that eating ice cream causes shark attacks. Rather, people consume more ice cream when it is warmer, and also swim in the ocean more when it is warmer, which explains why the two variables are correlated. But although they are highly correlated, one does not cause the other.

Predictive analytics

Once you know what happened in the past (descriptive analytics), and you understand why it happened (diagnostic analytics) you can begin to try to predict what is likely to happen in the future based on that information.

Predictive analytics is the process of using data to help understand how trends might unfold in the future, and to help predict future trends and events. It uses statistics, computer modelling and machine learning to determine the probability of various outcomes. Predictive analytics still uses historical data but does so to help predict what is likely to happen in the future.

For example, let us say a mobile phone provider has experienced an increase in customer churn. Diagnostic analytics has revealed this is because the organisation’s promotional deals were not incentivising customers to renew their phone contracts. Predictive analytics could then be used to help predict what kind of promotional deals will result in more renewals. For example, if the diagnostic analytics revealed that the upfront cost was a key factor in influencing customers’ decision, but customers were less concerned about monthly fees, the company then use predictive analytics to forecast the impact of different combinations of upfront costs and monthly fees, and which are likely to be more attractive to customers.

Equally, in the retail industry, applying predictive analytics to historical data, market data, demographic data, behavioural data, browsing trends, and more, can help retailers make more accurate demand forecasts. In turn, these forecasts can help inform inventory management, staffing decisions, and advertising campaigns.

Predictive analytics not only forecasts possible future outcomes but also identifies the likelihood of those events happening. In doing so, it helps organisations with better planning and realistic target setting, as well as avoiding unnecessary risk. One of the most valuable forms of predictive analytics is ‘what-if’ analysis, which involves changing variables or factors to see how those changes will affect the outcome.

For example, one of the key factors that could affect a sales forecast are economic conditions and the industry environment. ‘What-if’ analysis could help organisations to understand the potential impact of different scenarios on its sales forecasts:

  • How fast is the economy growing (or shrinking)? What impact will an increase (or decrease) in economic growth have on sales?
  • Are new competitors entering the market? If so, how likely are they to take some of the organisation’s market share? And how much might they capture?
  • Are there any opportunities to gain new customers (e.g. from launching a new product, or moving into new markets)?

ILLUSTRATIVE EXAMPLE – Predictive analytics and dynamic pricing in hotels
The aim of revenue management (RM) in the hotel industry is to have the right room for the right person at the right time. RM uses analytics and customer data to help predict customer behaviour, thereby helping hotels to forecast demand and optimise prices.

Hotel chains have information systems which integrate data such as historical and current reservations, occupancy, and daily rates to forecast demand. Additionally, these systems incorporate external data – such as the weather – and analyse competitor pricing, the presence of major events (music or sports events) help in local areas, and booking patterns on other sources, to suggest optimal room rates.

As a result, the hotels can implement dynamic pricing strategies, automatically adjusting room rates so that they increase at times when demand is forecast to be high but reducing them when demand is expected to be lower (to try to increase occupancy rates during these quieter periods).

However, as with diagnostic analytics, we still need to exercise a note of caution here, around the potential limitations of diagnostic analytics, because – although it is forward looking – it still relies on past data, which could limit its ability to draw conclusions about possible future events, as past performance is not necessarily indicative of future results.

Methods of data analytics, and ethical issues in data analysis

This article will now look at methods of analytics which can help to identify patterns and trends in different types of data. It will also highlight the potential ethical issues around capturing and analysing personal information, and the need for organisations to behave ethically when doing so.

One of the defining characteristics of big data is that it comes from a great variety of sources, both internal and external, and structured and unstructured. The variety of types of data requires distinct processing capabilities and specialist algorithms in analysing them. However, such analysis could help organisations identify performance issues or trends which they would historically not have been able to identify.

These are some methods of analytics:

  • Text (e.g. emails; social media posts)
  • Image
  • Video
  • Voice (e.g. customers conversations with a customer support centre)
  • Sentiment analysis  

Note: Sentiment analysis can be used within text analytics and voice analytics, so it will be discussed in the context of each of them. However, we will now consider the other four methods of data analytics in turn.

Text analytics

Text analytics involves large volumes of text (like emails, social media posts, customer support tickets) being translated into quantitative data to uncover trends or insights in the text.

Having tagged responses (according to the key words in them, and whether their tone is positive, neutral, or negative), text analytics can uncover patterns and insights across a dataset and create charts or reports to display the results.

For example, text analytics tools can be used to identify the main topics or issues being discussed in product reviews (‘topic detection’), or to identify people’s attitudes to a brand or product on social media (‘sentiment analysis’).

Sentiment analysis is a natural language processing technique used to determine whether data is positive, negative, or neutral. Sentiment analysis is often performed on textual data, and by tracking the tone, intent, and emotion behind messages it can reveal how positive or negative customers feel about a business, its products and services, or what customers feel about a business’ competitors, and their products and services.

Consider the following illustration: ‘I needed to go into the bank branch today, because I could not complete the transaction online. There was a long queue, but there were only two cashiers working, so it took forever to get served.’  

The emotion behind the comment that ‘it took forever to get served’ is one of dissatisfaction and frustration, so would be identified as negative within sentiment analysis software. However, it could also provide a useful insight to the bank, about the need to monitor the number of cashiers working at different times of day, to reduce the length of time customers typically have to wait before they are served.

Text analytics can also be useful to identify patterns in the content of the text, or topics which are being discussed most frequently. For example, if there has been a sudden increase in negative feedback about a product, text analytics can be used to help understand the reasons behind this, by identifying key words or phrases which recur most frequently in the customer feedback. Having identified this, a business can take action to improve the aspects of the product which are causing the complaints.

ILLUSTRATIVE EXAMPLE – Phone retailer
CallHi, a company which manufactures and retails smartphones, has noticed that its revenue has fallen recently. The company has used text analytics to try to help identify the reasons for this, analysing customers’ comments on social media.

The company’s analytics software uses key words or phrases to categorise comments, according to the nature of their content, and whether they were positive, negative, or neutral. For example, a post saying, ‘Battery life in the new CallHi8 is very poor’ is tagged to ‘Product performance’ and ‘Negative’; while a post saying, ‘The advisor I spoke to was well informed and helpful’ is tagged to ‘Customer support’ and ‘Positive’.

The text analytics results for the last month are shown in the graph below.

prodipsust-data-analytics

You have been asked to advise CallHi how it could use the analytics results to help improve performance.
The analysis shows that the areas customers comment on most frequently are product performance and customer support. This suggests that these areas are likely to be key to the company’s success, so should be covered in CallHi’s critical success factors (CSFs).

The analysis suggests that the poor product performance is a key factor in the recent fall in revenue. For example, customers will choose not to buy new CallHi8 phones because of their poor battery life, potentially buying phones from rival manufacturers instead. This suggests that CallHi needs to look at ways to improve product performance (for example, improving battery life) to address the issues causing dissatisfaction among customers.

By contrast, customers appear positive about customer service, so this could be helping to retain customers who might otherwise leave, thereby preventing the decline in revenue becoming even worse. As such, it will be important for the company to maintain its high levels of customer service.

Text analytics could also be applied in customer service. For example, an organisation could use text analytics to analyse the content of emails sent to customer support to help understand customers’ needs, and to help identify (and then address) issues which are the cause of the most frequent problems or queries. 

Using text analytics can help a business:

  • Improve customer satisfaction, by learning what their customers like and dislike about their products (and looking to enhance the things customers like, while addressing things they dislike)
  • Detect product or service issues, and help a business become more responsive to customer feedback or other negative sentiment
  • Monitor brand reputation

ILLUSTRATIVE EXAMPLE – Banks and customer experience
Banks use text analytics to analyse customer complaints. The banks’ analytics software uses natural language processing to analyse emails, survey responses and transcripts of customer calls to understand why a customer is complaining. They then use this insight to make changes to improve the customer experience going forwards.

Image analytics and video analytics

Image analytics uses algorithmic extraction and logical analysis to interpret information from images and graphics. In simple terms, image analytics is a computer’s ability to recognise elements pictured in image. For example, analytics software identifies whether the elements in an image depict physical features, objects, or movement, and these can then be logically analysed by a computer.

Videos are a series of images, so similar principles apply for video analytics. The ability to interpret and understand images and videos, could have important implications for business.

ILLUSTRATIVE EXAMPLE – Video analytics in retail
For example, retailers have traditionally used point of sales data to learn about customer behaviour, but the insights from that are restricted to transaction statistics: what products they are buying, how much they are spending, how frequent their purchases are. However, retailers can use video analytics to get more insight into customers’ behaviour through their whole shopping experience: how long customers spend in the store, or how long they spend in different sections of the store; which areas of the store are visited most; what proportion of visitors enter a store and leave without making a purchase; what products are customers looking at but not buying.

The store’s management could then use these insights to try to help improve performance: for example, by introducing deals on products which are frequently browsed but not purchased, to convert customers’ interest into an actual sale.

Video analytics could also be useful in not-for-profit organisations, such as hospitals. For example, analytics systems could monitor patient rooms to keep track of the nursing staff. If a nurse has not visited a patient within a certain amount of time, the system can notify the nursing team to check on a patient in a particular room.

Voice analytics and speech analytics

We have already noted that text analytics can be used to analyse written text. However, organisations also have access to potentially valuable data from conversations with customers (for example, through contact centres). And organisations can use speech analytics and voice analytics to help analyse this data.

Speech analytics software focuses only on what was said (that is, recording and transcribing the words used in a conversation), to identify and tag key words across conversations; for example, between customers and an organisation’s agents in a contact centre (call centre).

Organisations can use speech analytics to identify customer experience trends. For example, late deliveries could be a key area of performance monitored by logistics companies, so could be tagged as key words in calls. Tracking the numbers of calls where customers are complaining about late deliveries could help the logistics companies monitor performance in this area and identify the need to improve performance if the number of complaints is increasing.

Equally, having a better understanding of the topics which are important to customers can help organisations build stronger relationships with their customers – for example, by briefing agents in contact centres on ‘hot topics’ to help ensure they can deal with customer queries as efficiently and effectively as possible.

Moreover, some analytics software systems have real-time assistance functions, which offer recommendations for agents, when certain keyword triggers occur during a call. For example, ‘refund’ could be identified as a key word. Therefore, if a customer asks a customer service agent about a refund during a call, the system could display real-time advice to the agent about the company’s policy on refunds or anything the customer needs to do to claim/receive a refund. 

Voice analytics

While speech analytics focuses on what was said in a conversation, voice analytics focuses not only on what was said, but also on how it was said. Voice analytics software analyses the patterns of the conversations themselves to identify different features, such as tone of voice, volume, pitch and speed of a customer or agent’s speech. As such, voice analytics reveals the emotions within the content of the call. (This is another manifestation of sentiment analysis, which we mentioned earlier in relation to text analytics).

ILLUSTRATIVE EXAMPLE – Calls to a contact centre
Using voice analytics to analyse customers calls to a contact centre can provide a more accurate reflection of a customer’s mood than could be obtained by simply analysing the transcript of the conversation (as would be the case in speech analytics). 

For example, an extract of a conversation may be:

Agent: I am afraid your order has been delayed, so you will not receive it until next week now.

Customer: Great, so now I will not be able to do anything until next week.

The word ‘great’ normally denotes a positive sentiment, but in this case, it is being used sarcastically, which changes the meaning of the word. Voice analytics solutions include features that identify the context and emotions of conversations.

As such, voice analytics is often used to improve the customer experience, rather than relying on speech analytics alone.

We mentioned real-time assistance in relation to speech analytics, but this could also be an important feature in voice analytics. For example, if the software detects that a customer is becoming increasingly angry, or a customer service agent is struggling to answer a query, the software could highlight this on a supervisor’s dashboard, so that the supervisor can then intervene to help the agent with the call.

Ethical issues around capturing and processing data

Although using analytics software can help entities gather more information about their customers to understand customer behaviour more precisely, and to help drive decision-making, gathering this data (for example, by recording and transcribing phone conversations) could also raise significant ethical and privacy issues.

A key challenge for organisations to address is how they can collect, store, and use data ethically, and what rights they need to uphold. Data ethics encompasses the moral obligations of gathering, protecting, and using personally identifiable information, and how it affects individuals.

The following are key issues to consider when collecting and analysing data:

  • Ownership: Individuals (‘data subjects’) have ownership over their personal information. Therefore, it is unlawful and unethical to collect someone’s personal data without their consent. As such, if an organisation is going to collect data about any individuals (e.g. customers, employees) it needs to ask their permission to do so (for example, through digital privacy policies that ask users to agree to a company’s terms and conditions, or pop-ups with checkboxes that permit websites to track users’ online behaviour with cookies).

  • Transparency: In addition to owning their personal information, data subjects have a right to know how organisations plan to collect, store, and use that information. For example, if a company decides to implement an algorithm to personalise its website experience based on individuals’ buying habits and site behaviour, it should write a policy explaining that cookies are used to track users’ behaviour and that the data collected will be stored in a secure database and train an algorithm that provides a personalised website experience. It is a user’s right to have access to this information so they can decide whether to accept the site’s cookies or decline them.

  • Privacy: Another key ethical responsibility that comes with handling data is the data subjects’ privacy. Although an individual may have consented for an organisation to collect and store data, the organisation still has a responsibility to ensure that personal information is held securely to protect the individual’s privacy (for example, by storing data in a secure database, or by password protecting files containing personal information, or encrypting them).

  • Intention: Before collecting data, an organisation also needs to question why it needs that data, what it will gain from it, and what changes it will be able to make after analysing the data. An important issue, related to this, is that collecting and storing data when it is not necessary to do so, is unethical. Therefore, organisations should strive to collect the minimum viable amount of data, so they take as little as possible from data subjects while optimising the overall service they offer to them. This presents an inherent conundrum in big data analytics though. On the one hand – as we have said previously – organisations are gathering more data about individuals than ever before. On the other hand, organisations should only be collecting data when it is necessary and should be looking to collect the minimum viable amount of data. 

  • Outcomes: Even when intentions are good, the outcome of data analysis can cause inadvertent harm to individuals or groups of people.

This point about outcomes is particularly significant in relation to the use of algorithms in data analytics.

Ethical use of algorithms

Analytics software uses algorithms to sift through data and recognise patterns. Algorithms are sets of instructions that computers use for solving a problem or completing a task. However, although they are used by computers, algorithms are initially written by humans. Therefore, for the algorithm to work properly, the human programmer who writes it must include all the necessary rules and regulations. However, because algorithms are written by humans, bias may be present in them – intentionally or unintentionally.

Biased algorithms can cause serious harm to people, in particular by introducing prejudice against certain socio-economic or demographic groups. Two key ways that bias can creep into algorithms are:

  • Training: Machine-learning algorithms learn based on the data they are trained with. Therefore, an unrepresentative data set can cause an algorithm to favour some outcomes over others. As such, organisations need to ensure that training data is properly representative of the populations who will be affected by the algorithm.

  • Feedback: Algorithms also learn from users’ feedback. As such, they can be influenced by biased feedback. For instance, a job search platform may use an algorithm to recommend roles to candidates. If hiring managers consistently select candidates from one demographic group for specific roles, the algorithm will ‘learn’ and adjust and only provide job listings to candidates in that group in the future. The algorithm learns that when it provides the listing to people with certain attributes, it is ‘correct’ more often, which leads to an increase in that behaviour.

The inputs and operations of a ‘black box’ algorithm are not visible to users or people affected by its decisions. The algorithm takes a number of data points as inputs and correlates specific data features to produce an output.

However, because the workings of the software cannot easily be viewed or understood, errors can go unnoticed until they cause problems so large that it becomes necessary to investigate them. This is particularly true in relation to bias.

ILLUSTRATIVE EXAMPLE – Predictive policing
PredPol (which is short for ‘Predictive Policing’) is an artificial intelligence algorithm, used by police departments in the USA, which aims to predict where crimes will occur in the future, based on crime data collected by the police (eg arrest counts, number of calls made to police).

PredPol aims to reduce the human bias in police departments, by leaving crime prediction to artificial intelligence. However, researchers discovered that PredPol itself was biased, and it repeatedly sent police officers to particular neighbourhoods that contained a large number of minority groups, regardless of the level of crime in those areas relative to other areas.

Arrest data (which is one of the data sets used by the algorithm) biases predictive tools, because – overall it appears that – police arrest more people in neighbourhoods with large numbers of minority groups, compared to neighbourhoods with fewer minority groups. As a result, the PredPol algorithm directs more police to the areas with large numbers of minority groups, which in turn leads to more arrests being made in them, compared to areas where there are fewer police, even though crimes may be being committed in these areas as well.

However, one of the reasons for the higher numbers of arrests being made in the neighbourhoods which large numbers of minority groups was the higher police concentration in them, compared to other neighbourhoods. Consequently, this led to a self-reinforcing bias in the algorithm, which continued to send more police to all regions with a large number of minority groups and meant that police departments were over-policing areas which did not actually need a higher police presence.

The ethical issues around collecting and analysing data, and the use of algorithms could also be key issues to consider in the context of an exam scenario. If the scenario identifies that an organisation is looking to introduce an analytics system, it is important to consider whether there are sufficient controls over the way data is captured, stored, and used to ensure the process is ethical, and that it avoids bias as far as possible.

Reference
(1). Machine learning is a branch of artificial intelligence in which algorithms can ‘learn’ from data, detect patterns, and make decisions without receiving explicit instructions.

Adapted from an article written by a member of the Advanced Performance Management examining team