Introduction to business intelligence with R & R Studio
Kirjoittanut: Luca Ferrari
Introduction to R for Business Intelligence
By Luca Ferrari & Cyril Adler
Introduction to business intelligence with R.
We live in a time where data is constantly being gathered passively and actively on everyone. Including your customers. Each company, no matter how small collect a wide amount of data; this includes but is not limited to customer address, habits, purchases, stock, traffic on your web page and many other things. Moreover, there are enormous amount of public data for waiting to be used of the internet either in the form of some social media post of on website such as www.kaggle.com with many data sets that might be relevant to your business domain (Salcedo & McCormick, 2017).
The term “Business intelligence” refers to the transformation of data in information, and the use of this information to help one’s company take better and faster corporate and business decisions. when well exploited data can provide a company’s management with meaningful insight to help it grow and develop itself. Luckily for us, never before has data collection and analysis been so easy for small and medium businesses, thanks to free and easy-to-use tools such as R (Gendron, 2016).
Why do we process data?
As the saying goes: “Knowledge is power” and that is certainly true in a business environment. Knowledge offers a competitive advantage because with it comes the ability to see the big picture and take the right decision at the right time for the company.
As stated, before there is a lot of easily accessible data, but a lot of work is required to transform data into information, further work is then needed to turn information into business insights. For example, if you want to know how well your product is doing against a competitor worldwide you first need to know your sales from every sale point, in every geographical area, you can then do them same for your competitor and plot the result on a graph. You can then analyse this graph to understand how well you are doing. Look for unusual alignment or gaps and for outliers as identify the reasons for these, it might offer you some great insights on the best way to conduct your sales(Salcedo & McCormick, 2017).
Exploiting data properly can avoids common misunderstanding between different team members. By shaping existing data into quickly readable graphs and accessible visuals you can provide an easy to read version of the company’s result that can get everyone on the same page. This can help measure and understand the company’s performance. It is thus possible to set Key Point Indicators based on concrete evidence and solid projection.
Finally, a good data processing followed by an in-depth data analysis might allow your company identify business trends early. Thus, providing a competitive advantage by anticipation those trends instead of reacting to them.
To summarize the data processing allows your company to know, to take decisions, to avoid mistakes and to prepare for change more easily.
What is R.
R is a programming language and a free software that when used in conjunction with RStudio offers an integrated development environment perfect for statistical computing and graphics. It is currently widely used among the data miners and statisticians for its efficiency, ease of use and large number of packages that can be added to make it even more useful. With time R managed to establish itself as one of the “Gold standards” of the data processing industry.
All the following data processing steps can be made using R and RStudio and with only little prior practice. In order to have an optimal use with your data, you should probably follow these nine different steps:
As you know, you should have a strategy for your Business. Philip Evans explain the history of business strategy in his Ted Talks: “How data will transform business”. He says that “Henderson’s idea of increasing returns to scale and experience, and Porters idea of the value chain, encompassing heterogeneous elements, that the whole edifice of business strategy was subsequently erected”. Philip Evans shows the importance of define a good strategy before starting the process. It’s the same for data; if you want to use data, you must define what data and why you want these data, how they will be useful for your company and what are the elements you want to improve with these. Without strategy, you would just lose your time as well as your money.
For example, you can use data to make a service to build loyalty with the customer. It’s mainly to decide which economic side you want to choose (Gendron, 2016)￼. Or you could use them to improve the quality of your services and increase your sales. The possibilities of using the data are infinite, so clearly define your strategy.
2. Type of data:
Once you know your strategy and why you’re going to use them, you should define if you need quantitative or qualitative data for your business. This will help you determine how you to collect your data.
Should you do a questionnaire, interviews, a focus group, a world café or something else entirely. It always depends on your strategy and the information you need.
The nature and the used of the data determine the kind of the data. A data that isn’t used, is a useless data. When we ask the good question and collect the good data (data that really answers to our question), these data are valuable (The World Datanomic Forum , 2018)￼. That’s why you must collect data that really mean something for your business. If you operate a bike-sharing company, it would be useful to see if weather has a positive or a negative impact on your sales. You should probably cross the weather data with the utilisation rate to have potentially insightful data.
3. Value/cost ratio
Different way to collect data exist:
- Online platforms, which can be free or paid.,
- Market studies,
When you have decided how you are going to collect the data, you must define their cost. Because all these different ways to collect data have different costs. For example, a focus group is expensive, because you need a team before, during and after to collect the information. But the information is complete and has good trustability! Or you can buy a database to another company and save a lot of time, but this is expensive because you pay on a piece-rate basis.
As explained before, there is different methods to collect data. But if you want to start a business and you need concrete data in order to add value to your business idea, you can collect it in different ways:
- FSO (Switzerland) / Opendata.swiss (Switzerland) / avoiddata.fi (fFinland) / vm.fi (Suomi) / stat.fi (Finland)
- Going directly to companies to negotiate face to face
- Buying data, making comparisons and analysis and reselling the conclusions
This is a really good way to start collect data.
Once you have collected some data, you should clean it, in order to delete the mistakes and have the most accurate results possible:
- Make sure that all the information on your data and on your different files is in the same format and on the same line for optimal analysis
- What to do with the missing data? Leave them blank, put the median, the mean…? It’s a personal choice. Also pay attention to the dates, because the null value is taken for “0” and a null date represents the 01.01.1900 in computer language. So, be sure to check it with a plot
- During the cleaning, the notions of statistics will allow you to clean as well as possible. They can help you to see if there are big differences with the data, what could mean that there is something wrong.
(Keep in mind that the parts 4 & 5 count for more than 80% of the work.)
You should make an exploratory analysis of the data and look at variable by variable, in order to bring out graphs that will give us information. You should use these graphs to show the results to your team or your manager. Be ready to show your calculations, in order to add value to your explanations
To do that, you can take two variables together and cross them to see the relationships between the variables. Often there is no link, but it happens that one variable influence another. Tren cross the data to have valuable information.
You can also use your data for Geo-mapping: “Geo-mapping allows for more than just identifying locations. Think of how one of your favorite mobile apps utilizes a map view. The map most likely includes layers of detailed information and popups with data, such as business description, hours of operation, or customer ratings.” (Gendron, 2016). With “R Studio” you can create a visual geo-mapping to show your results.
7. Data protection
Take care with the data you use, because there are many laws to respect, particularly in Finland. Here is the link to find all the information about data protection in Europe: https://edpb.europa.eu/.
Here are some important points that we could share with you, that we have experiment in different Swiss projects:
- All servers must be encrypted and cannot be accessed by external persons,
- Simple and sensitive personal data for sensitive people, there must be even stronger encryption,
- Clients should be informed about the use of the data,
- The user may at any time request that his data be changed or that it be processed by other companies,
- The company providing the IT services is not responsible in case of any problem.
We hope that this article will help you for your business and your data analysis. You can find more information about the utilisation of R and R studio here: https://fr.scribd.com/book/365185281/Introduction-to-R-for-Business-Intelligence
Gendron, J. (2016). Introduction to R for Business Intelligence. Packt Publishing .
Salcedo, J., & McCormick, K. (2017). IBM SPSS Modeler Essentials. Birmingham: Packt Publishing Ltd.
The World Datanomic Forum . (2018). The Datanomic Manual: A Practical Guide To Basic Datanomics. Kindle Edition.