Description of Dataset

Main dataset

The dataset “Energy consumption for selected Bristol buildings from smart meters by half hour” was obtained from the web page: The electricity data was sampled from 9th March 2013 until 8th March 2014 (total 13 months), 24 hours a day in 30-minute intervals. Total uncompressed dataset size is 687kb.

Location and Subjects: Five unnamed buildings in Bristol, England, UK

Dataset characteristics:  Multivariate, Time-Series, Smart Meters

Number of instances:  2457

Number of attributes:  53 in total

Number of attributes after data modifications:
*Building 1 (Consists of 3 smart meters)
*Building 2 (Consists of 1 smart meter)
*Building 3 (Consists of 1 smart meter)
*Building 4 (Consists of 1 smart meter)
*Building 5 (Consists of 1 smart meter)


Firstly, it would be interesting to explore how electricity data could be connected to another set of data that is to weather in Bristol on the same period where data from the buildings was collected (monthly weather data for the year of 2013 and 2014). This sub data set could potentially reveal the electricity pattern usages in the buildings in comparison to the monthly average temperature recorded in Bristol, United Kingdom:

Additionally, the United Kingdom calendar for 2013 and 2014 was used to define weekdays, weekends and Bank holidays, so as to filter data based on when the building is occupied and not occupied:

Hypothesis and general assumptions

·      There is no information available from the data regarding the number of occupants in those buildings. Hence, we don’t take the population into account and assume the buildings operate at full capacity during the reported period.

·      There is no information available on the purpose of electricity usage, whether it was used for lighting, machinery, or small appliances, etc. Hence, we also don’t take this consideration into account.

·      There is no information available on the sizes (square foot) of the buildings. We simply assume that all the 5 buildings belong to the same company and are located in a vicinity and within 50 metres from one to another.

·      In a real-life project, these shortcomings of information can be addressed easily by requesting for appropriate information and datasets from the client.

Methodology and Tools

The electricity usage of each building is analysed separately using the following data analytic software:

·      IBM SPSS Modeler – A data analytic software used to visualise and build data models so as to create predictive intelligence using raw data, thus help decision makers make more effective decision.

·      IBM Watson Analytics – An on-the-cloud engine (operates without any actual software on the local computer) that enables data discovery and has very powerful predictive capability.

The data is processed using Cross Industry Standard Process for Data Mining (CRISP-DM) method, a universally accepted methodology for big data analytic, data science and analytic-related projects (IBM, 2011). The CRISP-DM methodology is based on six phases:

1)    Business Understanding: Gather and translate of business requirements to a data mining/data science perspective.

2)    Data Understanding: Acquire a better perspective of data at hand. Expectations of the client are carefully analysed and the additional data (if needed) is requested from the client.

3)    Data Preparation: Prepare the data based on the requirements for the construction of a good machine-learning model. Data manipulation and other required measurements are made here. If the quality of the data is unsatisfactory, the client is re-consulted to obtain better quality data.

4)    Modelling: Suitable machine-learning models are applied and tested for accuracy of prediction. If no proper result is obtained here, the previous phase (Data Preparation or Data Understanding) is revisited.

5)    Evaluation: After a data model has been built at the previous phase, it is evaluated for effectiveness and if it delivers the expectations of the client. The steps taken to reach this point are reviewed in order to ensure client’s issues and challenges have been fully addressed.

6)    Deployment: Based on the requirements of the client, the data models developed during Modelling phase are deployed as (1) automated scoring systems or (2) business reports or (3) proof of concepts (presentations).

This data analytic exercise will follow the CRISP-DM standards as closely as possible. Although for reasons such as the absence of a real client on site, some phases, such as Data Preparation, may not be fully realised.


Table 1. Data sorting/modification: Table generated on SPSS Modeler for Building 1

Table 1 above shows the data sorting process of the dataset from Building 1 on SPSS Modeler. Other four buildings also comprise of the similar data and fields. The most important fields are the date and time (00:30 until 24:00), comprising of how much electricity (KWh) was consumed 24-hour day at 30-minute intervals.

Figure1Figure 1. Electricity usage in all buildings

Figure 1 above shows a visualisation of the electricity usage in Building 1, 2, 3, 4 and 5 in KWh. The total energy usage of building 1 comprises of all values collected by three smart meters, while the energy usage of Building 2 to Building 5 comprises of values collected by one smart meter in each building. In can be seen that, building 2 has the lowest electric usage and building 5 has the highest electric usage. There are three smart meters belong to building 1 compared to one in every other building, which might indicate that either building 1 is the largest building compared to the rest and/or it has higher energy demand due to higher machinery usage and/or higher occupancy. None of these hypotheses can be supported, as there is no further evidence available from the original dataset.

Figure2Figure 2. Daily electricity usage versus local daily average temperature (Building 1)

Figure 2 above represents daily electricity usage (KWh) temperature in building 1 versus local daily average temperature. This graph consists of the total electricity usage measured by all the three smart meters in building 1. On a preliminary analysis viewpoint, there is a pattern (observation 1 & observation 2) depicting that whenever daily temperature drops (winter), the electricity usage increases, and vice versa. This is a logical correlation in winter time, the boilers, steamers and heating appliances in buildings are on active usage.

Figure 3. 24-hour electricity demand (KWh) (Building 1, smart meter 1)

Figure 3 above shows how electricity usage, measured by smart meter 1 in building 1, fluctuates throughout the day (24 hours a day at 30-minute intervals). The purpose of the plot graph is to understand how the electricity demand changes over time in a day, during different hours, in the buildings. The values used for the graph comprise of entire data collected over 13 months. The visualisation shows electricity usage starts picking up from a 6am and peaks between 8am-5pm. The logical explanation would be there is a higher demand for electricity during office hours. Electricity is still being used before and after office hours and we could relate this usage to the other features of the building, such as lightings and electronic equipment that cannot be switched off and must run 24 hours a day, 7 days a week. However, energy demanding appliances, such as air conditioning or office printers remains switched off after office hours, resulting in lower electricity usage during these times.

Figure 4. 24-hour electricity demand (KWh) (Building 1, smart meter 2)

Figure 4 above shows how electricity usage, measured by smart meter 2 in building 1, fluctuates throughout the day (24 hours a day at 30-minute intervals). There is not much difference when compared the energy pattern to Figure 10 (electricity usage in Building 1, measured by smart meter 1) in a sense that there is an observable peak from 6am onwards until a drop starts appearing around 5.30pm.

Figure 5. 24-hour electricity demand (KWh) (Building 1, smart meter 3)

Figure 5 above shows how electricity usage, measured by smart meter 3 in building 1, fluctuates throughout the day (24 hours a day at 30-minute intervals). The visualisation exhibits similar characteristics to the energy pattern measured by the first two smart meters in building 1; however, the energy peak is not very obvious. In other words, the energy demand is not so much higher after 6am and high peaks only appear from 7.30am, nevertheless, the drop in energy demand does occur after 5pm.

Figure 6. Daily electricity usage versus daily temperature (Building 2)

Figure 6 above represents daily electricity usage (KWh) in building 2 versus daily average temperature. The circled observations show a few energy peaks appearing when the daily average temperature is high. However, not much trend of energy usage can be observed in building 2, the energy pattern of building 2 also does not coincide with that of building 1.

Figure 7. 24-hour electricity demand (KWh) (Building 2)

Figure 7 above shows how electricity usage of building 2, measured by a single smart meter, fluctuates throughout the day (24 hours a day at 30-minute intervals). This visualisation is comparable to the graph of the 24-hour electricity demand in building 1, measured by smart meter 1 (Figure 11). The peaks in energy usage start appearing at the beginning  of office hours around 7am and begins to drop around 5pm.

Figure 8. Daily electricity usage versus daily temperature (Building 3)

Figure 8 above represents daily electricity usage (KWh) versus daily average temperature in building 3. There is no much pattern observed in the energy usage and the relationship with the average daily temperature, apart from a few outlined energy peaks.

Figure 9. 24-hour electricity demand (KWh) (Building 3)

Figure 9 above shows how electricity usage fluctuates throughout the day (24 hours a day at 30-minute intervals) in building 3, measured by a single smart meter.  The visualization shows a uniform pattern depicting a clear peak and trough in the graph without much noise in data. There are a few outliers observed between 9am to 2.30pm; with a few plots recorded quite high at 100KWh. It possibly indicates that during these times, there are a few appliances and/or equipment in the building that has a very high demand for electricity.

Figure10.png Figure 10. Daily electricity usage versus daily temperature (Building 4)

Figure 10 above represents daily electricity usage (KWh) versus daily average temperature in building 3. The electricity usage is uniformly distributed, this trend is probably related closely to the way electricity being consumed in building 4.

Figure 11. 24-hour electricity demand (KWh) (Building 3)

Figure 11 above shows how electricity usage fluctuates throughout the day (24 hours per day at 30-minute intervals) in building 4, measured by a single smart meter. The visualisation is a graph with noticeable noises. There is no uniform distribution that clearly indicates the relationship between time and electricity usage and there is also an observed disjunct in the plot. The disjunct of the plot begins at 4pm is most probably due to different electrical energy systems within building 4 running at different energy levels at the same time. One part of the building records higher electricity usage between 25KWh and 50KWh while another section of the building only consumes between 5KWh and 20KWh.

Figure 12. Daily electricity usage versus daily temperature (Building 5)

Figure 12 above represents daily electricity usage (KWh) versus daily average temperature in building 3. A few interesting trends can be observed from the observations circled. Observation 1 shows high temperature correlating against lower energy usage, observation 2 shows high energy usage and its correlation with lower temperature. This trend repeats for the two latter observations circled on the graph.

Figure 13. 24-hour electricity demand (KWh) (Building 5)

Figure 13 above shows how electricity usage of building 5, measured by its only smart meter, fluctuates throughout the day (24 hours a day at 30-minute intervals). The visualisation depicts a similar energy pattern to that of building 1, which starts picking up around 6am and dropping down from 5pm). The energy of building 5 is the highest amongst all the buildings measured. The minimum level is 25KWh and maximum level is 125KWh.

Figure 14. Daily sun-hours Bristol, United Kingdom from March 2013 to March 2014

Figure 14 above shows recorded daily sun hours in Bristol, United Kingdom from March of 2013 until the March of 2014 in when the electrical dataset was recorded by the smart meters in the five buildings.  The sun hours peak during summer and drop to lower values follows towards the end of the year in 2013 (winter). This graph visualises how the sun hours change in accordance to seasonal changes in Bristol. There are ‘0’ sun hours recorded in the data because based on the weather data collected from Horfield & Filton weather station, there have been recorded dates with 00.0 sun hours. The maximum recorded sun hours lie around the ranges of 12.5 hours to 13 hours per day, between the end of May 2013 until 13th July 2013. The daily sun hours graph helps determine the efficiency of solar energy as an alternative energy solution. Daylight hours are important to determine the viability a photovoltaic renewable energy, however, sun hours are the time when it is most efficient to translate solar radiation into usable electrical energy.

Justification for solar energy in Building 4

                 Steps Calculation
1.     Average daily electricity usage?


From the selected dataset, Building 4 is chosen to illustrate this example. The total electricity usage in Building 4, between months March 2013 until March 2014, was 272,283.1 KWh

This sum is divided by 241 days (instead of 365 because 241 was the total days during which data was obtained from Building 4):
272,283.1 KWh ÷ 241 days = 1130 KWh/day (rounded up)
2.     Average daily sun hours?


The total sun hours between March 2013 until March 2014 (during the recorded 241 days) is 1179 hours (rounded up). This is divided with 241 to obtain the average daily sun hours:
1179 hours ÷ 241 days = 4.9 hours/day (rounded up)
3.     Energy output (provided by solar energy system) needed to operate without fail? Output needed to cover Building 4 average daily electricity usage:

1130 KWh/day ÷ 4.9 hours/day = 231 KW AC (rounded up)

4.     How many solar panels does the client need to install on their building?


The solar panel produces only DC current while the electricity required in buildings are in AC. Hence, a conversion is required. Due to environmental “physics”, the conversion of DC to AC power will eventually result in around 80% efficiency (University of Colorado Colorado Springs, n.d.). Hence, 231 KW AC output must be divided by this loss first:

231 KW AC output ÷ 0.8  =  289 KW DC = 289,000 W DC
The number of solar panels needed depends on the choice of solar panels (for example 250, 270, 435 Watt peak capacity per panel):

289,000 W  ÷ 250 Wp/panel  =  1156 panels needed

289,000 W  ÷ 270 Wp/panel  =  1070.3 (1071 panels needed)

289,000 W  ÷ 435 Wp/panel  =  664.3 (665 panels needed)

5.     Roof/land area needed to install 435 Wp solar power system? (assume area of 1 panel is 1.5 m2)


Area required for solar panels:

665 x 1.5 m2 = 998 m2 (rounded up)

Area required for solar panels system (10% of total area required for wiring, operational system)

998 * 10/9 = 1109 m2 (rounded up)

Since this is an office building, we assume that there should be enough space on the rooftop and/or unused land spaces where these solar panels can be laid out for light capturing.

6.     How much does the client save in annual electricity bills? (alongside the zero-pollution advantage) Yearly output of solar energy system during recorded period:

289 KW DC x 241 days = 69,649 KWh/year

Yearly savings using the rate of SSE Company (a major electricity provider in the United Kingdom currently charging 10.815 pence/KWh):

69,649 KWh/year x  0.10815 £/KWh = £7,533 (rounded up)

Table 2. Justification for solar energy in Building 4


This Proof of Concept has demonstrated parts of the strategic sustainability consulting process and the application of big data in analysing operational data, in this case identifying the pattern of the energy consumption of buildings. Within this limited exercise, the predictive functionality of big data has not been demonstrated. For example, the prediction of the future energy demand will help the clients understand how they can be better prepared with alternative renewable energy solutions. Predictive models are built based on historical data, it is then applied or “scored” to future data. The results can be used to indicate the likelihood of energy demands for the consequent years (Table 3). In the context of a bank, a similar exercise will potentially be a significant move in greening the bank’s operations and an important part of the organisation-wide sustainability strategy.

Table 3. Actual and Predicted monthly electricity demand (Building 1, smart meter 1)

Head of Data Analytics

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: