Establishment of Gas Energy Consumption basic unit by Building Using Cluster Analysis
ⓒCopyright Korea Institute of Ecological Architecture and Environment
Abstract
The current study uses energy information made public as a part of building energy information access policy in order to analyze the pattern of end-use gas energy consumption in buildings and establish the basic unit of gas energy consumption by building use.
Monthly gas energy consumption data from buildings in various addresses, provided by the Korean government, was used in the current study. The data was categorized based on the similarity to each other, using cluster analysis method that can categorize large volumes of data with relative ease and speed. In cluster analysis, hierarchical clustering technique based on dynamic time warping was used, as it enables time series data analysis.
A cluster analysis of gas energy consumption in 291 buildings in greater Seoul area with 10 different detailed end-use purposes yielded 25 clusters: 76 buildings were in Cluster-1, 142 in Cluster-2, and 36 in Cluster-3, forming the bulk of the sample. Establishing the EUI of gas energy consumption by building use, the results showed that commercial-residential buildings had the highest consumption at 10.096kWh/㎡, followed by preschools, apartments, offices, religious facilities, stores, elementary schools, middle schools, and high schools using most energy, in that order.
Keywords:
Cluster Analysis, Dynamic Time Warping, Hierarchic Clustering, Building Gas Energy Consumption, Data Mining키워드:
군집분석, 동적시간정합, 계층 클러스터링, 건물 가스사용량, 데이터마이닝1. Introduction
1.1. Study Background and Purpose
The evaluation of energy consumption in buildings has been researched in various ways so far.
There are some representative methods: a method of estimating building energy consumption in the way of drawing influential factors on energy consumption of a building with particular purpose; a method of estimating energy consumption through actual condition survey; and a method of estimating energy consumption in the way of calculating cooling and heating loads of building with the use of a simulation program. However, the above methods cause a considerable difference between estimation and actual energy consumption. I there is no intensive research on a particular building, reliability lowers and it is hard to secure multiple analysis data.
Therefore, the government performs the policy to make public the actual energy consumption (gas and electricity) of most buildings in the nation through Green Building Promotion Act (Mar. 23, 2013, enforcement) and Public Data Provision and Use Activation Act (Oct. 31, 2013, enforcement). According to the building energy information disclosure policy, users are able to access energy information easily.
However, in this information environment, if there is no proper method to analyze increasing temporal and spatial data, data can be rich but information can be poor.
To solve this problem, data mining technique to extract information on valuable patters and rules of large data is more used as an estimation method of data with high complexity. Therefore, this study made use of cluster analysis as the main analysis technique of data mining in order to analyze gas (LNG) consumption patterns , and thereby established the basic unit of gas consumption of the cluster with a similar gas consumption pattern.
1.2. Study Method and Scope
For this study, multiple monthly gas consumption data for various types of buildings were collected. For data analysis, cluster analysis as the main analysis technique of data mining was conducted. In this way, energy consumption patterns of various types of buildings were analyzed.
As a cluster analysis method, hierarchic clustering technique based on dynamic time warping supporting time-series data analysis was applied. A relevant algorithm was implemented with R, the open data analysis software for statistics.
2. Theoretical Analysis
2.1. Hierarchical Clustering Technique
Hierarchical clustering technique is used to classify large data on the basis of data similarity, relatively quickly an easily. If a proper number of clusters is set up, similar data are grouped in the same cluster1).
Hierarchical clustering is a technique to create a hierarchical cluster step by step on the basis of document similarity information. In the technique, each document starts with one cluster, and then two clusters with high similarity are incorporated into one cluster. Such a process is repeated until one cluster is left. A unique set of clusters created well regardless of the order of similarity pairs is generated; cluster hierarchy is stable; N-1 pairs beginning with from a non-clustering data set is combined. Hierarchical clustering is divided into single link technique, complete link technique, group average link technique, and Ward's technique depending on the criteria of selecting the closest cluster.
2.2. Dynamic Time Warping Technique
Dynamic time warping is an algorithm to measure similarity of two data changing non-linearly in the time dimension. This technique is mainly applied to voice wave analysis, energy consumption analysis, and other pattern analyses.2)
Dynamic Time Warping (DTW) Distance was deeply researched by Kruskal (1983), and Berndt Clifford(1994) proposed a method of finding a time-series pattern. Like Frechet distance, dynamic time warping distance is a method of finding correlations between time-series data and minimizing the particular distance (Xai, Ybi) of observed paired values.
Dynamic time warping distance is defined as in the formula (1).
(1) |
To measure the similarity of time-series change in dynamic time warping method, applied is the principle of making individual time points lined up with similar points in order to let time-series graphs have a similar shape. For example, let's assume that two time-series data A and B are expressed as the vector A= a1, a2, a3, a4,……, ai and the vector B = b1, b2, b3, b4,……, bj. In the condition, if dynamic time warping technique is applied to calculate the distance of the vectors, it is possible for one point of a time-series graph to correspond to more than one points of a different time-series graph. At this time, the local distance can be seen as a sort of transformer function value of mapping the time-series data A to the time-series data B. The cost for allocation to another point of B from one point of A can be calculated on the basis of the difference of two values, usually.3)
(2) |
On balance, the dynamic time warping distance between A and B means the minimal cost necessary for allocation, as shown in the formula (3).
(3) |
Fig. 1 illustrates the diagram for the hierarchical clustering of dynamic time warping.
3. Time-Series Cluster Analysis
3.1. Selection of analysis targets
Monthly gas consumption data of buildings were collected by open building energy information system 4) of Ministry of Land, Infrastructure, and Transport. The data include a lot number and a purpose of a building only, so that building register search5) was conducted to collect building information additional. As the targets for building gas consumption analysis, five housing landing development districts without district heating, located in Gyeonggido, were selected.
Table 1 shows the selected districts.
As for housing land development plan, a site is divided into housing construction type, commercial & business type, and public facility type depending on building purpose. Therefore, in compliance with the categories, a building type is classified as shown in Table 2.
3.2. Overview of cluster analysis
The reason why buildings are classified depending on their purpose is that buildings with the same purpose can have different energy consumption characteristics. Therefore, in the classification, it is necessary to analyze gas consumption patterns effectively.
With a total of 291 buildings in the selected districts, cluster analysis on gas consumption was conducted. For analysis, 2012-1016 (5 years) monthly gas consumption data were applied.
As an analysis program, R, an open software for statistics, was used. On the basis of five clusters, the number of clusters changed up to 25 in order for cluster analysis.
The buildings selected finally as the result of the cluster analysis on the buildings in Table 2 are presented in Table 3.
As shown in Table 3, 291 buildings were classified into housing facility, commercial facility, and public facility depending on building purpose. Housing facility was sub classified into apartment house and commercial house; commercial facility into sales facility and business facility; and public facility into religious facility, educational facility (kindergarten, elementary, secondary, and high), and public welfare facility (library, senior citizen center, etc.). In addition, for cluster analysis, alphabet code was assigned to each detailed type. Therefore, each building has its code in the form of alphabet + numbers.
3.3. Results of cluster analysis
The results of the cluster analysis on the basis of the method suggested earlier are presented in Table 4.
In Table 4, k is the number of clusters. k=5 means that total data are classified into five clusters. Cluster 1-25 represents each cluster created as the result of cluster analysis, and cluster size means how many buildings are included in each cluster. The average distance between objects represents the average distance between the center case and other cases in each cluster.
According to the cluster analysis as shown in Table 4, when the number of clusters (k) is more than 15,there are more than 2 representative cluster patterns. When k=5 , or k=10, the cluster size of cluster 1 was very large. So, the cases of other clusters than cluster 1 were classified to be specific (abnormal).
The larger the number of clusters 'k' was, the narrower the average distance between the center case and other cases in each cluster was. As a result, the classification was found to be highly reliable. At k=5, the center distance between 283 cases in cluster 1 was 4.2, but with a rise in k, the center distance between cases reduced. At k=25, the number of cases in cluster 1 was 76. The center distance between cases fell to 0.8. Therefore, it was judged that reliability of the pattern classification was improved.
If the cluster size is 1, the number of cases involved in a relevant cluster is 1. The center distance between cases is 0, and one case has its own cluster. Therefore, it is judged to be a specific case or abnormal case so that it is excluded from the basic unit analysis.
Table 5 shows the classification of buildings in each cluster depending on their uses when k=25. In order to find a rate of relevant buildings in each cluster, cluster analysis was conducted when k=25 in case where there are more than four representative cluster patterns. Table 5 shows the results of the cluster analysis.
As shown in the table, apartment cases accounted for about 82.54%, or 52 buildings in cluster 1 and cluster 2. In particular, of a total of 63 apartment cases, 29 cases (46.03%) were classified in cluster 2.
In case of commercial house, of a total of 29 cases, 16 cases ( 55.17%) were classified in cluster 2, and there was no classified case in cluster 1. This result means that commercial house had a different gas consumption pattern from apartment houses, most of which were classified in cluster 1.
In case of sales facility, of a total of 110 buildings, 52 cases ( 47.27%) were classified in cluster 2. Also, the largest number of cases in business facility, religious facility, educational facilities (kindergarten, secondary, and high), and public welfare facility was found in cluster 2. Since cluster 2 had the largest percentage of buildings in all educational facilities but elementary educational facility, it was judged that the gas consumption pattern of cluster 2 was typical.
Fig. 2 illustrates the monthly gas consumption pattern of each cluster at k=25.
3.4. Establishment of building gas energy consumption unit
On the basis of the results of cluster analysis, the basic unit of gas consumption was established. In case of basic unit, source data was suggested with kWh. Therefore, in compliance with the suggestion, the basic unit was also suggested with kWh/㎡ (unit area: gross floor area).
Housing facility in cluster 1 had very little gas consumption compared to in other clusters. According to the data analysis, a separate system like small group energy was judged to be applied. Therefore, it was excluded from the basic unit analysis.
Table 6 presents the basic unit of gas consumption after cluster analysis.
With regard to the basic unit of annual average gas consumption, commercial house had the highest, or 10.096 kWh/㎡, followed by kindergarten, apartment house, business facility, religious facility, sales facility, public facility, elementary school, secondary school, and high school. In particular, commercial house was judged to have higher gas consumption than sales facility, given that restaurants accounted for a high percentage.
In case of apartment house, cluster 2 was applied to suggest the basic unit of gas consumption. Most cases were found to be apartment, and the basic unit of gas consumption was analyzed to be 5.748kWh/㎡. Although not presented in the above table, apartment houses in cluster 3 were all multi-household houses or multi-family houses, and their gas consumption was analyzed to be somewhat higher than apartment.
In case of apartment, there has been steady research on their gas consumption or their energy consumption characteristics. According to the previous studies6)7), gas consumption of apartment house was about 6~9kWh/㎡. The result was 1.5 times higher than the result of this study. The cause of the difference was analyzed. As a result, in this study, the total gas consumption of relevant lots offered by open building energy information system was applied as the gross floor area of the building of the lots. Therefore, it was different from the previous studies in which gas consumption of some households was applied as the exclusive area of each household. Also, the targets of this study included the households which used electric heating appliances like electric pads, rather than gas heating system because of the burden of heating energy, and the empty households where there are no residents.
Usually, public facility had lower gas consumption than residential facilities and sales facility, except for kindergarten. In particular, gas consumption of school buildings was low. High school had the lowest gas consumption, or 3.048kWh/㎡.
4. Conclusion
This study applied cluster analysis to the monthly building gas consumption data of each lot, offered by the government so as to analyze gas consumption patterns of various types of buildings. In the analysis, the basic unit of the energy consumption of the building cluster with similar energy consumption pattern was drawn. The main results are presented as follows:
In housing land development districts, 291 buildings in 10 types were selected. Their gas consumption was applied to cluster analysis. As a result, The larger the number of clusters 'k' was, the narrower the average distance between the center case and other cases in each cluster was. At k=25, the number of cases in cluster 1 was 76. The center distance between cases fell to 0.8. Therefore, it was judged that reliability of the pattern classification was improved.
When k=25, each cluster of buildings and a different percentage of buildings in each cluster depending on building type were analyzed. The results are presented as follows:
Apartment cases accounted for about 82.54%, or 52 buildings in cluster 1 and cluster 2. In particular, of a total of 63 apartment cases, 29 cases (46.03%) were classified in cluster 2. In case of commercial house, of a total of 29 cases, 16 cases (55.17%) were classified in cluster 2, and there was no classified case in cluster 1. This result means that commercial house had a different gas consumption pattern from apartment houses, most of which were classified in cluster 1.
In case of sales facility, of a total of 110 buildings, 52 cases ( 47.27%) were classified in cluster 2. Also, the largest number of cases in business facility, religious facility, educational facilities (kindergarten, secondary, and high), and public welfare facility was found in cluster 2. Since cluster 2 had the largest percentage of buildings in all educational facilities but elementary educational facility, it was judged that the gas consumption pattern of cluster 2 was typical.
On the basis of the results of cluster analysis, the basic unit of gas consumption was established. Commercial house had the highest, or 10.096 kWh/㎡, followed by kindergarten, apartment house, business facility, religious facility, sales facility, public facility, elementary school, secondary school, and high school.
However, given the limited cases of educational facilities (kindergarten, elementary school, secondary school, and high school), it will be considered to analyze more cases.
Acknowledgments
This research was supported by a grant(17CTAP-C130211-01) from Technology Advancement Research Program(TARP) Program funded by Ministry of Land, Infrastructure and Transport of Korean government
Notes
References
-
Lee, Shin-won, “A Study on Hierarchical Clustering using Advanced K-Means Algorithm for Information Retrieval”, Doctoral Dissertation, Chonbuk National University, (2005).
Lee, Shin-Won, "A Study on Hierarchical Clustering using Advanced K-Means Algorithm for Information Retrieval", Department of Computer Engineering Graduate School, Chonbuk National University, (2005). -
Sohn, Heung-gu, “A Study on Short-term Electricity Demand Combined Forecast by Time-series Cluster Analysis”, Doctoral Dissertation, Chung-Ang University, (2016).
Sohn, Hueng-goo, "Elecricity Demand Combined Forecasting based on Time Series Clustering Analysis", Department of Statistics Graduate School, Chung-Ang University, (2016). -
Park, Mi-ra, Park, Ki-ho, Anh, Jae-seong, "A Study on the Regional Classification Method using Dynamic Time Warping ", Geographical Journal of Korea, Vol. 45(No.3), (2011).
Park, Mi-Ra, Park, Key-Ho, Ahn, Jae-Seong, "A Study on the Regional Classification Method using Dynamic Time Warping", The Geographical Journal of Korea, Vol.45(No.3), (2011). -
Ministry of Land, Infrastructure, and Transport, Open Building energy Information System, http://open.greentogether.go.kr.
Ministry of Land, Infrastructure, and Transport, Building Energy Information Disclosure System, http://open.greentogether.go.kr. -
Ministry of Land, Infrastructure, and Transport, Integrated Real Estate Service, https://kras.go.kr.
Ministry of Land, Infrastructure, and Transport, Real property integrated Service in Civil Affairs, https://kras.go.kr. -
Cheon, Jin-soo, “A Study on Energy Consumption Characteristics of Apartment House based on City Gas Consumption Analysis”, Doctoral Dissertation, Gyeongsang National University, (2010).
Chun, Jin-Su, "The study on Energy Consumption Characteristics of Apartment Based on Analysis of City Gas Use", Department of Architecture Graduate School, Gyeongsang National University, (2010). -
Im, In-seop, “A Study on Energy Consumption Change by Apartment House Unit Plane Change: Focusing on City Gas Energy Cases”, Master's Thesis, Seoul National University of Science and Technology, (2016).
Lim, In-Seob, "A Study on the Change of Energy Consumption by the Change of Apartment Unit Plan - Focusing on city gas as an energy source -", Department of Housing and Urban Development Graduate School, Seoul National University of Science and Technology, (2016). - Kruskal, J. B., "An overview of sequence comparison: Time warps, string edits, and macromolecules", SIAM review, Vol.25(No.2), (1983). [https://doi.org/10.1137/1025045]
- Berndt, D. J., Clifford, J., "Using Dynamic Time Warping to Find Patterns in Time Series", In KDD workshop, Vol.10(No.16), (1994).
-
Shin, Jae-min, “Comparing the Accuracy of Cost Estimation Model by Adopting Data Mining Techniques in School Building Projects, Master's Thesis, Kyonggi University, (2014).
Shin, Jae-Min, "Comparing the Accuracy of Cost Estimation Model by Adopting Data Mining Techniques in School Building Projects", Department of Architectural Engineering Graduate School, Kyonggi University, (2014).