The outageoccurred between 08:15 and 08:45 on Thursday this week, andNorNet scientists immediatelysat down to understand the outage.In order to understand the impact and extent of the event, researchersleverage the measurements of Telenor’s performance that is collected by the NorNet Edge infrastructure (NNE).
The researchers at NorNet have provided a detailed run-through of the event and their findings, and state that about 40% of NNE's Telenor connections were unable to exchange data during the outage. As illustrated in the figure above, the percentage of impacted Telenor connections aggregated every five minutes, started to steadily increase from 08:15 and peaked at 08:40. The outage seems to be resolved around 08:45.
To quote the report, "The severity of the impact varied across connections, two thirdsof the affectedconnections suffered at least a five minute long outage, while the rest suffered shorter degradations. Almost all affected nodeslost their data connection completely at some point during the outage, and nodesthat lost their connectivity early on kept on trying to reestablish it. They often succeeded in receiving an IP address from the network but quickly lost the data connection. Looking further into this, we found that these nodes were unable to complete all the steps needed for establishing a data connection. These function are managed by elements in the mobile core networks (e.g. the MME, HSS, HLR). Hence, our observation hints that the outage is caused by a failure in the core network (e.g. between the MME and HSS in 4G)."
NorNet scientists also looked intothelocation of the impacted nodes. The figure below shows the distribution of the affected nodes, and the fact that they are scattered across the countryconfirms the earlier hypothesis that the cause of thefailure lies in the core network.
Ahmed Elmokashfiworks with the NorNet project asasenior research scientist at Simula, and statesthat"[t]he scope of the impact is caused by the fact that in today's mobile networks most of the crucial network functions are centralized. Meaning that a failure of such function can have a wide impact. Network operators often keep more than one instance of a network function and load balance demand between them. This explains why not all Telenor users were impacted yesterday."
Telenor acted quickly in correcting the network failure, but Elmokashfi argues that there is room for improvement: "Going forward, I believe that there is a need to decentralize the mobile packet core even more. Actually, the ongoing work on defining the next generation of mobile networks, 5G, has a heavy focus on this. More decentralization, however, adds more complexity which can lead to outages. In summary, in future networks we need to balance complexity and the extent of decentralization in order to build better and more robust networks."