Thursday, November 27, 2014

Part 4.1 - Traditional Approach to User Experience Management

To better understand how big data and analytics can dramatically improve the network problem identification and resolution for Service providers, let’s shown how the customer user experience is today managed by Telco operators

In the telecommunication industry, the management of User Experience is traditionally a well established discipline with clear models and processes (eTOM, ITIL). It is generally based on three main business processes:
 
  • Fault Management (FM), it is the group of processes of collecting and managing Alarms from network and service elements – Generally speaking, this process is responsible of monitoring services availability. This process is highly automatized
  • Performance Management (PM), it is the group of processes of collecting network and service performance information and aggregating it in high level indicators: the KPI and/or KQI. This process is responsible to monitoring the performance (quality) of the services. This process is highly automatized
  • Incident and Problem management, it is the group of processes to identify the problems (root cause analysis) and to make the needed change in network to fix the problem. This process is manual and time-consuming

The idea behind this model is very simple: all services are: “mapped” into the network and are associated to aggregate performance indicators (KPI/KQI) which measure the quality of the service provided. The combination of these components - Network alarms, KPI/KQI and service-network element mapping - provides the information to manage the service quality.
  • KPIs status provide clear and integrated view of the status of the network and the service quality provided
  • Alarms (Network element fail and KPI violation) provide immediate message on problems
  • Network-Service mapping allows to perform root cause analysis when alarms (network and KPI) are raised

When a network element fail, a network alarms are raised. Through the mapping between service and network elements it is possible to identify services impacted. Similarly, when a service KPI/KQI is degraded (it goes above or below defined thresholds), a KPI violation alarm is raised and through the service-network mapping is possible to identify the cause of the problem. So the combination of alarm, KPI and service mapping allows to manage problem and incident in the network and/or in the services provided to the subscribers.


Note:
The diagnostic approach based on the mapping between services and network elements   is based on the analytic model of “abductive inference” between alarms (effects) and network elements (cause). The abductive Inference is the process of reasoning from effect to a cause. It is a form of logical inference that goes from an observation (f.e. alarm) to a hypothesis that accounts for the observation (f.e. network or service elements which caused the problem).
 
Also if theoretically this model is extremely powerful, in the reality it has several strong limits:
·         the mapping can be extremely complex in large networks,
·         it is able to diagnose only known problems (the ones modelled)
·         it   doesn’t provide user impact information
·         It can’t identify problems which have no fault associated (f.e. slow download)
   


The picture below show an example of Telecom Operator Service Assurance reference architecture and the business processes associated.


Figure 1 - Example of Telecommunication reference model for Assurance and fulfillment processes


  
This model has worked well until the mobile internet era. In the 2G and 2.5G world, all, or most part, of services (f.e. example voice, SMS, etc.) provided by Telecommunication Operators were “network services” - that are services provided by the network layer. These services were controlled end-to-end by the Service Provider – from the switch to the terminal. Moreover, at that time, the mobile network was quite homogeneous and terminals (phone) were quite stupid.

In 2/2.5G scenario, the quality of the services was tightly bound to the network service level: monitoring the network – in particular the network signaling - was enough to monitoring the quality of the services provided.  At the same time, the strong relationships between services and network elements was the key to identify the root cause of the problems. It was the era of SLA (Service Level Agreement) and KPI/KQI (Key Performance/Quality Indicators).

With the 3G and the beginning of mobile internet era this scenario changes: Not “network services” appear. IP services (f.e. e-mail, web browsing, file download, etc.) start to be a commodity also on mobile networks. These new services had the peculiarity tonot to be “network services”; they are “application services” provided by applications on-top of the network. Quite often these applications are not owned by Operators but are provided by external entities (Application providers) using Operator network 

Terminal starts to be more intelligent and to run part of the applications/services on the terminal itself. The first smartphones appear. Mobile network increase their complexity: the 3G network is combination of packed data and voice circuit networks and its infrastructure coexists with the existing 2G/2.5G networks

These changes have a strong impact on the user experience monitoring and assurance:
  • The new IP services are just lightly bound to the network layer: monitoring the Control plane is not enough; User plane information are needed too.
  • Moreover, the relationship between services and network elements become less clear due the intrinsic packet-switched behavior of IP protocols, making the problem diagnostic processes extremely complex and time-consuming.

 
The service monitoring and assurance systems evolve to switch their focus from network to user and two new concepts rise: the Quality of Experience (QoE) and the Customer Experience Management (CEM).


Note:
Quality of Experience (QoE) is a measure of a customer's experiences with a service (web browsing, phone call, TV broadcast, etc.). QoE is a purely subjective measure from the user’s perspective of the overall value of the service provided. That is to mean, QoE cannot be taken a simply the effective quality of the service but must also take into consideration every factor that contributes to overall user value such as suitableness, flexibility, mobility, security, cost, personalization and choice (…..) Apart from its being user dependent, QoE will invariably be influenced by the user’s terminal device (for example Low Definition or High Definition TV), his environment (in the car or at home), his expectations (cellular or corded telephone), the nature of the content and its importance (a simple yes/no message or an orchestral concert) (from Wikipedia).

Even if the QoE is a subjective measure, it is possible to identify metrics that are directly related to the quality perceived by the end user – f.e. time to set up a new call, time to see a new web page or to access internet content, jitter or video buffering time when watching a video, etc. These metrics can be combined in score system to be used as global indicator of the quality perceived by the subscriber when he uses the services but also when he interacts with the Service Provider itself – f.e. when he contacts customer care.

 

4Gnetworks and the “Apps era”, completely reverse the traditional scenario: only few services are still network services; the most part of the services are provided by application over the network (the so called Over-The-Top applications).
  • The first and worst consequence of this new paradigm is that Telecommunication providers don’t control anymore the end-to-end service; in the best case they just control some components of this services, but in the most cases they just control the flow of application data inside their networks. But at the same time, Subscribers still credit Service Providers for the end-to-end service. If a subscriber has a problem with YouTube, their first thought is that the network’s operator is not good, independently of the real reason of the problem.

  • Also in the case where the Service provider controls the whole end-to-end service– f.e. in the case of VoLTE – providing a clear view of what is the real service quality perceived by the final user isn’t an easy task. The coexistence of different network technology - 4G (fully IP), 3G or 2G (traditional) and also WiFi - has dramatically increased the complexity of measuring service quality. A VoLTE call can provided through a mix of different network technologies – the so called 3g/4G roaming – and in this case, understand the quality of the call is a real guess.

  • The User interaction is completely changed too: subscribers and their devices (more than one) are always connected to the network (always on). Customers are become extremely demanding in term of quality and speed of the connectivity.

  • Additional complexity element to this new scenario is related to the so called Internet of things (IOT): in a short future the main users of the telecommunication network will be not human anymore, but sensor and device. This is the era of machine-to-machine communication


Obviously Service monitoring and Assurance systems have been evolved to keep up with this evolving scenario, f.e.: 
  • New data sources have been introduced (f.e. usage plane) to produce new KPI/KQIs, correlating control and usage information. 
  • Session capture and monitoring has been extended to all subscriber
  • Deep Packet Inspection (DPI) probes has been introduced in the network to extract IP protocol’s metadata, to have a better view of the user traffic at application and protocol level (level 7) 
  • Some operators have started to use also device status information (CiQ) to extend the trouble shouting up to the user’s devices
  • Other data sources as network topologies and logs, etc.

Unfortunately, most part of these improvements have been focuses on identifying and monitoring the subscriber’s Quality of Experience. Relative few efforts have been directed to solve the fundamental problem of the network and subscriber management: understand the reason why a problem or alarm has happened.


Using statistical terminology, today’s Service Assurance systems are efficient QoE Descriptive Measurements tools (know the problem), but they don’t provide diagnostic analysis support (to help solve the problem): they don’t help operator in the diagnostic and solve of the problem.  



 

 
 
 


No comments:

Post a Comment