🚲 Virtual Cycling Coach

For my Master Thesis, I worked together with the team of the ETH spin-off magnes, who specialize not only in a magnetic coating technique of materials, but also developing sensors that allow accurate power measurement of cyclists and pedaling force distribution. In essence, I headed the design and development of an intelligent multi-purpose virtual coach that harvests and stores personal cycling activity data safely, recognizes patterns with intelligent clustering and classification methods, and ultimately interprets these patterns to provide diagnostic performance insight. Along these lines, we first discuss the data collection methods from multiple servers and repositories. We integrate the recursive selection of best features, the high quality clustering of activities, an automatic label assignment mechanism, and a perceptron-based activity classification with an average 95% accuracy. Based on these analyses, we develop a custom designed Handicap HC – an intuitive metric reflective of the current performance. For the computation of HC, we partition each activity and assess each segment according to its power, grade, and altitude. From these, the activities are evaluated to a Ride Score RS, which makes up the final calculation of HC. The results for HC are validated for various examples, clearly exhibiting performance progression over a certain period of time. Finally, a decision tree successfully identifies potential weak spots during training and provides diagnostic feedback to support the cyclist in reaching a pre-defined milestone.

Back in 2014, the renowned IT magazine CIO ran the headline “Wearable Techs Dilemma: Too Much Data, Not Enough Insight” [1]. Not much has changed since then, as sports platforms such as Strava, Garmin, and Endomondo log users’ athletic activity and visualize it in a social media feed, yet do not provide useful predictive analyses or an intuitive overview of individual performance evolution.

Cyclists who seek this type of insight and training recommendation typically consult sports articles or one-for-all online training plans, all of which are suboptimal for the person due the reliance on a biased best guess fitness estimation solely based on age and gender. Essentially, these approaches are heavily deprived of individualization.

Throughout the cycling community it is widely echoed that individualization in training stimulus creates optimal performance outcomes [2,3,4]. This individualization includes training adjustments not only according to age and gender, but also to the rate of progress and previous skill development. Individualizing training can be an extremely time-consuming task, as they require an athlete to perform multiple high intensity tests over sessions of hours in order to measure specific physiological traits [5].

We present a virtual coach, which circumvents time-intensive tests and analyzes past cycling activities, including commutes, races, and exercise sessions to infer this physiological data. In keeping with this, we tackle the challenge of creating individualized cyclist profiles that accurately capture the performance, modeling the frequency and intensity of exercise by analyzing every second of every activity. Moreover, we address the shortcomings of major sports platforms – who fail to dynamically and clearly evaluate performance – by providing a pipeline of deep activity analysis and evaluation. We transform the data from descriptive to diagnostic while providing essential insights with self-developed metrics. This is done in a software pipeline presented in the following three components:


We connect to multiple APIs and repositories of platforms that contain relevant information to personal cycling activity data, gathering billions of data points in total. The type of data is visualized in the figure below. We organize and filter this data according to certain criteria in a joint database.


We feed the collected and filtered data into an analysis pipeline, where the coach selects the optimal subset of features and uses these to find a meaningful underlying clustering structure. In a next step, a decision tree-based approach is presented that transforms an unsupervised learning problem to a supervised problem, i.e. assign known labels to each cycling activity cluster. Several classification methods are investigated to accurately categorize these activity clusters as commutes, races, or workouts of different intensities.


The coach performs a stream analysis on the user’s cycling activities, segmenting each activity into uphill, flat, and downhill segments. These are individually scored with a custom Segment Score, which reflect the respective power distribution and duration of the section. The Segment Scores are then summarized to a Ride Score, a scalar rating of the effort throughout an activity. Finally, the Ride Scores make up the calculation of a personalized Handicap – an intuitive and comparable metric for power-based performance evaluation. Keeping in mind that the coach’s ultimate objective is the capitalization on strengths and minimization of existing skill deficiencies , activities with an abnormally low Ride Score permeate a decision tree in order to identify key problems in the respective activity and providing suggestions on how to train in the future.


The Handicap I mentioned earlier is a 0 to 100 score, by which we were able to evaluate athletes based on their power rides.

Here are some results of historical Handicap (HC) calculation. Generally speaking, the athlete has a high Handicap over the year ranging from 72 to 93. According to our assumptions, this is typical for an amateur athlete. The progression of the Handicap clearly indicates the monthly training progress and peaks in the vicinity of the March 21st and September 22nd races. After the respective race events, we can observe a decay in the Handicap in a tapering phase. As expected, an outlier RS value such as in July does not affect Handicap drastically, as we select the best performances.

Semi-professional cyclist.

The chart in the next figure displays the evolution of HC for two amateur triathletes, who happen to be twins and train and race together regularly. These are distinguished by the superscripts in the plot labels (see legend). From this we can perform a direct comparison and check training progress and potential inconsistencies or anomalies.

While both twins have very high scores with Twin #1 between 73 and 100 and Twin #2 between 73, 95, Twin 1 has consistently been stronger than Twin 2, which is apparent from race finish times throughout the season- Partly this has been due to Twin 2 suffering inconvenient injuries. On March 5th , for example, Twin 2 has suffered a contracture on the left upper leg, leading to slightly decreased HC2, as evident until mid-March. According to Twin 2, this fallbacks did not change the training plan much, however, slightly impacted performance for a short period of time. Later on, training performance between the two is on equal terms, until Twin 2 suffers a fatigue fracture on May 16th , due to which he drastically reduces training volume until mid-August and fully recovers until the end of that year. This is evident by the drop of HC2 and parallel continuation, HC2 being consistently lower than HC1 until mid-August.

Amateur twins.

As all of these examples validate the accuracy of the computed Handicap values, we prove that we are able to capture the segment efforts, activity efforts, as well as daily, monthly, and yearly effort in a single intuitive measure that is reliable and comparable. The author of CIO’s article Brian Eastwood mentions in a final remark, “I need wearable tech to tell me what I don’t know – and to do it without being uncomfortable, intrusive or expensive” [19]. Along these lines, I created an attempt to bring Eastwood a step closer to this reality.


[1] B. Eastwood. Wearable Tech’s Dilemma: Too Much Data, Not Enough Insight . https://www.cio.com/article/2452759/health/ wearable-techs-dilemma-too-much-data-not-enough-insight. html, 7 2014.

[2] S. Latyshev, G. Korobeynikov, and L. Korobeinikova. Individualization of Training in Wrestlers. International Journal of Wrestling Science, 4(2):28–32, 2014.

[3] K. Zhanneta, S. Irina, B. Tatyana, R. Olena, L. Olena, and I. Anna. The Applying of the Concept of Individualization in Sport. Journal of Physical Education and Sport, 15(2):172, 2015.

[4] L. Vorfolomeeva. Individualization of Training Process as a Leading Construction of Skiers Training Component in Preparation for Higher Achievements. Physical education of students, 17(4):15–18, 2013.

[5] H. Allen. Individualizing Your Training With WKO4, 6 2015.