07 Aug Inside the Mind of An Analyst
We often get asked for case studies and examples of our work. Clearly this is fine. I’m never sure it quite describes the value we actually add though.
The reason is that these examples show a finished outcome – we started here and got here.
I thought it might be interesting to give an example of something much more raw, much more natural.
The idea of this piece is to let you step inside the analysts mind – as if you’re going along for the journey of discovery that comes with starting to look at a new data set.
Actually, I have no idea if this will be useful or even comprehensible, but the thought behind it is sharing the narrative and the processes of data discovery. The hope you can see what we do.
If it all seems familiar then you’ve obviously been there done that. If it seems totally odd then perhaps it was worth it and I hope it helps you see what analyst do.
The situation is that we’ve been doing some research on how to make better use of the Concept 2 rowing machine data. This is a purely Pace Insights self funded and self initiated project.
The rowing machine has a screen. It shows information on the rowers performance. Typical measures are 500 metre pace – i.e. how long would it take to do 500m at the current rowing pace. This is updated each stroke.
You can also hook up the machine with a heart rate sensor (HR) and you can also store the rowing data on an iPhone app.
The iPhone app, shows you more information that you can see on the standard Concept 2 screen. Namely, information on drive length, peak power and stroke by stroke drag factor. The drive length is the length in metres of the rowers last stroke, the peak power was the maximum power generated in the stroke (normally you just get average power for the stroke) and the drag factor – this is the resistance (normally only set at the beginning and assumed to be constant – it’s not!).
When a rower has finished (their effort) they can add comments to their data on the iPhone app. This can then be saved. It can also then be sync’d to a Concept 2 cloud service.
From the cloud service, you are able to download the data. You can download summary data, for multiple efforts at different times, or, you can download detailed data for an individual effort.
The summary data has the times. It has the average HR (if applicable). It has the data. It has the average stroke rate, the average pace and the overall time for the effort.
The detail data has stroke by stroke information. What this means is that it takes a snap shot of the data each stroke. This includes HR, SR, Pace, Distance, Time.
Sadly, neither the summary or the detail data are complete on their own. Furthermore, there is information that only appears on the cloud service – the drag factor (averaged). There is also data that the app shows that is not logged – the drive length per stroke, the peak power, the drag factor per stroke. The detail data doesn’t have the finish times for each interval or the effort – these need are only in the summary data.
So, whilst there is some great information here. Information and data worth doing something with, there is a little work to do.
I chose this as an example as it’s fairly common to have these kinds of situation. Ones where there data is kind of all there but its not. The analysts job therefore starts with “getting hold” of the data, getting it into a form that’s useful and also doing some initial “data discovery” work, or investigation work, to first start to get a feel for the data.
The following is an, as it happened, account of this discovery and sorting process.
It’s long but it flags thoughts long the way, exposes questions and considered theories to explain whats happening. It uses multiple tools from spreadsheets to coding. It unexpectedly unearths some of the fundamental equations and logic used in the Concept 2 rower to give metrics and to relate performance. It also finds what could be a potential bug in the Concept 2 software the would have significant implications …
Hopefully it gives you some insight.
Inside the Mind of An Analyst
1st issue – csv is only summary data.
Does have comments – could be useful.
Found stroke-by-stroke csv – have to download separately. OK.
Filename same as Concept2 id. No summary info.
No HR on one file – key one too! – find out why.
Stroke-by-stroke only has distance of stroke start, not finish time.
Need to combine summary and stroke-by-stroke somehow.
Tip to rowers – do one more stroke past end – doesn’t log.
Does have pace, stroke, power, Cal/hr, stroke rate – some look averaged.
These should link to historic data.
Not sure if stroke-by-stroke is pre- or post- as distance 10m?
Doesn’t have drag factor. That is on web- but only ave not stroke-by-stroke like on app. Need to add.
Download another. Filename arbitrary from what can tell.
This is an interval training one. Pace goes up during interval.
Good though as get HR all during interval – but only per stroke.
Suggest to Athlete to keep stroking even in rest to monitor record recovery HR?
Time isn’t zero, nor distance. Need to infer.
Doesn’t have finish times – get from interval table on web.
HR data seems bit odd – recovers mid distance? Like it’s not being recorded properly?
Got 27 data sets for the three months of training.
Try: From season summary, add drag factor and ref to detailed stroke-by-stroke.
What to add back on details? – finish and start time? and each interval as appropriate, also add athlete weight from app for each data set.
Playing with basic charting on summary data. Try HR v Power Ave, Ave Power v SR.
Seems to be a frontier? Increase SR approx increase in Power ave?
Also shows SR zones – time in each approx as only summary.
Excel auto axis a pain!
HR v SR is interesting. Dip around SR 26 – look at this again?
Thought: Data is kind of 3D or multi-dimensional —> i.e. need time context of effort as a variant in analysis maybe (like a “physiological cost on HR per stroke” ? or something?) to give better comparative context between underlying performance, recovery (fitness?) – maybe generate a pacing strategy from this?
Dip into Tableau – colour by variable/scale to visulisase a bit better / easier than Excel as multi-dimensional plots not built in function of excel charting really.
Maybe category context with athlete “mood” / effort? – Ask physiology team for guidance on relevance.
Create crude “positivity index” or something to maybe match mood and performance. Aim to normalise (and quantify effect of?) for mood perhaps?
Try; Python NLTK library. Ran some comments through this. Needs teaching properly really. Most sentiment results negative or neutral!
Perhaps not surprising as normally finish with commend about being “broken” !
Maybe effort level would be better metric? Get athlete to add this data to their comments? perhaps 1-10 scale? Can then train classfier.
Broad trend on Pace Ave over time – improving ! Great. How to show that?
Re: sort axis again in Excel. Tried trend. Same trend over time.
Plateau? No. data points too low? 3rd order poly shows reasonable match —> R^2=0.92.
Shows plateau and then dip. Frequency of data not consistent though. Might be bit optimistic – forward trend line extrapolation shows rapid improvement when really a slow in improvement might be more likely?
Not great but starting to get a feel of data at macro level.
Linear trend prob a bit better —> R^2=0.81 but less aggressively positive.
Interesting to plot Ave Watts v Pace. Shows the Concept2’s underlying relationship between these variables. Interestingly they’ve made this a non-linear relationship. Wonder what drives that? Maybe some research online?
Trend line —> R^2 = 0.9997 so Poly 2nd order a good fit! 🙂
Changed axis from x time to y time as extending trend line through Excel into tail spin.
Question: What Power does Athlete need for faster pace(s)? Google.
For 2k skull World Record time – just to get zone as appreciate weather, environment dependant etc – M1x 6:33.25 at approx 1:38/500m, M8+ 5:19.35 at approx 1:20/500m. datedial.com
Rough zones: 8min approx 2:00/500m, 7:30 – 1:52.5, 7:00 – 1:45, 6:30 – 137.5, 6:00 – 1:30/500
Format Excel axis to show logical splits – auto totally random! Google how.
Excel isn’t great with dates. DataValue is key. Coverts into number.
Create little converter: time —> number
45mins on a call. Googling some more. Added grid lines to give context.
15 sec major gridline splits now. 5 sec minor.
Shows that 1:20 / 500m pace approx 515W
OK so can work off critical power curve to monitor this as got relationship and context and targets.
Limits seem to be HR and core strength. Technique more difficult to gauge but consistency of stroke longth could be a good measure as seems some missed strokes.
Want to see progression over time of Power v HR – although all data seems to be in high HR zones?
Stroke by stroke data can be used to create a critical power type curve. Python.
See progression over time. Animate this to show better?
Then see how to improve.
Where to focus perhaps? Pacing been an issue.
If gone out too hard then Interesting to see if can do something to recover during effort? Prob not but links to stroke by stroke cost index on HR above maybe?
Want to see relationship of Watts, HR, Time.
Excel can’t do this well. Make multi-data sets. Fix axes. Doesn’t really show increases in fitness it seems that athlete just pushed harder – in HR terms – and produced more power. Weight another variable to add to summary – as changed over time. Normalise? Power / kg might show fitness improvement better?
Want to switch axes. Excel a pain for that!
Sorted order of chart to months. Plot other way. Shows a little better e.g. In Dec HR 168 = approx 126W, In Jan = 170W.
Change scale y.
Fitness target: In HR stripes want to see more Power for given HR.
Will be good for detailed analysis. Look at HR buckets (zones) in histogram?
Initial summary – shows some rapid progress from Dec-Jan. Then slows. Lack of comparable data as looks like working in different HR zone. (able to?) try harder for longer? Look in detail – should show more clearly.
Frustrating that best power has no HR. Can guess perhaps? Ave likely to have been approx 180-185W. To reach a pace target of sub-7mins at 1:45/500 = 290W average. Maybe focus on power at lower HR?
Gradient kind of showing more HR needed (!) but limited (naturally).
Plot trendiness for data on each month. Kind of shows progression up: for given HR —> more power. Either poly 3 or 3 or linear shows it. Again detail might show more.
Thought: Issue is recovery and maintaining power through effort – not underlying ability to generate power per ce? Equivalent to “fitness” metric / index.
Saying that it is of whole effort and most efforts the same. 500m intervals for 5k. Then should be reasonably indicative? Consideration of timing.
On linear trendiness it shows quite nicely the power is higher for a given HR. On Dec data one outlier point? Why?
If trend line ignores – can’t as too close to Jan. The issue is the bucket approach here to time in months, as effort on last day of month too close to first day of next month.
Thought: More continuous would be better – critical curve style at a given point in time and then comparing these?
Looking at detail – interval view. Stroke count v HR shoes trend and recovery. Look at HR and Interval time. Good as split into intervals. Recovery difficult as not many mid interval (i.e. rest) data points as they are only recorded stroke by stroke. Resting equals no strokes. Shame as it’s shown on the screen and the app.
Hard to chart as 3D again – and overlapping. Switched to scatter plot dots instead of lines to reduce artefacts and also remove gridlines. Would suit being interactive version say in Power Bi, and or band style. Looking at SR v interval time. Shows density nicely around 29/30 strokes per min. Histogram again might be better. Excel doesn’t support natively so use Python plotlib. Nice.
Discovery: Concept 2 relationship between watts and Cal/hr calculation. This one is linear. Starting at 300 cal/hr for zero watts of input. Goes up at y=0.2902x-87.258 although this is in scale of axis. Careful Excel.
795cal/hr = 144W. 1427 cal/hr = 328W. Note max W in this effort was 328W. For sub 7min time needs to be 295W ave. 328W is 1:39m/500m —> 6:36min 2k pace. Note that these power values are averages per stroke. On the app there is peak power per stroke shown per stroke, also, stroke (drive) length, and stroke peak velocity – i.e. technique indicators. Sadly (oddly?) not stored in data.
Power v Interval time. Shows v strong first 500m. All above 300W peak. Confirmed in 1:44.4 split. Shows started strong on most efforts. Then dropped down quite a bit.
Check: Comments for why perhaps?
Given non-linear relationship to power and pace, can see that even a little drop in power has less effect on pace. Something to consider when considering pacing strategy —> cost for 1sec/500m in power might be worth quite a bit in terms of mid-effort “recovery” or perhaps better put “capacity until exhaustion” index.
Plotted 500m pace v power to compare to summary data. The pace is in seconds. Seems to show model of diminishing returns. Tried to plot trend line as the data for dummy intervals puts pace at infinity (!).
Task: Will have to process this and handling this somehow.
Scatter plot anyway shows density of strokes a particular Watt. By-eye, most seem around 200-300W. Again 3D data really as ideally you’d have HR and some indication of effort time spent variation – as longer goes on, higher HR.
HR v Power? Whilst creating did Stroke v Watts. Shows spectrally that stroke rate largely independent of Power – with Max 330W achieved at 29, 30 & 31 SR.
Thought: Maybe recommend to athlete to consider slower SR for pacing —> plot Distance / Stroke – needs to be calculated as not in standard output.
Did quick calc in new column. DPS seems to cluster around 9.25 – 10m per stroke. Fairly independent of SR.
Thought: Keep SR down then? Increase W?
Plot: DPS v Power. DPS v Time – very consistent, around 9.2 – 9.4m in mid section.
27sec to 93 sec. In 0-30sec close to 10 m per stroke. Same at finish.
SR v Stroke shows first 500m all consistent above 10m —> drive length would be good to know really.
Thought: Wonder if can get drive length off the app?
From inspection watching the screen on the app – typically drive length 1.20-1.33m
DPS v Power fairly clear relationship between increasing DPS and Power. Not sure which is result of which though! Prob power. So on X. Interesting some high power and low DPS —> guess this is at the start of intervals? Also long DPS with low power —> guess at end? … it’s not.
Can’t calculate trend line so inferring. Shows some strange drops during effort. It’s not effecting other metrics. Guessing SR is averaged, so too pace? , over 3 readings? … Yes as first two cells blank at start.
Anyway notional relationship is linear —> is this result of poor technique?
10.1m DPS —> 328W, 8.6m—> 202W. Same speed. But largely consistent max speed – by-eye – around 217W —> 8.9m mid, 8.5m min, 9.5m max-ish
Thought: The drop outs are odd. Maybe Concept 2 calc error??
Consistently about 2m short, even though other metrics not changing?
Task: Double check with other data sets. Try to get Pro rower data. Do video session to eliminate issue. If issue – contact Concept 2 as 2m is significant.
When plotting DPS and SR, clear trend to show longer DPS with lower SR. DPS v Power has four bands. Odd. Perhaps SR impacts on calcs.
On data viz, interesting how small can make charts and still covey value. Sparkline esq. Tweak line thickness – smaller is clearer.
Clearly shows that for a lower SR, for given power, equals longer DPS.
Thought: I wonder how long trend lasts (!) Need pace – velocity. This shoulds same speed across SR, however, lower SR = lower pace = shorter DPS. Lower SR = faster pace (!?!) for given power. Strange?
Target pace 105 sec/500m or less – SR recorded at 31, 30, 29SR – therefore SR 28 opportunity to go faster for equivalent effort?
Thought: Higher power at 28 SR should equal faster pace. Therefore what about 27 or 26 SR? Just seems counter intuitive.
Question: is the chart over influenced by the way the data is calulcated? Given SR is assuming an offset. Also power is power average per stroke. What influence does peak power have on boat velocity and DPS? Maybe nothing on Concept 2 but maybe major influencer on water?
Task: Ask team about stroke power profile and boat speed. What is best? What should we looking to profile on Concept 2 to recreate this?
Pace Insights Ltd. ©2018 All rights reserved.