The 5 Point Trade Quality Scoring System
Using granular metrics to rank our trades bar-by-bar to determine exactly how well each one is at capturing alpha (part 2 of the 70 Million Trade Experiment series.)
Often we have a trading system with a countless number of trades (in my case 70,000,000) with little to no way to understand actually what is going on. Sure, we get massive printouts and tear sheets with a ton of figures that quantify our strategy. But, what about on a trade-by-trade basis?
What we really need is to understand the quality of our trading systems on a trade-by-trade basis. It’s better to have a system that has super high efficiency, mastering entries and holding for minimal time than one that takes out trades haphazardly and holds for long periods of time, keeping our capital locked up and at risk.
So rather than using the “same old same old,” I devised a trade quality scoring system so that you can drop in your trades and figure out if they are good, how to improve them, and how to make more money than your peers in the marketplace.
Here’s the thing:
Aggregate strategy metrics are great, but they lack the granularity which can give us clear insights into exactly how our strategy works.
Without these trade-by-trade metrics, you won’t be able to construct and optimize the perfect strategy for your given needs and you’ll be left with unsatisfactory results.
Defining High Quality Trade Metrics
There are many metrics that already exist in terms of ranking a trading strategy: Sharpe ratio, Sortino ratio, maximum drawdown, CAGR, etc. But these are big picture metrics.
There also exist lesser known metrics that are made to give a deeper and more granular look into how each trade bar by bar is performing. Let’s dig into several of them to see how they work and give us a better picture into our work.
Measuring Opportunity Capture with Maximum Adverse/Favorable Excursion (MAE / MFE)
Every trade has a maximum point at which you as a trader feel the most pain. This point is known as the Maximum Adverse Excursion (keyword ‘adverse’). We measure this by taking the maximum potential drawdown of a trade, even if the trade didn’t end in the red.
Below is an example:
We take out our trade, and it moves against us a maximum of -4.58% (thus it is our Maximum Adverse Excursion). However, the trade still closes in the profit. It’s a good trade, right? Well, what if that number was -8%? -10%? What is your risk model? The MAE here gives us a signal into that on a trade by trade basis so we can find which strategies are the riskiest, and potentially even modify our stop losses.
On the inverse, we have the Maximum Favorable Excursion. And, this is exactly the opposite of the MAE. We can see the maximum point that the trade moved in our favor.
But wait, why would we want to do that? Well, there are cases, like the one below, where our trade could have made more money, but our take profit logic didn’t get us 100% efficiency.
So, while a high MFE can be good, it’s bad if the difference between the MFE and final returns is large as it tells us that at the trade was not efficient.
Here I am filtering out the top 50 trades on AAPL using the Ichimoku Cloud stop loss I developed ranked by MFE / return.
As you can see, this gives us the top trades where the MFE is not far away from the final returns, which selects only the ones that are continuously in profit.
Measuring Trade Smoothness with Monotonicity
It sounds like a mouthful, but monotonicity describes how monotonic a function is. A function is monotonic when increasing values on the x axis result in continuously increasing or decreasing values on the y axis.
Functions that squiggle up and down like a sine wave are not monotonic. This works great for us because trades should be monotonic (have a drawdown of 0%!) Of course, this is impossible. But, we can describe this monotonicity and filter out trades that are non-monotonic to a certain degree.
Monotonicity is simply defined by taking the difference of a time series, and counting the number of resulting bars that continue the up or down trend and then dividing that count by the length of the time series.
So, if a trade of 20 bars goes up straight for 5 days, then down for 5, and then up for 10, we take 5 + 10 / 20, or a monotonicity of 75%.
Here are the top 50 trades selected by monotonicity. Notice how unlike the previous graph, there is much more regularity on a comparative trade by trade basis. This comes at the cost of final returns (aka risk vs. std deviation of returns).
You take less risk when optimizing for monotonicity, but at the cost of potentially more returns.
Measuring Psychological Pain with Ulcer Index
The Ulcer Index is a measurement that captures on average how much a trade is under water on a step by step basis. We can use it to minimize how much psychological pain we are in as we watch our trades progress.
It’s calculated as so:
Taking entry and exit prices and turning that into a running equity curve.
Prices: $1.00, $1.05, $1.03, $0.98, $1.01, $1.00, $1.10
Running Equity: 0%, +5%, +3%, −2%, +1%, 0%, +10%
Walking through each point and tracking the maximum peak so far.
Running Equity: 0%, +5%, +3%, −2%, +1%, 0%, +10%
Peak so far: 0%, 5%, 5%, 5%, 5%, 5%, 10%
Doing the same, but tracking drawdown so far:
Running Equity: 0%, +5%, +3%, −2%, +1%, 0%, +10%
Drawdowns (value − peak): 0%, 0%, −2%, −7%, −4%, −5%, 0%
Squaring and summing the negatives:
Drawdowns (value − peak): 0%, 0%, −2%, −7%, −4%, −5%, 0%
Square negatives: 4, 49, 16, 25 → sum = 94
Averaging the sum over the length of the trade:
Sum: 94
Average over length of trade (7 periods): 94 / 7 ≈ 13.43
Taking the square root of that average:
Average over length of trade (7 periods): 94 / 7 ≈ 13.43
Square root: √13.42857143 = 3.66450153
A high Ulcer Index means that our trade moves against us more often than a low Ulcer Index. So, our goal is to minimize the Ulcer Index.
Here are the top 50 trades selected by lowest Ulcer Index:
You can see similar results to the monotonicity trades, which is good! Most of the time we want to optimize our strategies for consistent returns rather than risk, as we can simply leverage our systems in most cases.
Combining What We Learned into Categories
Each of the three metrics above deal with different aspects of what makes a trade ‘good.’ There is no objective ‘good.’ That is known. What we need to know is how a trade scores on different aspects of this ‘good.’
So, I came up with 5 categories of quality that we can measure:
Execution Efficiency
“Exit capture,” or realized returns / maximum favorable excursion.
Risk & Pain
Optimized Cumulative Distribution Function via Box-Cox on Ulcer Index to scale from a logarithmic distribution → 0 - 10.
Trained on a distribution of 8.8 million trades.
Smoothness
0 → 10 given the monotonicity of a trade.
Risk-Adjusted Return
Optimized Cumulative Distribution Function via Box-Cox on Exit Returns / abs(Maximum Adverse Excursion)
Trained on a distribution of 8.8 million trades.
Psychological Comfort
Percentage of time the trade is in profit (above entry price) scaled from 0% → 100% to 0 → 10
You can drop in any of your trades whether you have the prices or the returns and get out a radar chart that scores the trade based on these metrics:
Not bad. This does seem like a ‘comfortable’ trade, while it is still very choppy and has some risk. So, the radar plot works.
Creating a Composite Score and Ranking all Trades
Finally, I calculated the area of the pentagon for each radar calculated on each trade. This gives a final ‘radar score’ which I can then use to sort a list of trades.
Furthermore, if you noticed the first few graphs, the lines looked very similar. This is because there are a lot of trades that have the same exit day, but slightly different entry days. This skews the data and gives us ‘non-unique’ points in the equity curve.
I created a simple filter to group trades by that exit bar and to select the best trade for each. This filter works because this analysis used a single stop loss (Ichimoku Cloud) different trigger points for entry. When comparing all of my stops, I will only run this filter for each stop loss.
Here is the resulting graph:
And here are those 50 trades plotted on the AAPL equity curve (log-scale on the y-axis):
You can see clearly how this metric is capturing the best ‘upward’ points. The challenge now is figuring out how to take the trade at that entry point (our favorite part!).
Next Steps
Now with some clear metrics, we can rank our trades by quality and pick the ones that do the best, period.
The next challenge with this project is to figure out what entry signals map best to these points. Perhaps by creating an aggregate ‘radar score’ across all trades given a stop loss and feature combination? Perhaps using a decision tree? Perhaps by looking at the raw feature values to see if they have some sort of correlation to these periods of time? The sky is the limit.
But what is clear is that you now have a better grasp on what trade quality actually is. If you want the full code to recreate the findings here, you can purchase it from Gumroad here:
Stay tuned until next time as we progress this project forward and research more into this massive Big Data project.
Happy researching!
















Perfect! Then we have come to the conclusion that what i said is redundant
Having read your analysis, isn’t trade quality a function of security trend quality, so of market regime? You might as well score stocks for trend quality using your techniques. Yes, the issue is avoiding being late. It seems to me you are implicitly trying to gauge the market regime to get good trade quality