Algorithmic backtesting requires knowledge of many areas, including psychology, mathematics, statistics, software development and market/exchange microstructure. I couldn't hope to cover all of those topics in one article, so I'm going to split them into two or three smaller pieces. What will we discuss in this section? I'll begin by defining backtesting and then I will describe the basics of how it is carried out. Then I will elucidate upon the biases we touched upon in the Beginner's Guide to Quantitative Trading. Next I will present a comparison of the various available backtesting software options.

In subsequent articles we will look at the details of strategy implementations that are often barely mentioned or ignored. We will also consider how to make the backtesting process more realistic by including the idiosyncrasies of a trading exchange. Then we will discuss transaction costs and how to correctly model them in a backtest setting. We will end with a discussion on the performance of our backtests and finally provide an example of a common quant strategy, known as a mean-reverting pairs trade.

Let's begin by discussing what backtesting is and why we should carry it out in our algorithmic trading.

Algorithmic trading stands apart from other types of investment classes because we can more reliably provide expectations about future performance from past performance, as a consequence of abundant data availability. The process by which this is carried out is known as backtesting.

In simple terms, backtesting is carried out by exposing your particular strategy algorithm to a stream of historical financial data, which leads to a set of trading signals. Each trade (which we will mean here to be a 'round-trip' of two signals) will have an associated profit or loss. The accumulation of this profit/loss over the duration of your strategy backtest will lead to the total profit and loss (also known as the 'P&L' or 'PnL'). That is the essence of the idea, although of course the "devil is always in the details"!

What are key reasons for backtesting an algorithmic strategy?

- Filtration - If you recall from the article on Strategy Identification, our goal at the initial research stage was to set up a strategy pipeline and then filter out any strategy that did not meet certain criteria. Backtesting provides us with another filtration mechanism, as we can eliminate strategies that do not meet our performance needs.
- Modelling - Backtesting allows us to (safely!) test new models of certain market phenomena, such as transaction costs, order routing, latency, liquidity or other market microstructure issues.
- Optimisation - Although strategy optimisation is fraught with biases, backtesting allows us to increase the performance of a strategy by modifying the quantity or values of the parameters associated with that strategy and recalculating its performance.
- Verification - Our strategies are often sourced externally, via our strategy pipeline. Backtesting a strategy ensures that it has not been incorrectly implemented. Although we will rarely have access to the signals generated by external strategies, we will often have access to the performance metrics such as the Sharpe Ratio and Drawdown characteristics. Thus we can compare them with our own implementation.

Backtesting provides a host of advantages for algorithmic trading. However, it is not always possible to straightforwardly backtest a strategy. In general, as the frequency of the strategy increases, it becomes harder to correctly model the microstructure effects of the market and exchanges. This leads to less reliable backtests and thus a trickier evaluation of a chosen strategy. This is a particular problem where the execution system is the key to the strategy performance, as with ultra-high frequency algorithms.

Unfortunately, backtesting is fraught with biases of all types. We have touched upon some of these issues in previous articles, but we will now discuss them in depth.

There are many biases that can affect the performance of a backtested strategy. Unfortunately, these biases have a tendency to inflate the performance rather than detract from it. Thus you should always consider a backtest to be an idealised upper bound on the actual performance of the strategy. It is almost impossible to eliminate biases from algorithmic trading so it is our job to minimise them as best we can in order to make informed decisions about our algorithmic strategies.

There are four major biases that I wish to discuss: Optimisation Bias, Look-Ahead Bias, Survivorship Bias and Psychological Tolerance Bias.

This is probably the most insidious of all backtest biases. It involves adjusting or introducing additional trading parameters until the strategy performance on the backtest data set is very attractive. However, once live the performance of the strategy can be markedly different. Another name for this bias is "curve fitting" or "data-snooping bias".

Optimisation bias is hard to eliminate as algorithmic strategies often involve many parameters. "Parameters" in this instance might be the entry/exit criteria, look-back periods, averaging periods (i.e the moving average smoothing parameter) or volatility measurement frequency. Optimisation bias can be minimised by keeping the number of parameters to a minimum and increasing the quantity of data points in the training set. In fact, one must also be careful of the latter as older training points can be subject to a prior regime (such as a regulatory environment) and thus may not be relevant to your current strategy.

One method to help mitigate this bias is to perform a sensitivity analysis. This means varying the parameters incrementally and plotting a "surface" of performance. Sound, fundamental reasoning for parameter choices should, with all other factors considered, lead to a smoother parameter surface. If you have a very jumpy performance surface, it often means that a parameter is not reflecting a phenomena and is an artefact of the test data. There is a vast literature on multi-dimensional optimisation algorithms and it is a highly active area of research. I won't dwell on it here, but keep it in the back of your mind when you find a strategy with a fantastic backtest!

Look-ahead bias is introduced into a backtesting system when future data is accidentally included at a point in the simulation where that data would not have actually been available. If we are running the backtest chronologically and we reach time point N, then look-ahead bias occurs if data is included for any point N+k, where k>0. Look-ahead bias errors can be incredibly subtle. Here are three examples of how look-ahead bias can be introduced:

- Technical Bugs - Arrays/vectors in code often have iterators or index variables. Incorrect offsets of these indices can lead to a look-ahead bias by incorporating data at N+k for non-zero k.
- Parameter Calculation - Another common example of look-ahead bias occurs when calculating optimal strategy parameters, such as with linear regressions between two time series. If the whole data set (including future data) is used to calculate the regression coefficients, and thus retroactively applied to a trading strategy for optimisation purposes, then future data is being incorporated and a look-ahead bias exists.
- Maxima/Minima - Certain trading strategies make use of extreme values in any time period, such as incorporating the high or low prices in OHLC data. However, since these maximal/minimal values can only be calculated at the end of a time period, a look-ahead bias is introduced if these values are used -during- the current period. It is always necessary to lag high/low values by at least one period in any trading strategy making use of them.

As with optimisation bias, one must be extremely careful to avoid its introduction. It is often the main reason why trading strategies underperform their backtests significantly in "live trading".

Survivorship bias is a particularly dangerous phenomenon and can lead to significantly inflated performance for certain strategy types. It occurs when strategies are tested on datasets that do not include the full universe of prior assets that may have been chosen at a particular point in time, but only consider those that have "survived" to the current time.

As an example, consider testing a strategy on a random selection of equities before and after the 2001 market crash. Some technology stocks went bankrupt, while others managed to stay afloat and even prospered. If we had restricted this strategy only to stocks which made it through the market drawdown period, we would be introducing a survivorship bias because they have already demonstrated their success to us. In fact, this is just another specific case of look-ahead bias, as future information is being incorporated into past analysis.

There are two main ways to mitigate survivorship bias in your strategy backtests:

- Survivorship Bias Free Datasets - In the case of equity data it is possible to purchase datasets that include delisted entities, although they are not cheap and only tend to be utilised by institutional firms. In particular, Yahoo Finance data is NOT survivorship bias free, and this is commonly used by many retail algo traders. One can also trade on asset classes that are not prone to survivorship bias, such as certain commodities (and their future derivatives).
- Use More Recent Data - In the case of equities, utilising a more recent data set mitigates the possibility that the stock selection chosen is weighted to "survivors", simply as there is less likelihood of overall stock delisting in shorter time periods. One can also start building a personal survivorship-bias free dataset by collecting data from current point onward. After 3-4 years, you will have a solid survivorship-bias free set of equities data with which to backtest further strategies.

We will now consider certain psychological phenomena that can influence your trading performance.

This particular phenomena is not often discussed in the context of quantitative trading. However, it is discussed extensively in regard to more discretionary trading methods. It has various names, but I've decided to call it "psychological tolerance bias" because it captures the essence of the problem. When creating backtests over a period of 5 years or more, it is easy to look at an upwardly trending equity curve, calculate the compounded annual return, Sharpe ratio and even drawdown characteristics and be satisfied with the results. As an example, the strategy might possess a maximum relative drawdown of 25% and a maximum drawdown duration of 4 months. This would not be atypical for a momentum strategy. It is straightforward to convince oneself that it is easy to tolerate such periods of losses because the overall picture is rosy. However, in practice, it is far harder!

If historical drawdowns of 25% or more occur in the backtests, then in all likelihood you will see periods of similar drawdown in live trading. These periods of drawdown are psychologically difficult to endure. I have observed first hand what an extended drawdown can be like, in an institutional setting, and it is not pleasant - even if the backtests suggest such periods will occur. The reason I have termed it a "bias" is that often a strategy which would otherwise be successful is stopped from trading during times of extended drawdown and thus will lead to significant underperformance compared to a backtest. Thus, even though the strategy is algorithmic in nature, psychological factors can still have a heavy influence on profitability. The takeaway is to ensure that if you see drawdowns of a certain percentage and duration in the backtests, then you should expect them to occur in live trading environments, and will need to persevere in order to reach profitability once more.

The software landscape for strategy backtesting is vast. Solutions range from fully-integrated institutional grade sophisticated software through to programming languages such as C++, Python and R where nearly everything must be written from scratch (or suitable 'plugins' obtained). As quant traders we are interested in the balance of being able to "own" our trading technology stack versus the speed and reliability of our development methodology. Here are the key considerations for software choice:

- Programming Skill - The choice of environment will in a large part come down to your ability to program software. I would argue that being in control of the total stack will have a greater effect on your long term P&L than outsourcing as much as possible to vendor software. This is due to the downside risk of having external bugs or idiosyncrasies that you are unable to fix in vendor software, which would otherwise be easily remedied if you had more control over your "tech stack". You also want an environment that strikes the right balance between productivity, library availability and speed of execution. I make my own personal recommendation below.
- Execution Capability/Broker Interaction - Certain backtesting software, such as Tradestation, ties in directly with a brokerage. I am not a fan of this approach as reducing transaction costs are often a big component of getting a higher Sharpe ratio. If you're tied into a particular broker (and Tradestation "forces" you to do this), then you will have a harder time transitioning to new software (or a new broker) if the need arises. Interactive Brokers provide an API which is robust, albeit with a slightly obtuse interface.
- Customisation - An environment like MATLAB or Python gives you a great deal of flexibility when creating algo strategies as they provide fantastic libraries for nearly any mathematical operation imaginable, but also allow extensive customisation where necessary.
- Strategy Complexity - Certain software just isn't cut out for heavy number crunching or mathematical complexity. Excel is one such piece of software. While it is good for simpler strategies, it cannot really cope with numerous assets or more complicated algorithms, at speed.
- Bias Minimisation - Does a particular piece of software or data lend itself more to trading biases? You need to make sure that if you want to create all the functionality yourself, that you don't introduce bugs which can lead to biases.
- Speed of Development - One shouldn't have to spend months and months implementing a backtest engine. Prototyping should only take a few weeks. Make sure that your software is not hindering your progress to any great extent, just to grab a few extra percentage points of execution speed. C++ is the "elephant in the room" here!
- Speed of Execution - If your strategy is completely dependent upon execution timeliness (as in HFT/UHFT) then a language such as C or C++ will be necessary. However, you will be verging on Linux kernel optimisation and FPGA usage for these domains, which is outside the scope of this article!
- Cost - Many of the software environments that you can program algorithmic trading strategies with are completely free and open source. In fact, many hedge funds make use of open source software for their entire algo trading stacks. In addition, Excel and MATLAB are both relatively cheap and there are even free alternatives to each.

Now that we have listed the criteria with which we need to choose our software infrastructure, I want to run through some of the more popular packages and how they compare:

Note: I am only going to include software that is available to most retail practitioners and software developers, as this is the readership of the site. While other software is available such as the more institutional grade tools, I feel these are too expensive to be effectively used in a retail setting and I personally have no experience with them.

Backtesting Software Comparison

Description: WYSIWYG (what-you-see-is-what-you-get) spreadsheet software. Extremely widespread in the financial industry. Data and algorithm are tightly coupled.

Execution: Yes, Excel can be tied into most brokerages.

Customisation: VBA macros allow more advanced functionality at the expense of hiding implementation.

Strategy Complexity: More advanced statistical tools are harder to implement as are strategies with many hundreds of assets.

Bias Minimisation: Look-ahead bias is easy to detect via cell-highlighting functionality (assuming no VBA).

Development Speed: Quick to implement basic strategies.

Execution Speed: Slow execution speed - suitable only for lower-frequency strategies.

Cost: Cheap or free (depending upon license).

Alternatives: OpenOffice

Description: Programming environment originally designed for computational mathematics, physics and engineering. Very well suited to vectorised operations and those involving numerical linear algebra. Provides a wide array of plugins for quant trading. In widespread use in quantitative hedge funds.

Execution: No native execution capability, MATLAB requires a separate execution system.

Customisation: Huge array of community plugins for nearly all areas of computational mathematics.

Strategy Complexity: Many advanced statistical methods already available and well-tested.

Bias Minimisation: Harder to detect look-ahead bias, requires extensive testing.

Development Speed: Short scripts can create sophisticated backtests easily.

Execution Speed: Assuming a vectorised/parallelised algorithm, MATLAB is highly optimised. Poor for traditional iterated loops.

Cost: ~1,000 USD for a license.

Alternatives: Octave, SciLab

Description: High-level language designed for speed of development. Wide array of libraries for nearly any programmatic task imaginable. Gaining wider acceptance in hedge fund and investment bank community. Not quite as fast as C/C++ for execution speed.

Execution: Python plugins exist for larger brokers, such as Interactive Brokers. Hence backtest and execution system can all be part of the same "tech stack".

Customisation: Python has a very healthy development community and is a mature language. NumPy/SciPy provide fast scientific computing and statistical analysis tools relevant for quant trading.

Strategy Complexity: Many plugins exist for the main algorithms, but not quite as big a quant community as exists for MATLAB.

Bias Minimisation: Same bias minimisation problems exist as for any high level language. Need to be extremely careful about testing.

Development Speed: Pythons main advantage is development speed, with robust in built in testing capabilities.

Execution Speed: Not quite as fast as C++, but scientific computing components are optimised and Python can talk to native C code with certain plugins.

Cost: Free/Open Source

Alternatives: Ruby, Erlang, Haskell

Description: Environment designed for advanced statistical methods and time series analysis. Wide array of specific statistical, econometric and native graphing toolsets. Large developer community.

Execution: R possesses plugins to some brokers, in particular Interactive Brokers. Thus an end-to-end system can written entirely in R.

Customisation: R can be customised with any package, but its strengths lie in statistical/econometric domains.

Strategy Complexity: Mostly useful if performing econometric, statistical or machine-learning strategies due to available plugins.

Bias Minimisation: Similar level of bias possibility for any high-level language such as Python or C++. Thus testing must be carried out.

Development Speed: R is rapid for writing strategies based on statistical methods.

Execution Speed: R is slower than C++, but remains relatively optimised for vectorised operations (as with MATLAB).

Cost: Free/Open Source

Alternatives: SPSS, Stata

Description: Mature, high-level language designed for speed of execution. Wide array of quantitative finance and numerical libraries. Harder to debug and often takes longer to implement than Python or MATLAB. Extremely prevalent in both the buy- and sell-side.

Execution: Most brokerage APIs are written in C++ and Java. Thus many plugins exist.

Customisation: C/C++ allows direct access to underlying memory, hence ultra-high frequency strategies can be implemented.

Strategy Complexity: C++ STL provides wide array of optimised algorithms. Nearly any specialised mathematical algorithm possesses a free, open-source C/C++ implementation on the web.

Bias Minimisation: Look-ahead bias can be tricky to eliminate, but no harder than other high-level language. Good debugging tools, but one must be careful when dealing with underlying memory.

Development Speed: C++ is quite verbose compared to Python or MATLAB for the same algorithmm. More lines-of-code (LOC) often leads to greater likelihood of bugs.

Execution Speed: C/C++ has extremely fast execution speed and can be well optimised for specific computational architectures. This is the main reason to utilise it.

Cost: Various compilers: Linux/GCC is free, MS Visual Studio has differing licenses.

Alternatives: C#, Java, Scala

Different strategies will require different software packages. HFT and UHFT strategies will be written in C/C++ (these days they are often carried out on GPUs and FPGAs), whereas low-frequency directional equity strategies are easy to implement in TradeStation, due to the "all in one" nature of the software/brokerage.

My personal preference is for Python as it provides the right degree of customisation, speed of development, testing capability and execution speed for my needs and strategies. If I need anything faster, I can "drop in" to C++ directly from my Python programs. One method favoured by many quant traders is to prototype their strategies in Python and then convert the slower execution sections to C++ in an iterative manner. Eventually the entire algo is written in C++ and can be "left alone to trade"!

In the next few articles on backtesting we will take a look at some particular issues surrounding the implementation of an algorithmic trading backtesting system, as well as how to incorporate the effects of trading exchanges. We will discuss strategy performance measurement and finally conclude with an example strategy.

]]>Estimating the risk of loss to an algorithmic trading strategy, or portfolio of strategies, is of extreme importance for long-term capital growth. Many techniques for risk management have been developed for use in institutional settings. One technique in particular, known as Value at Risk or VaR, will be the topic of this article.

We will be applying the concept of VaR to a single strategy or a set of strategies in order to help us quantify risk in our trading portfolio. The definition of VaR is as follows:

**VaR provides an estimate, under a given degree of confidence, of the size of a loss from a portfolio over a given time period.**

In this instance "portfolio" can refer to a single strategy, a group of strategies, a trader's book, a prop desk, a hedge fund or an entire investment bank. The "given degree of confidence" will be a value of, say, 95% or 99%. The "given time period" will be chosen to reflect one that would lead to a minimal market impact if a portfolio were to be liquidated.

For example, a VaR equal to 500,000 USD at 95% confidence level for a time period of a day would simply state that there is a 95% probability of losing no more than 500,000 USD in the following day. Mathematically this is stated as:

P(L≤−5.0×10^5)=0.05

Or, more generally, for loss L exceeding a value VaR with a confidence level c we have:

P(L≤−VaR)=1−c

The "standard" calculation of VaR makes the following assumptions:

- Standard Market Conditions - VaR is not supposed to consider extreme events or "tail risk", rather it is supposed to provide the expectation of a loss under normal "day-to-day" operation.
- Volatilities and Correlations - VaR requires the volatilities of the assets under consideration, as well as their respective correlations. These two quantities are tricky to estimate and are subject to continual change.
- Normality of Returns - VaR, in its standard form, assumes the returns of the asset or portfolio are normally distributed. This leads to more straightforward analytical calculation, but it is quite unrealistic for most assets.

VaR is pervasive in the financial industry, hence you should be familiar with the benefits and drawbacks of the technique. Some of the advantages of VaR are as follows:

- VaR is very straightforward to calculate for individual assets, algo strategies, quant portfolios, hedge funds or even bank prop desks.
- The time period associated with the VaR can be modified for multiple trading strategies that have different time horizons.
- Different values of VaR can be associated with different forms of risk, say broken down by asset class or instrument type. This makes it easy to interpret where the majority of portfolio risk may be clustered, for instance.
- Individual strategies can be constrained as can entire portfolios based on their individual VaR.
- VaR is straightforward to interpret by (potentially) non-technical external investors and fund managers.

However, VaR is not without its disadvantages:

- VaR does not discuss the magnitude of the expected loss beyond the value of VaR, i.e. it will tell us that we are likely to see a loss exceeding a value, but not how much it exceeds it.
- It does not take into account extreme events, but only typical market conditions.
- Since it uses historical data (it is rearward-looking) it will not take into account future market regime shifts that can change volatilities and correlations of assets.

VaR should not be used in isolation. It should always be used with a suite of risk management techniques, such as diversification, optimal portfolio allocation and prudent use of leverage.

As of yet we have not discussed the actual calculation of VaR, either in the general case or a concrete trading example. There are three techniques that will be of interest to us. The first is the variance-covariance method (using normality assumptions), the second is a Monte Carlo method (based on an underlying, potentially non-normal, distribution) and the third is known as historical bootstrapping, which makes use of historical returns information for assets under consideration.

In this article we will concentrate on the Variance-Covariance Method and in later articles will consider the Monte Carlo and Historical Bootstrap methods.

Consider a portfolio of P dollars, with a confidence level c. We are considering daily returns, with asset (or strategy) historical standard deviation σ and mean μ. Then the daily VaR, under the variance-covariance method for a single asset (or strategy) is calculated as:

P−(P(α(1−c)+1))

Where α is the inverse of the cumulative distribution function of a normal distribution with mean μ and standard deviation σ.

We can use the SciPy and pandas libraries from Python in order to calculate these values. If we set P=106 and c=0.99, we can use the SciPy ppf method to generate the values for the inverse cumulative distribution function to a normal distribution with μ and σ obtained from some real financial data, in this case the historical daily returns of CitiGroup (we could easily substitute the returns of an algorithmic strategy in here):

# var.py import datetime import numpy as np import pandas.io.data as web from scipy.stats import norm def var_cov_var(P, c, mu, sigma): """ Variance-Covariance calculation of daily Value-at-Risk using confidence level c, with mean of returns mu and standard deviation of returns sigma, on a portfolio of value P. """ alpha = norm.ppf(1-c, mu, sigma) return P - P*(alpha + 1) if __name__ == "__main__": start = datetime.datetime(2010, 1, 1) end = datetime.datetime(2014, 1, 1) citi = web.DataReader("C", 'yahoo', start, end) citi["rets"] = citi["Adj Close"].pct_change() P = 1e6 # 1,000,000 USD c = 0.99 # 99% confidence interval mu = np.mean(citi["rets"]) sigma = np.std(citi["rets"]) var = var_cov_var(P, c, mu, sigma) print "Value-at-Risk: $%0.2f" % var

The calculated value of VaR is given by:

Value-at-Risk: $56510.29

VaR is an extremely useful and pervasive technique in all areas of financial management, but it is not without its flaws. We have yet to discuss the actual value of what could be lost in a portfolio, rather just that it may exceed a certain amount some of the time.

In follow-up articles we will not only discuss alternative calculations for VaR, but also outline the concept of Expected Shortfall (also known as Conditional Value at Risk), which provides an answer to how much is likely to be lost.

]]>The post is suitable for those who are beginning quantitative trading as well as those who have had some experience with the area. The post discusses the common pitfalls of backtesting, as well as some uncommon ones!

It also looks at the different sorts of backtesting mechanisms as well as the software landscape that implements these approaches. Then we discuss whether it is worth building your own backtester, even with the prevalence of open source tools available today.

Finally, we discuss the ins-and-outs of an event-driven backtesting system, a topic that I've covered frequently on QuantStart in prior posts.

A backtest is the application of trading strategy rules to a set of historical pricing data.

That is, if we define a set of mechanisms for entry and exit into a portfolio of assets, and apply those rules to historical pricing data of those assets, we can attempt to understand the performance of this "trading strategy" that might have been attained in the past.

It was once said that "All models are wrong, but some are useful". The same is true of backtests. So what purpose do they serve?

Backtests ultimately help us decide whether it is worth live-trading a set of strategy rules. It provides us with an idea of how a strategy might have performed in the past. Essentially it allows us to filter out bad strategy rules before we allocate any real capital.

It is easy to generate backtests. Unfortunately backtest results are not live trading results. They are instead a model of reality. A model that usually contains many assumptions.

There are two main types of software backtest - the "for-loop" and the "event-driven" systems.

When designing backtesting software there is always a trade-off between accuracy and implementation complexity. The above two backtesting types represent either end of the spectrum for this tradeoff.

There are many pitfalls associated with backtesting. They all concern the fact that a backtest is just a model of reality. Some of the more common pitfalls include:

- In-Sample Testing - This occurs when you utilise the same data to "train" your trading models as well as to "test" it. It almost always inflates the performance of a strategy beyond that which would be seen in live trading. This is because it has not been validated on unseen data, which will likely differ markedly from the training data. In essence, it is a form of overfitting.
- Survivorship Bias - For stock market indices like the S&P500, a periodic process of listing and de-listing occurs, changing the composition over time. By failing to take into account this changing composition over a backtest, trading strategies are automatically "picking the winners" by virtue of ignoring all the companies that fell out of the index due to low market capitalisation. Hence it is always necessary to use survivorship-bias free data when carrying out longer-term backtests.
- Look-Ahead Bias - Future data can "sneak in" to backtests in very subtle ways. Consider calculating a linear regression ratio over a particular time-frame. If this ratio is then used in the same sample, then we have implicitly brought in future data and thus will have likely inflated performance. Event-driven backtesters largely solve this problem, as we will discuss below.
- Market Regime Change - This concerns the fact that stock market "parameters" are not stationary. That is, the underlying process generating stock movements need not have parameters that stay constant in time. This makes it hard to generalise parametrised models (of which many trading strategies are instances of) and thus performance is likely to be higher in backtests than in live trading.
- Transaction Costs - Many For-Loop backtests do not take into account even basic transaction costs, such as fees or commissions. This is particularly true in academic papers where backtests are largely conducted free of transaction costs. Unfortunately it is all too easy to find strategies that are highly profitable without transaction costs, but make substantial losses when subjected to a real market. Typical costs include spread, market impact and slippage. All of these should be accounted for in realistic backtests.

There are some more subtle issues with backtesting that are not discusssed as often, but are still incredibly important to consider. They include:

- OHLC Data - OHLC data, that is the type of daily data taken from free sites such as Yahoo Finance, is often an amalgamation of multiple exchange feeds. Hence it is unlikely that some of the more extreme values seen (including the High and Low price of the day) would likely be obtained by a live trading system. Such "order routing" needs to be considered as part of a model.
- Capacity Constraints - When backtesting it is easy to utilise an "infinite" pot of money. However, in reality capital, as well as margin, is tightly constrained. It is necessary also to think of Average Daily Volume (ADV) limits, especially for small-cap stocks where it is possible that our trades might indeed move the market. Such "market impact" effects would need to be taken into account for risk management purposes.
- Benchmark Choice - Is the choice of benchmark against which the backtested strategy is being measured a good one? For instance if you are trading commodity futures and are neutral to the S&P500 US equity index, does it really make sense to use the S&P500 as your benchmark? Would a basket of other commodity trading funds make more sense?
- Robustness - By varying the starting time of your strategy within your backtest do the results change dramatically? It should not matter for a longer term strategy whether the backtest is started on a Monday or a Thursday. However, if it is sensitive to the "initial conditions" how can you reliably predict future performance when live trading?
- Overfitting/Bias-Variance Tradeoff - We've discussed this a little above in the In-Sample Testing point. However, overfitting is a broader problem for all (supervised) machine learning methods. The only real way to "solve" this problem is via careful use of cross-validation techniques. Even then, we should be extremely careful that we haven't simply fitted our trading strategies to noise in the training set.
- Psychological Tolerance - Psychology is often ignored in quant finance because (supposedly) it is removed by creating an algorithmic system. However, it always creeps in because quants have a tendency to "tinker" or "override" the system once deployed live. In addition, what may seem tolerable in a backtest, might be stomach-churning in live trading. If your backtested equity curve shows a 50% drawdown at some point in its trading history, could you also ride this through in a live trading scenario?

Much has been written about the problems with backtesting. Tucker Balch and Ernie Chan both consider the issues at length.

A For-Loop Backtester is the most straightforward type of backtesting system and the variant most often seen in quant blog posts, purely for its simplicity and transparency.

Essentially the For-Loop system iterates over every trading day (or OHLC bar), performs some calculation related to the price(s) of the asset(s), such as a Moving Average of the close, and then goes long or short a particular asset (often on the same closing price, but sometimes the day after). The iteration then continues. All the while the total equity is being tracked and stored to later produce an equity curve.

Here is the pseudo-code for such an algorithm:

for each trading bar: do_something_with_prices(); buy_sell_or_hold_something(); next_bar();PythonCopy

As you can see the design of such a sytem is incredibly simple. This makes it attractive for getting a "first look" at the performance of a particular strategy ruleset.

For-Loop backtesters are straightforward to implement in nearly any programming language and are very fast to execute. The latter advantage means that many parameter combinations can be tested in order to optimise the trading setup.

The main disadvantage with For-Loop backtesters is that they are quite unrealistic. They often have no transaction cost capability unless specifically added. Usually orders are filled immediately "at market" with the midpoint price. As such there is often no accounting for spread.

There is minimal code re-use between the backtesting system and the live-trading system. This means that code often needs to be written twice, introducing the possibility of more bugs.

For-Loop backtesters are prone to Look-Ahead Bias, due to bugs with indexing. For instance, should you have used "i", "i+1" or "i-1" in your panel indexing?

For-Loop backtesters should really be utilised solely as a filtration mechanism. You can use them to eliminate the obviously bad strategies, but you should remain skeptical of strong performance. Further research is often required. Strategies rarely perform better in live trading than they do in backtests!

Event-Driven Backtesters lie at the other end of the spectrum. They are much more akin to live-trading infrastructure implementations. As such, they are often more realistic in the difference between backtested and live trading performance.

Such systems are run in a large "while" loop that continually looks for "events" of differing types in the "event queue". Potential events include:

- Tick Events - Signify arrival of new market data
- Signal Events - Generation of new trading signals
- Order Events - Orders ready to be sent to market broker
- Fill Events - Fill information from the market broker

When a particular event is identified it is routed to the appropriate module(s) in the infrastructure, which handles the event and then potentially generates new events which go back to the queue.

The pseudo-code for an Event-Driven backtesting system is as follows:

while event_queue_isnt_empty(): event = get_latest_event_from_queue(); if event.type == "tick": strategy.calculate_trading_signals(event); else if event.type == "signal": portfolio.handle_signal(event); else if event.type == "order": portfolio.handle_order(event); else if event.type == "fill": portfolio.handle_fill(event) sleep(600); # Sleep for, say, 10 minsPythonCopy

As you can see there is a heavy reliance on the portfolio handler module. Such a module is the "heart" of an Event-Driven backtesting system as we will see below.

There are many advantages to using an Event-Driven backtester:

- Elimination of Look-Ahead Bias - By virtue of its message-passing design, Event-Driven systems are usually free from Look-Ahead Bias, at least at the trading level. There is the possibility of introducing bias indirectly through a pre-researched model, however.
- Code Re-Use - For live trading it is only necessary to replace the data handler and execution handler modules. All strategy, risk/position management and performance measurement code is identical. This means there are usually far less bugs to fix.
- Portfolio Level - With an Event-Driven system it is much more straightforward to think at the portfolio level. Introducing groups of instruments and strategies is easy, as are hedging instruments.
- "Proper" Risk/Position Management - Can easily modularise the risk and position management. Can introduce leverage and methodologies such as Kelly's Criterion easily. Can also easily include sector exposure warnings, ADV limits, volatility limits and illiquidity warnings.
- Remote Deployment/Monitoring - Modular nature of the code makes it easier to deploy in "the cloud" or to co-locate the software near an exchange on a virtualised system.

While the advantages are clear, there are also some strong disadvantages to using such a complex system:

- Tricky to Code - Building a fully-tested Event-Driven system will likely take weeks or months of full-time work. A corollary of this is that there is always a healthy market for freelance/contract quant developers!
- Require Object-Orientation - A modular design necessitates using object-oriented programming (OOP) principles, and thus a language that can support OOP easily. This does however make unit testing far more straightforward.
- Software Engineering - More likely to require good software engineering expertise and capabilities such as logging, unit testing, version control and continuous integration.
- Slow Execution - The message-passing nature of the code makes it far slower to execute compared to a vectorised For-Loop approach. Multiple parameter combinations can take a long time to calculate on unoptimised codes.

In this section we will consider software (both open source and commercial) that exists for both For-Loop and Event-Driven systems.

For For-Loop backtesters, the main programming languages/software that are used include Python (with the Pandas library), R (and the quantmod library) and MatLab. There are plenty of code snippets to be found on quant blogs. A great list of such blogs can be found on Quantocracy.

The market for Event-Driven systems is much larger, as clients/users often want the software to be capable of both backtesting and live trading in one package.

The expensive commercial offerings include Deltix and QuantHouse. They are often found in quant hedge funds, family offices and prop trading firms.

Cloud-based backtesting and live trading systems are relatively new. Quantopian is an example of a mature web-based setup for both backtesting and live trading.

Institutional quants often also build their own in house software. This is due to a mix of regulatory constraints, investor relations/reporting and auditability.

Retail quants have a choice between using the "cloud+data" approach of Quantopian or "rolling their own" using a cloud vendor such as Amazon Web Services, Rackspace Cloud or Microsoft Azure, along with an appropriate data vendor such as DTN IQFeed or QuantQuote.

In terms of open source software, there are many libraries available. They are mostly written in Python (for reasons I will outline below) and include Zipline (Quantopian), PyAlgoTrade, PySystemTrade (Rob Carver/Investment Idiocy) and QSTrader (QuantStart's own backtester).

One of the most important aspects, however, is that no matter which piece of software you ultimately use, it must be paired with an equally solid source of financial data. Otherwise you will be in a situation of "garbage in, garbage out" and your live trading results will differ substantially from your backtests.

While software takes care of the details for us, it hides us from many implementation details that are often crucial when we wish to expand our trading strategy complexity. At some point it is often necessary to write our own systems and the first question that arises is "Which programming language should I use?".

Despite having a background as a quantitative software developer I am not personally interested in "language wars". There are only so many hours in the day and, as quants, we need to get things done - not spend time arguing language design on internet forums!

We should only be interested in what works. Here are some of the main contenders:

Python is an extremely easy to learn programming language and is often the first language individuals come into contact with when they decide to learn programming. It has a standard library of tools that can read in nearly any form of data imaginable and talk to any other "service" very easily.

It has some exceptional quant/data science/machine learning (ML) libraries in NumPy, SciPy, Pandas, Scikit-Learn, Matplotlib, PyMC3 and Statsmodels. While it is great for ML and general data science, it does suffer a bit for more extensive classical statistical methods and time series analysis.

It is great for building both For-Loop and Event-Driven backtesting systems. In fact, it is perhaps one of the only languages that straightforwardly permits end-to-end research, backtesting, deployment, live trading, reporting and monitoring.

Perhaps its greatest drawback is that it is quite slow to execute when compared to other languages such as C++. However, work is being carried out to improve this problem and over time Python is becoming faster.

R is a statistical programming environment, rather than a full-fledged "first class programming language" (although some might argue otherwise!). It was designed primarily for performing advanced statistical analysis for time series, classical/frequentist statistics, Bayesian statistics, machine learning and exploratory data analysis.

It is widely used for For-Loop backtesting, often via the quantmod library, but is not particularly well suited to Event-Driven systems or live trading. It does however excel at strategy research.

C++ has a reputation for being extremely fast. Nearly all scientific high-performance computing is carried out either in Fortran or C++. This is its primary advantage. Hence if you are considering high frequency trading, or work on legacy systems in large organisations, then C++ is likely to be a necessity.

Unfortunately it is painful for carrying out strategy research. Due to being statically-typed it is quite tricky to easily load, read and format data compared to Python or R.

Despite its relative age, it has recently been modernised substantially with the introduction of C++11/C++14 and further standards refinements.

You may also wish to take a look at Java, Scala, C#, Julia and many of the functional languages. However, my recommendation is to stick with Python, R and/or C++, as the quant trading communities are much larger.

Answer: Yes!

It is a great learning experience to write your own Event-Driven backtesting system. Firstly, it forces you to consider all aspects of your trading infrastructure, not just spend hours tinkering on a particular strategy.

Even if you don't end up using the system for live trading, it will provide you with a huge number of questions that you should be asking of your commercial or FOSS backtesting vendors.

For example: How does your current live system differ from your backtest simulation in terms of:

While Event-Driven systems are not quick or easy to write, the experience will pay huge educational dividends later on in your quant trading career.

How do you go about writing such a system?

The best way to get started is to simply download Zipline, QSTrader, PyAlgoTrade, PySystemTrade etc and try reading through the documentation and code. They are all written in Python (due to the reasons I outlined above) and thankfully Python is very much like reading pseudo-code. That is, it is very easy to follow.

I've also written many articles on Event-Driven backtest design, which you can find here, that guide you through the development of each module of the system. Rob Carver, at Investment Idiocy also lays out his approach to building such systems to trade futures.

Remember that you don't have to be an expert on day #1. You can take it slowly, day-by-day, module-by-module. If you need help, you can always contact me or other willing quant bloggers. See the end of the article for my contact email.

I'll now discuss the modules that are often found in many Event-Driven backtesting systems. While not an exhaustive list, it should give you a "flavour" of how such systems are designed.

This is where all of the historical pricing data is stored, along with your trading history, once live. A professsional system is not just a few CSV files from Yahoo Finance!

Instead, we use a "first class" database or file system, such as PostgreSQL, MySQL, SQL Server or HDF5.

Ideally, we want to obtain and store tick-level data as it gives us an idea of trading spreads. It also means we can construct our own OHLC bars, at lower frequencies, if desired.

We should always be aware of handling corporate actions (such as stock splits and dividends), survivorship bias (stock de-listing) as well as tracking the timezone differences between various exchanges.

Individual/retail quants can compete here as many production-quality database technologies are mature, free and open source. Data itself is becoming cheaper and "democratised" via sites like Quandl.

There are still plenty of markets and strategies that are too small for the big funds to be interested in. This is a fertile ground for retail quant traders.

The trading strategy module in an Event-Driven system generally runs some kind of predictive or filtration mechanism on new market data.

It receives bar or tick data and then uses these mechanisms to produce a trading signal to long or short an asset. This module is NOT designed to produce a quantity, that is carried out via the position-sizing module.

95% of quant blog discussion usually revolves around trading strategies. I personally believe it should be more like 20%. This is because I think it is far easier to increase expected returns by reducing costs through proper risk management and position sizing, rather than chasing strategies with "more alpha".

The "heart" of an Event-Driven backtester is the Portfolio & Order Management system. It is the area which requires the most development time and quality assurance testing.

The goal of this system is to go from the current portfolio to the desired portfolio, while minimising risk and reducing transaction costs.

The module ties together the strategy, risk, position sizing and order execution capabilities of the sytem. It also handles the position calculations while backtesting to mimic a brokerage's own calculations.

The primary advantage of using such a complex system is that it allows a variety of financial instruments to be handled under a single portfolio. This is necessary for insitutional-style portfolios with hedging. Such complexity is very tricky to code in a For-Loop backtesting system.

Separating out the risk management into its own module can be extremely advantageous. The module can modify, add or veto orders that are sent from the portfolio.

In particular, the risk module can add hedges to maintain market neutrality. It can reduce order sizes due to sector exposure or ADV limits. It can completely veto a trade if the spread is too wide, or fees are too large relative to the trade size.

A separate position sizing module can implement volatility estimation and position sizing rules such as Kelly leverage. In fact, utilising a modular approach allows extensive customisation here, without affecting any of the strategy or execution code.

Such topics are not well-represented in the quant blogosphere. However, this is probably the biggest difference between how institutions and some retail traders think about their trading. Perhaps the simplest way to get better returns is to begin implementing risk management and position sizing in this manner.

In real life we are never guaranteed to get a market fill at the midpoint!

We must consider transactional issues such as capacity, spread, fees, slippage, market impact and other algorithmic execution concerns, otherwise our backtesting returns are likely to be vastly overstated.

The modular approach of an Event-Driven system allows us to easily switch-out the BacktestExecutionHandler with the LiveExecutionHandler and deploy to the remote server.

We can also easily add multiple brokerages utilising the OOP concept of "inheritance". This of course assumes that said brokerages have a straightforward Application Programming Interface (API) and don't force us to utilise a Graphical User Interface (GUI) to interact with their system.

One issue to be aware of is that of "trust" with third party libraries. There are many such modules that make it easy to talk to brokerages, but it is necessary to perform your own testing. Make sure you are completely happy with these libraries before committing extensive capital, otherwise you could lose a lot of money simply due to bugs in these modules.

Retail quants can and should borrow the sophisticated reporting techniques utilised by institutional quants. Such tools include live "dashboards" of the portfolio and corresponding risks, a "backtest equity" vs "live equity" difference or "delta", along with all the "usual" metrics such as costs per trade, the returns distribution, high water mark (HWM), maximum drawdown, average trade latency as well as alpha/beta against a benchmark.

Consistent incremental improvements should be made to this infrastructure. This can really enchance returns over the long term, simply by eliminating bugs and improving issues such as trade latency. Don't simply become fixated on improving the "world's greatest strategy" (WGS).

The WGS will eventually erode due to "alpha decay". Others will eventually discover the edge and will arbitrage away the returns. However, a robust trading infrastructure, a solid strategy research pipeline and continual learning are great ways of avoiding this fate.

Infrastructure optimisation may be more "boring" than strategy development but it becomes significantly less boring when your returns are improved!

Deployment to a remote server, along with extensive monitoring of this remote system, is absolutely crucial for institutional grade systems. Retail quants can and should utilise these ideas as well.

A robust system must be remotely deployed in "the cloud" or co-located near an exchange. Home broadband, power supplies and other factors mean that utilising a home desktop/laptop is too unreliable. Often things fail right at the worst time and lead to substantial losses.

The main issues when considering a remote deployment include; monitoring hardware, such as CPU, RAM/swap, disk and network I/O, high-availability and redundancy of systems, a well thought through backup AND restoration plan, extensive logging of all aspects of the system as well as continuous integration, unit testing and version control.

Remember Murphy's Law - "If it can fail it will fail."

There are many vendors on offer that provide relatively straightforward cloud deployments, including Amazon Web Services, Microsoft Azure, Google and Rackspace. For software engineering tasks vendors include Github, Bitbucket, Travis, Loggly and Splunk, as well as many others.

Unfortunately there is no "quick fix" in quant trading. It involves a lot of hard work and learning in order to be successful.

Perhaps a major stumbling block for beginners (and some intermediate quants!) is that they concentrate too much on the best "strategy". Such strategies always eventually succumb to alpha decay and thus become unprofitable. Hence it is necessary to be continually researching new strategies to add to a portfolio. In essence, the "strategy pipeline" should always be full.

It is also worth investing a lot of time in your trading infrastructure. Spend time on issues such as deployment and monitoring. Always try and be reducing transaction costs, as profitability is as much about reducing costs as it is about gaining trading revenue.

I recommend writing your own backtesting system simply to learn. You can either use it and continually improve it or you can find a vendor and then ask them all of the questions that you have discovered when you built your own. It will certainly make you aware of the limitations of commercially available systems.

Finally, always be reading, learning and improving. There are a wealth of textbooks, trade journals, academic journals, quant blogs, forums and magazines which discuss all aspects of trading. For more advanced strategy ideas I recommend SSRN and arXiv - Quantitative Finance.

]]>It might seem that the only important investor objective is to simply "make as much money as possible". However the reality of long-term trading is more complex. Since market participants have differing risk preferences and constraints there are many objectives that investors may possess.

Many retail traders consider the only goal to be the increase of account equity as much as possible, with little or no consideration given to the "risk" of a strategy. A more sophisticated retail investor would be measuring account drawdowns, but might also be able to stomach quite a drop in equity (say 50%) if they were aware that it was optimal, in the sense of growth rate, in the long term.

An institutional investor would think very differently about risk. It is almost certain that they will have a mandated maximum drawdown (say 20%) and that they would be considering sector allocation and average daily volume limits, which would all be additional constraints on the "optimisation problem" of capital allocation to strategies. These factors might even be more important than maximising the long-term growth rate of the portfolio.

Thus we are in a situation where we can strike a balance between maximising long-term growth rate via leverage and minimising our "risk" by trying to limit the duration and extent of the drawdown. The major tool that will help us achieve this is called the Kelly Criterion.

Within this article the Kelly Criterion is going to be our tool to control leverage of, and allocation towards, a set of algorithmic trading strategies that make up a multi-strategy portfolio.

We will define leverage as the ratio of the size of a portfolio to the actual account equity within that portfolio. To make this clear we can use the analogy of purchasing a house with a mortgage. Your down payment (or "deposit" for those of us in the UK!) constitutes your account equity, while the down payment plus the mortgage value constitutes the equivalent of the size of a portfolio. Thus a down payment of 50,000 USD on a 200,000 USD house (with a mortgage of 150,000 USD) constitutes a leverage of (150000+50000)/50000=4. Thus in this instance you would be 4x leveraged on the house. A margin account portfolio behaves similarly. There is a "cash" component and then more stock can be borrowed on margin, to provide the leverage.

Before we state the Kelly Criterion specifically I want to outline the assumptions that go into its derivation, which have varying degrees of accuracy:

- Each algorithmic trading strategy will be assumed to possess a returns stream that is normally distributed (i.e. Gaussian). Further, each strategy has its own fixed mean and standard deviation of returns. The formula assumes that these mean and std values do not change, i.e. that they are same in the past as in the future. This is clearly not the case with most strategies, so be aware of this assumption.
- The returns being considered here are excess returns, which means they are net of all financing costs such as interest paid on margin and transaction costs. If the strategy is being carried out in an institutional setting, this also means that the returns are net of management and performance fees.
- All of the trading profits are reinvested and no withdrawals of equity are carried out. This is clearly not as applicable in an institutional setting where the above mentioned management fees are taken out and investors often make withdrawals.
- All of the strategies are statistically independent (there is no correlation between strategies) and thus the covariance matrix between strategy returns is diagonal.

These assumptions are not particularly accurate but we will consider ways to relax them in later articles.

Now we come to the actual Kelly Criterion! Let's imagine that we have a set of N algorithmic trading strategies and we wish to determine both how to apply optimal leverage per strategy in order to maximise growth rate (but minimise drawdowns) and how to allocate capital between each strategy. If we denote the allocation between each strategy i as a vector f of length N, s.t. f=(f1,...,fN), then the Kelly Criterion for optimal allocation to each strategy fi is given by:

Where μi are the mean excess returns and σi are the standard deviation of excess returns for a strategy i. This formula essentially describes the optimal leverage that should be applied to each strategy.

While the Kelly Criterion fi gives us the optimal leverage and strategy allocation, we still need to actually calculate our expected long-term compounded growth rate of the portfolio, which we denote by g. The formula for this is given by:

Where r is the risk-free interest rate, which is the rate at which you can borrow from the broker, and S is the annualised Sharpe Ratio of the strategy. The latter is calculated via the annualised mean excess returns divided by the annualised standard deviations of excess returns. See this article for more details.

Note: If you would like to read a more mathematical approach to the Kelly formula, please take a look at Ed Thorp's paper on the topic: The Kelly Criterion in Blackjack Sports Betting, And The Stock Market (2007).

Let's consider an example in the single strategy case (i=1). Suppose we go long a mythical stock XYZ that has a mean annual return of m=10.7% and an annual standard deviation of σ=12.4%. In addition suppose we are able to borrow at a risk-free interest rate of r=3.0%. This implies that the mean excess returns are μ=m−r=10.7−3.0=7.7%. This gives us a Sharpe Ratio of S=0.077/0.124=0.62.

With this we can calculate the optimal Kelly leverage via f=μ/σ2=0.077/0.1242=5.01. Thus the Kelly leverage says that for a 100,000 USD portfolio we should borrow an additional 401,000 USD to have a total portfolio value of 501,000 USD. In practice it is unlikely that our brokerage would let us trade with such substantial margin and so the Kelly Criterion would need to be adjusted.

We can then use the Sharpe ratio S and the interest rate r to calculate g, the expected long-term compounded growth rate. g=r+S2/2=0.03+0.622/2=0.22, i.e. 22%. Thus we should expect a return of 22% a year from this strategy.

It is important to be aware that the Kelly Criterion requires a continuous rebalancing of capital allocation in order to remain valid. Clearly this is not possible in the discrete setting of actual trading and so an approximation must be made. The standard "rule of thumb" here is to update the Kelly allocation once a day. Further, the Kelly Criterion itself should be recalculated periodically, using a trailing mean and standard deviation with a lookback window. Again, for a strategy that trades roughly once a day, this lookback should be set to be on the order of 3-6 months of daily returns.

Here is an example of rebalancing a portfolio under the Kelly Criterion, which can lead to some counter-intuitive behaviour. Let's suppose we have the strategy described above. We have used the Kelly Criterion to borrow cash to size our portfolio to 501,000 USD. Let's assume we make a healthy 5% return on the following day, which boosts our account size to 526,050 USD. The Kelly Criterion tells us that we should borrow more to keep the same leverage factor of 5.01. In particular our account equity is 126,050 USD on a portfolio of 526,050, which means that the current leverage factor is 4.17. To increase it to 5.01, we need to borrow an additional 105,460 USD in order to increase our account size to 631,510.5 USD (this is 5.01×126050).

Now consider that the following day we lose 10% on our portfolio (ouch!). This means that the total portfolio size is now 568,359.45 USD (631510.5×0.9). Our total account equity is now 62,898.95 USD (126050−631510.45×0.1). This means our current leverage factor is 568359.45/62898.95=9.03. Hence we need to reduce our account by selling 253,235.71 USD of stock in order to reduce our total portfolio value to 315,123.73 USD, such that we have a leverage of 5.01 again (315123.73/62898.95=5.01).

Hence we have bought into a profit and sold into a loss. This process of selling into a loss may be extremely emotionally difficult, but it is mathematically the "correct" thing to do, assuming that the assumptions of Kelly have been met! It is the approach to follow in order to maximise long-term compounded growth rate.

You may have noticed that the absolute values of money being re-allocated between days were rather severe. This is a consequence of both the artificial nature of the example and the extensive leverage employed. 10% loss in a day is not particularly common in higher-frequency algorithmic trading, but it does serve to show how extensive leverage can be on absolute terms.

Since the estimation of means and standard deviations are always subject to uncertainty, in practice many traders tend to use a more conservative leverage regime such as the Kelly Criterion divided by two, affectionately known as "half-Kelly". The Kelly Criterion should really be considered as an upper bound of leverage to use, rather than a direct specification. If this advice is not heeded then using the direct Kelly value can lead to ruin (i.e. account equity disappearing to zero) due to the non-Gaussian nature of the strategy returns.

Every algorithmic trader is different and the same is true of risk preferences. When choosing to employ a leverage strategy (of which the Kelly Criterion is one example) you should consider the risk mandates that you need to work under. In a retail environment you are able to set your own maximum drawdown limits and thus your leverage can be increased. In an institutional setting you will need to consider risk from a very different perspective and the leverage factor will be one component of a much larger framework, usually under many other constraints.

In later articles we will consider other forms of money (and risk!) management, some of which can help with the additional constraints discussed above.

]]>In addition, if we are presented with two strategies possessing identical returns how do we know which one contains more risk? Further, what do we even mean by "more risk"? In finance, we are often concerned with volatility of returns and periods of drawdown. Thus if one of these strategies has a significantly higher volatility of returns we would likely find it less attractive, despite the fact that its historical returns might be similar if not identical.

These problems of strategy comparison and risk assessment motivate the use of the Sharpe Ratio.

William Forsyth Sharpe is a Nobel-prize winning economist, who helped create the Capital Asset Pricing Model (CAPM) and developed the Sharpe Ratio in 1966 (later updated in 1994).

The Sharpe Ratio S is defined by the following relation:

Where Ra is the period return of the asset or strategy and Rb is the period return of a suitable benchmark.

The ratio compares the mean average of the excess returns of the asset or strategy with the standard deviation of those returns. Thus a lower volatility of returns will lead to a greater Sharpe ratio, assuming identical returns.

The "Sharpe Ratio" often quoted by those carrying out trading strategies is the annualised Sharpe, the calculation of which depends upon the trading period of which the returns are measured. Assuming there are N trading periods in a year, the annualised Sharpe is calculated as follows:

Note that the Sharpe ratio itself MUST be calculated based on the Sharpe of that particular time period type. For a strategy based on trading period of days, N=252 (as there are 252 trading days in a year, not 365), and Ra, Rb must be the daily returns. Similarly for hours N=252×6.5=1638, not N=252×24=6048, since there are only 6.5 hours in a trading day.

The formula for the Sharpe ratio above alludes to the use of a benchmark. A benchmark is used as a "yardstick" or a "hurdle" that a particular strategy must overcome for it to worth considering. For instance, a simple long-only strategy using US large-cap equities should hope to beat the S&P500 index on average, or match it for less volatility.

The choice of benchmark can sometimes be unclear. For instance, should a sector Exhange Traded Fund (ETF) be utilised as a performance benchmark for individual equities, or the S&P500 itself? Why not the Russell 3000? Equally should a hedge fund strategy be benchmarking itself against a market index or an index of other hedge funds? There is also the complication of the "risk free rate". Should domestic government bonds be used? A basket of international bonds? Short-term or long-term bills? A mixture? Clearly there are plenty of ways to choose a benchmark! The Sharpe ratio generally utilises the risk-free rate and often, for US equities strategies, this is based on 10-year government Treasury bills.

In one particular instance, for market-neutral strategies, there is a particular complication regarding whether to make use of the risk-free rate or zero as the benchmark. The market index itself should not be utilised as the strategy is, by design, market-neutral. The correct choice for a market-neutral portfolio is not to substract the risk-free rate because it is self-financing. Since you gain a credit interest, Rf, from holding a margin, the actual calculation for returns is: (Ra+Rf)−Rf=Ra. Hence there is no actual subtraction of the risk-free rate for dollar neutral strategies.

Despite the prevalence of the Sharpe ratio within quantitative finance, it does suffer from some limitations.

Firstly, the Sharpe ratio is backward looking. It only accounts for historical returns distribution and volatility, not those occuring in the future. When making judgements based on the Sharpe ratio there is an implicit assumption that the past will be similar to the future. This is evidently not always the case, particular under market regime changes.

The Sharpe ratio calculation assumes that the returns being used are normally distributed (i.e. Gaussian). Unfortunately, markets often suffer from kurtosis above that of a normal distribution. Essentially the distribution of returns has "fatter tails" and thus extreme events are more likely to occur than a Gaussian distribution would lead us to believe. Hence, the Sharpe ratio is poor at characterising tail risk.

This can be clearly seen in strategies which are highly prone to such risks. For instance, the sale of call options (aka "pennies under a steam roller"). A steady stream of option premia are generated by the sale of call options over time, leading to a low volatility of returns, with a strong excess above a benchmark. In this instance the strategy would possess a high Sharpe ratio (based on historical data). However, it does not take into account that such options may be called, leading to significant and sudden drawdowns (or even wipeout) in the equity curve. Hence, as with any measure of algorithmic trading strategy performance, the Sharpe ratio cannot be used in isolation.

Although this point might seem obvious to some, transaction costs MUST be included in the calculation of Sharpe ratio in order for it to be realistic. There are countless examples of trading strategies that have high Sharpes (and thus a likelihood of great profitability) only to be reduced to low Sharpe, low profitability strategies once realistic costs have been factored in. This means making use of the net returns when calculating in excess of the benchmark. Hence, transaction costs must be factored in upstream of the Sharpe ratio calculation.

One obvious question that has remained unanswered thus far in this article is "What is a good Sharpe Ratio for a strategy?". Pragmatically, you should ignore any strategy that possesses an annualised Sharpe ratio S<1 after transaction costs. Quantitative hedge funds tend to ignore any strategies that possess Sharpe ratios S<2. One prominent quantitative hedge fund that I am familiar with wouldn't even consider strategies that had Sharpe ratios S<3 while in research. As a retail algorithmic trader, if you can achieve a Sharpe ratio S>2 then you are doing very well.

The Sharpe ratio will often increase with trading frequency. Some high frequency strategies will have high single (and sometimes low double) digit Sharpe ratios, as they can be profitable almost every day and certainly every month. These strategies rarely suffer from catastrophic risk and thus minimise their volatility of returns, which leads to such high Sharpe ratios.

This has been quite a theoretical article up to this point. Now we will turn our attention to some actual examples. We will start simply, by considering a long-only buy-and-hold of an individual equity then consider a market-neutral strategy. Both of these examples have been carried out in the Python pandas data analysis library.

The first task is to actually obtain the data and put it into a pandas DataFrame object. In the article on securities master implementation in Python and MySQL I created a system for achieving this. Alternatively, we can make use of this simpler code to grab Yahoo Finance data directly and put it straight into a pandas DataFrame. At the bottom of this script I have created a function to calculate the annualised Sharpe ratio based on a time-period returns stream:

import datetime import numpy as np import pandas as pd import urllib2 def get_historic_data(ticker, start_date=(2000,1,1), end_date=datetime.date.today().timetuple()[0:3]): """ Obtains data from Yahoo Finance and adds it to a pandas DataFrame object. ticker: Yahoo Finance ticker symbol, e.g. "GOOG" for Google, Inc. start_date: Start date in (YYYY, M, D) format end_date: End date in (YYYY, M, D) format """ # Construct the Yahoo URL with the correct integer query parameters # for start and end dates. Note that some parameters are zero-based! yahoo_url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=%s&b=%s&c=%s&d=%s&e=%s&f=%s" % \ (ticker, start_date[1] - 1, start_date[2], start_date[0], end_date[1] - 1, end_date[2], end_date[0]) # Try connecting to Yahoo Finance and obtaining the data # On failure, print an error message try: yf_data = urllib2.urlopen(yahoo_url).readlines() except Exception, e: print "Could not download Yahoo data: %s" % e # Create the (temporary) Python data structures to store # the historical data date_list = [] hist_data = [[] for i in range(6)] # Format and copy the raw text data into datetime objects # and floating point values (still in native Python lists) for day in yf_data[1:]: # Avoid the header line in the CSV headers = day.rstrip().split(',') date_list.append(datetime.datetime.strptime(headers[0],'%Y-%m-%d')) for i, header in enumerate(headers[1:]): hist_data[i].append(float(header)) # Create a Python dictionary of the lists and then use that to # form a sorted Pandas DataFrame of the historical data hist_data = dict(zip(['open', 'high', 'low', 'close', 'volume', 'adj_close'], hist_data)) pdf = pd.DataFrame(hist_data, index=pd.Index(date_list)).sort() return pdf def annualised_sharpe(returns, N=252): """ Calculate the annualised Sharpe ratio of a returns stream based on a number of trading periods, N. N defaults to 252, which then assumes a stream of daily returns. The function assumes that the returns are the excess of those compared to a benchmark. """ return np.sqrt(N) * returns.mean() / returns.std()

Now that we have the ability to obtain data from Yahoo Finance and straightforwardly calculate the annualised Sharpe ratio, we can test out a buy and hold strategy for two equities. We will use Google (GOOG) and Goldman Sachs (GS) from Jan 1st 2000 to May 29th 2013 (when I wrote this article!).

We can create an additional helper function that allows us to quickly see buy-and-hold Sharpe across multiple equities for the same (hardcoded) period:

def equity_sharpe(ticker): """ Calculates the annualised Sharpe ratio based on the daily returns of an equity ticker symbol listed in Yahoo Finance. The dates have been hardcoded here for the QuantStart article on Sharpe ratios. """ # Obtain the equities daily historic data for the desired time period # and add to a pandas DataFrame pdf = get_historic_data(ticker, start_date=(2000,1,1), end_date=(2013,5,29)) # Use the percentage change method to easily calculate daily returns pdf['daily_ret'] = pdf['adj_close'].pct_change() # Assume an average annual risk-free rate over the period of 5% pdf['excess_daily_ret'] = pdf['daily_ret'] - 0.05/252 # Return the annualised Sharpe ratio based on the excess daily returns return annualised_sharpe(pdf['excess_daily_ret'])

For Google, the Sharpe ratio for buying and holding is 0.7501. For Goldman Sachs it is 0.2178:

- equity_sharpe('GOOG')

0.75013831274645904 - equity_sharpe('GS')

0.21777027767830823

Now we can try the same calculation for a market-neutral strategy. The goal of this strategy is to fully isolate a particular equity's performance from the market in general. The simplest way to achieve this is to go short an equal amount (in dollars) of an Exchange Traded Fund (ETF) that is designed to track such a market. The most ovious choice for the US large-cap equities market is the S&P500 index, which is tracked by the SPDR ETF, with the ticker of SPY.

To calculate the annualised Sharpe ratio of such a strategy we will obtain the historical prices for SPY and calculate the percentage returns in a similar manner to the previous stocks, with the exception that we will not use the risk-free benchmark. We will calculate the net daily returns which requires subtracting the difference between the long and the short returns and then dividing by 2, as we now have twice as much trading capital. Here is the Python/pandas code to carry this out:

def market_neutral_sharpe(ticker, benchmark): """ Calculates the annualised Sharpe ratio of a market neutral long/short strategy inolving the long of 'ticker' with a corresponding short of the 'benchmark'. """ # Get historic data for both a symbol/ticker and a benchmark ticker # The dates have been hardcoded, but you can modify them as you see fit! tick = get_historic_data(ticker, start_date=(2000,1,1), end_date=(2013,5,29)) bench = get_historic_data(benchmark, start_date=(2000,1,1), end_date=(2013,5,29)) # Calculate the percentage returns on each of the time series tick['daily_ret'] = tick['adj_close'].pct_change() bench['daily_ret'] = bench['adj_close'].pct_change() # Create a new DataFrame to store the strategy information # The net returns are (long - short)/2, since there is twice # trading capital for this strategy strat = pd.DataFrame(index=tick.index) strat['net_ret'] = (tick['daily_ret'] - bench['daily_ret'])/2.0 # Return the annualised Sharpe ratio for this strategy return annualised_sharpe(strat['net_ret'])

For Google, the Sharpe ratio for the long/short market-neutral strategy is 0.7597. For Goldman Sachs it is 0.2999:

- market_neutral_sharpe('GOOG', 'SPY')

0.75966612163452329 - market_neutral_sharpe('GS', 'SPY')

0.29991401047248328

Despite the Sharpe ratio being used almost everywhere in algorithmic trading, we need to consider other metrics of performance and risk. In later articles we will discuss drawdowns and how they affect the decision to run a strategy or not.

Futures are a form of contract drawn up between two parties for the purchase or sale of a quantity of an underlying asset at a specified date in the future. This date is known as the delivery or expiration. When this date is reached the buyer must deliver the physical underlying (or cash equivalent) to the seller for the price agreed at the contract formation date.

In practice futures are traded on exchanges (as opposed to Over The Counter - OTC trading) for standardised quantities and qualities of the underlying. The prices are marked to market every day. Futures are incredibly liquid and are used heavily for speculative purposes. While futures were often utilised to hedge the prices of agricultural or industrial goods, a futures contract can be formed on any tangible or intangible underlying such as stock indices, interest rates of foreign exchange values.

A detailed list of all the symbol codes used for futures contracts across various exchanges can be found on the CSI Data site: Futures Factsheet.

The main difference between a futures contract and equity ownership is the fact that a futures contract has a limited window of availability by virtue of the expiration date. At any one instant there will be a variety of futures contracts on the same underlying all with varying dates of expiry. The contract with the nearest date of expiry is known as the near contract. The problem we face as quantitative traders is that at any point in time we have a choice of multiple contracts with which to trade. Thus we are dealing with an overlapping set of time series rather than a continuous stream as in the case of equities or foreign exchange.

The goal of this article is to outline various approaches to constructing a continuous stream of contracts from this set of multiple series and to highlight the tradeoffs associated with each technique.

The main difficulty with trying to generate a continuous contract from the underlying contracts with varying deliveries is that the contracts do not often trade at the same prices. Thus situations arise where they do not provide a smooth splice from one to the next. This is due to contango and backwardation effects. There are various approaches to tackling this problem, which we now discuss.

Unfortunately there is no single "standard" method for joining futures contracts together in the financial industry. Ultimately the method chosen will depend heavily upon the strategy employing the contracts and the method of execution. Despite the fact that no single method exists there are some common approaches:

This method alleviates the "gap" across multiple contracts by shifting each contract such that the individual deliveries join in a smooth manner to the adjacent contracts. Thus the open/close across the prior contracts at expiry matches up.

The key problem with the Panama method includes the introduction of a trend bias, which will introduce a large drift to the prices. This can lead to negative data for sufficiently historical contracts. In addition there is a loss of the relative price differences due to an absolute shift in values. This means that returns are complicated to calculate (or just plain incorrect).

The Proportionality Adjustment approach is similar to the adjustment methodology of handling stock splits in equities. Rather than taking an absolute shift in the successive contracts, the ratio of the older settle (close) price to the newer open price is used to proportionally adjust the prices of historical contracts. This allows a continous stream without an interruption of the calculation of percentage returns.

The main issue with proportional adjustment is that any trading strategies reliant on an absolute price level will also have to be similarly adjusted in order to execute the correct signal. This is a problematic and error-prone process. Thus this type of continuous stream is often only useful for summary statistical analysis, as opposed to direct backtesting research.

The essence of this approach is to create a continuous contract of successive contracts by taking a linearly weighted proportion of each contract over a number of days to ensure a smoother transition between each.

For example consider five smoothing days. The price on day 1, P1, is equal to 80% of the far contract price (F1) and 20% of the near contract price (N1). Similarly, on day 2 the price is P2=0.6×F2+0.4×N2. By day 5 we have P5=0.0×F5+1.0×N5=N5 and the contract then just becomes a continuation of the near price. Thus after five days the contract is smoothly transitioned from the far to the near.

The problem with the rollover method is that it requires trading on all five days, which can increase transaction costs.

There are other less common approaches to the problem but we will avoid them here.

The remainder of the article will concentrate on implementing the perpetual series method as this is most appropriate for backtesting. It is a useful way to carry out strategy pipeline research.

We are going to stitch together the WTI Crude Oil "near" and "far" futures contract (symbol CL) in order to generate a continuous price series. At the time of writing (January 2014), the near contract is CLF2014 (January) and the far contract is CLG2014 (February).

In order to carry out the download of futures data I've made use of the Quandl plugin. Make sure to set the correct Python virtual environment on your system and install the Quandl package by typing the following into the terminal:

import datetime import numpy as np import pandas as pd import Quandl

The main work is carried out in the futures_rollover_weights function. It requires a starting date (the first date of the near contract), a dictionary of contract settlement dates (expiry_dates), the symbols of the contracts and the number of days to roll the contract over (defaulting to five). The comments below explain the code:

def futures_rollover_weights(start_date, expiry_dates, contracts, rollover_days=5): """This constructs a pandas DataFrame that contains weights (between 0.0 and 1.0) of contract positions to hold in order to carry out a rollover of rollover_days prior to the expiration of the earliest contract. The matrix can then be 'multiplied' with another DataFrame containing the settle prices of each contract in order to produce a continuous time series futures contract.""" # Construct a sequence of dates beginning from the earliest contract start # date to the end date of the final contract dates = pd.date_range(start_date, expiry_dates[-1], freq='B') # Create the 'roll weights' DataFrame that will store the multipliers for # each contract (between 0.0 and 1.0) roll_weights = pd.DataFrame(np.zeros((len(dates), len(contracts))), index=dates, columns=contracts) prev_date = roll_weights.index[0] # Loop through each contract and create the specific weightings for # each contract depending upon the settlement date and rollover_days for i, (item, ex_date) in enumerate(expiry_dates.iteritems()): if i < len(expiry_dates) - 1: roll_weights.ix[prev_date:ex_date - pd.offsets.BDay(), item] = 1 roll_rng = pd.date_range(end=ex_date - pd.offsets.BDay(), periods=rollover_days + 1, freq='B') # Create a sequence of roll weights (i.e. [0.0,0.2,...,0.8,1.0] # and use these to adjust the weightings of each future decay_weights = np.linspace(0, 1, rollover_days + 1) roll_weights.ix[roll_rng, item] = 1 - decay_weights roll_weights.ix[roll_rng, expiry_dates.index[i+1]] = decay_weights else: roll_weights.ix[prev_date:, item] = 1 prev_date = ex_date return roll_weights

Now that the weighting matrix has been produced, it is possible to apply this to the individual time series. The main function downloads the near and far contracts, creates a single DataFrame for both, constructs the rollover weighting matrix and then finally produces a continuous series of both prices, appropriately weighted:

if __name__ == "__main__": # Download the current Front and Back (near and far) futures contracts # for WTI Crude, traded on NYMEX, from Quandl.com. You will need to # adjust the contracts to reflect your current near/far contracts # depending upon the point at which you read this! wti_near = Quandl.get("OFDP/FUTURE_CLF2014") wti_far = Quandl.get("OFDP/FUTURE_CLG2014") wti = pd.DataFrame({'CLF2014': wti_near['Settle'], 'CLG2014': wti_far['Settle']}, index=wti_far.index) # Create the dictionary of expiry dates for each contract expiry_dates = pd.Series({'CLF2014': datetime.datetime(2013, 12, 19), 'CLG2014': datetime.datetime(2014, 2, 21)}).order() # Obtain the rollover weighting matrix/DataFrame weights = futures_rollover_weights(wti_near.index[0], expiry_dates, wti.columns) # Construct the continuous future of the WTI CL contracts wti_cts = (wti * weights).sum(1).dropna() # Output the merged series of contract settle prices wti_cts.tail(60)

The output is as follows:

2013-10-14 102.230

2013-10-15 101.240

2013-10-16 102.330

2013-10-17 100.620

2013-10-18 100.990

2013-10-21 99.760

2013-10-22 98.470

2013-10-23 97.000

2013-10-24 97.240

2013-10-25 97.950

..

..

2013-12-24 99.220

2013-12-26 99.550

2013-12-27 100.320

2013-12-30 99.290

2013-12-31 98.420

2014-01-02 95.440

2014-01-03 93.960

2014-01-06 93.430

2014-01-07 93.670

2014-01-08 92.330

Length: 60, dtype: float64

It can be seen that the series is now continuous across the two contracts. The next step is to carry this out for multiple deliveries across a variety of years, depending upon your backtesting needs.

]]>In this article (and those that follow it) a basic object-oriented backtesting system written in Python will be outlined. This early system will primarily be a "teaching aid", used to demonstrate the different components of a backtesting system. As we progress through the articles, more sophisticated functionality will be added.

The process of designing a robust backtesting system is extremely difficult. Effectively simulating all of the components that affect the performance of an algorithmic trading system is challenging. Poor data granularity, opaqueness of order routing at a broker, order latency and a myriad of other factors conspire to alter the "true" performance of a strategy versus the backtested performance.

When developing a backtesting system it is tempting to want to constantly "rewrite it from scratch" as more factors are found to be crucial in assessing performance. No backtesting system is ever finished and a judgement must be made at a point during development that enough factors have been captured by the system.

With these concerns in mind the backtester presented in here will be somewhat simplistic. As we explore further issues (portfolio optimisation, risk management, transaction cost handling) the backtester will become more robust.

There are generally two types of backtesting system that will be of interest. The first is research-based, used primarily in the early stages, where many strategies will be tested in order to select those for more serious assessment. These research backtesting systems are often written in Python, R or MatLab as speed of development is more important than speed of execution in this phase.

The second type of backtesting system is event-based. That is, it carries out the backtesting process in an execution loop similar (if not identical) to the trading execution system itself. It will realistically model market data and the order execution process in order to provide a more rigourous assessment of a strategy.

The latter systems are often written in a high-performance language such as C++ or Java, where speed of execution is essential. For lower frequency strategies (although still intraday), Python is more than sufficient to be used in this context.

The design and implementation of an object-oriented research-based backtesting environment will now be discussed. Object orientation has been chosen as the software design paradigm for the following reasons:

- The interfaces of each component can be specified upfront, while the internals of each component can be modified (or replaced) as the project progresses
- By specifying the interfaces upfront it is possible to effectively test how each component behaves (via unit testing)
- When extending the system new components can be constructed upon or in addition to others, either by inheritance or composition

At this stage the backtester is designed for ease of implementation and a reasonable degree of flexibility, at the expense of true market accuracy. In particular, this backtester will only be able to handle strategies acting on a single instrument. Later the backtester will modified to handle sets of instruments. For the initial backtester, the following components are required:

- Strategy - A Strategy class receives a Pandas DataFrame of bars, i.e. a list of Open-High-Low-Close-Volume (OHLCV) data points at a particular frequency. The Strategy will produce a list of signals, which consist of a timestamp and an element from the set {1,0,−1} indicating a long, hold or short signal respectively.
- Portfolio - The majority of the backtesting work will occur in the Portfolio class. It will receive a set of signals (as described above) and create a series of positions, allocated against a cash component. The job of the Portfolio object is to produce an equity curve, incorporate basic transaction costs and keep track of trades.
- Performance - The Performance object takes a portfolio and produces a set of statistics about its performance. In particular it will output risk/return characteristics (Sharpe, Sortino and Information Ratios), trade/profit metrics and drawdown information.

As can be seen this backtester does not include any reference to portfolio/risk management, execution handling (i.e. no limit orders) nor will it provide sophisticated modelling of transaction costs. This isn't much of a problem at this stage. It allows us to gain familiarity with the process of creating an object-oriented backtester and the Pandas/NumPy libraries. In time it will be improved.

We will now proceed to outline the implementations for each object.

The Strategy object must be quite generic at this stage, since it will be handling forecasting, mean-reversion, momentum and volatility strategies. The strategies being considered here will always be time series based, i.e. "price driven". An early requirement for this backtester is that derived Strategy classes will accept a list of bars (OHLCV) as input, rather than ticks (trade-by-trade prices) or order-book data. Thus the finest granularity being considered here will be 1-second bars.

The Strategy class will also always produce signal recommendations. This means that it will advise a Portfolio instance in the sense of going long/short or holding a position. This flexibility will allow us to create multiple Strategy "advisors" that provide a set of signals, which a more advanced Portfolio class can accept in order to determine the actual positions being entered.

The interface of the classes will be enforced by utilising an abstract base class methodology. An abstract base class is an object that cannot be instantiated and thus only derived classes can be created. The Python code is given below in a file called backtest.py. The Strategy class requires that any subclass implement the generate_signals method.

In order to prevent the Strategy class from being instantiated directly (since it is abstract!) it is necessary to use the ABCMeta and abstractmethod objects from the abc module. We set a property of the class, called **metaclass** to be equal to ABCMeta and then decorate the generate_signals method with the abstractmethod decorator.

# backtest.py from abc import ABCMeta, abstractmethod class Strategy(object): """Strategy is an abstract base class providing an interface for all subsequent (inherited) trading strategies. The goal of a (derived) Strategy object is to output a list of signals, which has the form of a time series indexed pandas DataFrame. In this instance only a single symbol/instrument is supported.""" __metaclass__ = ABCMeta @abstractmethod def generate_signals(self): """An implementation is required to return the DataFrame of symbols containing the signals to go long, short or hold (1, -1 or 0).""" raise NotImplementedError("Should implement generate_signals()!")

While the above interface is straightforward it will become more complicated when this class is inherited for each specific type of strategy. Ultimately the goal of the Strategy class in this setting is to provide a list of long/short/hold signals for each instrument to be sent to a Portfolio.

The Portfolio class is where the majority of the trading logic will reside. For this research backtester the Portfolio is in charge of determining position sizing, risk analysis, transaction cost management and execution handling (i.e. market-on-open, market-on-close orders). At a later stage these tasks will be broken down into separate components. Right now they will be rolled in to one class.

This class makes ample use of pandas and provides a great example of where the library can save a huge amount of time, particularly in regards to "boilerplate" data wrangling. As an aside, the main trick with pandas and NumPy is to avoid iterating over any dataset using the for d in ... syntax. This is because NumPy (which underlies pandas) optimises looping by vectorised operations. Thus you will see few (if any!) direct iterations when utilising pandas.

The goal of the Portfolio class is to ultimately produce a sequence of trades and an equity curve, which will be analysed by the Performance class. In order to achieve this it must be provided with a list of trading recommendations from a Strategy object. Later on, this will be a group of Strategy objects.

The Portfolio class will need to be told how capital is to be deployed for a particular set of trading signals, how to handle transaction costs and which forms of orders will be utilised. The Strategy object is operating on bars of data and thus assumptions must be made in regard to prices achieved at execution of an order. Since the high/low price of any bar is unknown a priori it is only possible to use the open and close prices for trading. In reality it is impossible to guarantee that an order will be filled at one of these particular prices when using a market order, so it will be, at best, an approximation.

In addition to assumptions about orders being filled, this backtester will ignore all concepts of margin/brokerage constraints and will assume that it is possible to go long and short in any instrument freely without any liquidity constraints. This is clearly a very unrealistic assumption, but is one that can be relaxed later.

The following listing continues backtest.py:

# backtest.py class Portfolio(object): """An abstract base class representing a portfolio of positions (including both instruments and cash), determined on the basis of a set of signals provided by a Strategy.""" __metaclass__ = ABCMeta @abstractmethod def generate_positions(self): """Provides the logic to determine how the portfolio positions are allocated on the basis of forecasting signals and available cash.""" raise NotImplementedError("Should implement generate_positions()!") @abstractmethod def backtest_portfolio(self): """Provides the logic to generate the trading orders and subsequent equity curve (i.e. growth of total equity), as a sum of holdings and cash, and the bar-period returns associated with this curve based on the 'positions' DataFrame. Produces a portfolio object that can be examined by other classes/functions.""" raise NotImplementedError("Should implement backtest_portfolio()!")

At this stage the Strategy and Portfolio abstract base classes have been introduced. We are now in a position to generate some concrete derived implementations of these classes, in order to produce a working "toy strategy".

We will begin by generating a subclass of Strategy called RandomForecastStrategy, the sole task of which is to produce randomly chosen long/short signals! While this is clearly a nonsensical trading strategy, it will serve our needs by demonstrating the object oriented backtesting framework. Thus we will begin a new file called random_forecast.py, with the listing for the random forecaster as follows:

# random_forecast.py import numpy as np import pandas as pd import Quandl # Necessary for obtaining financial data easily from backtest import Strategy, Portfolio class RandomForecastingStrategy(Strategy): """Derives from Strategy to produce a set of signals that are randomly generated long/shorts. Clearly a nonsensical strategy, but perfectly acceptable for demonstrating the backtesting infrastructure!""" def __init__(self, symbol, bars): """Requires the symbol ticker and the pandas DataFrame of bars""" self.symbol = symbol self.bars = bars def generate_signals(self): """Creates a pandas DataFrame of random signals.""" signals = pd.DataFrame(index=self.bars.index) signals['signal'] = np.sign(np.random.randn(len(signals))) # The first five elements are set to zero in order to minimise # upstream NaN errors in the forecaster. signals['signal'][0:5] = 0.0 return signals

Now that we have a "concrete" forecasting system, we must create an implementation of a Portfolio object. This object will encompass the majority of the backtesting code. It is designed to create two separate DataFrames, the first of which is a positions frame, used to store the quantity of each instrument held at any particular bar. The second, portfolio, actually contains the market price of all holdings for each bar, as well as a tally of the cash, assuming an initial capital. This ultimately provides an equity curve on which to assess strategy performance.

The Portfolio object, while extremely flexible in its interface, requires specific choices when regarding how to handle transaction costs, market orders etc. In this basic example I have considered that it will be possible to go long/short an instrument easily with no restrictions or margin, buy or sell directly at the open price of the bar, zero transaction costs (encompassing slippage, fees and market impact) and have specified the quantity of stock directly to purchase for each trade.

Here is the continuation of the random_forecast.py listing:

# random_forecast.py class MarketOnOpenPortfolio(Portfolio): """Inherits Portfolio to create a system that purchases 100 units of a particular symbol upon a long/short signal, assuming the market open price of a bar. In addition, there are zero transaction costs and cash can be immediately borrowed for shorting (no margin posting or interest requirements). Requires: symbol - A stock symbol which forms the basis of the portfolio. bars - A DataFrame of bars for a symbol set. signals - A pandas DataFrame of signals (1, 0, -1) for each symbol. initial_capital - The amount in cash at the start of the portfolio.""" def __init__(self, symbol, bars, signals, initial_capital=100000.0): self.symbol = symbol self.bars = bars self.signals = signals self.initial_capital = float(initial_capital) self.positions = self.generate_positions() def generate_positions(self): """Creates a 'positions' DataFrame that simply longs or shorts 100 of the particular symbol based on the forecast signals of {1, 0, -1} from the signals DataFrame.""" positions = pd.DataFrame(index=signals.index).fillna(0.0) positions[self.symbol] = 100*signals['signal'] return positions def backtest_portfolio(self): """Constructs a portfolio from the positions DataFrame by assuming the ability to trade at the precise market open price of each bar (an unrealistic assumption!). Calculates the total of cash and the holdings (market price of each position per bar), in order to generate an equity curve ('total') and a set of bar-based returns ('returns'). Returns the portfolio object to be used elsewhere.""" # Construct the portfolio DataFrame to use the same index # as 'positions' and with a set of 'trading orders' in the # 'pos_diff' object, assuming market open prices. portfolio = self.positions*self.bars['Open'] pos_diff = self.positions.diff() # Create the 'holdings' and 'cash' series by running through # the trades and adding/subtracting the relevant quantity from # each column portfolio['holdings'] = (self.positions*self.bars['Open']).sum(axis=1) portfolio['cash'] = self.initial_capital - (pos_diff*self.bars['Open']).sum(axis=1).cumsum() # Finalise the total and bar-based returns based on the 'cash' # and 'holdings' figures for the portfolio portfolio['total'] = portfolio['cash'] + portfolio['holdings'] portfolio['returns'] = portfolio['total'].pct_change() return portfolio

This gives us everything we need to generate an equity curve based on such a system. The final step is to tie it all together with a **main** function:

if __name__ == "__main__": # Obtain daily bars of SPY (ETF that generally # follows the S&P500) from Quandl (requires 'pip install Quandl' # on the command line) symbol = 'SPY' bars = Quandl.get("GOOG/NYSE_%s" % symbol, collapse="daily") # Create a set of random forecasting signals for SPY rfs = RandomForecastingStrategy(symbol, bars) signals = rfs.generate_signals() # Create a portfolio of SPY portfolio = MarketOnOpenPortfolio(symbol, bars, signals, initial_capital=100000.0) returns = portfolio.backtest_portfolio() print returns.tail(10)

The output of the program is as follows. Yours will differ from the output below depending upon the date range you select and the random seed used:

SPY holdings cash total returns

Date

2014-01-02 -18398 -18398 111486 93088 0.000097

2014-01-03 18321 18321 74844 93165 0.000827

2014-01-06 18347 18347 74844 93191 0.000279

2014-01-07 18309 18309 74844 93153 -0.000408

2014-01-08 -18345 -18345 111534 93189 0.000386

2014-01-09 -18410 -18410 111534 93124 -0.000698

2014-01-10 -18395 -18395 111534 93139 0.000161

2014-01-13 -18371 -18371 111534 93163 0.000258

2014-01-14 -18228 -18228 111534 93306 0.001535

2014-01-15 18410 18410 74714 93124 -0.001951

In this instance the strategy lost money, which is unsurprising given the stochastic nature of the forecaster! The next steps are to create a Performance object that accepts a Portfolio instance and provides a list of performance metrics upon which to base a decision to filter the strategy out or not.

We can also improve the Portfolio object to have a more realistic handling of transaction costs (such as Interactive Brokers commissions and slippage). We can also straightforwardly include a forecasting engine into a Strategy object, which will (hopefully) produce better results. In the following articles we will explore these concepts in more depth.

I’m certainly not a great programmer, but writing this project taught me a lot (and kept me occupied). Most of my code were done on FMZ.COM, and if I were to refactor the python code I would use a more object orientated model. Nonetheless, I was pleasantly surprised with the results I got and the bot has made almost 100% ether profit so far.

What does it do?

It is an arbitrage bot. That means that it earns money from trading the difference between prices on two (or more) exchanges. As of now it is unidirectional and only trades between Etherdelta and Bittrex: they share approximately twenty eth/token pairs. Here’s a diagram to illustrate how it works:

Words followed by parenthesis are ethereum transactions that invoke a smart contract function call.

The Code

I could have used fmz.com platform python editor to create the transactions and function calls and it would have been fairly straightforward. I needed something more reliable; a failed transaction means losing money. Every single one my GET requests needed a reply, even if the TCP packet got lost or the webserver on the other end was temporarily down. Therefore I decided to implement my own python etherscan API wrapper and used pythereum to create the transactions and etherscan to publish them. I also wrote my own requests.get decorator that is a while loop that only exits once the reply is satisfactory.

Here is the code I used to encode the etherdelta json API responses as hexadecimal, rlp encoded, ethereum transactions (not for the faint hearted):

The raw hexadecimal values in the closure at the bottom are the function signatures that correspond to each function. A function signature is derived from the keccak of the function and its arguments. It must be appended to the data parameter of a transaction followed by the data that makes up the arguments. In total my code is around 400 lines long and contained in 5 different files.

The Outcome

I made a couple of graphs from the data I logged using pymatplotlib.

Conclusion

Overall the entire project took me around two weeks during my spare time at school and it was a blast all round. I’ve taken a break from coding vigorously and am currently in the process of planning arbitrage bot v2. The next version is going to include 86 different exchanges and a whole lot of trading pairs.

To the moon!

]]>The beauty of algorithmic trading is that there is no need to test out ones knowledge on real capital, as many brokerages provide highly realistic market simulators. While there are certain caveats associated with such systems, they provide an environment to foster a deep level of understanding, with absolutely no capital risk.

A common question that I receive from readers of QuantStart is "How do I get started in quantitative trading?". I have already written a beginner's guide to quantitative trading, but one article cannot hope to cover the diversity of the subject. Thus I've decided to recommend my favourite entry-level quant trading books in this article.

The first task is to gain a solid overview of the subject. I have found it be far easier to avoid heavy mathematical discussions until the basics are covered and understood. The best books I have found for this purpose are as follows:

**1) Quantitative Trading**by Ernest Chan - This is one of my favourite finance books. Dr. Chan provides a great overview of the process of setting up a "retail" quantitative trading system, using MatLab or Excel. He makes the subject highly approachable and gives the impression that "anyone can do it". Although there are plenty of details that are skipped over (mainly for brevity), the book is a great introduction to how algorithmic trading works. He discusses alpha generation ("the trading model"), risk management, automated execution systems and certain strategies (particularly momentum and mean reversion). This book is the place to start.**2) Inside the Black Box**by Rishi K. Narang - In this book Dr. Narang explains in detail how a professional quantitative hedge fund operates. It is pitched at a savvy investor who is considering whether to invest in such a "black box". Despite the seeming irrelevance to a retail trader, the book actually contains a wealth of information on how a "proper" quant trading system should be carried out. For instance, the importance of transaction costs and risk management are outlined, with ideas on where to look for further information. Many retail algo traders could do well to pick this up and see how the 'professionals' carry out their trading.**3) Algorithmic Trading & DMA**by Barry Johnson - The phrase 'algorithmic trading', in the financial industry, usually refers to the execution algorithms used by banks and brokers to execute efficient trades. I am using the term to cover not only those aspects of trading, but also*quantitative*or*systematic*trading. This book is mainly about the former, being written by Barry Johnson, who is a quantitative software developer at an investment bank. Does this mean it is of no use to the retail quant? Not at all. Possessing a deeper understanding of how exchanges work and "market microstructure" can aid immensely the profitability of retail strategies. Despite it being a heavy tome, it is worth picking up.

Once the basic concepts are grasped, it is necessary to begin developing a trading strategy. This is usually known as the *alpha model* component of a trading system. Strategies are straightforward to findthese days, however the true value comes in determining your own trading parameters via extensive research and backtesting. The following books discuss certain types of trading and execution systems and how to go about implementing them:

**4) Algorithmic Trading**by Ernest Chan - This is the second book by Dr. Chan. In the first book he eluded to momentum, mean reversion and certain high frequency strategies. This book discusses such strategies in depth and provides significant implementation details, albeit with more mathematical complexity than in the first (e.g. Kalman Filters, Stationarity/Cointegration, CADF etc). The strategies, once again, make extensive use of MatLab but the code can be easily modified to C++, Python/pandas or R for those with programming experience. It also provides updates on the latest market behaviour, as the first book was written a few years back.**5) Trading and Exchanges**by Larry Harris - This book concentrates on*market microstructure*, which I personally feel is an essential area to learn about, even at the beginning stages of quant trading. Market microstructure is the "science" of how market participants interact and the dynamics that occur in the*order book*. It is closely related to how exchanges function and*what actually happens*when a trade is placed. This book is less about trading strategies as such, but more about things to be aware of when designing execution systems. Many professionals in the quant finance space regard this as an excellent book and I also highly recommend it.

At this stage, as a retail trader, you will be in a good place to begin researching the other components of a trading system such as the execution mechanism (and its deep relationship with transaction costs), as well as risk and portfolio management. I will dicuss books for these topics in later articles.t

]]>