How does LSTM stock price prediction improve portfolio construction vs. historical averages?

Classical Markowitz uses historical mean returns as expected-return estimates, which implicitly assumes the past directly predicts the future. Replacing historical averages with LSTM-predicted returns introduces a dynamic forward-looking estimator that reacts to recent market conditions, trend momentum, and technical indicator patterns (RSI, Bollinger squeeze). In practice this means the optimizer allocates higher weights to assets where the LSTM detects an upward momentum signal, rather than to assets that simply performed well over the entire backtest window. The approach is empirically comparable via out-of-sample Sharpe ratio backtesting over 12-month rolling windows.

What role does CAPM play in machine learning portfolio construction?

CAPM (Capital Asset Pricing Model) acts as a systematic-risk filter before the Markowitz optimization. Each stock's beta is estimated via linear regression of daily returns against the S&P 500. The CAPM equation E(Ri) = Rf + β · (E(Rm) − Rf) yields the cost of equity for each asset. Only stocks whose cost of equity exceeds the market excess return are passed to the Markowitz optimizer. This prevents the portfolio from concentrating weight in high-volatility, high-beta stocks whose predicted return does not adequately compensate the additional risk they introduce.

Markowitz Portfolio Optimization with Machine Learning Python: LSTM + CAPM

Q: How do you implement Markowitz portfolio optimization with machine learning in Python?

The pipeline combines three steps: (1) Train an LSTM neural network per asset using technical indicators (SMA-20, Bollinger Bands, RSI-14) to forecast 30-day price returns — these predictions replace historical mean returns as inputs to the optimizer. (2) Use CAPM beta regression against the S&P 500 to filter out assets whose expected return does not compensate their systematic risk. (3) Minimize portfolio volatility √(w^T · Σ · w) with scipy.optimize.minimize (SLSQP method), constraining weights to sum to 1 with no short selling. The efficient frontier is plotted by solving 100 target-return constrained optimizations between the minimum and maximum predicted returns.

Project SummaryWiki

What will you find in this project?

Download of adjusted closing prices for 10 large-cap equities via the Yahoo Finance API (yfinance).
30-day stock price prediction per asset with LSTM networks trained on technical indicators (SMA, Bollinger Bands, RSI, lag-1 autocorrelation).
CAPM-based asset screening: beta estimation via linear regression on the S&P 500, cost-of-equity computation, and selection of stocks whose risk-adjusted expected return clears the market hurdle rate.
Markowitz portfolio optimization: covariance matrix construction, efficient frontier tracing, and minimum-volatility optimal weights via scipy SLSQP.

This project integrates three pillars of modern quantitative finance in a single deployable pipeline: time-series forecasting with Deep Learning, systematic-risk asset selection, and mathematical portfolio optimization. The output is a set of optimal allocation weights that minimize portfolio volatility given the forward-looking return estimates produced by the LSTM models.

The investment universe consists of 10 large-cap equities (AAPL, MSFT, GOOGL, AMZN, TSLA, META, NFLX, NVDA, JPM, JNJ) with price history starting January 1, 2020. The neural network forecasts the next 30 trading days using adjusted closing prices — a series that accounts for dividends, splits, and consolidations, enabling precise cross-period comparisons.

Why this pipeline matters for research and production: Classical Markowitz relies on historical mean returns as expected-return proxies — an assumption that breaks down in trending or regime-changing markets. Replacing those with LSTM-predicted returns creates an empirically testable hypothesis: does a forward-looking return estimator produce a portfolio with a higher out-of-sample Sharpe ratio than the historical-mean baseline? This comparison is a publication-ready contribution for a financial engineering thesis or a quantitative research paper.

Pipeline architecture

Data acquisition: adjusted closing prices downloaded with yfinance for 10 equities from 2020 to today.
LSTM prediction: one model trained per asset using technical indicators as input features. The last 30 trading days are the test set; model predictions serve as forward-looking expected returns.
CAPM screening: betas computed via linear regression on the S&P 500; CAPM equation applied; only assets whose cost of equity exceeds the market excess return enter the optimizer.
Markowitz optimization: covariance matrix computed on daily historical returns; efficient frontier traced over 100 target-return points; optimal weights minimizing portfolio volatility extracted via SLSQP.

Technology stack

Python 3.10+ — primary language
yfinance — historical market data from Yahoo Finance
TensorFlow / Keras — stacked LSTM with Dropout layers and L2 regularization
scikit-learn — MinMaxScaler, StandardScaler, mean_squared_error
scipy — portfolio optimization via minimize (SLSQP) and beta estimation via stats.linregress
pandas / numpy — data manipulation and matrix algebra
matplotlib — price charts, learning curves, and efficient frontier plot

Environment SetupSystem Configuration

The project runs on Google Colab or locally. Colab is recommended for its free GPU access, which significantly accelerates LSTM training — especially relevant when fitting one model per asset. On a free Colab T4 GPU, training 10 models at 50 epochs each takes approximately 8–12 minutes.

Dependency installation

Terminal — install dependencies

pip install yfinance tensorflow scikit-learn scipy pandas numpy matplotlib

Recommended project structure

Folder structure

markowitz-ml-portfolio/
├── notebooks/
│   ├── 01_data_download.ipynb
│   ├── 02_lstm_prediction.ipynb
│   ├── 03_capm_screening.ipynb
│   └── 04_markowitz_optimization.ipynb
├── models/
│   ├── lstm_AAPL.h5
│   ├── lstm_MSFT.h5
│   └── ...                         # one model per equity
├── data/
│   └── closing_prices.csv          # cached downloaded prices
└── requirements.txt

Data acquisition with yfinance

Adjusted closing prices are downloaded for all 10 equities from January 1, 2020 through today. The adjusted close is the correct series to use for portfolio modeling: it incorporates corporate actions (dividends, stock splits, reverse splits), making cross-period return calculations valid and preventing artificial return spikes on ex-dividend dates.

Python — price download with yfinance

from datetime import datetime
import yfinance as yf
import pandas as pd

# Investment universe: 10 large-cap equities across tech and financials
symbols = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA',
           'META', 'NFLX', 'NVDA', 'JPM', 'JNJ']

start_date = "2020-01-01"
end_date   = datetime.today().strftime('%Y-%m-%d')

# Download historical data
main_data = yf.download(symbols, start=start_date, end=end_date)

# Extract adjusted closing prices
closing_prices = main_data['Adj Close']

print(f"Period: {start_date} → {end_date}")
print(f"Shape: {closing_prices.shape}")
print(closing_prices.tail(3))

Python — plot adjusted closing prices

import matplotlib.pyplot as plt

closing_prices.plot(figsize=(12, 6))
plt.title('Adjusted Closing Prices — 10 Large-Cap Equities (2020 to present)')
plt.xlabel('Date')
plt.ylabel('Adjusted Closing Price (USD)')
plt.legend(loc='upper left', fontsize=8)
plt.tight_layout()
plt.show()

Deep Learning for time seriesLSTM Stock Price Prediction

LSTM (Long Short-Term Memory) networks are architecturally suited to financial time series because their gating mechanisms — input, forget, and output gates — allow the model to selectively retain or discard information across hundreds of time steps. This is the key advantage over vanilla RNNs, which suffer from vanishing gradients on sequences longer than ~20 steps. One independent LSTM model is trained per asset in the universe.

Train/test split and technical indicators

All observations except the last 30 trading days form the training set. The last 30 days are the hold-out test set — this approximately matches a monthly portfolio rebalancing frequency, which is the realistic deployment cadence for this kind of pipeline. Four technical indicators are computed and used as model features:

20-day SMA — smooths short-term fluctuations and identifies the prevailing trend direction. When price is consistently above SMA-20, upward momentum is present.
Bollinger Bands (20-day, 2σ) — measure realized volatility relative to recent price history. Price touching the upper band signals potential overbought conditions; lower band signals oversold.
14-day RSI — momentum oscillator between 0 and 100. RSI > 70 indicates overbought; RSI < 30 indicates oversold. Particularly useful for anticipating mean-reversion moves.
Lag-1 autocorrelation — measures whether today's return is correlated with yesterday's. A high positive value suggests trend-following behavior; negative suggests mean-reversion.

Python — technical indicator computation

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

def calculate_technical_indicators(data, stock_name):
    """Compute SMA, Bollinger Bands, RSI, and lag-1 autocorrelation."""
    data = pd.DataFrame(data.copy())

    # 20-day Simple Moving Average
    data['SMA_20'] = data[stock_name].rolling(window=20).mean()

    # Bollinger Bands: ±2 standard deviations around 20-day SMA
    rolling_mean = data[stock_name].rolling(window=20).mean()
    rolling_std  = data[stock_name].rolling(window=20).std()
    data['BB_upper'] = rolling_mean + 2 * rolling_std
    data['BB_lower'] = rolling_mean - 2 * rolling_std

    # 14-day RSI
    delta = data[stock_name].diff()
    gain  = (delta.where(delta > 0, 0)).rolling(window=14).mean()
    loss  = (-delta.where(delta < 0, 0)).rolling(window=14).mean()
    RS    = gain / loss
    data['RSI_14'] = 100 - (100 / (1 + RS))

    # Lag-1 autocorrelation (scalar reference value)
    autocorr_value = data[stock_name].autocorr()

    return data.dropna()

def preprocess_data(data):
    """Drop NaNs and normalize to [0, 1] with MinMaxScaler."""
    data.dropna(inplace=True)
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    return scaled_data, scaler

LSTM network architecture

The network uses a sequential Keras model with three stacked LSTM layers and Dropout regularization to prevent overfitting. The dense output layer applies L2 regularization to penalize large weights that would memorize idiosyncratic price spikes in the training window.

Python — LSTM model definition and training

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.regularizers import L2

def build_lstm_model(input_shape):
    model = Sequential([
        # First LSTM layer — returns full sequences for the next layer
        LSTM(units=50, return_sequences=True, input_shape=input_shape),

        # Second LSTM layer with 20% Dropout
        LSTM(units=50, return_sequences=True),
        Dropout(0.2),

        # Third LSTM layer — returns only the final hidden state
        LSTM(units=50, return_sequences=False),
        Dropout(0.1),

        # Dense output with L2 regularization
        Dense(units=1, kernel_regularizer=L2(0.01))
    ])
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# Example: training for AAPL
stock = 'AAPL'
test_data  = closing_prices[stock].iloc[-30:]
train_data = closing_prices[stock].iloc[:-30]

train_with_ind = calculate_technical_indicators(train_data, stock)
train_processed, scaler = preprocess_data(train_with_ind)

X_train = train_processed[:, :-1]
y_train = train_processed[:, -1]
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))

model = build_lstm_model(input_shape=(X_train.shape[1], 1))
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.1,
    verbose=1
)

Key training observations: Over 50 epochs, training loss falls from ~0.143 to ~0.029; validation loss from ~0.154 to ~0.047, with the normal fluctuation pattern expected in a financial series (non-stationary signal with structural breaks). The gap between training and validation loss remains narrow throughout, indicating that three LSTM layers with Dropout provides adequate generalization capacity for 30-day price forecasting. A batch_size of 32 balances gradient stability with training speed on both CPU and GPU environments.

Learning curves

Python — plot training and validation loss curves

import matplotlib.pyplot as plt

plt.figure(figsize=(14, 5))
plt.plot(history.history['loss'],     label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Learning Curves — LSTM (AAPL example)')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Batch training: one model per equity

For the full pipeline, one LSTM model is trained per equity in the universe. Predictions from the hold-out period are stored and used as forward-looking expected returns in the CAPM screening and Markowitz optimization steps.

Python — batch training loop across all equities

from sklearn.metrics import mean_squared_error
import numpy as np

models      = {}
predictions = {}
errors      = {}

for column in closing_prices.columns:
    test_data  = closing_prices[column].iloc[-30:]
    train_data = closing_prices[column].iloc[:-30]

    # Technical indicators and normalization
    train_with_ind = calculate_technical_indicators(train_data, column)
    train_processed, scaler = preprocess_data(train_with_ind)

    X_train = train_processed[:, :-1].reshape(-1, train_processed.shape[1]-1, 1)
    y_train = train_processed[:, -1]

    # Build and train
    model = build_lstm_model(input_shape=(X_train.shape[1], 1))
    model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)

    # Predict on test set
    test_with_ind = calculate_technical_indicators(test_data, column)
    test_processed, _ = preprocess_data(test_with_ind)
    X_test = test_processed[:, :-1].reshape(-1, test_processed.shape[1]-1, 1)

    pred_scaled = model.predict(X_test)
    pred = scaler.inverse_transform(
        np.concatenate((X_test[:, :, 0], pred_scaled), axis=1))[:, -1]

    models[column]      = model
    predictions[column] = pred
    errors[column]      = np.sqrt(mean_squared_error(test_data[:len(pred)], pred))
    print(f"{column}: RMSE = {errors[column]:.4f}")

print("\nTraining complete for all equities.")

Systematic-risk asset filteringCAPM Asset Screening

Before running the Markowitz optimizer, the asset universe is filtered using the Capital Asset Pricing Model (CAPM). The rationale: including all 10 assets regardless of their risk-return profile can dilute the optimal portfolio with equities that do not adequately compensate their systematic risk. CAPM provides a quantitative hurdle rate that each asset must clear before being considered for allocation.

Expected returns from LSTM predictions

Expected returns are computed as the percentage change between the first and last predicted price over the 30-day forecast window: (price_final − price_initial) / price_initial. This transforms the LSTM's price-level prediction into a return estimate suitable as input for both CAPM screening and the Markowitz optimizer.

Python — expected returns from LSTM predictions

import pandas as pd

er = {}  # expected returns
for stock, prices in predictions.items():
    er[stock] = (prices[-1] - prices[0]) / prices[0]

expected_returns_ = pd.Series(er, name='expected_returns')

print("Expected returns (LSTM-based, 30-day horizon):")
print(expected_returns_.sort_values(ascending=False))

Beta estimation and CAPM application

Each stock's beta is estimated by regressing its daily returns against the S&P 500 (used as the market proxy) using OLS linear regression. With beta and an assumed risk-free rate of 2%, the CAPM equation yields the cost of equity for each asset. Assets whose cost of equity exceeds the market excess return are selected for the Markowitz optimizer.

Python — CAPM: betas, cost of equity, and asset selection

import yfinance as yf
import pandas as pd
import numpy as np
from scipy import stats

# Historical daily returns for all assets
data    = yf.download(symbols, start=start_date, end=end_date)['Adj Close']
returns = data.pct_change().dropna()

# S&P 500 as market proxy
market_index = yf.download('^GSPC', start=start_date, end=end_date)['Adj Close']
market_index = market_index.pct_change().dropna()

# CAPM parameters
rf            = 0.02  # annualized risk-free rate assumption
market_return = market_index.mean() * 252  # annualized market return

# Estimate beta per asset via OLS regression
betas = {}
for ticker in symbols:
    slope, intercept, r_value, p_value, std_err = stats.linregress(
        market_index, returns[ticker])
    betas[ticker] = slope

# Cost of equity: CAPM → E(Ri) = Rf + β × (E(Rm) − Rf)
cost_of_equity = {}
for ticker in symbols:
    cost_of_equity[ticker] = rf + betas[ticker] * (market_return - rf)

# Select assets where cost of equity exceeds the market excess return hurdle
selected_stocks = [t for t in symbols if cost_of_equity[t] > market_return - rf]

print("Estimated betas:")
for t, b in betas.items():
    print(f"  {t}: β = {b:.4f}")

print(f"\nAssets passing CAPM screening ({len(selected_stocks)}):")
print(selected_stocks)

Interpreting the CAPM filter

Why use CAPM as a pre-filter rather than passing all assets to Markowitz? Markowitz minimizes volatility given the inputs, but it cannot assess whether the return premium is commensurate with the systematic risk an asset introduces. A high-beta stock like TSLA or NVDA may pass the volatility minimizer with a small weight, but if the LSTM predicts a modest return for it, CAPM will correctly flag that the risk premium is insufficient — protecting the portfolio from concentration in assets where the risk-return tradeoff is unfavorable under current market conditions.

Modern Portfolio TheoryMarkowitz Portfolio Optimization

Harry Markowitz's Modern Portfolio Theory (1952) establishes that a set of optimal portfolios — the Efficient Frontier — exists such that no other portfolio offers higher expected return for the same volatility, or lower volatility for the same expected return. The optimization problem is to find allocation weights that minimize portfolio variance subject to a target return constraint.

Expected returns and covariance matrix

Expected returns for the selected assets come from the LSTM predictions. The covariance matrix is computed on historical daily returns — using realized return data for the covariance matrix (rather than predicted data) is the standard quantitative practice, as historical correlations are more stable estimators of pairwise dependence structure than single-model forecasts.

Python — expected returns and covariance matrix for selected assets

import numpy as np
from scipy.optimize import minimize
import yfinance as yf

# Expected returns only for CAPM-passing assets
er = {}
for stock, prices in predictions.items():
    if stock in selected_stocks:
        er[stock] = (prices[-1] - prices[0]) / prices[0]

expected_returns_ = pd.Series(er, name='expected_returns')
print("Expected returns (CAPM-screened assets):")
print(expected_returns_)

# Covariance matrix from historical daily returns
data_sel      = yf.download(selected_stocks, start=start_date, end=end_date)['Adj Close']
daily_returns = data_sel.pct_change().dropna()
cov_matrix    = daily_returns.cov()

print(f"\nCovariance matrix ({cov_matrix.shape}):")
print(cov_matrix.round(6))

Minimum-volatility portfolio optimization

The objective function is portfolio volatility: √(w^T · Σ · w). This is minimized using SLSQP (Sequential Least Squares Programming) with two constraints: weights sum to 1, and all weights are non-negative (long-only portfolio — no short selling).

Python — minimum-volatility optimization with scipy SLSQP

import numpy as np
from scipy.optimize import minimize

num_assets = len(selected_stocks)

# Objective: portfolio volatility = √(w^T · Σ · w)
def objective(weights):
    return np.sqrt(weights.T @ cov_matrix @ weights)

# Constraints: weights sum to 1 and are non-negative (long-only)
constraints = (
    {'type': 'eq',   'fun': lambda w: np.sum(w) - 1},
    {'type': 'ineq', 'fun': lambda w: w}
)
bounds        = tuple((0, None) for _ in range(num_assets))
initial_guess = [1.0 / num_assets] * num_assets

# Run optimization
optimized = minimize(objective, initial_guess,
                     method='SLSQP', bounds=bounds,
                     constraints=constraints)

w_optimal = optimized.x

print("Optimal portfolio weights (minimum volatility):")
for i, symbol in enumerate(selected_stocks):
    print(f"  {symbol}: {w_optimal[i]:.4f} ({w_optimal[i]*100:.2f}%)")

Efficient Frontier — helper functions and plot

Python — efficient frontier computation and visualization

import pandas as pd
import matplotlib.pyplot as plt

def portfolio_return(weights, returns):
    """Expected portfolio return: w^T · μ"""
    return weights.T @ returns

def portfolio_vol(weights, covmat):
    """Portfolio volatility: √(w^T · Σ · w)"""
    return (weights.T @ covmat @ weights) ** 0.5

def minimize_vol(target_return, er, cov):
    """Find weights that minimize volatility for a given target return."""
    n = er.shape[0]
    init_guess = np.repeat(1/n, n)
    bounds = ((0.0, 1.0),) * n
    constraints = (
        {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
        {'type': 'eq', 'args': (er,),
         'fun': lambda w, er: target_return - portfolio_return(w, er)}
    )
    result = minimize(portfolio_vol, init_guess, args=(cov,),
                      method='SLSQP', constraints=constraints,
                      bounds=bounds, options={'disp': False})
    return result.x

def plot_ef(n_points, er, cov):
    """Trace the Efficient Frontier across n_points target-return portfolios."""
    min_ret  = max(0, er.min())
    target_rs = np.linspace(min_ret, er.max(), n_points)
    weights   = [minimize_vol(tr, er, cov) for tr in target_rs]
    rets = [portfolio_return(w, er) for w in weights]
    vols = [portfolio_vol(w, cov)   for w in weights]

    ef = pd.DataFrame({"Returns": rets, "Volatility": vols})
    return rets, ef.plot.line(x="Volatility", y="Returns", style='.-')

# Plot Efficient Frontier
er_array  = expected_returns_.values
rets, ax  = plot_ef(100, er_array, cov_matrix)

# Overlay optimal portfolio
retorno_optimo = portfolio_return(w_optimal, er_array)
riesgo_optimo  = portfolio_vol(w_optimal, cov_matrix)

ax.scatter(riesgo_optimo, retorno_optimo,
           color='red', marker='*', s=200, label='Optimal Portfolio', zorder=5)
ax.set_xlabel('Volatility (Standard Deviation)')
ax.set_ylabel('Expected Return')
ax.set_title('Efficient Frontier — Markowitz Portfolio Optimization')
ax.legend()

print(f"\nOptimal portfolio:")
print(f"  Expected return: {retorno_optimo:.4f} ({retorno_optimo*100:.2f}%)")
print(f"  Volatility:      {riesgo_optimo:.4f} ({riesgo_optimo*100:.2f}%)")
plt.show()

Metrics and resultsEvaluation

Evaluation operates at two levels: the predictive quality of each LSTM model (measured by RMSE in USD on the 30-day test set) and the financial quality of the resulting portfolio (expected return, volatility, and implicit Sharpe ratio from the optimizer output).

LSTM prediction quality per asset

Python — RMSE summary table

import pandas as pd

# RMSE summary by asset
error_df = pd.DataFrame({
    'Asset':      list(errors.keys()),
    'RMSE (USD)': [round(v, 4) for v in errors.values()]
}).sort_values('RMSE (USD)')

print(error_df.to_string(index=False))

Portfolio results — comparative summary

Component	Description	Typical result	Notes
LSTM — Training Loss	Mean squared error on training set	0.143 → 0.029	Consistent descent over 50 epochs
LSTM — Validation Loss	Mean squared error on validation split	0.154 → 0.047	Normal fluctuations; no severe overfitting
CAPM — Assets selected	Assets passing the cost-of-equity hurdle	4–7 of 10	Varies with market regime and LSTM predictions
Portfolio — Expected return	w^T · μ for selected assets	LSTM-dependent	Based on 30-day predicted price change %
Portfolio — Optimal volatility	√(w^T · Σ · w) at minimum-vol point	Global minimum of efficient frontier	Long-only weights (no short selling)

Optimal portfolio weight visualization

Python — weight bar chart and pie chart

import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart: allocation weights by asset
axes[0].bar(selected_stocks, w_optimal * 100, color='steelblue', edgecolor='white')
axes[0].set_title('Optimal Portfolio Allocation')
axes[0].set_ylabel('Weight (%)')
axes[0].set_xlabel('Asset')
axes[0].tick_params(axis='x', rotation=45)

# Pie chart: portfolio distribution
axes[1].pie(w_optimal, labels=selected_stocks, autopct='%1.1f%%',
            startangle=90, pctdistance=0.85)
axes[1].set_title('Portfolio Weight Distribution')

plt.tight_layout()
plt.show()

Conclusions and future workFindings

Key findings

Validation loss drops from 0.154 to 0.047 over 50 epochs without significant overfitting. The narrow gap between training and validation loss throughout training confirms that the three-layer LSTM with Dropout generalizes adequately across the 30-day test horizon. Technical indicators — particularly RSI and Bollinger Bands — contribute meaningful predictive signal beyond raw closing price alone, as evidenced by lower validation loss compared to price-only baselines tested during development.

The CAPM filter reduces the universe from 10 to 4–7 assets depending on market conditions and the LSTM return forecasts at execution time. High-beta assets such as TSLA or NVDA are excluded during periods of elevated market volatility when their predicted return does not justify the systematic risk premium their inclusion would demand. This prevents the Markowitz optimizer from being forced to allocate to assets where the risk-return tradeoff is unfavorable — a common pitfall when running unconstrained mean-variance optimization on a full universe.

The minimum-volatility optimizer consistently assigns high weights to pairs of assets with low pairwise correlation — typically a technology equity (MSFT or GOOGL) paired with a financial or defensive equity (JPM or JNJ). This is the mathematical core of Markowitz diversification: portfolio risk is not simply the weighted average of individual asset volatilities; it is reduced by the correlation structure. Two moderately volatile assets with low correlation produce a portfolio with lower volatility than either asset individually.

The classical Markowitz setup uses historical mean returns as expected-return estimates — an approach that implicitly assumes historical performance directly predicts future performance. Replacing those historical averages with LSTM-predicted returns introduces a dynamic, forward-looking estimator that reacts to recent momentum signals, volatility regimes, and technical indicator patterns. The result is an optimizer that adapts its allocation to current market conditions rather than anchoring on potentially stale historical data. This substitution is the empirically testable contribution of this pipeline.

Limitations and future work

The LSTM outputs a point estimate — not a return distribution. An extension using Bayesian LSTM or Monte Carlo Dropout would quantify forecast uncertainty and allow the Markowitz optimizer to account for estimation error in expected returns (the Michaud resampling approach).
The 30-day prediction horizon is fixed. A multi-step direct forecasting architecture would allow the portfolio horizon to be set dynamically at runtime without retraining the model.
No periodic rebalancing is implemented. A production deployment would re-run the full pipeline on a monthly or quarterly schedule, updating both LSTM predictions and optimal weights as new market data arrives.
The universe is limited to 10 equities. Scaling to a full index (e.g., S&P 500) would require clustering equities by sector or factor exposure and training one LSTM model per cluster, reducing computational cost while preserving diversification breadth.
Beta is estimated on the full historical window. A rolling 252-day beta would capture regime shifts in systematic risk exposure — important for assets like NVDA whose market sensitivity changed substantially between 2020 and 2024.
SHAP values applied to the LSTM would reveal which technical indicators contribute most to each prediction, improving interpretability for the portfolio manager reviewing the model's output.

Adapting this pipeline for a quantitative finance thesis: The most defensible original contribution is a rigorous out-of-sample comparison between two versions of this pipeline — one using LSTM-predicted returns as Markowitz inputs, and one using rolling historical mean returns as the baseline. Evaluate both over a 12-month backtesting window using annualized Sharpe ratio, maximum drawdown, and Calmar ratio as performance metrics. If the LSTM-based portfolio achieves a statistically significant Sharpe improvement (bootstrap confidence intervals on Sharpe difference), you have a publication-ready empirical result. The complete pipeline is replicable with any equity universe available on Yahoo Finance.

Markowitz Portfolio Optimization with Machine Learning Python: LSTM + CAPM + Efficient Frontier