FMCG Demand Forecast - Project Details

A dual-pipeline deep learning system for Fast-Moving Consumer Goods (FMCG) inventory planning, extracted and sanitized from a production system serving an Indonesian distributor operating 6 darkstores. All database dependencies are replaced with a synthetic data generator — fully runnable out of the box.

Problem Statement

Accurate demand forecasting in Indonesian FMCG is complicated by pronounced cultural seasonality — Ramadan, Eid al-Fitr (Lebaran), payday periods, and rainy seasons create irregular demand spikes that naive models miss. The challenge was to build a forecasting system that handles these patterns while also producing calibrated uncertainty bounds for safety-stock sizing, and to package it as a portfolio project that runs without any live database connection.

Solution: Dual-Pipeline Architecture

The system is split into two independent pipelines sharing a common data layer and calendar module:

Pipeline 1 — Demand: Unit-level, per-product-warehouse daily demand forecasting. Uses an LSTM encoder with Bahdanau-style additive attention feeding into an autoregressive GRU decoder, trained with quantile loss to produce probabilistic P10/P50/P90 outputs. STL decomposition separates trend from residual before training; trend is linearly extrapolated at inference time.
Pipeline 2 — Sales: Revenue and COGS forecasting with two sub-models: a per-region SeasonalFinancialForecaster (LSTM + MultiheadAttention, sinusoidal seasonal features) and an aggregated FinancialForecaster (plain LSTM, day-of-week one-hots). Followed by a purchase recommendation engine that redistributes non-working-day orders using a 60/40 split to preceding business days.

Architecture

  ┌─────────────────────────────────────────────────────────────────────────┐
  │  DATA LAYER                                                             │
  │                                                                         │
  │  scripts/generate_data.py ──▶ data/synthetic/                          │
  │    ├── products.csv  (80 SKUs, 5 categories)                           │
  │    ├── orders.csv    (daily orders, payday spikes, seasonal patterns)   │
  │    ├── sales.csv     (aggregated sales + COGS with realistic noise)     │
  │    └── calendar.csv  (Indonesian FMCG features, 13+ columns)           │
  └─────────────────────────────────────────────────────────────────────────┘
                          │
          ┌───────────────┴────────────────┐
          ▼                                ▼
  ┌───────────────────────┐    ┌──────────────────────────────┐
  │  PIPELINE 1: DEMAND   │    │  PIPELINE 2: SALES           │
  │                       │    │                              │
  │  preprocess_demand    │    │  preprocess_raw_data()       │
  │    ├─ OOS detection   │    │    └─ date adjustment        │
  │    ├─ outlier removal │    │       (Sundays/holidays)     │
  │    ├─ lag features    │    │                              │
  │    └─ Lebaran signal  │    │  ┌─────────┬──────────────┐ │
  │          │            │    │  │Individual│  Merged      │ │
  │  TimeSeriesForecaster │    │  │(per R-G) │  (aggregate) │ │
  │    ├─ STL decomp.     │    │  │SeasonalF.│  FinancialF. │ │
  │    ├─ MinMaxScaler    │    │  │LSTM+MHA  │  Plain LSTM  │ │
  │    ├─ TimeSeriesSplit │    │  │sin/cos   │  DOW one-hot │ │
  │    └─ LSTM+GRU decoder│    │  └────┬────┴──────┬───────┘ │
  │          │            │    │       │ sales+COGS│          │
  │  demand_forecast.csv  │    │       ▼           ▼          │
  └───────────────────────┘    │  recommendations.py          │
                                │    └─ 60/40 redistribution   │
                                │  sales_forecast/ + rec_buy/  │
                                └──────────────────────────────┘

Model Architecture — Demand Pipeline

  LSTM Encoder + Autoregressive GRU Decoder
  ──────────────────────────────────────────

  Input features (19 total):
    ├── sales lags (1,2,3,7,14,21 days)
    ├── rolling stats (mean/std 7,14,28 days)
    ├── STL residual
    ├── cyclical encodings (dow_sin/cos, month_sin/cos)
    └── calendar flags (Ramadan, Lebaran, payday, rainy season, ...)

  Categorical embeddings:
    product_embed (dim=8) ─┐
    gudang_embed  (dim=4) ─┼─▶ concat with continuous features

  ┌─────────────────────────────────────────────────────┐
  │  LSTM Encoder                                       │
  │  input_dim=19+12, hidden_dim=64, num_layers=2       │
  │  dropout=0.15, unidirectional                       │
  └──────┬───────────────────────┬──────────────────────┘
         │ hidden state h_n      │ all hidden states
         │                       ▼
         │            ┌──────────────────────┐
         │            │  Additive Attention  │  Bahdanau-style
         │            │  (weights over seq)  │
         │            └──────────┬───────────┘
         │                       │ context vector
         ▼                       │
  ┌──────────────────────────────┘
  │  enc_to_dec projection (64 → 64)
  │  GRU Decoder  hidden_dim=64, num_layers=1
  │  Autoregressive: each step's output fed as next step's input
  └──────────────────────────────────────────────────────────┐
                                                             ▼
  Output: (batch, horizon=30, quantiles=3)  →  P10 / P50 / P90

Indonesian FMCG Calendar

A custom calendar module encodes 13+ Indonesian market-specific features as model inputs. These are the dominant drivers of demand spikes and troughs in Indonesian FMCG:

Ramadan/Lebaran: is_ramadan_month, is_lebaran_peak_week, is_thr_payout_week, plus a continuous lebaran_proximity_signal (0→1 ramp starting 60 days before Eid) to model the gradual demand build-up.
Payday period: is_payday_period covers the 25th to 5th of the next month, capturing discretionary spending surges.
Seasonal patterns: Rainy season (Oct–Apr), back-to-school periods, Chinese New Year week, Idul Adha week, long weekends, and year-end holiday period.

Key Engineering Decisions

Autoregressive GRU decoder over single FC layer: A simple fully-connected output layer produced constant forecasts across the 30-day horizon. The GRU decoder feeds each step's output as the next step's input, producing non-constant multi-step sequences.
Separate target scalers per series: Fitting MinMaxScaler on the full feature matrix caused near-zero range on the target column, producing flat forecasts. Scalers are fit only on the detrended target series per product-warehouse pair.
All-fold CV with best model selection: All 5 TimeSeriesSplit folds are trained; the fold with the lowest validation loss is kept — not just the last fold. Prevents lucky or unlucky folds from dominating.
STL decomposition before scaling: Separating trend from residual before MinMaxScaler prevents the trend from dominating the feature range. Trend is linearly extrapolated (slope-clamped) at inference time.
No database dependencies: All ClickHouse/MySQL connections replaced with CSV-based I/O and a synthetic data generator that reproduces realistic FMCG patterns (payday spikes, Ramadan surge, rainy-season dip, day-of-week seasonality, stock-outs, outliers).

Results

Evaluated on the final 30 business days held out from training (backtest window):

  Backtest  RMSE=18.4 | MAE=13.2 | MAPE=22.4% | WAPE=18.1% | Bias=-2.3% | Skill=0.31
  Backtest  80% PI coverage = 76.4%

Naive Skill = 0.31 — 31% lower MAE than a 7-day rolling average baseline.
WAPE = 18.1% — strong for intermittent FMCG demand; target is <15% for the full production dataset.
80% PI coverage = 76.4% — the P10/P90 uncertainty band is well-calibrated on held-out data.
Best validation loss ~0.037 (quantile pinball loss, normalised), with early stopping at 30–190 epochs depending on fold.
54 tests, 93% coverage across both pipelines.