Testing Best Practices
======================

This guide provides best practices for writing robust tests in quActuary, especially when dealing with stochastic methods and numerical computations.

.. important::

   Stochastic methods require careful tolerance selection. Tests that are too strict will fail randomly, 
   while tests that are too loose won't catch real issues.

Tolerance Guidelines
--------------------

Choosing Appropriate Tolerances
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Different types of calculations require different tolerance levels:

.. list-table:: Recommended Tolerances by Calculation Type
   :widths: 40 20 40
   :header-rows: 1

   * - Calculation Type
     - Relative Tolerance
     - Notes
   * - Deterministic calculations
     - ``1e-10``
     - Should match exactly (accounting for floating point)
   * - Small sample Monte Carlo (n < 1,000)
     - ``0.1`` (10%)
     - High variance expected
   * - Medium sample Monte Carlo (n ~ 10,000)
     - ``0.05`` (5%)
     - Reasonable for most tests
   * - Large sample Monte Carlo (n > 100,000)
     - ``0.01`` (1%)
     - More stable results
   * - Quasi-Monte Carlo convergence
     - ``0.05`` (5%)
     - Better convergence than standard MC
   * - Moment calculations (mean, variance)
     - ``0.02`` (2%)
     - Central limit theorem helps
   * - Tail statistics (VaR 99.5%)
     - ``0.1`` (10%)
     - Fewer samples in tails
   * - Correlation estimates
     - ``0.05`` (5%)
     - Depends on sample size

Setting Random Seeds
~~~~~~~~~~~~~~~~~~~~

Always set random seeds for reproducible tests:

.. code-block:: python

   import numpy as np
   import pytest
   
   @pytest.fixture
   def set_random_seed():
       """Ensure reproducible random numbers."""
       np.random.seed(42)
       yield
       # Reset to random state after test
       np.random.seed(None)
   
   def test_monte_carlo_mean(set_random_seed):
       # Test will use same random numbers each run
       result = simulate_portfolio(n_sims=10000)
       assert abs(result.mean - expected_mean) / expected_mean < 0.02

Testing Stochastic Methods
--------------------------

Basic Pattern
~~~~~~~~~~~~~

.. code-block:: python

   def test_stochastic_calculation():
       # Set up
       np.random.seed(42)
       n_sims = 10000
       
       # Run simulation
       result = run_simulation(n_sims=n_sims)
       
       # Check with appropriate tolerance
       expected_mean = 1000.0
       relative_error = abs(result.mean - expected_mean) / expected_mean
       
       # Use generous tolerance for stochastic results
       assert relative_error < 0.05, f"Mean {result.mean} not within 5% of {expected_mean}"

Testing Convergence
~~~~~~~~~~~~~~~~~~~

For methods that should converge, test the trend rather than absolute values:

.. code-block:: python

   def test_qmc_convergence():
       """Test that QMC converges faster than standard MC."""
       sample_sizes = [100, 1000, 10000]
       qmc_errors = []
       mc_errors = []
       
       true_value = 1000.0
       
       for n in sample_sizes:
           # QMC simulation
           qmc_result = simulate_with_qmc(n_sims=n)
           qmc_error = abs(qmc_result.mean - true_value) / true_value
           qmc_errors.append(qmc_error)
           
           # Standard MC simulation  
           mc_result = simulate_with_mc(n_sims=n)
           mc_error = abs(mc_result.mean - true_value) / true_value
           mc_errors.append(mc_error)
       
       # QMC should converge faster (errors decrease more)
       qmc_improvement = qmc_errors[0] / qmc_errors[-1]
       mc_improvement = mc_errors[0] / mc_errors[-1]
       
       assert qmc_improvement > mc_improvement * 1.5  # QMC at least 50% better

Hardware-Dependent Tests
------------------------

When to Skip Tests
~~~~~~~~~~~~~~~~~~

Some tests depend on hardware or environment. Use ``pytest.mark.skip`` appropriately:

.. code-block:: python

   import pytest
   import psutil
   
   @pytest.mark.skip(reason="Requires stable baseline data from CI environment")
   def test_performance_regression():
       """This test should only run in controlled CI environment."""
       result = benchmark_function()
       assert result.time < baseline_time * 1.1
   
   @pytest.mark.skipif(
       psutil.virtual_memory().available < 8 * 1024**3,
       reason="Requires at least 8GB free memory"
   )
   def test_large_portfolio():
       """Test handling of large portfolios."""
       portfolio = generate_large_portfolio(n_policies=1_000_000)
       result = process_portfolio(portfolio)
       assert result.success

Conditional Testing
~~~~~~~~~~~~~~~~~~~

For tests that may behave differently on different hardware:

.. code-block:: python

   def test_parallel_speedup():
       """Test parallel processing provides speedup."""
       import multiprocessing
       
       n_cores = multiprocessing.cpu_count()
       
       if n_cores < 4:
           pytest.skip("Parallel speedup test requires at least 4 cores")
       
       # Time sequential processing
       start = time.time()
       sequential_result = process_sequential(data)
       sequential_time = time.time() - start
       
       # Time parallel processing
       start = time.time()
       parallel_result = process_parallel(data, n_workers=n_cores)
       parallel_time = time.time() - start
       
       # Expect speedup, but not perfect scaling
       min_speedup = min(2.0, n_cores * 0.5)  # At least 50% efficiency
       actual_speedup = sequential_time / parallel_time
       
       assert actual_speedup > min_speedup

Numerical Accuracy Testing
--------------------------

Testing Floating Point Calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use appropriate comparison methods for floating point:

.. code-block:: python

   import numpy as np
   
   def test_numerical_stability():
       """Test calculations are numerically stable."""
       
       # Don't use exact equality for floats
       result = complex_calculation()
       expected = 1.23456789
       
       # BAD: May fail due to floating point precision
       # assert result == expected
       
       # GOOD: Use np.allclose for arrays
       assert np.allclose(result, expected, rtol=1e-7, atol=1e-10)
       
       # GOOD: Use pytest.approx for scalars
       assert result == pytest.approx(expected, rel=1e-7, abs=1e-10)

Testing Edge Cases
~~~~~~~~~~~~~~~~~~

Always test numerical edge cases:

.. code-block:: python

   def test_distribution_edge_cases():
       """Test distribution behavior at extremes."""
       
       # Test near-zero values
       dist = Exponential(scale=1.0)
       
       # Small probabilities
       assert dist.ppf(1e-10) == pytest.approx(1e-10, rel=0.01)
       
       # Large probabilities  
       assert dist.ppf(0.99999) < 20  # Reasonable upper bound
       
       # Test parameter boundaries
       with pytest.raises(ValueError):
           Exponential(scale=-1.0)  # Invalid parameter

Integration Test Patterns
-------------------------

Testing End-to-End Workflows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   def test_portfolio_pricing_workflow():
       """Test complete pricing workflow with appropriate tolerances."""
       
       # Create test portfolio
       portfolio = Portfolio([
           Policy(
               frequency=Poisson(mu=2.0),
               severity=Lognormal(shape=1.0, scale=1000)
           )
           for _ in range(100)
       ])
       
       # Configure for testing (faster but less accurate)
       config = OptimizationConfig(
           use_vectorization=True,
           use_qmc=True,
           qmc_method="sobol"
       )
       
       # Run simulation with reasonable sample size
       model = PricingModel(portfolio)
       result = model.simulate(
           n_sims=10000,
           optimization_config=config
       )
       
       # Check results with appropriate tolerances
       assert result.mean == pytest.approx(2000, rel=0.05)  # 5% tolerance
       assert result.std == pytest.approx(1500, rel=0.10)   # 10% for std
       assert result.var_95 == pytest.approx(4000, rel=0.10)  # 10% for VaR

Testing Different Optimization Strategies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   @pytest.mark.parametrize("use_jit,use_parallel,use_qmc", [
       (False, False, False),  # Baseline
       (True, False, False),   # JIT only
       (False, True, False),   # Parallel only
       (False, False, True),   # QMC only
       (True, True, True),     # All optimizations
   ])
   def test_optimization_consistency(use_jit, use_parallel, use_qmc):
       """Test that all optimization strategies give consistent results."""
       
       np.random.seed(42)
       
       config = OptimizationConfig(
           use_jit=use_jit,
           use_parallel=use_parallel,
           use_qmc=use_qmc,
           n_workers=2  # Limit for testing
       )
       
       result = model.simulate(
           n_sims=5000,
           optimization_config=config
       )
       
       # All strategies should give similar results
       assert result.mean == pytest.approx(expected_mean, rel=0.1)

Common Pitfalls and Solutions
-----------------------------

Issue: Flaky Tests
~~~~~~~~~~~~~~~~~~

**Problem**: Tests pass sometimes but fail randomly.

**Solution**: 

.. code-block:: python

   # BAD: No seed, tight tolerance
   def test_simulation():
       result = simulate(n_sims=100)
       assert result.mean == pytest.approx(1000, rel=0.001)
   
   # GOOD: Seed set, appropriate tolerance
   def test_simulation():
       np.random.seed(42)
       result = simulate(n_sims=10000)  # Larger sample
       assert result.mean == pytest.approx(1000, rel=0.05)

Issue: Slow Tests
~~~~~~~~~~~~~~~~~

**Problem**: Tests take too long to run regularly.

**Solution**:

.. code-block:: python

   # Mark slow tests
   @pytest.mark.slow
   def test_large_portfolio_performance():
       """Full performance test - only run before release."""
       portfolio = generate_portfolio(n_policies=100000)
       # ... expensive test
   
   # Create a fast version for regular testing
   def test_small_portfolio_performance():
       """Quick performance check for CI."""
       portfolio = generate_portfolio(n_policies=100)
       # ... fast test

Run regular tests with: ``pytest -m "not slow"``

Issue: Platform-Dependent Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: Tests pass on one platform but fail on another.

**Solution**:

.. code-block:: python

   import platform
   
   def test_numerical_precision():
       """Test with platform-appropriate tolerance."""
       
       result = complex_calculation()
       
       # Different platforms may have different precision
       if platform.system() == "Windows":
           tolerance = 1e-10
       else:
           tolerance = 1e-12
           
       assert abs(result - expected) < tolerance

Best Practices Summary
----------------------

1. **Set random seeds** for reproducibility
2. **Use appropriate tolerances** based on calculation type
3. **Test trends** rather than absolute values for convergence
4. **Skip hardware-dependent tests** when environment isn't suitable
5. **Use larger sample sizes** for more stable test results
6. **Parameterize tests** to cover multiple scenarios efficiently
7. **Mark slow tests** to keep regular test runs fast
8. **Document why** specific tolerances were chosen

Example: Complete Test Module
-----------------------------

.. code-block:: python

   """
   Tests for portfolio pricing with best practices.
   """
   import numpy as np
   import pytest
   from quactuary.pricing import PricingModel
   from quactuary.distributions import Poisson, Lognormal
   
   
   @pytest.fixture
   def random_seed():
       """Ensure reproducible randomness."""
       np.random.seed(42)
       yield
       np.random.seed(None)
   
   
   @pytest.fixture
   def small_portfolio():
       """Create a small test portfolio."""
       return Portfolio([
           Policy(
               frequency=Poisson(mu=2.0),
               severity=Lognormal(shape=1.0, scale=1000)
           )
           for _ in range(10)
       ])
   
   
   class TestPricingModel:
       """Test pricing model with appropriate tolerances."""
       
       def test_mean_estimation(self, small_portfolio, random_seed):
           """Test mean estimation with stochastic tolerance."""
           model = PricingModel(small_portfolio)
           result = model.simulate(n_sims=10000)
           
           expected_mean = 2.0 * 1000  # frequency * severity mean
           assert result.mean == pytest.approx(expected_mean, rel=0.05)
       
       def test_convergence(self, small_portfolio, random_seed):
           """Test that results converge with more simulations."""
           model = PricingModel(small_portfolio)
           
           # Run with increasing sample sizes
           results = []
           for n_sims in [1000, 10000, 100000]:
               result = model.simulate(n_sims=n_sims)
               results.append(result.mean)
           
           # Later results should be more stable
           diff_small = abs(results[1] - results[0]) / results[0]
           diff_large = abs(results[2] - results[1]) / results[1]
           
           assert diff_large < diff_small * 0.5  # Convergence
       
       @pytest.mark.slow
       def test_large_portfolio_memory(self):
           """Test memory handling for large portfolios."""
           large_portfolio = generate_portfolio(n_policies=100000)
           
           model = PricingModel(large_portfolio)
           result = model.simulate(
               n_sims=1000,
               optimization_config=OptimizationConfig(
                   use_memory_optimization=True,
                   batch_size=10000
               )
           )
           
           assert result.success
       
       @pytest.mark.skipif(
           not is_ci_environment(),
           reason="Performance regression tests only run in CI"
       )
       def test_performance_regression(self, small_portfolio):
           """Test performance hasn't regressed."""
           import time
           
           model = PricingModel(small_portfolio)
           
           start = time.time()
           model.simulate(n_sims=10000)
           duration = time.time() - start
           
           baseline_duration = load_baseline_duration()
           assert duration < baseline_duration * 1.2  # 20% margin

See Also
--------

* :doc:`/development/contributing` - General contribution guidelines
* :doc:`/development/testing_guidelines` - Testing guidelines and best practices
* `pytest documentation <https://docs.pytest.org/>`_ - Official pytest docs