Resampling

To estimate the accuracy and precision of a GRID results, you can analyze a set of randomly chosen values from the measured survival time distributions and repeat this process a certain number of times. This can be done with resample_and_fit().

This function works very similar to the other fitting functions, but you need to provide some extra information. To perform the resampling you need to define the parameters as described here. You will also need to define the number of times you want to perform the resampling (n), the percentage of data you want to use to create a random set (perc) and the fitting mode (fit_mode). Furthermore, you can perform the resampling in a multiprocessed way or in a sequential way. To perform the resampling in a multiprocessed way (which is a faster) you need to set multiprocess_flag to True, and you can then also set the maximum number of workers, which is limited and defaults to the number of logical cores on your pc - 1.

Warning

If you want to perform the resampling in a multiprocessed manner then you need to encapsulate resample_and_fit() in a specific type of if-statement, namely the following:

import gridlib

if __name__ == "__main__":
    fit_result_full, fit_results_resampled = gridlib.resample_and_fit(...)

Here is an example, where we perform 200 resamples with sets of 80% randomly chosen data points and perform GRID fitting on this. This resampling and fitting is done in a multiprocessed way. See the example:

import numpy as np
import matplotlib.pyplot as plt
import gridlib
import gridlib.io
import gridlib.plot

# if __name__ == "__main__" is required for the multiprocessing in the
# resampling_grid function to work.
if __name__ == "__main__":  # required for multiprocessing, not required for sequential

    # Load the data
    data = gridlib.io.read_data_survival_function("examples/data/example1.csv")

    # Set the parameters for the GRID fitting
    parameters = {
        "k_min": 10 ** (-3),
        "k_max": 10**1,
        "N": 200,
        "scale": "log",
        "reg_weight": 0.01,
        "fit_a": True,
        "a_fixed": None,
    }

    # Perform the resampling, the number of resamplings is set to 200, the percentage
    # of data to use per resampling is set to 80% and the fitting mode is set to the
    # GRID fitting procedure.
    fit_result_full, fit_results_resampled = gridlib.resample_and_fit(
        parameters,
        data,
        n=200,
        perc=0.8,
        fit_mode="grid",
        multiprocess_flag=True,
    )

    # Uncomment the next lines and change the path str to the preferred path to save the
    # fit results
    # gridlib.io.write_data_grid_resampling(
    #     "path/to/file_resampling_data.mat", fit_result_full, fit_results_resampled
    # )

    # Plot the resampled data
    fig1, ax1 = gridlib.plot.event_spectrum_heatmap(
        fit_result_full, fit_results_resampled
    )
    fig2, ax2 = gridlib.plot.state_spectrum_heatmap(
        fit_result_full, fit_results_resampled
    )

    # Set the titles
    ax1.set_title("Resampling event spectrum")
    ax2.set_title("Resampling state spectrum")

    plt.show()