Skip to content
Unverified Commit 90afa9ae authored by Lars H. B. Olsen's avatar Lars H. B. Olsen Committed by GitHub
Browse files

Harmonize batch distribution ++ (#359)



* Bugfix in `prepare_data()` related to vector of approaches. When using several approaches the old version only used the first approach. Verified this by adding a print in each prepare_data.approach() function and saw that only the first approach in internal$parameters$approach was used.
Can maybe remove code comments before pull request is accepted. Maybe a better method to get the approach?

Also updated roxygen2 for the function, as it seemed that it was reflecting the old version of shapr(?) due to arguments which are no longer present.

However, one then get a warning when creating the roxygen2 documentation. Discuss some solutions as comments below. Discuss with Martin.

* # Lars have added `n_combinations` - 1 as a possibility, as the function `check_n_batches` threw an error for the vignette with gaussian approach with `n_combinations` = 8 and `n_batches = NULL`, as this function here then set `n_batches` = 10, which was too large. We subtract 1 as `check_n_batches` function specifies that `n_batches` must be strictly less than `n_combinations`.

* Samll typo.

* Fixed bug. All messages says "n_combinations is larger than or equal to 2^m", but all the test only tested for "larger than". I.e., if the user specified n_combinations = 2^m in the call to shapr::explain, the function would not treat it as exact.

* Added script demonstrating the bug that shapr does not enter the exact mode when `n_combinations = 2^m`, before the bugfix.

* Added (tentative) test that checks that shapr enters exact mode when `n_combinations >= 2^m`. Remove the large comment after discussing that with Martin.

* Added script that demonstrates the bug before the bugfix, and added test checking that we do not get an error when runing the code after the bugfix has been applied.

* Fixed lint warnings in `approach.R`.

* Added two parameters to the `internal$parameters` list which contains the number of approaches and the number of unique approaches.

This is for example useful to check that the provided `n_batches` is a valid value. (see next commits)

* Added test to check that `n_batches` must be larger than or equal to the number of unique approaches. Before the user could, e.g., set `n_batches = 2`, but use 4 approaches and then shapr would use 4 but not update `n_batches` and without giwing a warning to the user.

* Updated `get_default_n_batches` to take into consideration the number of unique approaches that is used. This was not done before and gave inconsistency in what number shapr would reccomend and use when `n_batches` was set to `null` by the user.

* Changed where seed is set such that it applies for both regular and combined approaches.
Furthermore, added if test, because previous version resulted in not reproducible code, as setting seed to `null` ruins that we set seed in `explain()`.

Just consider this small example:
# Set seed to get same values twice
set.seed(123)
rnorm(1)

# Seting the same seed gives the same value
set.seed(123)
rnorm(1)

# If we also include null then the seed is removed and we do not get the same value
set.seed(123)
set.seed(NULL)
rnorm(1)

# Setining seed to null actually gives a new "random" number each time.
set.seed(123)
set.seed(NULL)
rnorm(1)

* Typo

* Added test to check that setting the seed works for combined approaches.

* typo in test function

* Added file to demonstrate the bugs (before the bugfix)

* Added new test

* Updated tests by removing n_samples

* Added a bugfix to shapr not using the correct number of batches. Maybe not the most elegant solution.

* Updated the demonstration script

* Added last test and fixed lintr

* Lint again.

* styler

* minor edits to tests

* simplifies comment

* comb files ok

* Updated bug in independence approach related to categorical features which caused shapr to crash later. Added comments when I debuged to understand what was going on. I have added some comments about some stuff I did no understand/agree with. Discuss with Martin and correct this before merge.

* Updated bug in independence approach related to categorical features which caused shapr to crash later. Added comments when I debuged to understand what was going on. I have added some comments about some stuff I did no understand/agree with. Discuss with Martin and correct this before merge.

* lint warning

* Lint

* lint

* updated test files after accepting new values

* adjustments to comments and Lars' TODO-comments

* update snapshot file after weight adjustment

* cleaned up doc

* rerun doc

* style

* Changed to `n_batches = 10` in the combined approaches, as the previous value (`n_batches = 1`) is not allowed anymore as it is lower than the number of unique used approaches.

* accept OK test changes

* additonal Ok test files

* change batches in test files

* accept new files

* handle issue with a breaking change update in the testthat package

* + these

* removing last (unused) input of approach

* updating tests

* + update setup tests/snaps

* correcting unique length

* update linting and vignette

* update docs

* fix example issue

* temporary disable tests on older R systems

* remove unecessary if-else test

* data.table style on Lars's batch adjustment suggestion

* del comment

* lint

---------

Co-authored-by: default avatarMartin <jullum@nr.no>
parent a62005f6
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment