Title: | Report on Diversity and Inclusion in a Corporate Setting |
---|---|
Description: | Facilitate the analysis of teams in a corporate setting: assess the diversity per grade and job, present the results, search for bias (in hiring and/or promoting processes). It also provides methods to simulate the effect of bias, random team-data, etc. White paper: 'Philippe J.S. De Brouwer' (2021) <http://www.de-brouwer.com/assets/div/div-white-paper.pdf>. Book (chapter 36): 'Philippe J.S. De Brouwer' (2020, ISBN:978-1-119-63272-6) and 'Philippe J.S. De Brouwer' (2020) <doi:10.1002/9781119632757>. |
Authors: | Philippe J.S. De Brouwer [aut, cre] |
Maintainer: | Philippe J.S. De Brouwer <[email protected]> |
License: | AGPL (>= 3) |
Version: | 0.3.1 |
Built: | 2024-10-31 18:37:06 UTC |
Source: | https://github.com/cran/div |
This function calculates the entropy of a system with discrete states
div_add_median_label( d, colName = "age", value1 = "T", value2 = "F", newColName = "isYoung" )
div_add_median_label( d, colName = "age", value1 = "T", value2 = "F", newColName = "isYoung" )
d |
tibble, a tibble with team data columns as defined in the documentation (at least the column colName (as set by next parameter), 'grade', and 'jobID') |
colName |
the name of the columns that contains the factor object to be used as explaining dimension for the paygap (defaults to 'gender') |
value1 |
character, the label to be used for the first half of observations (the smallest ones) |
value2 |
character, the label to be used for the second half of observations (the biggest ones) |
newColName |
the value in new column name that will hold the values value1 and value2 |
dataframe (with columns grade, jobID, salary_selectedValue, salary_others, n_selectedValue, n_others, paygap, confidence) , where "confidence" is one of the following: NA = not available (numbers are too low), "" = no bias detectable, "." = there might be some bias, but we're not sure, "*" = bias detected wit some degree of confidence, "**" = quite sure there is bias, "***" = trust us, this is biased.
df <- div_add_median_label(div_fake_team()) colnames(df)
df <- div_add_median_label(div_fake_team()) colnames(df)
Function to calculate the confidence interval for the median
div_ci_median(x, conf = 0.95)
div_ci_median(x, conf = 0.95)
x |
numeric, data from which the median is calcualted |
conf |
numeric, the confidence interval as 1 - P(x < x0) |
ci (confidence interval object)
x <- 1:100 div_ci_median(x)
x <- 1:100 div_ci_median(x)
This function returns a colour (R named colour) based on the confidence level
div_conf_colour(x)
div_conf_colour(x)
x |
the string associated to the paygap confidence: NA, ”, ',', '*', '***', '***' |
string (named colour)
div_conf_colour("*")
div_conf_colour("*")
This function generates a data frame with data for a team (with salaries, gender, FTE, etc). This is a good start to test the package and to experiment what level of bias will be visible in the paygap for example.
div_fake_team( seed = 100, N = 200, genders = c("F", "M", "O"), gender_prob = c(0.4, 0.58, 0.02), gender_salaryBias = c(1, 1.1, 1), jobIDs = c("sales", "analytics"), jobID_prob = c(0.6, 0.4), citizenships = c("Polish", "German", "Italian", "Indian", "Other"), citizenship_prob = c(0.6, 0.2, 0.1, 0.05, 0.05) )
div_fake_team( seed = 100, N = 200, genders = c("F", "M", "O"), gender_prob = c(0.4, 0.58, 0.02), gender_salaryBias = c(1, 1.1, 1), jobIDs = c("sales", "analytics"), jobID_prob = c(0.6, 0.4), citizenships = c("Polish", "German", "Italian", "Indian", "Other"), citizenship_prob = c(0.6, 0.2, 0.1, 0.05, 0.05) )
seed |
numeric, the seed to be used in set.seed() |
N |
numeric, the size of the team to be used (default = 200) |
genders |
character, a vector of the genders to be used |
gender_prob |
numeric, relative probabilities of the different genders to occur (must have the same length as 'genders') |
gender_salaryBias |
numeric, vector with the relative salaries of the different genders (must have the same length as 'genders') |
jobIDs |
character, a vector with the labels of the job categories in the team (they will appear in each grade) |
jobID_prob |
numeric, a vector with the relative sizes of the different jobs in the team (must have the same length as 'jobIDs') |
citizenships |
character, a vector of the citizenships to be generated |
citizenship_prob |
numeric, relative probabilities of the different citizenships to occur (must have the same length as 'citizenships') |
dataframe (employees of the random team)
library(div) d <- div_fake_team() head(d) diversity(table(d$gender))
library(div) d <- div_fake_team() head(d) diversity(table(d$gender))
This function produces one or more gauge plots coloured in red (R), amber (A) or green (G) for a value between 0 and 1.
div_gauge_plot(df, breaks = c(0, 0.8, 0.95, 1), ncol = NULL, nbrSize = 6)
div_gauge_plot(df, breaks = c(0, 0.8, 0.95, 1), ncol = NULL, nbrSize = 6)
df |
tibble, a tibble with columns "value" and "label" (value = the values between 0 and 1; - label = text to show e.g. paste("group", colnames(t))) |
breaks |
numeric vector with the lower limit, the border between green and amber, the border between amber and red, and the upper limit |
ncol |
numeric, the number of columns to produce |
nbrSize |
numeric, the font size for the label |
ggplot object
d <- div_fake_team() tbl_gender_div <- table(d$gender, d$grade) %>% apply(2, diversity, prior = c(50.2, 49.8)) %>% tibble(value = ., label = paste("Grade", names(.))) div_gauge_plot(tbl_gender_div, ncol = 2, nbrSize = 4)
d <- div_fake_team() tbl_gender_div <- table(d$gender, d$grade) %>% apply(2, diversity, prior = c(50.2, 49.8)) %>% tibble(value = ., label = paste("Grade", names(.))) div_gauge_plot(tbl_gender_div, ncol = 2, nbrSize = 4)
This function formats the paygap matrix (created by div_paygap()) and prepares it for printing via the function knitr::kable()
div_parse_paygap( pg, label = NULL, min_nbr_show = NULL, max_length_jobID = 12, max_length_colnames = 9 )
div_parse_paygap( pg, label = NULL, min_nbr_show = NULL, max_length_jobID = 12, max_length_colnames = 9 )
pg |
paygap object as created by div::div_paygap(). This is an S3 object with a specific structure |
label |
character, the label to be used in the caption of the kable object |
min_nbr_show |
numeric, if provided then only groups that have more than min_nbr_show employees in both categories (selectedValue and others) will be shown |
max_length_jobID |
numeric, if provided the maximal length of the column jobID (in characters) |
max_length_colnames |
numeric, if provided the maximal length of the column names (in characters) |
knitr::kable object (for LaTeX)
d <- div_fake_team() pg <- div_paygap(d) div_parse_paygap(pg)
d <- div_fake_team() pg <- div_paygap(d) div_parse_paygap(pg)
This function calculates the entropy of a system with discrete states
div_paygap(d, x = "gender", y = "salary", x_ctrl = "F", ctrl_var = "age")
div_paygap(d, x = "gender", y = "salary", x_ctrl = "F", ctrl_var = "age")
d |
tibble, a tibble with columns as definded |
x |
the name of the columns that contains the factor object to be used as explaining dimension for the paygap (defaults to 'gender') |
y |
the name of the columns that contains the numeric value to be used to calculate the paygap (could be salary or bonus for example) |
x_ctrl |
the value in the column defined by x that should be isolated (this versus the others), defaults to 'F' |
ctrl_var |
a control variable to be added (shows median per group for that variable) |
dataframe (with columns grade, jobID, salary_x_ctrl, salary_others, n_x_ctrl, n_others, paygap, confidence) , where "confidence" is one of the following: NA = not available (numbers are too low), "" = no bias detectable, "." = there might be some bias, but we're not sure, "*" = bias detected wit some degree of confidence, "**" = quite sure there is bias, "***" = trust us, this is biased.
df <- div_paygap(div_fake_team()) df
df <- div_paygap(div_fake_team()) df
Plots a histogram, a normal distribution with the same standard deviation and mean as well as one with a mean centred around 1
div_plot_paygap_distribution(x, label = "Gender", mu_unbiased = 1)
div_plot_paygap_distribution(x, label = "Gender", mu_unbiased = 1)
x |
numeric vector, column of paygap observations |
label |
character, prefix for the title |
mu_unbiased |
numeric, the mean of the unbiased distribution (for paygaps this should be 1) |
ggplot2 object
d <- div_fake_team() pg <- div_paygap(d) div_plot_paygap_distribution(pg$data$paygap)
d <- div_fake_team() pg <- div_paygap(d) div_plot_paygap_distribution(pg$data$paygap)
This function all numbers to zero decimals, except the paygap (which is rounded to 2 decimals):
div_round_paygap(x)
div_round_paygap(x)
x |
paygap object (output of div::div_paygap()) |
the paygap data-frame (tibble only, not the whole paygap object)
d <- div_fake_team() pg <- div_paygap(d) div_round_paygap(pg)
d <- div_fake_team() pg <- div_paygap(d) div_round_paygap(pg)
This function calculates the entropy of a system with discrete states
diversity(x, prior = NULL)
diversity(x, prior = NULL)
x |
numeric vector, observed probabilities of the classes |
prior |
numeric vector, the prior probabilities of the classes |
the entropy or diversity measure
x <- c(0.4, 0.6) diversity(x)
x <- c(0.4, 0.6) diversity(x)
print the paygap object in the terminal
## S3 method for class 'paygap' print(x, ...)
## S3 method for class 'paygap' print(x, ...)
x |
paygap object, as created by the function div_paygpa() |
... |
arguments passed on to the generic print function: print(x$data) |
text output
library(div) div_fake_team() %>% div_paygap %>% print
library(div) div_fake_team() %>% div_paygap %>% print
Title
## S3 method for class 'paygap' summary(object, ...)
## S3 method for class 'paygap' summary(object, ...)
object |
paygap S3 object, as created by the function dif_paygap() |
... |
passed on to summary() |
a summary of the paygap object
library(div) d <- div_fake_team() pg <- div_paygap(d) summary(pg)
library(div) d <- div_fake_team() pg <- div_paygap(d) summary(pg)