Make coding suggestions based on a user's open-ended text input.
Source:R/suggestions.R
get_job_suggestions.Rd
Given a text
input, find up to num_suggestions
possible occupation categories.
Usage
get_job_suggestions(
text,
suggestion_type = "auxco-1.2.x",
num_suggestions = 5,
suggestion_type_options = list(),
aggregate_score_threshold = 0.02,
item_score_threshold = 0,
distinctions = TRUE,
steps = list(simbased_wordwise = list(algorithm = algo_similarity_based_reasoning,
parameters = list(sim_name = "wordwise")), simbased_substring = list(algorithm =
algo_similarity_based_reasoning, parameters = list(sim_name = "substring"))),
include_general_id = FALSE
)
Arguments
- text
The raw text input from the user.
- suggestion_type
Which type of suggestion to use / provide. Possible options are "auxco-1.2.x" and "kldb-2010".
- num_suggestions
The maximum number of suggestions to show. This is an upper bound and less suggestions may be returned. Defaults to 5.
- suggestion_type_options
A list with options for generating suggestions. Supported options: -
datasets
: Pass specific datasets to be used whenn adding information to predictions e.g. use a specific version of the kldb or auxco. Supported datasets are: "auxco-1.2.x", "kldb-2010". By default the datasets bundled with this package are used.- aggregate_score_threshold
A single value or named list of thresholds between 0 and 1. If it is a list, each entry should correspond to one of the
steps
. If it is a single value, it will apply to all steps. Results from that step will only be returned if the sum of their scores is equal to or greater than the specified threshold. With a aggregate_score_threshold of 0 results will always be returned (if there are any).- item_score_threshold
A threshold between 0 and 1 (usually very small, default 0). Results from any step will only be returned if they are greater than the specified threshold. Allows the removal of highly implausible suggestions.
- distinctions
Whether or not to add additional distinctions to similar occupational categories to the source code. Defaults to TRUE.
- steps
A list with the algorithms to use and their parameters. Each entry of the list should contain a nested list with two entries: algorithm (the algorithm's function itself) and parameters (the parameters to pass onto the algorithm). Each algorithm will also always have access to a default set of three parameters:
text_processed: The input text after preprocessing
suggestion_type: Which type of suggestion to output
num_suggestions: How many suggestions shall be returned These parameters must not be specified manually and will be provided automatically instead. Defaults to:
list( # try similarity "one word at most 1 letter different" first list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "wordwise", min_aggregate_prob = 0.535 ) ), # since everything else failed, try "substring" similarity list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "substring", min_aggregate_prob = 0.02 ) ) )
- include_general_id
Whether a general column, called "id" should always be returned. This will automatically contain the appropriate id for different suggestion_types i.e. for "auxco-1-2.x" it will contain the same data as the column "auxco_id".
Details
The procedure implemented here is, roughly speaking, as follows:
Predict categories from KldB 2010, including their scores. The first algorithm mentioned in
steps
is used (default:algo_similarity_based_reasoning()
).Convert the predicted KldB 2010 categories to
suggestion_type
(default:auxco-1.2.x
, an n:m mapping, scores are mapped accordingly.). See internal functionconvert_suggestions()
for details.Remove predicted categories if their score is below
item_score_threshold
and only keep thenum_suggestions
top-ranked suggestions.Start anew, trying the next algorithm in
steps
, if the the top-ranked suggestions have a low chance to be correct. (Technically, this happens if the summed score of thenum_suggestions
top-ranked suggestions is belowaggregate_score_threshold
.)If
suggestion_type == "auxco-1.2.x"
anddistinctions == TRUE
, insert additional and (highly) similar categories or replace existing ones. See internal functionadd_distinctions_auxco()
. Reorder and keep only thenum_suggestions
top-ranked suggestions. Auxco categories which were added during this step can be identified by their scores: It equals 0.05 for categories with high similarity and 0.005 for categories with medium similarity.
Examples
if (FALSE) { # \dontrun{
if (interactive()) {
get_job_suggestions("Koch")
}
if (interactive()) {
get_job_suggestions("Schlosser")
}
} # }