Make coding suggestions based on a user's open-ended text input.Source:
text input, find up to
num_suggestions possible occupation categories.
get_job_suggestions( text, suggestion_type = "auxco-1.2.x", num_suggestions = 5, suggestion_type_options = list(), aggregate_score_threshold = 0.02, item_score_threshold = 0, distinctions = TRUE, steps = list(simbased_wordwise = list(algorithm = algo_similarity_based_reasoning, parameters = list(sim_name = "wordwise")), simbased_substring = list(algorithm = algo_similarity_based_reasoning, parameters = list(sim_name = "substring"))), include_general_id = FALSE )
The raw text input from the user.
Which type of suggestion to use / provide. Possible options are "auxco-1.2.x" and "kldb-2010".
The maximum number of suggestions to show. This is an upper bound and less suggestions may be returned. Defaults to 5.
A list with options for generating suggestions. Supported options: -
datasets: Pass specific datasets to be used whenn adding information to predictions e.g. use a specific version of the kldb or auxco. Supported datasets are: "auxco-1.2.x", "kldb-2010". By default the datasets bundled with this package are used.
A single value or named list of thresholds between 0 and 1. If it is a list, each entry should correspond to one of the
steps. If it is a single value, it will apply to all steps. Results from that step will only be returned if the sum of their scores is equal to or greater than the specified threshold. With a aggregate_score_threshold of 0 results will always be returned (if there are any).
A threshold between 0 and 1 (usually very small, default 0). Results from any step will only be returned if they are greater than the specified threshold. Allows the removal of highly implausible suggestions.
Whether or not to add additional distinctions to similar occupational categories to the source code. Defaults to TRUE.
A list with the algorithms to use and their parameters. Each entry of the list should contain a nested list with two entries: algorithm (the algorithm's function itself) and parameters (the parameters to pass onto the algorithm). Each algorithm will also always have access to a default set of three parameters:
text_processed: The input text after preprocessing
suggestion_type: Which type of suggestion to output
num_suggestions: How many suggestions shall be returned These parameters must not be specified manually and will be provided automatically instead. Defaults to:
list( # try similarity "one word at most 1 letter different" first list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "wordwise", min_aggregate_prob = 0.535 ) ),# since everything else failed, try "substring" similarity list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "substring", min_aggregate_prob = 0.02 ) ))
Whether a general column, called "id" should always be returned. This will automatically contain the appropriate id for different suggestion_types i.e. for "auxco-1-2.x" it will contain the same data as the column "auxco_id".
The procedure implemented here is, roughly speaking, as follows:
Predict categories from KldB 2010, including their scores. The first algorithm mentioned in
stepsis used (default:
Convert the predicted KldB 2010 categories to
auxco-1.2.x, an n:m mapping, scores are mapped accordingly.). See internal function
Remove predicted categories if their score is below
item_score_thresholdand only keep the
Start anew, trying the next algorithm in
steps, if the the top-ranked suggestions have a low chance to be correct. (Technically, this happens if the summed score of the
num_suggestionstop-ranked suggestions is below
suggestion_type == "auxco-1.2.x"and
distinctions == TRUE, insert additional and (highly) similar categories or replace existing ones. See internal function
add_distinctions_auxco(). Reorder and keep only the
num_suggestionstop-ranked suggestions. Auxco categories which were added during this step can be identified by their scores: It equals 0.05 for categories with high similarity and 0.005 for categories with medium similarity.