bgp package

Subpackages

Submodules

bgp.base module

Base objects for symbolic regression.

Contains:
  • Class: SymbolSet

  • Class: CalculatePrecisionSet

  • Class: SymbolTree

  • others

class bgp.base.CalculatePrecisionSet(pset, scoring=None, score_pen=(1,), filter_warning=True, cv=1, cal_dim=True, dim_type=None, fuzzy=False, add_coef=True, inter_add=True, inner_add=False, vector_add=False, out_add=False, flat_add=False, n_jobs=1, batch_size=20, tq=True, details=False, classification=False, score_object='y', batch_para=False)

Bases: bgp.base.SymbolSet

Add score method to SymbolSet. The object can get from a worked SymbolSet object.

Parameters
  • pset (SymbolSet) – SymbolSet.

  • scoring (Callbale, default is sklearn.metrics.r2_score.) – See Also sklearn.metrics.

  • filter_warning (bool) – bool.

  • score_pen (tuple of float) –

    1 : best is positive, worse -np.inf.

    -1 : best is negative, worse np.inf.

    0 : best is positive , worst is 0.

  • cal_dim (bool) – calculate dim or not, if not return dless.

  • add_coef (bool) – bool.

  • inter_add (bool) – bool.

  • inner_add (bool) – bool.

  • fuzzy (bool) – fuzzy or not.

  • dim_type (object) – if None, use the y_dim.

  • n_jobs (int) – running core.

  • batch_size (int) – batch size, advice batch_size*n_jobs = inds.

  • tq (bool) – bool.

  • cv (sklearn.model_selection._split._BaseKFold, int) –

    the shuffler must be False!

    use cv spilt for score, return the mean_test_score.

    use cv spilt for predict, return the cv_predict_y.(not be used)

    Notes:

    if cv and refit, all the data is refit to determination the coefficients.

    Thus the expression is not compact with the this scores, when re-calculated by this expression

  • details (bool) – return the expr and predict y cor not.

  • classification (bool) – classification or not.

  • score_object – score by y or delta y (for implicit function).

calculate_cv_score(ind)

just used for calculating single one or check.

calculate_detail(ind)

just used for calculated final best one result for showing.

calculate the best expression.

Parameters

ind (SymbolTree) – best expression.

calculate_expr(expr)

just used for calculated final result for showing.

Parameters

ind (sympy.Expr) –

calculate_score(ind)

just used for calculating single one or check with cv=1.

Parameters

ind (SymbolTree) –

calculate_simple(ind)

just used for re_Tree, and showing.

calculate the best expression.

Parameters

ind (SymbolTree) –

compile_context(ind)

transform SymbolTree to sympy.Expr.

hasher

alias of str

parallelize_calculate_expr(exprs)

just used for final results, calculate exprs.

parallelize_score(inds)

The main score in each generation of GP!

Parameters

inds (list of SymbolTree) – list of expressions

parallelize_try_add_coef_times(exprs, grid_x=None, resample_number=500)

to be continued

try_add_coef_times(expr, grid_x=None)

just used for best result, try add coefficient to expr.

update(pset)

updata self by input pset.

update_with_X_y(X, y)

replace x, y data.

class bgp.base.ShortStr(st)

Bases: object

short version of tree, just left name to simplify the store and transmit.

class bgp.base.SymbolPrimitive(name, arity)

Bases: object

General operator type, do not use directly, but use SymbolPrimitiveDetail.

Parameters
  • name (str) – function name.

  • arity (int) – input parameters numbers of function. such as + with 2, ln with 1.

format_repr(*args)
format_str(*args)
class bgp.base.SymbolPrimitiveDetail(name, arity, func, prob, np_func=None, dim_func=None, sym_func=None)

Bases: bgp.base.SymbolPrimitive

General operator type with more details.

Parameters
  • func (Callable) –

    function. better using sympy.Function Type.

    For Maintainer:

    If self function and can not be simplified to sympy.Function or elementary function, the function for function.np_map() and dim.dim_map() should be defined.

  • name (str) – function name.

  • arity (int) – function input numbers.

  • prob (float) – default 1.

capsule()

return short one.

class bgp.base.SymbolSet(name='PSet')

Bases: object

Definite the preparation set of operations, features, and fixed constants.

Parameters

name (str) – name.

add_accumulative_operation(categories=None, categories_prob='balance', self_categories=None, special_prob=None)

add accumulative operation.

Parameters
  • categories (tuple of str) – categories=(“Self”, “MAdd”, “MSub”, “MMul”, “MDiv”)

  • categories_prob (None, "balance" or float.) –

    probility of categories in (0, 1], except (“Self”, “MAdd”, “MSub”, “MMul”, “MDiv”),

    ”balance” is 1/n_categories.

    ”MSub”, “MMul”, “MDiv” are only worked on the size of group is 2, else work like “Self”.

    Notes:

    the (“Self”, “MAdd”, “MSub”, “MMul”, “MDiv”) are set as 1 to be a standard.

  • self_categories (list of dict, None) –

    the dict can be generate from newfuncD or defination self.

    the function at least containing:

    {“func”: func, “name”: name, “np_func”: npf, “dim_func”: dimf, “sym_func”: gsymf}

    1.func:sympy.Function(name) object, which need add attributes: is_jump, keep.

    2.name:name

    3.np_func:numpy function

    4.dim_func:dimension function

    5.sym_func:NewArray function. (unpack the group, used just for shown)

    See Also bgp.newfunc.newfuncV

  • special_prob (None or dict) –

    Examples:

    {“MAdd”:0.5, “Self”:0.5}

add_constants(c, c_dim=1, c_prob=None)

Add features with dimension and probability.

Parameters
  • c_dim (1, list of Dim) – the same size wih c.

  • c (float, list) – list of float.

  • c_prob (None, float, list of float) – the same size with c.

add_features(X, y, x_dim=1, y_dim=1, x_prob=None, x_group=None, feature_name=None)

Add features with dimension and probability.

Parameters
  • X (np.ndarray) – 2D data.

  • y (np.ndarray) – 1D data.

  • feature_name (None, list of str) – the same size wih x.shape[1].

  • x_dim (1 or list of Dim) – the same size wih x.shape[1], default 1 is dless for all x.

  • y_dim (1, Dim) – dim of y.

  • x_prob (None, list of float) – the same size wih x.shape[1].

  • x_group (None or list of list, int) – features group.

add_features_and_constants(X, y, c=None, x_dim=1, y_dim=1, c_dim=1, x_prob=None, c_prob=None, x_group=None, feature_name=None)

combination of add_constant and add_features.

add_operations(power_categories=None, categories=None, self_categories=None, power_categories_prob='balance', categories_prob='balance', special_prob=None)

Add operations with probability.

Parameters
  • power_categories (Sized, tuple, None) –

    Examples:

    (0.5, 2, 3)

  • categories (tuple of str) –

    map table:

    {‘Add’: sympy.Add, ‘Sub’: Sub, ‘Mul’: sympy.Mul, ‘Div’: Div} {“sin”: sympy.sin, ‘cos’: sympy.cos, ‘exp’: sympy.exp, ‘ln’: sympy.ln, } {‘Abs’: sympy.Abs, “Neg”: functools.partial(sympy.Mul, -1.0), } “Rec”: functools.partial(sympy.Pow, e=-1.0)}

    Others:

    ”Rem”: f(x)=1-x, if x true

    ”Self”: f(x)=x, if x true

  • categories_prob ("balance", float) – probability of categories, except (+, -, /), in (0, 1]. “balance” is 1/n_categories. The (+, -, /) are set as 1 to be a standard.

  • special_prob (None, dict) –

    prob for special name.

    Examples:{“Mul”:0.6, “Add”:0.4, “exp”:0.1}

  • power_categories_prob ("balance", float) – float in (0, 1]. probability of power categories, “balance” is 1/power_categories_prob.

  • self_categories (list of dict, None) –

    the dict can be generate from newfuncV or definition self.

    the function at least containing: {“func”: func, “name”: name, “arity”:2, “np_func”: npf, “dim_func”: dimf, “sym_func”: gsymf}

    1.func:sympy.Function(name) object

    2.name:name

    3.arity:int, the number of parameter

    4.np_func:numpy function

    5.dim_func:dimension function

    6.sym_func:NewArray function. (unpack the group, used just for shown)

    See Also bgp.newfunc.newfuncV

add_tree_to_features(Tree, prob=0.3)

Add the individual as a new feature to initial features. not sure add success, because the value and name should be check and different to exist.

Parameters
  • Tree (SymbolTree) – individual or expression

  • prob (int) – probability of this individual

bonding_personal_maps(pers)

Personal preference add to permap more control can be found by pset.premap

Bond the points with ratio. the others would be penalty.

For example set the [1, 2, 0.9], the others bond such as (1, 2), (1, 3), (1, 4),…,(2, 3), (2, 4)…would be with small prob.

Parameters

pers (list of list) –

Examples:

[[index1, index2, prob][…]] the prob is [0, 1), 1 means the force binding.

property data_x
property dim_ter_con_list
property dispose

accumulate operators

property free_symbol
static get_values(v, mean=False)

get list of dict values

property init_free_symbol
property primitives

operators

property prob_dispose_list
property prob_pri_list
property prob_ter_con_list
register(primitives_dict='all', dispose_dict='all', ter_con_dict='all')

Register and capsule for simplify.

Parameters
  • primitives_dict (None, str, dict) –

  • dispose_dict (None, str, dict) –

  • ter_con_dict (None, str, dict) –

replace(X, y=None, tree_X=None)
set_personal_maps(pers)

personal preference add to permap. more control can be found by pset.premap.***

Just set couples of points and don’t chang others.

Parameters

pers (list of list) –

Examples:

[[index1, index2, prob]], the prob in [0, 1).

property terminalRatio

Return the ratio of the number of terminals on the number of all kind of primitives.

property terminals_and_constants
property terminals_and_constants_repr
property types
class bgp.base.SymbolTerminal(name, init_name=None)

Bases: object

General feature type, do not use directly.

The name for show (str) and calculation (repr) are set to different string for avoiding repeated calculations.

Parameters
  • name (str) – Represent name. Default “xi”.

  • init_name (str) –

    Just for show, rather than calculate.

    Examples:

    init_name=[x1, x2] , if is compact features, need[].

    init_name=(x1*x4-x3), if is expr, need ().

format_repr()

representing name

format_str()

represented name

class bgp.base.SymbolTerminalDetail(values, name, dim=None, prob=None, init_sym=None, init_name=None)

Bases: bgp.base.SymbolTerminal

General feature type.

The name for show (str) and calculation (repr) are set to different string for avoiding repeated calculations.

Parameters
  • values (None, number or np.ndarray) – xi value, the shape can be (n, ) or (n_x, n), n is number of samples, n_x is numbers of feature.

  • name (str) – Represent name. Default “xi”

  • dim (bgp.dim.Dim or None) – None.

  • prob (float or None) – None.

  • init_sym (list, sympy.Expr) – list.

  • init_name (str or None) –

    Just for show, rather than calculate.

    Examples:

    init_name=”[x1, x2]” , if is compact features, need[].

    init_name=”(x1*x4-x3)”, if is expr, need ().

capsule()
class bgp.base.SymbolTree(*arg, **kwargs)

Bases: bgp.base._ExprTree

Individual Tree, each tree is one expression. The SymbolTree is only generated by method: genGrow and genFull.

property capsule

return the short one

compress()

drop unnecessary attributes

depart()

take part the expression

classmethod genFull(pset, min_, max_, per=False)

details in genGrow function

classmethod genGrow(pset, min_, max_, per=False)

details in genGrow function

ppprint(pset, feature_name=False)

get a user friendly version

reset()

keep these attribute refreshed

ter_site()

site for feature and constants node

terminals()

Return terminals that occur in the expression tree.

to_expr(pset)

transformed to sympy.Expr

bgp.flow module

Some definition loop for genetic algorithm. All the loop is with same run method.

Contains:

-Class: BaseLoop

one node mate and one tree mutate.

-Class: MultiMutateLoop

one node mate and (one tree mutate, one node Replacement mutate, shrink mutate, difference mutate).

-Class: OnePointMutateLoop

one node Replacement mutate: (keep height of tree)

-Class: DimForceLoop

Select with dimension : (keep dimension of tree)

class bgp.flow.BaseLoop(pset, pop=500, gen=20, mutate_prob=0.5, mate_prob=0.8, hall=1, re_hall=1, re_Tree=None, initial_min=None, initial_max=3, max_value=5, scoring=(<function r2_score>, ), score_pen=(1, ), filter_warning=True, cv=1, add_coef=True, inter_add=True, inner_add=False, vector_add=False, out_add=False, flat_add=False, cal_dim=False, dim_type=None, fuzzy=False, n_jobs=1, batch_size=40, random_state=None, stats=None, verbose=True, migrate_prob=0, tq=True, store=False, personal_map=False, stop_condition=None, details=False, classification=False, score_object='y', sub_mu_max=1, limit_type='h_bgp', batch_para=False)

Bases: mgetool.packbox.Toolbox

Base loop for BGP.

Examples:

if __name__ == "__main__":
    pset = SymbolSet()
    stop = lambda ind: ind.fitness.values[0] >= 0.880963

    bl = BaseLoop(pset=pset, gen=10, pop=1000, hall=1, batch_size=40, re_hall=3,

    n_jobs=12, mate_prob=0.9, max_value=5, initial_min=1, initial_max=2,

    mutate_prob=0.8, tq=True, dim_type="coef", stop_condition=stop,

    re_Tree=0, store=False, random_state=1, verbose=True,

    stats={"fitness_dim_max": ["max"], "dim_is_target": ["sum"]},

    add_coef=True, inter_add=True, inner_add=False, cal_dim=True, vector_add=False,

    personal_map=False)

    bl.run()
Parameters
  • pset (SymbolSet) – the feature x and target y and others should have been added.

  • pop (int) – number of population.

  • gen (int) – number of generation.

  • mutate_prob (float) – probability of mutate.

  • mate_prob (float) – probability of mate(crossover).

  • initial_max (int) – max initial size of expression when first producing.

  • initial_min (None,int) – min initial size of expression when first producing.

  • max_value (int) – max size of expression.

  • limit_type ("height" or "length",","h_bgp") – limitation type for max_value, but don’t affect initial_max, initial_min.

  • hall (int,>=1) – number of HallOfFame (elite) to maintain.

  • re_hall (None or int>=2) – Notes: only valid when hall number of HallOfFame to add to next generation.

  • re_Tree (int) – number of new features to add to next generation. 0 is false to add.

  • personal_map (bool or "auto") –

    “auto” is using ‘premap’ and with auto refresh the ‘premap’ with individual.

    True is just using constant ‘premap’.

    False is just use the prob of terminals.

  • scoring (list of Callable, default is [sklearn.metrics.r2_score,]) – See Also sklearn.metrics

  • score_pen (tuple of 1, -1 or float but 0.) –

    >0 : max problem, best is positive, worse -np.inf. <0 : min problem, best is negative, worse np.inf.

    Notes:

    if multiply score method, the scores must be turn to same dimension in prepossessing or weight by score_pen. Because the all the selection are stand on the mean(w_i*score_i)

    Examples:

    scoring = [r2_score,]
    score_pen= [1,]
    

  • cv (sklearn.model_selection._split._BaseKFold,int) – the shuffler must be False, default=1 means no cv.

  • filter_warning (bool) – filter warning or not.

  • add_coef (bool) – add coef in expression or not.

  • inter_add:bool – add intercept constant or not.

  • inner_add (bool) – add inner coefficients or not.

  • out_add (bool) – add out coefficients or not.

  • flat_add (bool) – add flat coefficients or not.

  • n_jobs (int) – default 1, advise 6.

  • batch_size (int) – default 40, depend of machine.

  • random_state (int) – None,int.

  • cal_dim (bool) – escape the dim calculation.

  • dim_type (Dim or None or list of Dim) –

    “coef”: af(x)+b. a,b have dimension,f(x)’s dimension is not dnan.

    ”integer”: af(x)+b. f(x) is with integer dimension.

    [Dim1,Dim2]: f(x)’s dimension in list.

    Dim: f(x) ~= Dim. (see fuzzy)

    Dim: f(x) == Dim.

    None: f(x) == pset.y_dim

  • fuzzy (bool) – choose the dim with same base with dim_type, such as m,m^2,m^3.

  • stats (dict) –

    details of logbook to show.

    Map:

    values

    = {“max”: np.max, “mean”: np.mean, “min”: np.mean, “std”: np.std, “sum”: np.sum}

    keys

    = {

    “fitness”: just see fitness[0],

    ”fitness_dim_max”: max problem, see fitness with demand dim,

    ”fitness_dim_min”: min problem, see fitness with demand dim,

    ”dim_is_target”: demand dim,

    ”coef”: dim is True, coef have dim,

    ”integer”: dim is integer,

    … }

    if stats is None, default is:

    for cal_dim=True:

    stats = {“fitness_dim_max”: (“max”,), “dim_is_target”: (“sum”,)}

    for cal_dim=False

    stats = {“fitness”: (“max”,)}

    if self-definition, the key is func to get attribute of each ind.

    Examples:

    def func(ind):
        return ind.fitness[0]
    stats = {func: ("mean",), "dim_is_target": ("sum",)}
    

  • verbose (bool) – print verbose logbook or not.

  • tq (bool) – print progress bar or not.

  • store (bool or path) – bool or path.

  • stop_condition (callable) –

    stop condition on the best ind of hall, which return bool,the true means stop loop.

    Examples:

    def func(ind):
        c = ind.fitness.values[0]>=0.90
        return c
    

  • details (bool) – return expr and predict_y or not.

  • classification (bool) – classification or not.

  • score_object – score by y or delta y (for implicit function).

check_height_length(pop, site='')
maintain_halls(population)

maintain the best p expression

re_add()

add the expression as a feature

re_fresh_by_name(*arr)
run(warm_start=False, new_gen=None)
Parameters
  • warm_start (bool) – warm_start from last result.

  • new_gen – new generations for warm_startm, default is the initial generations.

to_csv(data_all)

store to csv

top_n(n=10, gen=- 1, key='value', filter_dim=True, ascending=False)

Return the best n results.

Note

Only valid in store=True.

Parameters
  • n (int) –

  • gen – the generation, default is -1.

  • key (str) – sort keys, default is “values”.

  • filter_dim – filter no-dim expressions or not.

  • ascending – reverse.

Returns

  • top n results.

  • pd.DataFrame

varAnd(*arg, **kwargs)
class bgp.flow.DimForceLoop(*args, **kwargs)

Bases: bgp.flow.MultiMutateLoop

Force select the individual with target dim for next generation

See also BaseLoop

class bgp.flow.MultiMutateLoop(*args, **kwargs)

Bases: bgp.flow.BaseLoop

multiply mutate method.

See also BaseLoop

varAnd(population, toolbox, cxpb, mutpb)
class bgp.flow.OnePointMutateLoop(*args, **kwargs)

Bases: bgp.flow.BaseLoop

limitation height of population, just use mutNodeReplacementVerbose method.

See also BaseLoop

varAnd(population, toolbox, cxpb, mutpb)

bgp.gp module

Notes

This part are one copy from deap, change the random to numpy.random.

bgp.gp.Statis_func(stats=None)
bgp.gp.checks_number(func)
bgp.gp.checkss(func)
bgp.gp.cxOnePoint(ind10, ind20)

Randomly select crossover point in each individual and exchange each subtree with the point as root between each individual.

Parameters
  • ind10 – First tree participating in the crossover.

  • ind20 – Second tree participating in the crossover.

Returns

A tuple of two trees.

bgp.gp.depart(individual)

take part expression.

bgp.gp.genFull(pset, min_, max_, personal_map=False)

Generate an expression where each leaf has the same depth between min and max.

Parameters
  • pset – Primitive set from which primitives are selected.

  • min – Minimum height of the produced trees.

  • max – Maximum Height of the produced trees.

  • personal_map

Returns

A full tree with all leaves at the same depth.

bgp.gp.genGrow(pset, min_, max_, personal_map=False)

Generate an expression where each leaf might have a different depth between min and max.

Parameters
  • pset – Primitive set from which primitives are selected.

  • min – Minimum height of the produced trees.

  • max – Maximum Height of the produced trees.

  • personal_map – bool.

Returns

A grown tree with leaves at possibly different depths.

bgp.gp.genHalf(pset, min_, max_, personal_map=False)
bgp.gp.generate(pset, min_, max_, condition, personal_map=False, *kwargs)

generate expression.

Parameters
  • pset (SymbolSet) – pset

  • min (int) – Minimum height of the produced trees.

  • max (int) – Maximum Height of the produced trees.

  • condition (collections.Callable) – The condition is a function that takes two arguments, the height of the tree to build and the current depth in the tree.

  • kwargs (None) – placeholder for future

  • personal_map (bool) – premap

bgp.gp.mutDifferentReplacementVerbose(individual, pset, personal_map=False)

choice terminals_and_constants verbose Replaces a randomly chosen primitive from individual by a randomly chosen primitive with the same number of arguments from the pset attribute of the individual. decrease the probability of same terminals.

Parameters
  • individual – The normal or typed tree to be mutated.

  • pset – SymbolSet

  • personal_map – bool

Returns

A tuple of one tree.

bgp.gp.mutNodeReplacementVerbose(individual, pset, personal_map=False)

choice terminals_and_constants verbose Replaces a randomly chosen primitive from individual by a randomly chosen primitive with the same number of arguments from the pset attribute of the individual.

Parameters
  • individual – The normal or typed tree to be mutated.

  • pset – SymbolSet

  • personal_map – bool

Returns

A tuple of one tree.

bgp.gp.mutShrink(individual, pset=None)

This operator shrinks the individual by choosing randomly a branch and replacing it with one of the branch’s arguments (also randomly chosen).

Parameters
  • individual – The tree to be shrinked.

  • pset – SymbolSet.

Returns

A tuple of one tree.

bgp.gp.mutUniform(individual, expr, pset)

Randomly select a point in the tree individual, then replace the subtree at that point as a root by the expression generated using method expr().

Parameters
  • individual – The tree to be mutated.

  • expr – A function object that can generate an expression when called.

  • pset – SymbolSet

Returns

A tuple of one tree.

bgp.gp.selBest(individuals, k, fit_attr='fitness')

Select the k best individuals among the input individuals. The list returned contains references to the input individuals.

Parameters
  • individuals – A list of individuals to select from.

  • k – The number of individuals to select.

  • fit_attr – The attribute of individuals to use as selection criterion

Returns

A list containing the k best individuals.

bgp.gp.selKbestDim(pop, K_best=10, dim_type=None, fuzzy=False, fit_attr='fitness', force_number=False)

Select the individual with dim limitation.

Parameters
  • pop (SymbolTree) – A list of individuals to select from.

  • K_best (int) – The number of individuals to select.

  • dim_type (Dim) –

  • fuzzy (bool) – the dim or the dim with same base. such as m,m^2,m^3

  • fit_attr (str) – The attribute of individuals to use as selection criterion, default attr is “fitness”.

  • force_number (False) – return the number the same with K.

Returns

Return type

A list of selected individuals.

bgp.gp.selRandom(individuals, k)

Select k individuals at random from the input individuals with replacement. The list returned contains references to the input individuals.

Parameters
  • individuals – A list of individuals to select from.

  • k – The number of individuals to select.

Returns

A list of selected individuals.

This function uses the numpy.random.choice() function

bgp.gp.selTournament(individuals, k, tournsize, fit_attr='fitness')

Select the best individual among tournsize randomly chosen individuals, k times. The list returned contains references to the input individuals.

Parameters
  • individuals – A list of individuals to select from.

  • k – The number of individuals to select.

  • tournsize – The number of individuals participating in each tournament.

  • fit_attr – The attribute of individuals to use as selection criterion

Returns

A list of selected individuals.

This function uses the numpy.random.choice() function

bgp.gp.staticLimit(key, max_value)
bgp.gp.varAnd(population, toolbox, cxpb, mutpb)
bgp.gp.varAndfus(population, toolbox, cxpb, mutpb, fus, mutpb_list=1.0)
Parameters
  • population

  • toolbox

  • cxpb

  • mutpb

  • fus

  • mutpb_list (float,list,None) –

bgp.postprocess module

bgp.postprocess.acf(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, built_format_input=False)

Add coef fitting.

Try calculate predict y by sympy expression with coefficients. if except error return expr itself.

Parameters
  • expr01 (sympy.Expr) – expr for fitting.

  • x (list of np.ndarray or np.ndarray) – real data with: [x1,x2,x3,…,x_n_feature].

  • y (np.ndarray with shape (n_sample,)) – real data of target.

  • init_c (list of float or float,None) – default 1.

  • terminals (List of sympy.Symbol,None) – placeholder for xi, with the same features in expr01.

  • c_terminals (List of sympy.Symbol,None) – placeholder for ci, with the same coefficients/constants in expr01.

  • np_maps (dict,default is None) –

    for self-definition. 1. make your function with sympy.Function and arrange in in expr01. >>> x1, x2, x3, c1,c2,c3,c4 = sympy.symbols(“x1,x2,x3,c1,c2,c3,c4”) >>> Seg = sympy.Function(“Seg”) >>> expr01 = Seg(x1*x2) 2. write the numpy calculation method for this function. >>> def np_seg(x): >>> res = x >>> res[res>1]=-res[res>1] >>> return res 3. pass the np_maps parameters. >>> np_maps = {“Seg”:np_seg}

    In total, when parse the expr01, find the numpy function in sequence by: (np_maps -> numpy’s function -> system -> Error)

  • classification (bool) – classfication or not, default False.

  • built_format_input (bool) – use format_input function to check input parameters. Just used for temporary test or single case, due to format_input is repetitive.

Returns

  • pre_y – np.array or None

  • expr01 (Expr) – New expr.

bgp.postprocess.acfng(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, no_gradient_coef=- 1, no_gradient_coef_range=array([- 1, 0]), n_jobs=1, scoring='r2')

Add coefficients with no gradient coefficient.

Try calculate predict y by sympy expression with coefficients. if except error return expr itself.

Parameters
  • scoring (str) – score in sklearn.metrics

  • n_jobs (int) – parallize number

  • no_gradient_coef (int,sympy.Symbol) – coefficient in no gradient function, default the last one. Examples: no_gradient_coef=sympy.Symbol(“c2”) no_gradient_coef=0

  • no_gradient_coef_range – range of the special coef.

  • expr01 (sympy.Expr) – expr for fitting.

  • x (list of np.ndarray or np.ndarray) – real data with: [x1,x2,x3,…,x_n_feature].

  • y (np.ndarray with shape (n_sample,)) – real data of target.

  • init_c (list of float or float,None) – default 1.

  • terminals (List of sympy.Symbol,None) – placeholder for xi, with the same features in expr01.

  • c_terminals (List of sympy.Symbol,None) – placeholder for ci, with the same coefficients/constants in expr01.

  • np_maps (dict,default is None) –

    for self-definition. 1. make your function with sympy.Function and arrange in in expr01. >>> x1, x2, x3, c1,c2,c3,c4 = sympy.symbols(“x1,x2,x3,c1,c2,c3,c4”) >>> Seg = sympy.Function(“Seg”) >>> expr01 = Seg(x1*x2) 2. write the numpy calculation method for this function. >>> def np_seg(x): >>> res = x >>> res[res>1]=-res[res>1] >>> return res 3. pass the np_maps parameters. >>> np_maps = {“Seg”:np_seg}

    In total, when parse the expr01, find the numpy function in sequence by: (np_maps -> numpy’s function -> system -> Error)

  • classification (bool) – classfication or not, default False.

Returns

  • pre_y – np.array or None

  • expr01 (Expr) – New expr.

bgp.postprocess.acfs(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, built_format_input=False, scoring='r2')

Add coefficients and score.

See also add_coef_fitting (acf).

bgp.postprocess.acfsng(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, no_gradient_coef=- 1, no_gradient_coef_range=array([- 1, 0]), n_jobs=1, scoring='r2')

Add coefficients and score with no gradient coefficient.

See also add_coef_fitting (acf).

bgp.postprocess.cla(pre_y, cl=True)
bgp.postprocess.format_input(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, x_mark='x', c_mark='c')

Check and format_input for add_coef_fitting.

Parameters
  • expr01 (sympy.Expr) – expr for fitting.

  • x (list of np.ndarray or np.ndarray) – real data with: [x1,x2,x3,…,x_n_feature] or x with shape (n_sample,n_feature).

  • y (np.ndarray with shape (n_sample,)) – real data of target.

  • init_c (list of float or float.) – default 1.

  • terminals (list of sympy.Symbol) – placeholder for xi, with the same features in expr01.

  • c_terminals (list of sympy.Symbol) – placeholder for ci, with the same coefficients/constants in expr01.

  • np_maps (dict,default is None) –

    for self-definition. 1. make your function with sympy.Function and arrange in in expr01. >>> x1, x2, x3, c1,c2,c3,c4 = sympy.symbols(“x1,x2,x3,c1,c2,c3,c4”) >>> Seg = sympy.Function(“Seg”) >>> expr01 = Seg(x1*x2,c1) 2. write the numpy calculation method for this function. >>> def np_seg(x,c): >>> res = -x >>> res[res>-c]=0 >>> return res 3. pass the np_maps parameters. >>> np_maps = {“Seg”:np_seg}

    In total, when parse the expr01, find the numpy function in sequence by: (np_maps -> numpy’s function -> system -> Error)

  • x_mark (str) – mark for x

  • c_mark (str) – mark for c

Returns

format_parameters – (expr01, x, y, init_c, terminals, c_terminals, np_maps)

Return type

tuple

bgp.postprocess.top_n(loop, n=10, gen=- 1, key='value', ascending=False)

return the top result of loop. PendingDeprecation.

please use loop.top_n() directly.

bgp.preprocess module

class bgp.preprocess.MagnitudeTransformer(standard=1, tolerate=0)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Transform x, y or c to near to 1, and store the transform Magnitude.

fit(X, y=None, group=2, apply=None, keep=None)
Parameters
  • X (np.ndarray) –

  • y (np.ndarray) –

  • group (group index of x) –

  • apply (specific which index of x to transform) –

  • keep (specific which index of x to not transform) –

fit_constant(c)
fit_transform_all(X, y, **fit_params)
fit_transform_constant(c)
inverse_transform(X)
inverse_transform_constant(c)
inverse_transform_y(y)
transform(X)
transform_constant(c)
transform_y(y)

bgp.skflow module

class bgp.skflow.SymbolLearning(loop, *args, **kwargs)

Bases: sklearn.base.BaseEstimator, sklearn.base.MultiOutputMixin, sklearn.base.TransformerMixin

One simplify Guide for flow.

1. The SymbolLearning is time-costing and not suit for GridSearchCV, the cross_validate are embedded.

2. For the classification problems, please using classification =True, and set the suit classification metrics for scoring and score_pen carefully.

This code does not check and identity the certainty of data.

Parameters

<https (`Web of SymbolLearning) –

//bgp.readthedocs.io/en/latest/src/bgp.html#bgp.skflow.SymbolLearning>`_

Parameters
  • loop (str,None) –

    bgp.flow.BaseLoop

    [‘BaseLoop’, ‘MultiMutateLoop’, ‘OnePointMutateLoop’, ‘DimForceLoop’ …].

  • pop (int) – number of population.

  • gen (int) – number of generation.

  • mutate_prob (float) – probability of mutate.

  • mate_prob (float) – probability of mate(crossover).

  • initial_max (int) – max initial size of expression when first producing.

  • initial_min (None,int) – min initial size of expression when first producing.

  • max_value (int) – max size of expression.

  • hall (int,>=1) – number of HallOfFame (elite) to maintain.

  • re_hall (None or int>=2) – Notes: only valid when hall number of HallOfFame to add to next generation.

  • re_Tree (int) – number of new features to add to next generation. 0 is false to add.

  • personal_map (bool or "auto") –

    “auto” is using ‘premap’ and with auto refresh the ‘premap’ with individual.

    True is just using constant ‘premap’.

    False is just use the prob of terminals.

  • scoring (list of Callable, default is [sklearn.metrics.r2_score,]) – See Also sklearn.metrics

  • score_pen (tuple of 1, -1 or float but 0.) –

    >0 : max problem, best is positive, worse -np.inf. <0 : min problem, best is negative, worse np.inf.

    Notes: if multiply score method, the scores must be turn to same dimension in prepossessing or weight by score_pen. Because the all the selection are stand on the mean(w_i*score_i)

    Examples:

    scoring = [r2_score,]
    score_pen= [1,]
    

  • cv (sklearn.model_selection._split._BaseKFold,int) – the shuffler must be False, default=1 means no cv.

  • filter_warning (bool) – filter warning or not.

  • add_coef (bool) – add coef in expression or not.

  • inter_add:bool – add intercept constant or not.

  • inner_add (bool) – add inner coefficients or not.

  • out_add (bool) – add out coefficients or not.

  • flat_add (bool) – add flat coefficients or not.

  • n_jobs (int) – default 1, advise 6.

  • batch_size (int) – default 40, depend of machine.

  • random_state (int) – None,int.

  • cal_dim (bool) – escape the dim calculation.

  • dim_type (Dim or None or list of Dim) –

    “coef”: af(x)+b. a,b have dimension,f(x)’s dimension is not dnan.

    ”integer”: af(x)+b. f(x) is with integer dimension.

    [Dim1,Dim2]: f(x)’s dimension in list.

    Dim: f(x) ~= Dim. (see fuzzy)

    Dim: f(x) == Dim.

    None: f(x) == pset.y_dim

  • fuzzy (bool) – choose the dim with same base with dim_type, such as m,m^2,m^3.

  • stats (dict) –

    details of logbook to show.

    Map:

    values

    = {“max”: np.max, “mean”: np.mean, “min”: np.mean, “std”: np.std, “sum”: np.sum}

    keys

    = {

    “fitness”: just see fitness[0],

    ”fitness_dim_max”: max problem, see fitness with demand dim,

    ”fitness_dim_min”: min problem, see fitness with demand dim,

    ”dim_is_target”: demand dim,

    ”coef”: dim is True, coef have dim,

    ”integer”: dim is integer,

    … }

    if stats is None, default is :

    for cal_dim=True:

    stats = {“fitness_dim_max”: (“max”,), “dim_is_target”: (“sum”,)}

    for cal_dim=False:

    stats = {“fitness”: (“max”,)}

    if self-definition, the key is func to get attribute of each ind.

    Examples:

    def func(ind):
        return ind.fitness[0]
    stats = { func: ("mean",), "dim_is_target":("sum",)}
    

  • verbose (bool) – print verbose logbook or not.

  • tq (bool) – print progress bar or not.

  • store (bool or path) – bool or path.

  • stop_condition (callable) –

    stop condition on the best ind of hall, which return bool,the true means stop loop.

    Examples:

    def func(ind):
        c = ind.fitness.values[0]>=0.90
        return c
    

  • pset (SymbolSet) – the feature x and target y and others should have been added.

  • details (bool) – return expr and predict_y or not.

  • classification (bool) – classification or not.

cv_result(refit=False)

return the cv_result of best expression. Only valid when cv !=1.

Parameters

refit (bool) – re-fit the data or not. If true, use all the data on the best expression.

fit(X=None, y=None, c=None, x_group=None, x_dim=1, y_dim=1, c_dim=1, x_prob=None, c_prob=None, pset=None, power_categories=(2, 3, 0.5), categories=('Add', 'Mul', 'Sub', 'Div'), warm_start=False, new_gen=None)

Method 1. fit with x, y.

Examples:

sl = SymbolLearning()
sl..fit(x,y,...)

Method 2. fit with customized pset. If need more self-definition, use one defined SymbolSet object to pset.

Examples:

pset = SymbolSet()
pset.add_features_and_constants(...)
pset.add_operations(...)
...
sl = SymbolLearning()
sl..fit(pset=pset)
Parameters
  • X (np.ndarray) – data.

  • y (np.ndarray) –

  • c (list of float, None) – constants.

  • x_dim (1 or list of Dim) – the same size wih x.shape[1], default 1 is dless for all x.

  • y_dim (1,Dim) – dim of y.

  • c_dim (1,list of Dim) – the same size wih c.shape, default 1 is dless for all c.

  • x_prob (None,list of float) – the same size wih x.shape[1].

  • c_prob (None,list of float) – the same size wih c.

  • x_group (list of list) –

    Group of x.

    Examples:

    x_group=[[1,2],] or x_group=2

    See Also bgp.base.SymbolSet.add_features()

  • power_categories (Sized,tuple, None) – Examples:(0.5,2,3)

  • categories (tuple of str) –

    map table:

    {“Add”: sympy.Add, ‘Sub’: Sub, ‘Mul’: sympy.Mul, ‘Div’: Div} {“sin”: sympy.sin, ‘cos’: sympy.cos, ‘exp’: sympy.exp, ‘ln’: sympy.ln, {‘Abs’: sympy.Abs, “Neg”: functools.partial(sympy.Mul, -1.0), “Rec”: functools.partial(sympy.Pow, e=-1.0)}

    Others:

    ”Rem”: f(x)=1-x,if x true

    ”Self”: f(x)=x,if x true

pset:SymbolSet

See Also SymbolSet.

warm_start: bool

warm start or not.

Note:

If you offer pset in advance by user, please check carefully the feature numbers,especially when use re_Tree. because the new features are add.

Reference:

CalculatePrecisionSet.update_with_X_y.

new_gen: None,int

warm_start generation.

predict(X)

predict y from X.

Parameters

X (np.ndarray) – data.

score(X, y, scoring)
Parameters
  • X (np.ndarray) – data.

  • y (np.ndarray) – true y.

  • scoring (str) – scoring method,default is “r2”

Module contents