bgp package¶
Subpackages¶
Submodules¶
bgp.base module¶
Base objects for symbolic regression.
- Contains:
Class:
SymbolSet
Class:
CalculatePrecisionSet
Class:
SymbolTree
others
- class bgp.base.CalculatePrecisionSet(pset, scoring=None, score_pen=(1,), filter_warning=True, cv=1, cal_dim=True, dim_type=None, fuzzy=False, add_coef=True, inter_add=True, inner_add=False, vector_add=False, out_add=False, flat_add=False, n_jobs=1, batch_size=20, tq=True, details=False, classification=False, score_object='y', batch_para=False)¶
Bases:
bgp.base.SymbolSet
Add score method to SymbolSet. The object can get from a worked
SymbolSet
object.- Parameters
pset (SymbolSet) – SymbolSet.
scoring (Callbale, default is sklearn.metrics.r2_score.) – See Also sklearn.metrics.
filter_warning (bool) – bool.
score_pen (tuple of float) –
1 : best is positive, worse -np.inf.
-1 : best is negative, worse np.inf.
0 : best is positive , worst is 0.
cal_dim (bool) – calculate dim or not, if not return dless.
add_coef (bool) – bool.
inter_add (bool) – bool.
inner_add (bool) – bool.
fuzzy (bool) – fuzzy or not.
dim_type (object) – if None, use the y_dim.
n_jobs (int) – running core.
batch_size (int) – batch size, advice batch_size*n_jobs = inds.
tq (bool) – bool.
cv (sklearn.model_selection._split._BaseKFold, int) –
the shuffler must be False!
use cv spilt for score, return the mean_test_score.
use cv spilt for predict, return the cv_predict_y.(not be used)
- Notes:
if cv and refit, all the data is refit to determination the coefficients.
Thus the expression is not compact with the this scores, when re-calculated by this expression
details (bool) – return the expr and predict y cor not.
classification (bool) – classification or not.
score_object – score by y or delta y (for implicit function).
- calculate_cv_score(ind)¶
just used for calculating single one or check.
- calculate_detail(ind)¶
just used for calculated final best one result for showing.
calculate the best expression.
- Parameters
ind (SymbolTree) – best expression.
- calculate_expr(expr)¶
just used for calculated final result for showing.
- Parameters
ind (sympy.Expr) –
- calculate_score(ind)¶
just used for calculating single one or check with cv=1.
- Parameters
ind (SymbolTree) –
- calculate_simple(ind)¶
just used for re_Tree, and showing.
calculate the best expression.
- Parameters
ind (SymbolTree) –
- compile_context(ind)¶
transform SymbolTree to sympy.Expr.
- hasher¶
alias of
str
- parallelize_calculate_expr(exprs)¶
just used for final results, calculate exprs.
- parallelize_score(inds)¶
The main score in each generation of GP!
- Parameters
inds (list of SymbolTree) – list of expressions
- parallelize_try_add_coef_times(exprs, grid_x=None, resample_number=500)¶
to be continued
- try_add_coef_times(expr, grid_x=None)¶
just used for best result, try add coefficient to expr.
- update(pset)¶
updata self by input pset.
- update_with_X_y(X, y)¶
replace x, y data.
- class bgp.base.ShortStr(st)¶
Bases:
object
short version of tree, just left name to simplify the store and transmit.
- class bgp.base.SymbolPrimitive(name, arity)¶
Bases:
object
General operator type, do not use directly, but use SymbolPrimitiveDetail.
- Parameters
name (str) – function name.
arity (int) – input parameters numbers of function. such as
+
with 2,ln
with 1.
- format_repr(*args)¶
- format_str(*args)¶
- class bgp.base.SymbolPrimitiveDetail(name, arity, func, prob, np_func=None, dim_func=None, sym_func=None)¶
Bases:
bgp.base.SymbolPrimitive
General operator type with more details.
- Parameters
func (Callable) –
function. better using sympy.Function Type.
- For Maintainer:
If self function and can not be simplified to sympy.Function or elementary function, the function for function.np_map() and dim.dim_map() should be defined.
name (str) – function name.
arity (int) – function input numbers.
prob (float) – default 1.
- capsule()¶
return short one.
- class bgp.base.SymbolSet(name='PSet')¶
Bases:
object
Definite the preparation set of operations, features, and fixed constants.
- Parameters
name (str) – name.
- add_accumulative_operation(categories=None, categories_prob='balance', self_categories=None, special_prob=None)¶
add accumulative operation.
- Parameters
categories (tuple of str) – categories=(“Self”, “MAdd”, “MSub”, “MMul”, “MDiv”)
categories_prob (None, "balance" or float.) –
probility of categories in (0, 1], except (“Self”, “MAdd”, “MSub”, “MMul”, “MDiv”),
”balance” is 1/n_categories.
”MSub”, “MMul”, “MDiv” are only worked on the size of group is 2, else work like “Self”.
- Notes:
the (“Self”, “MAdd”, “MSub”, “MMul”, “MDiv”) are set as 1 to be a standard.
self_categories (list of dict, None) –
the dict can be generate from newfuncD or defination self.
the function at least containing:
{“func”: func, “name”: name, “np_func”: npf, “dim_func”: dimf, “sym_func”: gsymf}
1.func:sympy.Function(name) object, which need add attributes: is_jump, keep.
2.name:name
3.np_func:numpy function
4.dim_func:dimension function
5.sym_func:NewArray function. (unpack the group, used just for shown)
See Also bgp.newfunc.newfuncV
special_prob (None or dict) –
- Examples:
{“MAdd”:0.5, “Self”:0.5}
- add_constants(c, c_dim=1, c_prob=None)¶
Add features with dimension and probability.
- Parameters
c_dim (1, list of Dim) – the same size wih c.
c (float, list) – list of float.
c_prob (None, float, list of float) – the same size with c.
- add_features(X, y, x_dim=1, y_dim=1, x_prob=None, x_group=None, feature_name=None)¶
Add features with dimension and probability.
- Parameters
X (np.ndarray) – 2D data.
y (np.ndarray) – 1D data.
feature_name (None, list of str) – the same size wih x.shape[1].
x_dim (1 or list of Dim) – the same size wih x.shape[1], default 1 is dless for all x.
y_dim (1, Dim) – dim of y.
x_prob (None, list of float) – the same size wih x.shape[1].
x_group (None or list of list, int) – features group.
- add_features_and_constants(X, y, c=None, x_dim=1, y_dim=1, c_dim=1, x_prob=None, c_prob=None, x_group=None, feature_name=None)¶
combination of add_constant and add_features.
- add_operations(power_categories=None, categories=None, self_categories=None, power_categories_prob='balance', categories_prob='balance', special_prob=None)¶
Add operations with probability.
- Parameters
power_categories (Sized, tuple, None) –
- Examples:
(0.5, 2, 3)
categories (tuple of str) –
- map table:
{‘Add’: sympy.Add, ‘Sub’: Sub, ‘Mul’: sympy.Mul, ‘Div’: Div} {“sin”: sympy.sin, ‘cos’: sympy.cos, ‘exp’: sympy.exp, ‘ln’: sympy.ln, } {‘Abs’: sympy.Abs, “Neg”: functools.partial(sympy.Mul, -1.0), } “Rec”: functools.partial(sympy.Pow, e=-1.0)}
Others:
”Rem”: f(x)=1-x, if x true
”Self”: f(x)=x, if x true
categories_prob ("balance", float) – probability of categories, except (+, -, /), in (0, 1]. “balance” is 1/n_categories. The (+, -, /) are set as 1 to be a standard.
special_prob (None, dict) –
prob for special name.
Examples:{“Mul”:0.6, “Add”:0.4, “exp”:0.1}
power_categories_prob ("balance", float) – float in (0, 1]. probability of power categories, “balance” is 1/power_categories_prob.
self_categories (list of dict, None) –
the dict can be generate from newfuncV or definition self.
the function at least containing: {“func”: func, “name”: name, “arity”:2, “np_func”: npf, “dim_func”: dimf, “sym_func”: gsymf}
1.func:sympy.Function(name) object
2.name:name
3.arity:int, the number of parameter
4.np_func:numpy function
5.dim_func:dimension function
6.sym_func:NewArray function. (unpack the group, used just for shown)
See Also bgp.newfunc.newfuncV
- add_tree_to_features(Tree, prob=0.3)¶
Add the individual as a new feature to initial features. not sure add success, because the value and name should be check and different to exist.
- Parameters
Tree (SymbolTree) – individual or expression
prob (int) – probability of this individual
- bonding_personal_maps(pers)¶
Personal preference add to permap more control can be found by pset.premap
Bond the points with ratio. the others would be penalty.
For example set the [1, 2, 0.9], the others bond such as (1, 2), (1, 3), (1, 4),…,(2, 3), (2, 4)…would be with small prob.
- Parameters
pers (list of list) –
- Examples:
[[index1, index2, prob][…]] the prob is [0, 1), 1 means the force binding.
- property data_x¶
- property dim_ter_con_list¶
- property dispose¶
accumulate operators
- property free_symbol¶
- static get_values(v, mean=False)¶
get list of dict values
- property init_free_symbol¶
- property primitives¶
operators
- property prob_dispose_list¶
- property prob_pri_list¶
- property prob_ter_con_list¶
- register(primitives_dict='all', dispose_dict='all', ter_con_dict='all')¶
Register and capsule for simplify.
- Parameters
primitives_dict (None, str, dict) –
dispose_dict (None, str, dict) –
ter_con_dict (None, str, dict) –
- replace(X, y=None, tree_X=None)¶
- set_personal_maps(pers)¶
personal preference add to permap. more control can be found by pset.premap.***
Just set couples of points and don’t chang others.
- Parameters
pers (list of list) –
- Examples:
[[index1, index2, prob]], the prob in [0, 1).
- property terminalRatio¶
Return the ratio of the number of terminals on the number of all kind of primitives.
- property terminals_and_constants¶
- property terminals_and_constants_repr¶
- property types¶
- class bgp.base.SymbolTerminal(name, init_name=None)¶
Bases:
object
General feature type, do not use directly.
The name for show (str) and calculation (repr) are set to different string for avoiding repeated calculations.
- Parameters
name (str) – Represent name. Default “xi”.
init_name (str) –
Just for show, rather than calculate.
- Examples:
init_name=[x1, x2] , if is compact features, need[].
init_name=(x1*x4-x3), if is expr, need ().
- format_repr()¶
representing name
- format_str()¶
represented name
- class bgp.base.SymbolTerminalDetail(values, name, dim=None, prob=None, init_sym=None, init_name=None)¶
Bases:
bgp.base.SymbolTerminal
General feature type.
The name for show (str) and calculation (repr) are set to different string for avoiding repeated calculations.
- Parameters
values (None, number or np.ndarray) – xi value, the shape can be (n, ) or (n_x, n), n is number of samples, n_x is numbers of feature.
name (str) – Represent name. Default “xi”
dim (bgp.dim.Dim or None) – None.
prob (float or None) – None.
init_sym (list, sympy.Expr) – list.
init_name (str or None) –
Just for show, rather than calculate.
- Examples:
init_name=”[x1, x2]” , if is compact features, need[].
init_name=”(x1*x4-x3)”, if is expr, need ().
- capsule()¶
- class bgp.base.SymbolTree(*arg, **kwargs)¶
Bases:
bgp.base._ExprTree
Individual Tree, each tree is one expression. The SymbolTree is only generated by method:
genGrow
andgenFull
.- property capsule¶
return the short one
- compress()¶
drop unnecessary attributes
- depart()¶
take part the expression
- classmethod genFull(pset, min_, max_, per=False)¶
details in genGrow function
- classmethod genGrow(pset, min_, max_, per=False)¶
details in genGrow function
- ppprint(pset, feature_name=False)¶
get a user friendly version
- reset()¶
keep these attribute refreshed
- ter_site()¶
site for feature and constants node
- terminals()¶
Return terminals that occur in the expression tree.
- to_expr(pset)¶
transformed to sympy.Expr
bgp.flow module¶
Some definition loop for genetic algorithm. All the loop is with same run method.
Contains:
-Class: BaseLoop
one node mate and one tree mutate.
-Class: MultiMutateLoop
one node mate and (one tree mutate, one node Replacement mutate, shrink mutate, difference mutate).
-Class: OnePointMutateLoop
one node Replacement mutate: (keep height of tree)
-Class: DimForceLoop
Select with dimension : (keep dimension of tree)
- class bgp.flow.BaseLoop(pset, pop=500, gen=20, mutate_prob=0.5, mate_prob=0.8, hall=1, re_hall=1, re_Tree=None, initial_min=None, initial_max=3, max_value=5, scoring=(<function r2_score>, ), score_pen=(1, ), filter_warning=True, cv=1, add_coef=True, inter_add=True, inner_add=False, vector_add=False, out_add=False, flat_add=False, cal_dim=False, dim_type=None, fuzzy=False, n_jobs=1, batch_size=40, random_state=None, stats=None, verbose=True, migrate_prob=0, tq=True, store=False, personal_map=False, stop_condition=None, details=False, classification=False, score_object='y', sub_mu_max=1, limit_type='h_bgp', batch_para=False)¶
Bases:
mgetool.packbox.Toolbox
Base loop for BGP.
Examples:
if __name__ == "__main__": pset = SymbolSet() stop = lambda ind: ind.fitness.values[0] >= 0.880963 bl = BaseLoop(pset=pset, gen=10, pop=1000, hall=1, batch_size=40, re_hall=3, n_jobs=12, mate_prob=0.9, max_value=5, initial_min=1, initial_max=2, mutate_prob=0.8, tq=True, dim_type="coef", stop_condition=stop, re_Tree=0, store=False, random_state=1, verbose=True, stats={"fitness_dim_max": ["max"], "dim_is_target": ["sum"]}, add_coef=True, inter_add=True, inner_add=False, cal_dim=True, vector_add=False, personal_map=False) bl.run()
- Parameters
pset (SymbolSet) – the feature x and target y and others should have been added.
pop (int) – number of population.
gen (int) – number of generation.
mutate_prob (float) – probability of mutate.
mate_prob (float) – probability of mate(crossover).
initial_max (int) – max initial size of expression when first producing.
initial_min (None,int) – min initial size of expression when first producing.
max_value (int) – max size of expression.
limit_type ("height" or "length",","h_bgp") – limitation type for max_value, but don’t affect initial_max, initial_min.
hall (int,>=1) – number of HallOfFame (elite) to maintain.
re_hall (None or int>=2) – Notes: only valid when hall number of HallOfFame to add to next generation.
re_Tree (int) – number of new features to add to next generation. 0 is false to add.
personal_map (bool or "auto") –
“auto” is using ‘premap’ and with auto refresh the ‘premap’ with individual.
True is just using constant ‘premap’.
False is just use the prob of terminals.
scoring (list of Callable, default is [sklearn.metrics.r2_score,]) – See Also
sklearn.metrics
score_pen (tuple of 1, -1 or float but 0.) –
>0 : max problem, best is positive, worse -np.inf. <0 : min problem, best is negative, worse np.inf.
- Notes:
if multiply score method, the scores must be turn to same dimension in prepossessing or weight by score_pen. Because the all the selection are stand on the mean(w_i*score_i)
Examples:
scoring = [r2_score,] score_pen= [1,]
cv (sklearn.model_selection._split._BaseKFold,int) – the shuffler must be False, default=1 means no cv.
filter_warning (bool) – filter warning or not.
add_coef (bool) – add coef in expression or not.
inter_add:bool – add intercept constant or not.
inner_add (bool) – add inner coefficients or not.
out_add (bool) – add out coefficients or not.
flat_add (bool) – add flat coefficients or not.
n_jobs (int) – default 1, advise 6.
batch_size (int) – default 40, depend of machine.
random_state (int) – None,int.
cal_dim (bool) – escape the dim calculation.
dim_type (Dim or None or list of Dim) –
“coef”: af(x)+b. a,b have dimension,f(x)’s dimension is not dnan.
”integer”: af(x)+b. f(x) is with integer dimension.
[Dim1,Dim2]: f(x)’s dimension in list.
Dim: f(x) ~= Dim. (see fuzzy)
Dim: f(x) == Dim.
None: f(x) == pset.y_dim
fuzzy (bool) – choose the dim with same base with dim_type, such as m,m^2,m^3.
stats (dict) –
details of logbook to show.
Map:
- values
= {“max”: np.max, “mean”: np.mean, “min”: np.mean, “std”: np.std, “sum”: np.sum}
- keys
= {
“fitness”: just see fitness[0],
”fitness_dim_max”: max problem, see fitness with demand dim,
”fitness_dim_min”: min problem, see fitness with demand dim,
”dim_is_target”: demand dim,
”coef”: dim is True, coef have dim,
”integer”: dim is integer,
… }
if stats is None, default is:
- for cal_dim=True:
stats = {“fitness_dim_max”: (“max”,), “dim_is_target”: (“sum”,)}
- for cal_dim=False
stats = {“fitness”: (“max”,)}
if self-definition, the key is func to get attribute of each ind.
Examples:
def func(ind): return ind.fitness[0] stats = {func: ("mean",), "dim_is_target": ("sum",)}
verbose (bool) – print verbose logbook or not.
tq (bool) – print progress bar or not.
store (bool or path) – bool or path.
stop_condition (callable) –
stop condition on the best ind of hall, which return bool,the true means stop loop.
Examples:
def func(ind): c = ind.fitness.values[0]>=0.90 return c
details (bool) – return expr and predict_y or not.
classification (bool) – classification or not.
score_object – score by y or delta y (for implicit function).
- check_height_length(pop, site='')¶
- maintain_halls(population)¶
maintain the best p expression
- re_add()¶
add the expression as a feature
- re_fresh_by_name(*arr)¶
- run(warm_start=False, new_gen=None)¶
- Parameters
warm_start (bool) – warm_start from last result.
new_gen – new generations for warm_startm, default is the initial generations.
- to_csv(data_all)¶
store to csv
- top_n(n=10, gen=- 1, key='value', filter_dim=True, ascending=False)¶
Return the best n results.
Note
Only valid in
store=True
.- Parameters
n (int) –
gen – the generation, default is -1.
key (str) – sort keys, default is “values”.
filter_dim – filter no-dim expressions or not.
ascending – reverse.
- Returns
top n results.
pd.DataFrame
- varAnd(*arg, **kwargs)¶
- class bgp.flow.DimForceLoop(*args, **kwargs)¶
Bases:
bgp.flow.MultiMutateLoop
Force select the individual with target dim for next generation
See also BaseLoop
- class bgp.flow.MultiMutateLoop(*args, **kwargs)¶
Bases:
bgp.flow.BaseLoop
multiply mutate method.
See also BaseLoop
- varAnd(population, toolbox, cxpb, mutpb)¶
- class bgp.flow.OnePointMutateLoop(*args, **kwargs)¶
Bases:
bgp.flow.BaseLoop
limitation height of population, just use mutNodeReplacementVerbose method.
See also BaseLoop
- varAnd(population, toolbox, cxpb, mutpb)¶
bgp.gp module¶
Notes
This part are one copy from deap, change the random to numpy.random.
- bgp.gp.Statis_func(stats=None)¶
- bgp.gp.checks_number(func)¶
- bgp.gp.checkss(func)¶
- bgp.gp.cxOnePoint(ind10, ind20)¶
Randomly select crossover point in each individual and exchange each subtree with the point as root between each individual.
- Parameters
ind10 – First tree participating in the crossover.
ind20 – Second tree participating in the crossover.
- Returns
A tuple of two trees.
- bgp.gp.depart(individual)¶
take part expression.
- bgp.gp.genFull(pset, min_, max_, personal_map=False)¶
Generate an expression where each leaf has the same depth between min and max.
- Parameters
pset – Primitive set from which primitives are selected.
min – Minimum height of the produced trees.
max – Maximum Height of the produced trees.
personal_map –
- Returns
A full tree with all leaves at the same depth.
- bgp.gp.genGrow(pset, min_, max_, personal_map=False)¶
Generate an expression where each leaf might have a different depth between min and max.
- Parameters
pset – Primitive set from which primitives are selected.
min – Minimum height of the produced trees.
max – Maximum Height of the produced trees.
personal_map – bool.
- Returns
A grown tree with leaves at possibly different depths.
- bgp.gp.genHalf(pset, min_, max_, personal_map=False)¶
- bgp.gp.generate(pset, min_, max_, condition, personal_map=False, *kwargs)¶
generate expression.
- Parameters
pset (SymbolSet) – pset
min (int) – Minimum height of the produced trees.
max (int) – Maximum Height of the produced trees.
condition (collections.Callable) – The condition is a function that takes two arguments, the height of the tree to build and the current depth in the tree.
kwargs (None) – placeholder for future
personal_map (bool) – premap
- bgp.gp.mutDifferentReplacementVerbose(individual, pset, personal_map=False)¶
choice terminals_and_constants verbose Replaces a randomly chosen primitive from individual by a randomly chosen primitive with the same number of arguments from the
pset
attribute of the individual. decrease the probability of same terminals.- Parameters
individual – The normal or typed tree to be mutated.
pset – SymbolSet
personal_map – bool
- Returns
A tuple of one tree.
- bgp.gp.mutNodeReplacementVerbose(individual, pset, personal_map=False)¶
choice terminals_and_constants verbose Replaces a randomly chosen primitive from individual by a randomly chosen primitive with the same number of arguments from the
pset
attribute of the individual.- Parameters
individual – The normal or typed tree to be mutated.
pset – SymbolSet
personal_map – bool
- Returns
A tuple of one tree.
- bgp.gp.mutShrink(individual, pset=None)¶
This operator shrinks the individual by choosing randomly a branch and replacing it with one of the branch’s arguments (also randomly chosen).
- Parameters
individual – The tree to be shrinked.
pset – SymbolSet.
- Returns
A tuple of one tree.
- bgp.gp.mutUniform(individual, expr, pset)¶
Randomly select a point in the tree individual, then replace the subtree at that point as a root by the expression generated using method
expr()
.- Parameters
individual – The tree to be mutated.
expr – A function object that can generate an expression when called.
pset – SymbolSet
- Returns
A tuple of one tree.
- bgp.gp.selBest(individuals, k, fit_attr='fitness')¶
Select the k best individuals among the input individuals. The list returned contains references to the input individuals.
- Parameters
individuals – A list of individuals to select from.
k – The number of individuals to select.
fit_attr – The attribute of individuals to use as selection criterion
- Returns
A list containing the k best individuals.
- bgp.gp.selKbestDim(pop, K_best=10, dim_type=None, fuzzy=False, fit_attr='fitness', force_number=False)¶
Select the individual with dim limitation.
- Parameters
pop (SymbolTree) – A list of individuals to select from.
K_best (int) – The number of individuals to select.
dim_type (Dim) –
fuzzy (bool) – the dim or the dim with same base. such as m,m^2,m^3
fit_attr (str) – The attribute of individuals to use as selection criterion, default attr is “fitness”.
force_number (False) – return the number the same with K.
- Returns
- Return type
A list of selected individuals.
- bgp.gp.selRandom(individuals, k)¶
Select k individuals at random from the input individuals with replacement. The list returned contains references to the input individuals.
- Parameters
individuals – A list of individuals to select from.
k – The number of individuals to select.
- Returns
A list of selected individuals.
This function uses the
numpy.random.choice()
function
- bgp.gp.selTournament(individuals, k, tournsize, fit_attr='fitness')¶
Select the best individual among tournsize randomly chosen individuals, k times. The list returned contains references to the input individuals.
- Parameters
individuals – A list of individuals to select from.
k – The number of individuals to select.
tournsize – The number of individuals participating in each tournament.
fit_attr – The attribute of individuals to use as selection criterion
- Returns
A list of selected individuals.
This function uses the
numpy.random.choice()
function
- bgp.gp.staticLimit(key, max_value)¶
- bgp.gp.varAnd(population, toolbox, cxpb, mutpb)¶
- bgp.gp.varAndfus(population, toolbox, cxpb, mutpb, fus, mutpb_list=1.0)¶
- Parameters
population –
toolbox –
cxpb –
mutpb –
fus –
mutpb_list (float,list,None) –
bgp.postprocess module¶
- bgp.postprocess.acf(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, built_format_input=False)¶
Add coef fitting.
Try calculate predict y by sympy expression with coefficients. if except error return expr itself.
- Parameters
expr01 (sympy.Expr) – expr for fitting.
x (list of np.ndarray or np.ndarray) – real data with: [x1,x2,x3,…,x_n_feature].
y (np.ndarray with shape (n_sample,)) – real data of target.
init_c (list of float or float,None) – default 1.
terminals (List of sympy.Symbol,None) – placeholder for xi, with the same features in expr01.
c_terminals (List of sympy.Symbol,None) – placeholder for ci, with the same coefficients/constants in expr01.
np_maps (dict,default is None) –
for self-definition. 1. make your function with sympy.Function and arrange in in expr01. >>> x1, x2, x3, c1,c2,c3,c4 = sympy.symbols(“x1,x2,x3,c1,c2,c3,c4”) >>> Seg = sympy.Function(“Seg”) >>> expr01 = Seg(x1*x2) 2. write the numpy calculation method for this function. >>> def np_seg(x): >>> res = x >>> res[res>1]=-res[res>1] >>> return res 3. pass the np_maps parameters. >>> np_maps = {“Seg”:np_seg}
In total, when parse the expr01, find the numpy function in sequence by: (np_maps -> numpy’s function -> system -> Error)
classification (bool) – classfication or not, default False.
built_format_input (bool) – use format_input function to check input parameters. Just used for temporary test or single case, due to format_input is repetitive.
- Returns
pre_y – np.array or None
expr01 (Expr) – New expr.
- bgp.postprocess.acfng(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, no_gradient_coef=- 1, no_gradient_coef_range=array([- 1, 0]), n_jobs=1, scoring='r2')¶
Add coefficients with no gradient coefficient.
Try calculate predict y by sympy expression with coefficients. if except error return expr itself.
- Parameters
scoring (str) – score in sklearn.metrics
n_jobs (int) – parallize number
no_gradient_coef (int,sympy.Symbol) – coefficient in no gradient function, default the last one. Examples: no_gradient_coef=sympy.Symbol(“c2”) no_gradient_coef=0
no_gradient_coef_range – range of the special coef.
expr01 (sympy.Expr) – expr for fitting.
x (list of np.ndarray or np.ndarray) – real data with: [x1,x2,x3,…,x_n_feature].
y (np.ndarray with shape (n_sample,)) – real data of target.
init_c (list of float or float,None) – default 1.
terminals (List of sympy.Symbol,None) – placeholder for xi, with the same features in expr01.
c_terminals (List of sympy.Symbol,None) – placeholder for ci, with the same coefficients/constants in expr01.
np_maps (dict,default is None) –
for self-definition. 1. make your function with sympy.Function and arrange in in expr01. >>> x1, x2, x3, c1,c2,c3,c4 = sympy.symbols(“x1,x2,x3,c1,c2,c3,c4”) >>> Seg = sympy.Function(“Seg”) >>> expr01 = Seg(x1*x2) 2. write the numpy calculation method for this function. >>> def np_seg(x): >>> res = x >>> res[res>1]=-res[res>1] >>> return res 3. pass the np_maps parameters. >>> np_maps = {“Seg”:np_seg}
In total, when parse the expr01, find the numpy function in sequence by: (np_maps -> numpy’s function -> system -> Error)
classification (bool) – classfication or not, default False.
- Returns
pre_y – np.array or None
expr01 (Expr) – New expr.
- bgp.postprocess.acfs(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, built_format_input=False, scoring='r2')¶
Add coefficients and score.
See also add_coef_fitting (acf).
- bgp.postprocess.acfsng(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, classification=False, no_gradient_coef=- 1, no_gradient_coef_range=array([- 1, 0]), n_jobs=1, scoring='r2')¶
Add coefficients and score with no gradient coefficient.
See also add_coef_fitting (acf).
- bgp.postprocess.cla(pre_y, cl=True)¶
- bgp.postprocess.format_input(expr01, x, y, init_c=None, terminals=None, c_terminals=None, np_maps=None, x_mark='x', c_mark='c')¶
Check and format_input for add_coef_fitting.
- Parameters
expr01 (sympy.Expr) – expr for fitting.
x (list of np.ndarray or np.ndarray) – real data with: [x1,x2,x3,…,x_n_feature] or x with shape (n_sample,n_feature).
y (np.ndarray with shape (n_sample,)) – real data of target.
init_c (list of float or float.) – default 1.
terminals (list of sympy.Symbol) – placeholder for xi, with the same features in expr01.
c_terminals (list of sympy.Symbol) – placeholder for ci, with the same coefficients/constants in expr01.
np_maps (dict,default is None) –
for self-definition. 1. make your function with sympy.Function and arrange in in expr01. >>> x1, x2, x3, c1,c2,c3,c4 = sympy.symbols(“x1,x2,x3,c1,c2,c3,c4”) >>> Seg = sympy.Function(“Seg”) >>> expr01 = Seg(x1*x2,c1) 2. write the numpy calculation method for this function. >>> def np_seg(x,c): >>> res = -x >>> res[res>-c]=0 >>> return res 3. pass the np_maps parameters. >>> np_maps = {“Seg”:np_seg}
In total, when parse the expr01, find the numpy function in sequence by: (np_maps -> numpy’s function -> system -> Error)
x_mark (str) – mark for x
c_mark (str) – mark for c
- Returns
format_parameters – (expr01, x, y, init_c, terminals, c_terminals, np_maps)
- Return type
tuple
- bgp.postprocess.top_n(loop, n=10, gen=- 1, key='value', ascending=False)¶
return the top result of loop. PendingDeprecation.
please use loop.top_n() directly.
bgp.preprocess module¶
- class bgp.preprocess.MagnitudeTransformer(standard=1, tolerate=0)¶
Bases:
sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
Transform x, y or c to near to 1, and store the transform Magnitude.
- fit(X, y=None, group=2, apply=None, keep=None)¶
- Parameters
X (np.ndarray) –
y (np.ndarray) –
group (group index of x) –
apply (specific which index of x to transform) –
keep (specific which index of x to not transform) –
- fit_constant(c)¶
- fit_transform_all(X, y, **fit_params)¶
- fit_transform_constant(c)¶
- inverse_transform(X)¶
- inverse_transform_constant(c)¶
- inverse_transform_y(y)¶
- transform(X)¶
- transform_constant(c)¶
- transform_y(y)¶
bgp.skflow module¶
- class bgp.skflow.SymbolLearning(loop, *args, **kwargs)¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.MultiOutputMixin
,sklearn.base.TransformerMixin
One simplify Guide for flow.
1. The SymbolLearning is time-costing and not suit for
GridSearchCV
, the cross_validate are embedded.2. For the classification problems, please using
classification
=True, and set the suit classification metrics forscoring
andscore_pen
carefully.This code does not check and identity the certainty of data.
- Parameters
<https (`Web of SymbolLearning) –
//bgp.readthedocs.io/en/latest/src/bgp.html#bgp.skflow.SymbolLearning>`_
See also
- Parameters
loop (str,None) –
bgp.flow.BaseLoop
[‘BaseLoop’, ‘MultiMutateLoop’, ‘OnePointMutateLoop’, ‘DimForceLoop’ …].
pop (int) – number of population.
gen (int) – number of generation.
mutate_prob (float) – probability of mutate.
mate_prob (float) – probability of mate(crossover).
initial_max (int) – max initial size of expression when first producing.
initial_min (None,int) – min initial size of expression when first producing.
max_value (int) – max size of expression.
hall (int,>=1) – number of HallOfFame (elite) to maintain.
re_hall (None or int>=2) – Notes: only valid when hall number of HallOfFame to add to next generation.
re_Tree (int) – number of new features to add to next generation. 0 is false to add.
personal_map (bool or "auto") –
“auto” is using ‘premap’ and with auto refresh the ‘premap’ with individual.
True is just using constant ‘premap’.
False is just use the prob of terminals.
scoring (list of Callable, default is [sklearn.metrics.r2_score,]) – See Also
sklearn.metrics
score_pen (tuple of 1, -1 or float but 0.) –
>0 : max problem, best is positive, worse -np.inf. <0 : min problem, best is negative, worse np.inf.
Notes: if multiply score method, the scores must be turn to same dimension in prepossessing or weight by score_pen. Because the all the selection are stand on the mean(w_i*score_i)
Examples:
scoring = [r2_score,] score_pen= [1,]
cv (sklearn.model_selection._split._BaseKFold,int) – the shuffler must be False, default=1 means no cv.
filter_warning (bool) – filter warning or not.
add_coef (bool) – add coef in expression or not.
inter_add:bool – add intercept constant or not.
inner_add (bool) – add inner coefficients or not.
out_add (bool) – add out coefficients or not.
flat_add (bool) – add flat coefficients or not.
n_jobs (int) – default 1, advise 6.
batch_size (int) – default 40, depend of machine.
random_state (int) – None,int.
cal_dim (bool) – escape the dim calculation.
dim_type (Dim or None or list of Dim) –
“coef”: af(x)+b. a,b have dimension,f(x)’s dimension is not dnan.
”integer”: af(x)+b. f(x) is with integer dimension.
[Dim1,Dim2]: f(x)’s dimension in list.
Dim: f(x) ~= Dim. (see fuzzy)
Dim: f(x) == Dim.
None: f(x) == pset.y_dim
fuzzy (bool) – choose the dim with same base with dim_type, such as m,m^2,m^3.
stats (dict) –
details of logbook to show.
Map:
- values
= {“max”: np.max, “mean”: np.mean, “min”: np.mean, “std”: np.std, “sum”: np.sum}
- keys
= {
“fitness”: just see fitness[0],
”fitness_dim_max”: max problem, see fitness with demand dim,
”fitness_dim_min”: min problem, see fitness with demand dim,
”dim_is_target”: demand dim,
”coef”: dim is True, coef have dim,
”integer”: dim is integer,
… }
if stats is None, default is :
- for cal_dim=True:
stats = {“fitness_dim_max”: (“max”,), “dim_is_target”: (“sum”,)}
- for cal_dim=False:
stats = {“fitness”: (“max”,)}
if self-definition, the key is func to get attribute of each ind.
Examples:
def func(ind): return ind.fitness[0] stats = { func: ("mean",), "dim_is_target":("sum",)}
verbose (bool) – print verbose logbook or not.
tq (bool) – print progress bar or not.
store (bool or path) – bool or path.
stop_condition (callable) –
stop condition on the best ind of hall, which return bool,the true means stop loop.
Examples:
def func(ind): c = ind.fitness.values[0]>=0.90 return c
pset (SymbolSet) – the feature x and target y and others should have been added.
details (bool) – return expr and predict_y or not.
classification (bool) – classification or not.
- cv_result(refit=False)¶
return the cv_result of best expression. Only valid when
cv
!=1.- Parameters
refit (bool) – re-fit the data or not. If true, use all the data on the best expression.
- fit(X=None, y=None, c=None, x_group=None, x_dim=1, y_dim=1, c_dim=1, x_prob=None, c_prob=None, pset=None, power_categories=(2, 3, 0.5), categories=('Add', 'Mul', 'Sub', 'Div'), warm_start=False, new_gen=None)¶
Method 1. fit with x, y.
Examples:
sl = SymbolLearning() sl..fit(x,y,...)
Method 2. fit with customized pset. If need more self-definition, use one defined SymbolSet object to
pset
.Examples:
pset = SymbolSet() pset.add_features_and_constants(...) pset.add_operations(...) ... sl = SymbolLearning() sl..fit(pset=pset)
- Parameters
X (np.ndarray) – data.
y (np.ndarray) –
c (list of float, None) – constants.
x_dim (1 or list of Dim) – the same size wih x.shape[1], default 1 is dless for all x.
y_dim (1,Dim) – dim of y.
c_dim (1,list of Dim) – the same size wih c.shape, default 1 is dless for all c.
x_prob (None,list of float) – the same size wih x.shape[1].
c_prob (None,list of float) – the same size wih c.
x_group (list of list) –
Group of x.
Examples:
x_group=[[1,2],] or x_group=2
See Also
bgp.base.SymbolSet.add_features()
power_categories (Sized,tuple, None) – Examples:(0.5,2,3)
categories (tuple of str) –
- map table:
{“Add”: sympy.Add, ‘Sub’: Sub, ‘Mul’: sympy.Mul, ‘Div’: Div} {“sin”: sympy.sin, ‘cos’: sympy.cos, ‘exp’: sympy.exp, ‘ln’: sympy.ln, {‘Abs’: sympy.Abs, “Neg”: functools.partial(sympy.Mul, -1.0), “Rec”: functools.partial(sympy.Pow, e=-1.0)}
Others:
”Rem”: f(x)=1-x,if x true
”Self”: f(x)=x,if x true
- pset:SymbolSet
See Also SymbolSet.
- warm_start: bool
warm start or not.
- Note:
If you offer pset in advance by user, please check carefully the feature numbers,especially when use
re_Tree
. because the new features are add.- Reference:
CalculatePrecisionSet.update_with_X_y.
- new_gen: None,int
warm_start generation.
- predict(X)¶
predict y from X.
- Parameters
X (np.ndarray) – data.
- score(X, y, scoring)¶
- Parameters
X (np.ndarray) – data.
y (np.ndarray) – true y.
scoring (str) – scoring method,default is “r2”