analytics_packages package
Submodules
analytics_packages.custom_pandas module
These are all very outdated, wrote these several years ago. There are probably much better ways to achieve what you need to -James
- analytics_packages.custom_pandas.add_to_bottom(df, column_names, values)[source]
Adds to bottom of dataframe based on ‘column_names’ and 2D list ‘values.’ See append_to_df_with_df for appending dfs
- analytics_packages.custom_pandas.append_to_df_with_df(og_df, new_df, reset_ind=True)[source]
Appends ‘og_df’ with ‘new_df’ and returns new
- analytics_packages.custom_pandas.boxplot(df, column_with_values, group_by=None)[source]
Shows a boxplot values found in ‘column_with_values’ with the option to group by another column
- analytics_packages.custom_pandas.ceiling_filter(df, max_val, column)[source]
Returns df with values below ‘max_val’ found in ‘column’
- analytics_packages.custom_pandas.check_column_in_list(df, column, list, new_column)[source]
returns a dataframe with a boolean value in new column if the row had one of those value or not
- analytics_packages.custom_pandas.dict_to_df(dictionary)[source]
Return a dataframe from ‘dictionary’
- analytics_packages.custom_pandas.drop_these_cols(df, cols)[source]
Returns dataframe without columns ‘cols’
- analytics_packages.custom_pandas.drop_these_rows(df, rows)[source]
Returns dataframe without ‘rows’ based on index value
- analytics_packages.custom_pandas.filter_df_by_dates(df, date_col_dt, lower_datetime, upper_datetime, low_inc=True, up_inc=False)[source]
takes in a pandas df and returns one being filtered by lower and upper dates
- analytics_packages.custom_pandas.floor_filter(df, min_val, column)[source]
Returns df with values above ‘min_val’ found in ‘column’
- analytics_packages.custom_pandas.get_df(file_name, **params)[source]
Read csv file from local directory: return dataframe
- analytics_packages.custom_pandas.grab_rows_with_certain_values(df, column, values, return_not_in_values=False)[source]
Returns a version of the dataframe where every row in “column” contains a value found in list”values”
- analytics_packages.custom_pandas.histogram(two_dim_list, x_axis_titles, legend_labels, x_label, y_label, graph_title, text_size=11, axesfont=26, titlesize=32, opacity=0.6)[source]
Prints a histogram: two_dim_list variable should contain n-number (number of differnt colored series to plot) of lists of numerical values l-length long. x_axis_titles is a list of strings l-length long legend labels is a list of strings to the number of lists contained in two_dim_list ex: two_dim_list = [ [.1, .4, .3, .2], [.2, .4, .3, .1], [.2, .3, .3, .2] ], x_axis_titles = [‘Pepperoni’,’Sausage’,’Cheese’,’Vegetable’], legend_labels = [‘under 20 years old’,’20-50’,’50+’]
- analytics_packages.custom_pandas.keep_these_cols(df, cols)[source]
Returns dataframe with columns ‘cols’
- analytics_packages.custom_pandas.keep_these_rows(df, rows)[source]
Returns dataframe with index values contained in ‘rows’
- analytics_packages.custom_pandas.map_df_col_to_new_id(df, col, new_col_name, df2, id_col, map_col)[source]
- analytics_packages.custom_pandas.multiple_filters(df, columns, two_dim_list)[source]
columns = [‘Age’,’Name’] two_dim_list = [ [21, 25], [‘James’,’Michael’] ]
this function sends back the df with age values of 21 and 25 and name values of james and michael
- analytics_packages.custom_pandas.prep_datetime(df, time_col, dt_col, format='%Y-%m-%d %H:%M:%S')[source]
- analytics_packages.custom_pandas.rename_cols(df, existing, new)[source]
renames cols found in ‘existing’ to match those found in “new”
- analytics_packages.custom_pandas.replace_in_df(dataframe, column, to_find, to_replace)[source]
Replaces list or string “to_find” with list or string “to_replace” Looks up all instances in “column” found in the dataframe and replaces them Returns dataframe
- analytics_packages.custom_pandas.split_by_time_filter(df, how='hours', new_poss_values=[])[source]
returns dfs which have been sifted based on hours/days/months etc refer to params.py -> time_splits for reference
df1 df2 customer hour customer hour 0 0 0 1 1 0 1 1 2 0 2 1
- analytics_packages.custom_pandas.split_df_into_equal_time(df, time_chunk, datetime_col, format='seconds')[source]
returns dfs after being split into separate dfs by a time separator example: starting from time 0, separate into chunks of 3 weeks at a time
ISOTIME
- analytics_packages.custom_pandas.value_counts_df(original_dataframe, column=None)[source]
This function outputs a dataframe with counts of the unique values for each column from the input dataframe (function argument). The counts are given for each column of the input dataframe - in the output a column with unique values is paired with another column with the counts for the unique values
analytics_packages.custom_xlwings module
These are all very outdated, wrote these several years ago. There are probably much better ways to achieve what you need to -James
- analytics_packages.custom_xlwings.alpha_from_index(integer)[source]
Takes a 0-based index (integer) and returns the corresponding column header
- analytics_packages.custom_xlwings.change_cell_color(ws, top_left_cell, cell_color, bottom_right_cell=None)[source]
changes a range of cells a certain color
- analytics_packages.custom_xlwings.combine_string_columns(df, col1, col2, new_column)[source]
Returns df with new column that has a compiled string of col1 and col2
- analytics_packages.custom_xlwings.get_column(ws, col_index, nested=True)[source]
gets a column from the ws
- analytics_packages.custom_xlwings.remove_slash_from_ws_name(string, replace=True, char='-')[source]
- analytics_packages.custom_xlwings.sort_ws(ws, column_alphas)[source]
takes active ws and list of column alphas and sorts worksheet